Closed Bug 98304 (bz-postgres) Opened 23 years ago Closed 19 years ago

Make Bugzilla's SQL generally compatible with PostgreSQL (PgSQL)

Categories

(Bugzilla :: Database, enhancement, P1)

2.15
enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: justdave, Assigned: mkanat, NeedInfo)

References

()

Details

Attachments

(19 obsolete files)

David Lawrence is already in the process of converting bugzilla so that it will
work with either MySQL or PgSQL.  I couldn't find a bug to go with it, so I'm
creating one that the necessary patches can be posted to once they've gotten to
a point of being usable.
Depends on: bz-enums
Depends on: 43600, bz-custres
A few quick comments from a quick look:

I think the driver statements should check the driver is PostgreSQL too, and
give fail appropriately if it doesn't recognise the driver.  When we add
drivers, I think it's better to give explicit error messages.

I think that the REGEXP/~ stuff, INSTR/STRPOS stuff, etc should be put in subs,
eg RegExpSQL ($$).  This will make it easier to add new drivers to the code base
as well as making the code more readable because XD support isn't in the way.

Why is there still groupbitset stuff in here?  I thought we were removing that
stuff?
I agree that smaller utility functions would make the code cleaner and require
less changes throughout all the files such as things like SqlDate and the
regular expression stuff. I will make the necessary changes in the next day or
so. I left in the groupbit stuff to basically make the PostgreSQL port
functionally equivalent to the current Mozilla code base. The group changes are
a separate patch that will be merged into separately.
*** Bug 1104 has been marked as a duplicate of this bug. ***
This is now on the "we really want this for 2.16, but won't hold the release for
it if it's not done by then" list.
Priority: -- → P1
Target Milestone: --- → Bugzilla 2.16
Just attached second pass patch for PostgreSQL. Main differences are the
addition of two SQL utility functions SqlRegEx and SqlStrSearch. These replace
the REGEXP and INSTR function used by MySQL with functions that return the
proper syntax depending on which database you are currently using. Also I have
changed syntax from 

if ($::driver eq 'mysql) {

} else {

}

to

if ($::driver eq 'mysql') {

} elsif ($::driver eq 'Pg') {

}

buglist.cgi is currently broken due to a LEFT JOIN added in by the new
SelectVisible() function. It generates the following error:

SELECT bugs.bug_id, bugs.groupset, substring(bugs.bug_severity, 1, 3),
substring(bugs.priority, 1, 3), substring(bugs.rep_platform, 1, 3),
map_assigned_to.login_name, substring(bugs.bug_status,1,4),
substring(bugs.resolution,1,4), substring(bugs.short_desc, 1, 60) FROM bugs,
profiles map_assigned_to, profiles map_reporter LEFT JOIN cc selectVisible_cc ON
bugs.bug_id = selectVisible_cc.bug_id AND selectVisible_cc.who = 1 WHERE
((bugs.groupset & int8(9223372036854775807)) = bugs.groupset OR
(bugs.reporter_accessible = 1 AND bugs.reporter = 1) OR
(bugs.assignee_accessible = 1 AND bugs.assigned_to = 1) OR
(bugs.qacontact_accessible = 1 AND bugs.qa_contact = 1) OR
(bugs.cclist_accessible = 1 AND selectVisible_cc.who = 1 AND
selectVisible_cc.who IS NOT NULL)) AND bugs.assigned_to = map_assigned_to.userid
AND bugs.reporter = map_reporter.userid AND (bugs.bug_status = 'NEW' OR
bugs.bug_status = 'ASSIGNED' OR bugs.bug_status = 'REOPENED') GROUP BY
bugs.bug_id ORDER BY bugs.bug_status, bugs.resolution, bugs.bug_status,
map_assigned_to.login_name, bugs.bug_status, priority, bugs.bug_id: ERROR:
JOIN/ON clause refers to "bugs", which is not part of JOIN at globals.pl line 235.

If someone has an idea on how to fix the error let me know.
I also need to attach the pgsetup.pl script for creation of PostgreSQL tables as
soon as I sync it up with the current checksetup.pl.
V0.2 has been checked in on branch Bugzilla_PgSQL_branch

you can get this with:

cvs checkout -rBugzilla_PgSQL_branch Bugzilla

or

cvs update -rBugzilla_PgSQL_branch

If you don't know how to merge with the trunk (which we'll need to do somewhat
frequently until this is ready for prime time) I can show you how to do that.

I also set up a tinderbox testing this, and it fails (Burning).  See
http://tinderbox.mozilla.org/Bugzilla/ and click the "L" to look at the error
logs.  If it's the test's fault because of something PgSQL does that we don't
handle correctly, let us know so we can fix the test. :)
Try moving the bugs table in the FROM part to just left of the LEFT JOIN?
Yeah, you need to have bugs LEFT JOIN cc, I think.

If you can't rework the regexp to do this (and doing so may break other
queries), then you'll need to do:

bugs AS selectVisible_bugs LEFT JOIN cc AS selectVisible_cc ON ...

and then add selectVisible_bugs.bug_id = bugs.bug_id to the WHERE part, unless
you can think of something better. I'd think that that would be slower, though.
We might do some of this for 2.16 but not all, so moving off to 2.18.
Target Milestone: Bugzilla 2.16 → Bugzilla 2.18
I have made a change to CanSeeBug to alleviate some of the joining problems.
Please see my comment made on bug 68022 which is relevant to this report also.
Status: NEW → ASSIGNED
Depends on: 120980
From IRC:

<bbaetz> I think we should do this in stages
<bbaetz> 1. group stuff
<bbaetz> 2. " -> '
<bbaetz>  /other sql syntax changes which don't affect mysql
<bbaetz> 3. checksetup/conversion/etc
<bbaetz> 4. transations
<bbaetz> 5. constraints
<bbaetz> oh, and 5 may include ON DELETE CASCADE type stuff

4 and 5 may be swapped arround/combined/etc
Oh, and somewhere in there we need to handle locking (course grained locking
before transactions are supported are ok).

The pgsql branch currently just wraps all calls to LOCK/UNLOCK in an if
($::driver eq 'mysql') block - thats not a good idea...
The reason I conditioned out the LOCK stuff to be only MySQL is because PgSQL
handles locking pretty well out of the box. It does also support explicit
locking using LOCK and UNLOCK but the syntax is a little different then MySQL so
I chose the other route.
I must be missing something, then. Whats to stop two simultanious updates to a
bug, if we don't do locking? If I add an attachment (which for some reason
doesn't  use an auto_increment field), whats to stop one getting one id, and the
other getting the same id (and them failing due to the uniqueness test)?
PostgreSQL has a special locking mechanism called MVCC (Multi Version
Concurrency Control). For more information on this check out:

http://www.postgresql.org/idocs/index.php?mvcc.html

Basically when a access the databasem he/she gets a current snapshot of the
database to themselves. If they need a sequence number it grabs the next id, if
they do not use it and their session ends it does not go back in the pool so it
is lost. I dont really see this happening alot but if it does it is only a
single number gap worst case.

PostgreSQL does support SELECT FOR UPDATE so if we move in that direction for
the future, it shouldnt be a problem to get rid of the conditionals at that time.
OK, then I think I do misunderstand ;)

That only protects against other _uncommited_ transactions, AIUI.

Consider the following, which is how editattachstatuses.cgi adds new attachments.

table test2 has two columns, id, and test_id, with a unique index on id.

user1                                user2
---------------------------------------------------------
bbaetz=> begin;                      bbaetz=> begin;
BEGIN                                BEGIN
bbaetz=> select max(id) from test2;
 max 
-----
   4
(1 row)
                                     bbaetz=> select max(id) from test2;
                                      max 
                                     -----
                                        4
                                    (1 row)
bbaetz=>insert into test2 values(5,5);
INSERT 18776 1
                                     bbaetz=> insert into test2 values(5,7);
                                     (blocks....)
bbaetz=>commit;
COMMIT
                                     ERROR: Cannot insert a duplicate key into
                                     unique index test2_id_index

Oops. So the index is still unique, but the second attachmentstatus addition
failed. Now our code has to know how to rollback, and start again. Yuck.

What the code really needs to do is to lock the table in SHARE ROW EXCLUSIVE
mode first. The current code does LOCK TABLES attachstatusdefs WRITE, before this.

Mysql only has read and write locks, which translate directly to SHARE and
ACCESS EXCLUSIVE modes, I think. ACCESS EXCLUSIVE is very heavy weight - it even
blocks other SELECTS. Here we can get away with something simpler, since we
don't mind other people trying to select on the table for display. (of course,
if they want a read lock, they'd have to block) Locks only matter within
transactions, too, and postgresql doesn't have an UNLOCK command, so we'd have
to have psuedo support for those, at least.

What am I missing? (If you want to discuss this on IRC< that would probably be
easier...)
That is one reason why I have found ever doing SELECT MAX(id)+1 for getting a
next primary key is not really a good idea. That is where sequences are helpful.
This is a paragraph from the PgSQL docs that explains alittle.

To avoid blocking of concurrent transactions that obtain numbers from the same
sequence, a nextval operation is never rolled back; that is, once a value has
been fetched it is considered used, even if the transaction that did the nextval
later aborts. This means that aborted transactions may leave unused "holes" in
the sequence of assigned values. setval operations are never rolled back, either.

So if you were using sequences which I am, you would also need to LOCK the
sequence table itself to make sure that noone can get a number until you are
done if the above method wasnt implemented.
Sure. But:

a) The code does this already, so we have to support it (and the frequency of
doing an INSERT for a new bug/etc is reasonably low that locking the sequence
table/using a trigger for MAX(id)+1/etc wouldn't introduce noticable contention).

A sequence won't work for tokens, once we move to making them non-unique.

b) That was the simple case, but consider adding a vote to a bug. You haven't
change doeditvotes.cgi, but it needs to be locked. Currently it deletes all
votes, then adds the existing ones, counting + updating the cache on the bugs
record. You don't want that happening concurrently... Even if it was using
RelationSet, you'd still have problems if the same user modified their votes
simultaniously.

(OTOH, looking through a code I see a lack of locking in places - eg I think its
possible for the <title> of a bug to be different to its summary in the
textfield, since the SELECT is called once from show_bug, and again from
bug_form, while process_bug can happen in the middle. Using pgsql doesn't change
that, but if it was fixed, SHARE locking would be needed on the bugs table.

I agree that pgsql can get rid of a large number of the locks, but not all of
them...
I think PostgreSQL will fail transactions that clash in that way, so they need
to be reapplied.
> I think PostgreSQL will fail transactions that clash in that way, so they need
> to be reapplied.

Unless you take care in locking, which is what we'd want to do, rather than
using serilized transactions and handling retrying ourselves.
Suggest moving transaction discussion to newly filed bug #121069 - transactions
are not necessary to support PostgreSQL AFAIK.
I suggest the first version of the PgSQL code should use locking in the same way
as MySQL does.

This would simplify the initial landing to what is necessary to get it working.

We can definitely use MVCC later but MVCC won't work unless you're using
transactions.
From the PostgreSQL documentation, table locking is only effective when 
utilizing transactions which we do not do yet. Also most of the necessary row locking
is done automatically for you by PostgreSQL for most common operations such
as SELECT, UPDATE, INSERT, and DELETE. Please refer to the following page
and correct me if I am off base.

http://www3.us.postgresql.org/users-lounge/docs/7.1/reference/sql-lock.html
bbaetz mentioned this on IRC and mooted the idea that for a first pass we could
replace locks and unlocks with equivalent subs that did an implicit transaction
for PgSQL.  This would be reasonably easy.

Maybe we will introduce proper transactions for 2.18 but it would be nice if it
was a separate patch.

The full transactionising task is probably a lot harder, we need to audit for
places where there is insufficient locking currently.  We know some of these
places exist.
And, again, I want to try it out, _then_ work on an API. When I get time, I'll
sit down with some paper, a pen, and the bugzilla source, and work out what
locks we need (as opposed to what we currently have)
Note to pgSQL implementers...
When running a query that relies on the ordering of NULL values in SELECTS
versus non-NULL values, MySQL and PgSQL follow opposite conventions. 
MySQL 3.23 puts NULL first if ascending, last if descending
MySQL 4.0 documentation claims NULL is first regardless
pgSQL documentation claims NULL is last if ascending and first if descending.

Since doing JOINS with ISNULL is vital to get decent performance out of all of
the maps Bugzilla is getting, it is a good idea to make a provision for
portability on this.  ORDER BY statements that are relying on this behavior
should be able to do somthing like "ORDER BY fieldname $::NullFirst" where
NullFirst would be defined according to the database in use.

Attachment #55101 - Attachment is obsolete: true
Attachment #55518 - Attachment is obsolete: true
Attachment #55523 - Attachment is obsolete: true
Attached patch Port CVS Head to PostgreSQL (v1) (obsolete) — Splinter Review
This is initial first pass at making the new cvs head with integrated group
schema changed to work with PostgreSQL. Still needs work:

1. regular expression searches in Search.pm
2. buglist.cgi has permission checking turned off to allow it to work. Probable

cause of error is different JOIN syntax with Pg.
3. Attachments are large text fields for now. Need to be converted to support
Pg's large binary object interface.
4. Some substring searches are probably borken too in Search.pm but have not
tried it yet.

Most of this stuff just needs for me to bring over my changed from the previous
PostgreSQL port which had most of these working.
Comment on attachment 100291 [details] [diff] [review]
Port CVS Head to PostgreSQL (v1)

This diff has a whole lot of other changes - I suspect it wasn't diffed against
the correct base tag.

A lot of the conditionaly stuff shouldn't be there, since its valid for mysql
too (eg the 'delete from logincookes' stuff)

What mode does |LOCK TABLE| w/o an expicit mode use?

Can't you handle the LOCK stuff in SendSQL, like you do UNLOCK? Also, unless
you lock the tables in the same order (eg alphabetical), Pg will end up in a
deadlock state eventually.
Attachment #100291 - Flags: review-
Attachment #100291 - Attachment is obsolete: true
Attached patch Port CVS Head to PostgreSQL (v2) (obsolete) — Splinter Review
Second attempt. Made changes based on bbaetz suggestions. 

Transactions seem to work properly now the AutoCommit set to off.
No if/else for LOCK TABLES. Handled now in SendSQL.
Added support functions for SqlRegEx() and CurrId().
Changed occurences of INSTR() to POSITION() which both Mysql and PgSQL seem to
like.
Your REPLACE changes are subject to races - for pg, you need to LOCK the table
in share row exclusive (I think) mode, then SELECT FOR UPDATE the row.

You need the lock in case the entry doesn't exist, and the FOR UPDATE in case it
does.

HandleError shoud really be a die, but I've handled that separately. Implicty
rolling back won't help, because then you'd have subseqeunt statments eecuting.

Please do not use an END sub - thast just going to stuff mod_perl up. What you
should do instead is to have the LOCK commands change autocommit, then call
->begin, and have unlock do ->commit. YEs, this isn't safe, but its the best
you're going to get for the moment.
Actually, after a bit of google seraching, the best replacment for REPLACE in
this case is:

BEGIN
LOCK TABLE foo IN SHARE ROW EXCLUSIVE MODE
DELETE FROM foo where (unique_key=val);
INSERT INTO foo (unique_key,a,b) values (unique_key,new_a_val,new_b_val);
COMMIT

This allows concurrent processes to access the old value until this is committed
(inlcuding other select for updates on other rows), but blocks any updates (to
any row). Given the shortness/non-perf-ciritcal nature of this, I don't think
its a problem. (It also means that we don't need to SELECT first, and then test
the result)

For mysql, just use a write lock on the table.

I do note that query.cgi's use of replace looks racy, because there aren't any
locks on the namedquery table...
Correcting summary to make bug easier to find.
Summary: Allow Bugzilla to work with Postgres SQL (PgSQL) → Allow Bugzilla to work with PostgresSQL (PgSQL)
PostgresSQL -> PostgreSQL

If you're going to be pendantic... :) 

'postgres' should still hit as a case-insensitive substring.
Summary: Allow Bugzilla to work with PostgresSQL (PgSQL) → Allow Bugzilla to work with PostgreSQL (PgSQL)
Attached patch Port CVS Head to PostgreSQL (v3) (obsolete) — Splinter Review
Various bug fixes. 
Better locking/transaction support for PostgreSQL.
Much testing needs to be done.
Attachment #100656 - Attachment is obsolete: true
I really really really think that you're going to want to use checksetup. With
other db support coming along, having to update the schema in 5 different places
is just going to be a nightmare.

Also, you probably want to create tables WITHOUT OIDS, to save space.
Depends on: 114696
Attached patch Port CVS Head to PostgreSQL (v4) (obsolete) — Splinter Review
I felt the need to sync this patch because I'm basing the Sybase conversion on
it.

globals.pl had a conflict with the E|A|R patch that I had a hard time figuring
out (postgres removed a column from a query and added another one, and E|A|R
added the same new column, but didn't remove the other one)

I wiped out reports.cgi from this patch.  It got *completely* rewritten
recently, so we're better off starting over with that than trying to resolve
the 450-line conflict :-)

pgsetup.pl is groussly out of date.  We've had a lot of schema changes in the
last month.

I *will* be doing some more work on this in the next few days, since I have a
deadline to meet with the Sybase conversion.

I have postgresql 7.2.2 installed on Landfill, and an installation running this
patch there also.  I'll publish the URL as soon as I get pgsetup.pl up-to-date
and run it. :-)
Attachment #102377 - Attachment is obsolete: true
Depends on: 156834
No longer depends on: 120980
Updated patch with PostgreSQL changes ported to current CVS. WIll post new
patch soon with merged checksetup.pl that I worked on while on my road tour ;)
Attachment #105532 - Attachment is obsolete: true
Attached patch Port CVS Head to PostgreSQL (v5) (obsolete) — Splinter Review
Updated patch with PostgreSQL changes ported to current CVS. WIll post new
patch soon with merged checksetup.pl that I worked on while on my road tour ;)
Alias: bz-postgres
Depends on: 174295
Depends on: 190432
Comment on attachment 107200 [details] [diff] [review]
Port CVS Head to PostgreSQL (v5)

This has bitrotted. A few quick comments:

Some of the mysql vs pg differences don't need to be there. Date arithmatic can
be mostly done wthe same way with mysql 3.23, I think. (ie foo + interval '1
day'). However, pg requires quotes arround the '1 day', and mysql doesn't allow
it. Sigh. Anyway, we can abstract that, which is probably better/easier/etc.

The CurrId stuff should take two arguments, rather than a : separate string
which is split.

Would it be worth installing sql functions for the mysql IF function?
Similarlly for IFNULL.

For attachment.cgi, use a recent DBD::Pg version, and bind the var as a blob.
See dbi-dev/pg-interfaces disucussions froma couple of months back for details.

Avoid extra indentation - it just makes stuff hard to follow.

Since you have presumably found all of the places delta_ts needs to be updates
(because pg doesn't have an auto-updating timestamp type), its probably worht
splitting that out, making it a 'normal' time val on the mysql side, and then
you can drop the date format stuff.

Can the security stuff be rewtitten to use EXISTS for pg, and thgus be faster?
Attachment #107200 - Flags: review-
I'd like to try out and help debug PgSQL support. Given that the most recent
patch has bitrot, where can I obtain the latest code? Is there a branch in CVS?
The attachment has code like this in several places:

  if ($::db_driver eq 'mysql') {
    do it the mysql way
  } elsif ($::db_driver eq 'Pg') {
    do it the pg way
  }

And, of course, my imagination extends it like this:

  if ($::db_driver eq 'mysql') {
    do it the mysql way
  } elsif ($::db_driver eq 'Pg') {
    do it the pg way
  } elsif ($::db_driver eq 'oracle') {
    do it the oracle way
  } elsif ($::db_driver eq 'informix') {
    do it the informix way
  } elsif ($::db_driver eq 'sybase') {
    do it the sybase way
  } elsif ($::db_driver eq 'ehm... msaccess' ) {
    whatever
  }

All this in numerous places around numerous files. Do we really want it like
this? How about creating an abstract database interface? (Don't hit me!)
WRT comment 45,

  The lowest common denominator between all those DBs is really low.  Hopefully,
*MOST* places where DB operations are done can be common to all the DBs.

I think comment 45 was asking if we could consolidate all of the
database-specific stuff into one file for better tracking...
Yeah, we did a Bugzilla/DBCompat.pm for the Sybase port that contains stuff like:

sub SQLNow {
    if ($::db_driver eq 'mysql') {
        return 'NOW()';
    }
    elsif ($::db_driver eq 'Sybase') {
        return 'GETDATE()';
    }
}

So then when we need to get the current time, we can do this:

SendSQL("SELECT @{[SQLNow()]}");

(just a simple example, there's worse ones, like date formatting and date math
and so forth).

It's on my long list of things to get contributed one of these days.
Just FYI, I just posted the DBCompat.pm to bug 173130
With comment 45 I was actually suggesting what I just now filed as bug 211769,
but there are also less extreme solutions, such as bug 131136 or Dave's
DBCompat.pm file.
Attached file Schema for Bugzilla (obsolete) —
This is a schema based on CVS Bugzilla's database creation as per
checksetup.pl.	I think the way to go would be to rip all database specifics
from the checksetup.pl and make files like: schema.pgsql, schema.mysql,
schema.oracle, etc.  I have some work on that which I hope to post soon.
Attached file Compatibility layer for PostgreSQL (obsolete) —
Based on Dave Miller's DBCompat.pm for Sybase/MySQL I added PostgreSQL code to
the file.
This is an updated pgsetup.pl file.  It creates the database, sets things up. 
All without problems and using a schema which benefits pgsql.
Attachment #130562 - Attachment is obsolete: true
Attachment #130925 - Attachment mime type: text/html → text/plain
This new compatibility layer replaces the previous one I submitted.  It has
stubs for Oracle, introduces a DBConnect function/method (which needs to be
fixed wrt db_user and db_pass, I can get them from Bugzilla::Config, but it is
not picking them up, help appreciated), and some more fleshed out skeleton
documentation.
Attachment #130751 - Attachment is obsolete: true
Comment on attachment 130979 [details]
PostgreSQL compatibility layer, revision 1

People should use the Bugzilla::DB connect method instead.
The Red Hat guys did a quick 'n dirty port.  It works, but doesn't quite make 
use of the best of PostgreSQL.  Also, their tarball is out of date with the 
current schema used by Bugzilla.
dkl:  in two weeks it will have been one year since the last time you touched
this bug.  Is there any chance at all of you being able to work on this again to
get it landed, or should we allow someone else to take it over and drive it in?
Going with bbaetz's comment 39:

One thing that really needs doing is a unified way of doing checksetup.pl so
that it works with all the databases.  having a separate schema file for each
database is going to get out of sync fast, and people who add a feature that
requires schema changes are going to forget to update the schema files other
than the one they're using.  It would be best if we had some generic way to
describe the schema that all of them used, but could easily be adjusted to suit
the individial databases.

A Perl hash which describes the schema would probably work well for this.  The
database could be a single object which contains an array of table objects,
which contains an array of column objects, index objects, references to other
tables, etc.  We should have our own definitions for table types that are a
superset of everything available on the different databases, and let the actual
setup code pick the best type for that database based on that.  Using specific
hash keys also allows database-specific keys to be added where needed without
interfering with things the setup scripts for other databases expect to find,
but keeping it all in the same place.
I've start something along these lines today, which I think could work well,
except for the enumerated fields. It is quite possible to constrain these values
within a text field, but this will lead to some nasty bloatage. If I were doing
this in PostgreSQL (or almost anything but MySQL) from scratch I would put these
values in a table (or one per enumeration) with a code for each value and then
use a foreign key constraint on the code in the referencing table, possibly
adding functions to encode/decode values. But you can only do FKs with MySQL
3.23.44 and up and then only using InnoDB, and you can only define functions in
C, as I understand it. 

I'm sure there are many other wrinkles (e.g. fulltext) but that's what I came
across first up.

So although a super-schema sounds nice in theory, ISTM in practice that it might
not work so well, and a better way would be to have a schema per RDBMS
supported, and functions to go along with it that hide the actual implementation
from most of the code.
The problem with multiple schema documents is who is going to maintain them? 
Anyone maintaining a database schema will have to monitor all changes to any of
the other schemas and make changes to match.  With the generic scheme it's
likely someone with knowledge of only one database could add to that schema (if
we have limitations such as "no enums" spelled out) and have a reasonable
certainty that it would still work on the other databases.
I mostly agree with comment 60, and I've made some additional comments
supporting this in my disillusioned bug 211769 comment 1. A lowest common
denominator schema may be possible, however, although it might be better to use
lowest common denominator CREATE TABLE statements instead of Perl data structures.
There's no such thing as a lowest-common-denominator TABLE CREATE statement,
especially because of data types.  Although a lot of databases claim to be
"ANSI-compatible" a lot of them (like Sybase) suck at it in practice.  Sure, the
table create works, but you wind up with crappy performance.  It would be better
for performance reasons to declare our own datatypes for each column which
explicitly defines the type of use we have for that column, and let each
database driver translate that to the most effective datatype for that database.

There may be cases where 3 different columns might have the same datatype in the
MySQL schema, but the most effective schema in some other database might use
different types for each of the three.

For example, there are a number of tables that have what are essentially boolean
columns in them (attachments.ispatch, groups.isbuggroup, bugs.cc_accessible,
etc).  MySQL doesn't have booleans, so they're TINYINT in our schema.  Postgres
and Sybase both have boolean types, and in most cases it would make better sense
to actually use them.

Other differences (like how many digits versus how many bytes for integers, and
how many characters vs how many bytes in unicode fields, etc) could come into
play, too.
I expect to have something along the lines of Dave's suggestion in a day or two.

If we are fine saying "no enums" etc. then we might get somewhere. Can we also
say MySQL >= 3.23.44 and InnoDb?

In any case it will mean major surgery, but it it the Right Thing (tm).
Databases like the older versions of MySQL could even ignore the foreign key
references, just like we do now.  And if the schema is in a perl module file
which can be loaded by other scripts than just checksetup.pl, then sanitycheck
can use those reference definitions to test for it on the databases that don't
enforce it themselves.
Before anyone goes off to write their own generic database schema module,
please take a look at DBIx::DBSchema.  It currently supports MySQL,
PostgreSQL and (partially) Sybase.  It's being actively updated, and I'm
sure the author would welcome patches.

http://search.cpan.org/~ivan/DBIx-DBSchema-0.22/

As far as making database connections more transparent, the DBIx::AnyDBD
module may help with that.

http://search.cpan.org/~msergeant/DBIx-AnyDBD-2.01/
Having spent a few weeks hacking on the PostgreSQL support before I got pulled 
into migrating the office to a new building over the course of a month I can 
tell you that not only the schema is important to fix.  The entire code as it is 
is good for MySQL, but pretty braindead if you want to use a real RDBMS.  The 
way MySQL locks tables is unacceptable for (most) other databases.  So real 
transactions need to be introduced in order to solve these things.  Doing that 
also requires a lot of rewrites on the source level.  I am not sure this is 2.18 
material.  Also, a lot of the backwards compatibility cruft isn't helping 
rewriting the code.  Yes, I understand the importance, but man, it is a LOT of 
additional stuff you have to worry about.
Usually when you have to rip everything out and rewrite it, the project just
never gets done.  That might be where this bug is.  "baby-steps" is my motto.

Would it be possible to divide this project up in to smaller bits and work away
at it over a few releases?  For instance, get DBIx::DBSchema and
DBIx-AnyDBD-2.01 supported in spirit for optional use.  Then when somebody
touched a bit of code they could opt to use DBIx::DBSchema instead of writing
mysql-compatible code directly.  Helpers could port small bits as they have
time.  For a while, you'd have the overhead of the abstraction to still only
support one databse, but eventually enough would get done that one final slog
could pick up the leftover bits and complete the transition.

Even though the bug is for pgsql support, database independence is the nicer
goal.  Comments show some conflicts though: independence (mysql,pgsql,sybase),
ease of coding, and speed (write 1 SQL statement vs n optimized SQL statements).
 I think it's a 'pick any two' situation.  It would be helpful to pick the two
before embarking on the solution.

If all the SQL were sql-standard, that would be a good start.  There are
probably several queries that are not taking advantage of any mysql tricks that
could be brought up to standard which would help get the ball rolling.  There's
a web validator here:  http://developer.mimer.com/validator/  Maybe helpers
could pick off some of those queries to chip away at the problem?

Of course, to even do that, we'd have to pick the standard first (92?), which is
a balancing of trade-offs.  [aside: The only reason vendors get away with shoddy
standards support is because the customers let them...]   Mozilla.org stuff
_tends_ towards supporting standards on principle.  I'd be happy to file some
breakout bugs if any of this sounds reasonable.
Some things can be done in baby steps, some can't - they just require big jumps.

I tried getting DBIx::DBSchema working, but it wanted a version of DBD::Pg that
isn't even on CPAN, so I gave up. 

I now have a sort of abstract database structure and some database dependent
stuff that will now generate DDL for both MySQL and PostgreSQL and is easily
extensible to other Dbs. It has the whole Bugzilla schema down. It even has
rules for emulating MySQL's hokey auto-update-timestamp thingy in PostgreSQL.
The one MySQL thing that is gone is the enums in the bugs table - replaced by FK
references to tables holding the allowed values. It still has some testing to go
but appears to work. The aim is to have something that doesn't bitrot when the
schema changes.

However - and this is really where the non-baby steps come in - what I think is
really needed is to provide a layer between the cgi code and the database, so
that the cgi code did no SQL construction and the database layer does no
presentation. It just becomes procedural - make a call, get back a hash. I've
done quite a few Perl webapps (in the commercial world) using this architecture,
and ported them with minimal fuss between MySQL, PostgreSQL and Oracle. *BUT* it
is a big job, and before I can get started on it I need to know the code far
more intimately.

I would like very much to get this done for 2.18, but I have no idea of the
timeframe involved. If anyone can enlighten me a bit on that it would help me to
prioritize (WAGs OK). 
Excuse an amateur, but could the "layer between the cgi code and the database" 
referred to in #69 be ODBC?

* Almost all databases have ODBC support
* there is a DBI interface
    http://search.cpan.org/~jurl/DBD-ODBC-1.06/ODBC.pm
* there is plenty of experience in using ODBC on th eWin platform
* there are free ODBC drivers for non-Windows platforms, for example unixODBC

/Mike (don't reinvent the wheel)
The key issue is not how one communicates the SQL itself to the server or even
the exact syntax of the table creation commnads.

The key issue is that the various databases have fundamental differrences in
their approach.  MySQL (MyISAM) require table locking.  PgSQL and Oracle support
transactions (as does MySQL InnoDB).  Some databases support boolean operations
while others do not.  The behavior of NULL values varies by database. There are
differrences in the degree of support for foreign keys and enumerated datatypes.
 Some DBs handle LEFT JOINs very efficiently while others demand that they be
avoided.  Some DBs have POSIX regular expression support while others lack it.

All of those issues currently have driven design decisions that are captured in
the very way that Bugzilla is structured.  Moving to a DB-agnostic structure
will require some rather dramatic changes.  

My suggestion is we start by evaluating....
a) Moving recommended MySQL version to 4.something
b) Migrating from MyISAM to InnoDB table types
c) Replacement of MySQL-specific SQL with more generic SQL
d) Freeze of checksetup.pl (upgrade to final MySQL-only version)
e) Creation of new checksetup.pl replacement that abstracts multiple databases.
(No need to port old conversion code.... existing installations can be presumed
to have MySQL, so they can be upgraded to final state of checksetup.pl.  Only
the new checksetup.pl replacement needs to be maintained beyond that point.)
This is some proof of concept code that aims at the 'single schema - multiple
database engines' idea that Dave proposed.

I haven't worked on integrating it yet with the bugzilla code, and
unfortunately a large project has just landed on my plate, so it might be a
while before I can get back to it.
That proof-of-concept code looks nice. However, arrays might not be the way to go.

I'm almost sure that the way to go is something like DBCompat.pm, overriding it.
I'd like to see Bugzilla::DB::Pg, Bugzilla::DB::MySQL, Bugzilla::DB::Sybase,
etc. (In fact, I thought this _was_ the way we were going -- I thought I talked
about this with somebody already, who expressed this.)

Impractical Theoretical Solution:
In an ideal world, I'd have every bug as an object, and have every object load
itself into memory with all of its fields initialized by reading the DB, and
have it write itself back out to the database when needed. I'd abstract out
every DB read/write to getters and setters. :-) Basically, under most
circumstances, I'd have to have an object in memory for every bug in a database.
(The database just becomes a persistence layer.) However, I've implemented
solutions like this before, and the memory usage is larger than most Bugzilla
administrators would be likely to accept.

[Note: There are other ways to implement this that _would_ be practical in
Bugzilla, using memcached and lazy reads.]

Though that's impractical, there is something to be gained from it. It
demonstrates that the root of the problem lies in abstraction of database reads
and writes. (I think this is what justdave was getting at.)

So, the general, ideal, practical solution is just more abstraction, one way or
another.

I think a good first step would be to have checksetup call functions in
DBCompat.pm, just like the rest of Bugzilla should.

Possibly, it would mean a LOT of functions in DBCompat.pm. I think that's okay.

DB-independent schema are are really cool idea, and they're a lot of fun to
think about.

By the way, another posibility would be to take that proof-of-concept code and
make it more modular, using objects.

I'm very willing to work on this.

-M
Depends on: 228917
Regarding use of arrays: it was done deliberately where order could be important
- namely tables (for FK dependencies) and fields (to preserve field order).

Regardless of how OO it is made, ITSM it will only be maintainable if a Db
independent schema is kept in *one* place. Otherwise it will be far to easy to
forget to update some Db's schema and break things. That's what I tried to do.
Wrapping some OO-ness around that would be hard.
We need to name all of our indexes,so that we can drop them in the schema change
code.

How will teh schema change stuff happen independantly?
>How will the schema change stuff happen independently?

I think DBIx::AnyDB could handle that.  You'd have to write methods like
addColumn("table", "colname", "coltype"), deleteColumn("table", "colname"),
etc. for each supported database, then checksetup.pl (or its replacment)
could then call the "generic" method, which would then call the appropriate
database-specific method depending on the current Bugzilla configuration
(which would determine which database is being used).

See also Comment #66.
By the way, I'm not sure that DBIx::AnyDBD is really what we're interested in,
anyhow. That's what DBI is for.

I think that DBIx::DBSchema and DBIx::Abstract are what's interesting.

-M
Be careful about loading up more module dependencies - I had troubler getting
DBIx::DBSchema installed. Your (Max K-A's) earlier suggestion of taking my POC
code and OOiffying it might be a better way to go.
Regarding Comment #58:

I noticed that Red Hat has a new download on their Bugzilla index.cgi
page (dated 2003-11-20) for a version of Bugzilla that uses PostgreSQL.
I have no idea if this is DKL's work, or someone else's.

  https://bugzilla.redhat.com/bugzilla/index.cgi

Here's the direct download link:

  http://bugzilla.redhat.com/download/bugzilla-redhat-20031120.tar.gz
According to Dave Lawrence at RedHat, the version that you see on their website
is based on version 2.17.1:

It is based on 2.17.1 which is available at http://www.bugzilla.org.

On Mon, 2004-01-26 at 15:34 -0500, Joshua Kramer wrote:

>> Hello,
>> 
>> I see you have a PostgreSQL version of Bugzilla at
>> https://bugzilla.redhat.com/bugzilla/index.cgi.  What version of Bugzilla
>> is this based on - does it have all of the features and fixes of 2.16?
>> 
>> Thanks,
>> --Josh
>> 
Okay, so there are two actual bugs here -- one for schema, and one for the SQL
that Bugzilla uses. Since all the docs for PG integration point to this bug,
let's make this the SQL bug. I'll file another bug for DB-independent schema,
and move the important attachments and CC's there. When it's all made, I'll post
the bug number here.

Also, dkl hasn't worked on this in a while, I'll assign it to nobody for now.

-M
Assignee: dkl → nobody
Status: ASSIGNED → NEW
Summary: Allow Bugzilla to work with PostgreSQL (PgSQL) → Make Bugzilla's SQL compatible with PostgreSQL (PgSQL)
All right, it turns out that bug 146679 was close enough to what we were talking
about that our ideas could be merged over there. DB Schema conversation goes
there. For this bug, let's focus on the actual SQL that Bugzilla sends, outside
of checksetup.pl.

What I suggest is to use this as a tracking bug. Find little points within
Bugzilla where you could make the SQL DB-independent, and then file a bug
against that part, blocking this bug.

Once that is complete, we can file another bug for "use DBCompat.pm", which I
think is what we should do for the differences between databases that can't be
resolved.

This means that certain bugs actually now block the new bug, not this one, so
those blockers will be moved there, to bug 146679, and that bug will be marked
as blocking this one.

-Max
Blocks: bz-dbschema
No longer depends on: bz-enums
No longer blocks: bz-dbschema
Attached patch Port CVS Head to PostgreSQL (v6) (obsolete) — Splinter Review
Another stab at creating a better patch against CVS head. I just noticed the
compat layer created earlier by another person. I have also create a
DBcompat.pm module which I will look at merging the better of the two into one
and upload a new patch soon. Couple of things to note, FlagType matches are
commented out in Bug.pm until I figure out what is wrong. Also I have included
a pgsetup.sql script for database creation until I or someone else can create a
unified checksetup.pl in order to get rid of pgsetup.pl. Please let me know
what you all think about the latest patch.
Attachment #107200 - Attachment is obsolete: true
Comment on attachment 141569 [details] [diff] [review]
Port CVS Head to PostgreSQL (v6)

Hrm, this is a big hulking patch. I'll look over DBCompat first.

My first note is that I'm *really* happy about the subroutine docs. :-)

>+# Contributor(s): Terry Weissman <terry@mozilla.org>

   Hrm... aren't *you* actually the contributor for this one? :-)

>+sub GetLastInsertID {
>+    my ($table, $column) = @_;
>+    if ($db_driver eq 'mysql') {
>+        return "SELECT LAST_INSERT_ID()";

     Is this actually guaranteed to do what the documentation above says? If I
insert into table A, and then insert into table B, and I call this on table A,
I get...?

     You might also want to note in the subroutine doc that this should always
be called inside of a transaction, so there aren't threading issues. (Thread1:
INSERT Thread2: INSERT Thread1: GetLastInsertID())

>+# subroutine:  RegExp
>+# description: Outputs SQL syntax for doing regular expressions searches in format
>+#              suitable for a given database.
>[snip]

    An example of how this works would be useful to me, and probably to others.
I don't fully understand how that function is used, looking at the code of it.
(Is it actually complete?)

>+sub DateFormat {
>[snip]

   I like this because it allows us to keep the MySQL syntax we have there
already.

>+sub ToDays {
>+    my ($column) = @_;
>+    if ($db_driver eq 'mysql') {
>+        return " TO_DAYS($column) ";
>+    } elsif ($db_driver eq 'Pg') {
>+        return " $column ";
>+    }
>+}

     For Pg, I think that should actually be to_char($column, 'J')::int, based
on some code I've fixed in our local Pg Bugzilla. That returns the
date/timestamp as # of Julian days, which can be added/subtracted from other
numbers of the same type.

>+sub Rand {
>+    return " RAND() " if $db_driver eq 'mysql';
>+    return " RANDOM() " if $db_driver eq 'Pg';

Nit: The Pg idiom seems to be lowercase function names. (At least in the docs.)

-M
I see that have some dummy code in DB.pm, too, for handling LOCK TABLES and
UNLOCK TABLES. Is there a reason that you added the "return;"?

I think a better solution than this would be to add a Lock($table, $row) and
Unlock($table, $row) function to DBCompat.pm, or even to DB.pm. Then, change all
the LOCK TABLES and UNLOCK TABLES statements to that (which would be a simple
grep-then-edit). That way, for MySQL you could ignore the $table and $row, and
for PgSQL you could get more granular locks.
I hope you can forgive the spam. There's just no way to digest that whole patch
at once. :-)

Further comments:

+ You have a few if(db_driver) statements that do two whole chunks of SQL for
only a difference between (e.g.) "INTERVAL 3 DAY" and "INTERVAL '3 days'". (See
userprefs.cgi for an example.) 
     I think that a TimeInterval($num, $type) function in DBCompat would do the
trick. You could do a switch statement on $type, which would be either "D" or
"days" or something like that, depending on your (dkl) personal preference.

+ There's a lot of DateFormat that goes on in the patch. I would think that we
should do the DateFormat on the presentation side, and let the databases handle
their own date types internally. I think it makes it simpler overall. The
double-comment bug over on the RH BZ
<https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=114353> is example of what
can go wrong.

+ In query.cgi, you changed a REPLACE into an INSERT. If there's a PRIMARY KEY
on that table (I didn't check), then those two won't actually be the same.

+ In process_bug.cgi:

>                detaint_natural($::FORM{"$field"});
>                SendSQL("UPDATE longdescs SET isprivate = $private
> 	
>                        WHERE bug_id = $::FORM{'id'} AND " .
>
> Bugzilla::DB::DateFormat('bug_when','%Y%m%d%H%i%s') . 
>                        " = " . SqlQuote($::FORM{"$field"}));

  You don't need to SqlQuote that field, since detaint_natural assures that it's
an integer. (However, now that I'm thinking about it, why in the world is
bug_when being compared to an integer??)

+ In long_list.cgi, you changed a "IFNULL(bugs.alias, '')" to just "bugs.alias"
-- are you certain that that is safe?

+ In buglist.cgi
   * You manually limited the buglist to 2000 results. Is that a Red
Hat-specific thing, or a PgSQL limitation?
   * Another change of REPLACE to INSERT for namedqueries.
   * I don't understand why "SELECT DISTINCT  groups.id, name, description,
isactive" was changed to "SELECT DISTINCT  groups.id, name, description, isactive"

+ For attachment.cgi, A brief explanation of why attachments need to be coded
into Base64 for PostgreSQL might be useful for those who aren't initiates. (In
fact, I'm not sure, myself.)
(In reply to comment #86)
>    * You manually limited the buglist to 2000 results. Is that a Red
> Hat-specific thing, or a PgSQL limitation?

  Oh, slight correction. You limited *something* having to do with bug group
privacy to 2000 bug ids.
Hi David,

yeah the code resembles mine so it seems we're at least on the same track.

However, I see you kept the MySQL locking stuff in there.  I am still not 
convinced this is the way to go since it effectively limits the throughput of 
Bugzilla on a PostgreSQL or other transactions-able database.

I'll do a compare on it all and see what we do and do not agree about and then 
we'll hash it out Tekken-style.  (Nah, just kidding. ;) We'll do some discussion 
back and forth.)
Attached patch Port CVS Head to PostgreSQL (v7) (obsolete) — Splinter Review
Many changes. Will comment further in replies to bug comments.
Attachment #141569 - Attachment is obsolete: true

(In reply to comment #88)
> Hi David,
> 
> yeah the code resembles mine so it seems we're at least on the same track.
> 
> However, I see you kept the MySQL locking stuff in there.  I am still not 
> convinced this is the way to go since it effectively limits the throughput of 
> Bugzilla on a PostgreSQL or other transactions-able database.
> 
> I'll do a compare on it all and see what we do and do not agree about and then 
> we'll hash it out Tekken-style.  (Nah, just kidding. ;) We'll do some discussion 
> back and forth.)


In the v7 patch, I merged alot of your stuff in with that I had. I removed the
SQL from the function names since I thought it was redundant and without made
the names shorter and easier to type ;) I can add them back if everyone thinks
we need that level of description. Let me know what you think. As for the
locking, I agree that could remove them, at least PostgreSQL does not need them
per say since it does it's own level of locking. I am not familiar with what is
required with the new MySQL that has transaction support. It may be the same
situation there.
(In reply to comment #84)
> (From update of attachment 141569 [details] [diff] [review])
> Hrm, this is a big hulking patch. I'll look over DBCompat first.
> 
> My first note is that I'm *really* happy about the subroutine docs. :-)
> 
> >+# Contributor(s): Terry Weissman <terry@mozilla.org>
> 
>    Hrm... aren't *you* actually the contributor for this one? :-)
> 

Heh, fixed that.

> >+sub GetLastInsertID {
> >+    my ($table, $column) = @_;
> >+    if ($db_driver eq 'mysql') {
> >+        return "SELECT LAST_INSERT_ID()";
> 
>      Is this actually guaranteed to do what the documentation above says? If I
> insert into table A, and then insert into table B, and I call this on table A,
> I get...?
> 
>      You might also want to note in the subroutine doc that this should always
> be called inside of a transaction, so there aren't threading issues. (Thread1:
> INSERT Thread2: INSERT Thread1: GetLastInsertID())
>

It does make the assumption that you are running it after the insert that you
meant to retrieve the current value from. I admit that it is not the most ideal
solution but it is not really that much different than the way it happens now.
 
> >+# subroutine:  RegExp
> >+# description: Outputs SQL syntax for doing regular expressions searches in
format
> >+#              suitable for a given database.
> >[snip]
> 
>     An example of how this works would be useful to me, and probably to others.
> I don't fully understand how that function is used, looking at the code of it.
> (Is it actually complete?)
>

It is complete and I added some more description to the code in comments.
 
> >+sub DateFormat {
> >[snip]
> 
>    I like this because it allows us to keep the MySQL syntax we have there
> already.
> 
> >+sub ToDays {
> >+    my ($column) = @_;
> >+    if ($db_driver eq 'mysql') {
> >+        return " TO_DAYS($column) ";
> >+    } elsif ($db_driver eq 'Pg') {
> >+        return " $column ";
> >+    }
> >+}
> 
>      For Pg, I think that should actually be to_char($column, 'J')::int, based
> on some code I've fixed in our local Pg Bugzilla. That returns the
> date/timestamp as # of Julian days, which can be added/subtracted from other
> numbers of the same type.
>

Fixed
 
> >+sub Rand {
> >+    return " RAND() " if $db_driver eq 'mysql';
> >+    return " RANDOM() " if $db_driver eq 'Pg';
> 
> Nit: The Pg idiom seems to be lowercase function names. (At least in the docs.)
> 

I like to keep column names and values in lower case and SQL keywords in
UPPERCASE. Helps me see things better.

>+ You have a few if(db_driver) statements that do two whole chunks of SQL for
>only a difference between (e.g.) "INTERVAL 3 DAY" and "INTERVAL '3 days'". (See
>userprefs.cgi for an example.) 
>     I think that a TimeInterval($num, $type) function in DBCompat would do the
>trick. You could do a switch statement on $type, which would be either "D" or
>"days" or something like that, depending on your (dkl) personal preference.

I added a function called Interval to DBcompat.pm that does that now.

>+ There's a lot of DateFormat that goes on in the patch. I would think that we
>should do the DateFormat on the presentation side, and let the databases handle
>their own date types internally. I think it makes it simpler overall. The
>double-comment bug over on the RH BZ
><https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=114353> is example of >what
>can go wrong.

I think I see what you are saying. Can you provide some code examples to further
explain this?

>+ In query.cgi, you changed a REPLACE into an INSERT. If there's a PRIMARY KEY
>on that table (I didn't check), then those two won't actually be the same.

Well there isnt a primary key per say in namedqueries but there is a unique
constraint on (userid, name) which may cause a problem possibly. I did this
originally since PostgreSQL did not have a REPLACE statement.

>+ In process_bug.cgi:
>
>                detaint_natural($::FORM{"$field"});
>                SendSQL("UPDATE longdescs SET isprivate = $private
> 	
>                        WHERE bug_id = $::FORM{'id'} AND " .
>
> Bugzilla::DB::DateFormat('bug_when','%Y%m%d%H%i%s') . 
>                        " = " . SqlQuote($::FORM{"$field"}));
>
>  You don't need to SqlQuote that field, since detaint_natural assures that >it's
>an integer. (However, now that I'm thinking about it, why in the world is
>bug_when being compared to an integer??)

Took the SqlQuote off.

>+ In long_list.cgi, you changed a "IFNULL(bugs.alias, '')" to just "bugs.alias"
>-- are you certain that that is safe?

Wrapped it with Bugzilla::DB::IfNull()

>+ In buglist.cgi
>   * You manually limited the buglist to 2000 results. Is that a Red
>Hat-specific thing, or a PgSQL limitation?

At the time I was having problems with putting more that 2000 integers inside of
the IN() clause. It was causing PostgreSQL to generate an error. It may have
something do with a statement length limitation or something else. If the number
of bug_ids was more than 2000 it just didnt do the check and there wasn't any
color highlighting showing private bugs. That may have been an older problem
with PostgreSQL and might be fixed now. I will try it out sometime.

>+ For attachment.cgi, A brief explanation of why attachments need to be coded
>into Base64 for PostgreSQL might be useful for those who aren't initiates. (In
>fact, I'm not sure, myself.)

Was a quick fix at the time to allow binary attachments with little code change
to the main bugzilla code. What needs to be done eventually is to utilize the
built in binary object data types that PostgreSQL uses. I am going to work on
something like that soon.

P.S. Changes to this bug report sure take a long time with the number of Cc
addresses attach to this ;)
One additional comment. I added two new functions to DBcompat.pm called
LockTable and UnlockTable. Please comment on those as well. I have only added
the functions into votes.cgi so far so people comment before changing every LOCK
TABLES to the new format and wasting time. You will notice that I send dummy
args in the if condition first before actually sending the real LOCK TABLES
since on certain databases that don't need it, LockTable() will just return an
empty value and the statement will be skipped instead of SendSQL sending
something invalid.
(In reply to comment #91)
>>+ There's a lot of DateFormat that goes on in the patch. [snip]
> 
> I think I see what you are saying. Can you provide some code examples to further
> explain this?

  Okay, I suppose the basic idea that I'm proposing is that at some point,
DATE_FORMAT() shouldn't happen at all in SQL, and instead should happen in perl. :-)

  There's an example in this patch, though, that I can give, from process_bug.cgi:

-                SendSQL("UPDATE longdescs SET isprivate = $private 
-                    WHERE bug_id = $::FORM{'id'} AND bug_when = " .
$::FORM{"$field"});
+                SendSQL("UPDATE longdescs SET isprivate = $private
+                        WHERE bug_id = $::FORM{'id'} AND " .
+                        Bugzilla::DB::DateFormat('bug_when','%Y%m%d%H%i%s') . 
+                        " = $::FORM{$field}");

  As you can see here, every time we touch bug_when, we have to remember to
DateFormat the value we insert, and DateFormat it when we take it out again,
before the comparison. This is a sort of "hidden requirement" that programmers
have to remember without any way to enforce the requirement in code (that I know
of). 

  Otherwise, we end up with various hidden inconsistencies as a result of the
fact that PostgreSQL is actually tracking microseconds in every date/time type.
If we hide the microseconds, we have two choices: we can round off the
microseconds, or we can truncate them. Either way, this means that 10:10:10 and
10:10:10 aren't guaranteed to have *actually* been simultaneous, and 10:10:10
and 10:10:11 aren't guaranteed to have *actually* been one second apart.

  This is the worst when you're dealing with delta_ts and lastdiffed, which are
almost always a half-second apart. One way or another, if we don't keep
microseconds in them, we'll end up in trouble. When I was working on the patch
for the double-comment bug in the Red Hat Bugzilla, I tried a *lot* of different
ways to work around this. The only viable long-term solution was to keep the
microseconds.

  But, if you keep microseconds on one field, you have to keep microseconds on
all of them, since you *might* at some point compare a date-with-microseconds to
a date-without-microseconds and get an invalid answer. (Even if Bugzilla doesn't
compare two specific date fields now, it may in the future.)

  So, all storing and comparison of dates in Bugzilla has to be done in the
native format of the database being used. Otherwise, it's a headache and hidden
bugs waiting to happen. It's okay to SELECT them with DateFormat, but *only* if
you're not going to use that as a comparison argument.

  What about moving bugs from MySQL to PgSQL? Well, that's not a problem, since
the MySQL dates will all be correct relevant to each other, and the new PgSQL
dates will all *become* correct relevant to each other on the first update. I'm
not sure that moving between databases is something that we even want to think
about right now, though. :-)

-M
Assignee: nobody → dkl
Comment on attachment 141703 [details] [diff] [review]
Port CVS Head to PostgreSQL (v7)

  Okay, going over DBCompat.pm first, again.

  You might want to put a comment at the top of the file about the fact that
not all of these functions are currently used in every SendSQL, but should be
if they can be. Also, that Oracle support is nonexistent, currently, but
perhaps give them a bug number to contribute to.

>+# Sybase:   Dave Miller <justdave@netscape.com>
>+#       Gayathri Swaminath <gayathrik00@netscape.com>

  I think that justdave is at bugzilla.org, now.

>+sub LastKey {
>+    my ($table, $column) = @_;
>+    if ($db_driver eq 'mysql') {
>+        return "SELECT LAST_INSERT_ID()";
>+    } elsif ($db_driver eq 'Pg') {
>+        my $seq = $table . "_" . $column . "_seq";
>+        return "SELECT currval('$seq')";

  I think that this shouldn't include the SELECT, so that people can use the
function as part of a larger SendSQL statement, if they want.

>+sub RegExp {

   My only concern with this function is that someday we might run into a
database that doesn't do RegExps the same way as other DBs do, and we have to
actually mangle the pattern itself. Honestly, I can't say if this will ever
happen, since I don't know how databases do RegExps besides PgSQL and MySQL.

>+# subroutine:   ToDays
> [snip]
>+# returns:      formatted SQL for TO_DAYS function. TO_DAYS($column)
> for Mysql and just $column for Postgresql. (scalar)

  What it returns changed for PgSQL. :-)

>+sub If {
> [snip]
>+    elsif ($db_driver eq "Oracle") {
>+        # Stub
>+    }

  If you're going to do the Stub for this one, you probably ought to do it for
the rest of them, too. :-) In fact, instead of doing a Stub here, you might
want to see if there's a ThrowCodeError for "not_implemented" or something.

>+# description:  Returns list of values from the enum data in Mysql 
>                 or a particular table in PostgreSQL and others.
>+#               Assumes that enums don't ever contain an apostrophe!

  Is that a safe assumption? (Did we actually make it before and it just wasn't
documented?)

>+# subroutine:   Interval
> [snip]

  This looks great.

>+sub DBConnect {

  Hrm, is this supposed to be called by ConnectToDatabase? (That is, what's it
for?)


>+sub DaysDiff {
>+    my ($lowdate, $highdate) = @_;
>+    if ($db_driver eq "mysql") {

  For the MySQL and PgSQL versions, it might make more sense to implement this
using the ToDays function that you wrote above instead of manually using the
TO_DAYS SQL. (It's too bad that we have to have this function at all -- I see
why, for Sybase, though.)

>+    elsif ($db_driver eq "Pg") {
>+        return "COALESCE($one,'$two')";

  Why does this have a space before the $two?

>+# subroutine:   IfNullString
>+# description:  Outputs SQL syntax for the Mysql IFNULL function based on database type

   You might want to describe when this should be used instead of good ol'
IfNull.

>+sub IsNullString {

  Also, this sub is called "IsNullString" but the docs are for "IfNullString"

>+sub Limit {
> [snip]
>+    elsif ($db_driver eq "Sybase") {
>+        # Sybase doesn't support LIMIT, but we have a hack in SendSQL
>+        # to remove it from the SQL string and emulate it from there
>+        # using $::db->{rowcount}
>+        return "LIMIT $limit";

  Hrm, I think that hack is only in justdave's version. I think that Sybase
support in general requires a *lot* of hacks, and that justdave was of the
opinion that upstream Bugzilla will never need to support it. If you removed
all of the if...sybase, I don't think anybody would complain.

>+sub DateFormat {
>+    my ($data) = @_;
>+    # For compatibility reasons, we force the format to
>+    my $format = "%Y.%m.%d %H:%i:%s";
>+    # for everyone.  Use Date::Parse and friends after retrieving it if you
>+    # need to convert it to another format.

  Eek. I can see why you do this, but I think that it is maybe not such a good
idea (see my previous comment for my reasoning). If Sybase support is the
reason, then I think we should scrap Sybase support.

>+# subroutine:   LockTable
> [snip]
>+# params:       $tables = hash structure containing table names and type of lock needed
>+#               { 'foo' => 'WRITE', 'bar' => 'READ', 'foo2 AS bar2' => 'READ' } 

  Is this a necessary syntax? Is it possible to just call LockTable() multiple
times in a row, or does that Not Work?

>+    if ($db_driver eq 'mysql') {
>+        $str = "LOCK TABLES ";

  If it's possible, I'd tend to think that this function should just *do* the
table locking, the SendSQL and everything. Is there a situation where LOCK
TABLE needs to be part of a larger statement? Other than that, I think it's a
really good idea and it looks. :-)

>+    } elsif ($db_driver eq 'Pg') {
>+       $str = "LOCK TABLE ";
>+        foreach my $key (keys %{$tablesref}) {
>+            push (@tables, "$key");
>+        }
>+        $str .= join(', ', @tables);
>+        return $str; 

  This LOCK mode will default to ACCESS EXCLUSIVE for PgSQL, which is a global
lock on that table. It means that nobody else can even *access* that table
while the lock is being held. I think that MySQL WRITE locks are close to PgSQL
EXCLUSIVE MODE and READ locks are close to PgSQL ACCESS SHARE MODE. I don't
know a lot about MySQL's locking. For PgSQL, I'd like to see people just use
PgSQL's inherent locking, except for when they need a guarantee that LastKey
won't change during their write. (In which case I think they should be using
transactions instead of locking, anyway.) However, for now you might want to
use the MODE statements.

  Also, does Bugzilla ever try to do a LOCK TABLES outside of a transaction? If
it does, that won't work for Pg.

  Man, that ol' DBCompat is getting pretty big. :-) It sure is impressive,
though.
(In reply to comment #94)
> For PgSQL, I'd like to see people just use
> PgSQL's inherent locking, except for when they need a guarantee that LastKey
> won't change during their write. (In which case I think they should be using
> transactions instead of locking, anyway.) However, for now you might want to
> use the MODE statements.
> 

LastKey appears to be calling currval() for Pg. This should be safe without any
special locking - see the Pg docs for currval(), which state:

"Return the value most recently obtained by nextval for this sequence in the
current session. (An error is reported if nextval has never been called for this
sequence in this session.) Notice that because this is returning a session-local
value, it gives a predictable answer even if other sessions are executing
nextval meanwhile."

and nextval() states:

"Advance the sequence object to its next value and return that value. This is
done atomically: even if multiple sessions execute nextval concurrently, each
will safely receive a distinct sequence value."


(I hope I haven't misunderstood what you said).
(In reply to comment #94)
> (From update of attachment 141703 [details] [diff] [review])
>   Okay, going over DBCompat.pm first, again.
> 
>   You might want to put a comment at the top of the file about the fact that
> not all of these functions are currently used in every SendSQL, but should be
> if they can be. Also, that Oracle support is nonexistent, currently, but
> perhaps give them a bug number to contribute to.
> 
> >+# Sybase:   Dave Miller <justdave@netscape.com>
> >+#       Gayathri Swaminath <gayathrik00@netscape.com>
> 
>   I think that justdave is at bugzilla.org, now.
> 

Will fix.

> >+sub LastKey {
> >+    my ($table, $column) = @_;
> >+    if ($db_driver eq 'mysql') {
> >+        return "SELECT LAST_INSERT_ID()";
> >+    } elsif ($db_driver eq 'Pg') {
> >+        my $seq = $table . "_" . $column . "_seq";
> >+        return "SELECT currval('$seq')";
> 
>   I think that this shouldn't include the SELECT, so that people can use the
> function as part of a larger SendSQL statement, if they want.
> 

Good point. Easy change.

> >+sub RegExp {
> 
>    My only concern with this function is that someday we might run into a
> database that doesn't do RegExps the same way as other DBs do, and we have to
> actually mangle the pattern itself. Honestly, I can't say if this will ever
> happen, since I don't know how databases do RegExps besides PgSQL and MySQL.
> 

Well in the past I have not been able to find a way to do it with Oracle without
 using sort of external procedures. And then the syntax would be something like
Regex(column, pattern) which would then again be incompatible. So not that I
think of it having just the operator broken out would break if someone was using
Oracle if it returned and empty string. 

> >+# subroutine:   ToDays
> > [snip]
> >+# returns:      formatted SQL for TO_DAYS function. TO_DAYS($column)
> > for Mysql and just $column for Postgresql. (scalar)
> 
>   What it returns changed for PgSQL. :-)
> 

Will fix.

> >+sub If {
> > [snip]
> >+    elsif ($db_driver eq "Oracle") {
> >+        # Stub
> >+    }
> 
>   If you're going to do the Stub for this one, you probably ought to do it for
> the rest of them, too. :-) In fact, instead of doing a Stub here, you might
> want to see if there's a ThrowCodeError for "not_implemented" or something.
> 

Will change.

> >+# description:  Returns list of values from the enum data in Mysql 
> >                 or a particular table in PostgreSQL and others.
> >+#               Assumes that enums don't ever contain an apostrophe!
> 
>   Is that a safe assumption? (Did we actually make it before and it just wasn't
> documented?)
> 

I pulled the SplitEnumType straight out of globals.pl so I used the same
comment that was already there.

> >+# subroutine:   Interval
> > [snip]
> 
>   This looks great.
> 

Cool

> >+sub DBConnect {
> 
>   Hrm, is this supposed to be called by ConnectToDatabase? (That is, what's it
> for?)
> 

This was part of the merge of Jeroen's DBCompat.pm code so eventually it can be
called from within DB.pm to form the dsn line correctly. I hadn't implimented
that far yet.

> >+sub DaysDiff {
> >+    my ($lowdate, $highdate) = @_;
> >+    if ($db_driver eq "mysql") {
> 
>   For the MySQL and PgSQL versions, it might make more sense to implement this
> using the ToDays function that you wrote above instead of manually using the
> TO_DAYS SQL. (It's too bad that we have to have this function at all -- I see
> why, for Sybase, though.)

Ok, will change. 

> >+    elsif ($db_driver eq "Pg") {
> >+        return "COALESCE($one,'$two')";
> 
>   Why does this have a space before the $two?
> 
> >+# subroutine:   IfNullString
> >+# description:  Outputs SQL syntax for the Mysql IFNULL function based on
database type
> 
>    You might want to describe when this should be used instead of good ol'
> IfNull.
> 

Yeah, documentation incomplete. Will fix.

> >+sub IsNullString {
> 
>   Also, this sub is called "IsNullString" but the docs are for "IfNullString"
>

ditto
 
> >+sub Limit {
> > [snip]
> >+    elsif ($db_driver eq "Sybase") {
> >+        # Sybase doesn't support LIMIT, but we have a hack in SendSQL
> >+        # to remove it from the SQL string and emulate it from there
> >+        # using $::db->{rowcount}
> >+        return "LIMIT $limit";
> 
>   Hrm, I think that hack is only in justdave's version. I think that Sybase
> support in general requires a *lot* of hacks, and that justdave was of the
> opinion that upstream Bugzilla will never need to support it. If you removed
> all of the if...sybase, I don't think anybody would complain.
> 

I can remove the Sybase conditions if Dave doesn't think it will ever be merged
in. Dave?

> >+sub DateFormat {
> >+    my ($data) = @_;
> >+    # For compatibility reasons, we force the format to
> >+    my $format = "%Y.%m.%d %H:%i:%s";
> >+    # for everyone.  Use Date::Parse and friends after retrieving it if you
> >+    # need to convert it to another format.
> 
>   Eek. I can see why you do this, but I think that it is maybe not such a good
> idea (see my previous comment for my reasoning). If Sybase support is the
> reason, then I think we should scrap Sybase support.

If we pull out DateFormat altogether then this wont be a problem anymore.

> >+# subroutine:   LockTable
> > [snip]
> >+# params:       $tables = hash structure containing table names and type of
lock needed
> >+#               { 'foo' => 'WRITE', 'bar' => 'READ', 'foo2 AS bar2' => 'READ' } 
> 
>   Is this a necessary syntax? Is it possible to just call LockTable() multiple
> times in a row, or does that Not Work?

First stab. I think it is more efficient on the database side to just call it
altogether in one statement but I have not looked into whether it could be
separated out. Anyone tried this before?
 
> >+    if ($db_driver eq 'mysql') {
> >+        $str = "LOCK TABLES ";
> 
>   If it's possible, I'd tend to think that this function should just *do* the
> table locking, the SendSQL and everything. Is there a situation where LOCK
> TABLE needs to be part of a larger statement? Other than that, I think it's a
> really good idea and it looks. :-)
> 

It could, but I was trying to refrain from actually doing any real SQL calls
within DBcompat.pm and have it purely only return SQL strings. 

> >+    } elsif ($db_driver eq 'Pg') {
> >+       $str = "LOCK TABLE ";
> >+        foreach my $key (keys %{$tablesref}) {
> >+            push (@tables, "$key");
> >+        }
> >+        $str .= join(', ', @tables);
> >+        return $str; 
> 
>   This LOCK mode will default to ACCESS EXCLUSIVE for PgSQL, which is a global
> lock on that table. It means that nobody else can even *access* that table
> while the lock is being held. I think that MySQL WRITE locks are close to PgSQL
> EXCLUSIVE MODE and READ locks are close to PgSQL ACCESS SHARE MODE. I don't
> know a lot about MySQL's locking. For PgSQL, I'd like to see people just use
> PgSQL's inherent locking, except for when they need a guarantee that LastKey
> won't change during their write. (In which case I think they should be using
> transactions instead of locking, anyway.) However, for now you might want to
> use the MODE statements.
> 

Will look into that.

>   Also, does Bugzilla ever try to do a LOCK TABLES outside of a transaction? If
> it does, that won't work for Pg.
> 

Right, you need to be in a transaction to do table locks in Pg. I should
implement some sort of BeginWork(), Commit(), Rollback() functions also in
DBcompat.pm. These would need to be added in almost every CGI that does DB access.
 
(In reply to comment #96)
> > >+# Sybase:   Dave Miller <justdave@netscape.com>
> > >+#       Gayathri Swaminath <gayathrik00@netscape.com>
> > 
> >   I think that justdave is at bugzilla.org, now.
> 
> Will fix.

Actually, make those <davem00@aol.com> and <gayathrik00@aol.com>.  Neither
address will work after next week (Gayathri's has been gone for a year already),
but AOL owns the copyright, so any relicensing issues would need to go through
them unless I can get them to reassign copyright on it to MF like they did with
the Mozilla stuff (I'll work on that).

> > >+# description:  Returns list of values from the enum data in Mysql 
> > >                 or a particular table in PostgreSQL and others.
> > >+#               Assumes that enums don't ever contain an apostrophe!
> > 
> >   Is that a safe assumption? (Did we actually make it before and it just wasn't
> > documented?)
> > 
> 
> I pulled the SplitEnumType straight out of globals.pl so I used the same
> comment that was already there.

I would venture to say that we're intending for this to just go away in the
MySQL code as well.  However, if you've already got the compatibility hack for
it, there's no reason not to use this until then.

> > >+sub Limit {
> > > [snip]
> > >+    elsif ($db_driver eq "Sybase") {
> > >+        # Sybase doesn't support LIMIT, but we have a hack in SendSQL
> > >+        # to remove it from the SQL string and emulate it from there
> > >+        # using $::db->{rowcount}
> > >+        return "LIMIT $limit";
> > 
> >   Hrm, I think that hack is only in justdave's version. I think that Sybase
> > support in general requires a *lot* of hacks, and that justdave was of the
> > opinion that upstream Bugzilla will never need to support it. If you removed
> > all of the if...sybase, I don't think anybody would complain.
> > 
> 
> I can remove the Sybase conditions if Dave doesn't think it will ever be
> merged in. Dave?

I don't think we'll ever have Sybase support out of the box.  However, I have no
objection to including code that will make the hacking easier to do.

> > >+sub DateFormat {
> > >+    # For compatibility reasons, we force the format to
> > >+    my $format = "%Y.%m.%d %H:%i:%s";
> > >+    # for everyone.  Use Date::Parse and friends after retrieving it if you
> > >+    # need to convert it to another format.
> > 
> >   Eek. I can see why you do this, but I think that it is maybe not such a
> > good idea (see my previous comment for my reasoning). If Sybase support is
> > the reason, then I think we should scrap Sybase support.
> 
> If we pull out DateFormat altogether then this wont be a problem anymore.

You have to date format on Sybase in order to get accurate resolution, because
it'll format it for you if you don't ask it to, and the default format doesn't
even include seconds.  But yes, if Sybase is the only reason we need it, skip
it.  I still think there's a bit to be gained by using a common format to
retrieve the timestamps and letting the display code deal with formatting it. 
Date formatting tends to be someone people like to localize anyway.

> >   Is this a necessary syntax? Is it possible to just call LockTable() multiple
> > times in a row, or does that Not Work?
> 
> First stab. I think it is more efficient on the database side to just call it
> altogether in one statement but I have not looked into whether it could be
> separated out. Anyone tried this before?

You can't in MySQL.  Any time you call LOCK TABLES in MySQL it unlocks anything
you already had locked previously that's not mentioned in the current lock
statement.

> >   If it's possible, I'd tend to think that this function should just *do* the
> > table locking, the SendSQL and everything. Is there a situation where LOCK
> > TABLE needs to be part of a larger statement? Other than that, I think it's a
> > really good idea and it looks. :-)
> > 
> 
> It could, but I was trying to refrain from actually doing any real SQL calls
> within DBcompat.pm and have it purely only return SQL strings. 

Maybe LockTables should be a function in Bugzilla::DB.pm.  It should in turn
retrieve the SQL from DBCompat. :)  (just throwing ideas out, don't feel like
you have to follow it)

> >   Also, does Bugzilla ever try to do a LOCK TABLES outside of a transaction?
> > If it does, that won't work for Pg.
> 
> Right, you need to be in a transaction to do table locks in Pg. I should
> implement some sort of BeginWork(), Commit(), Rollback() functions also in
> DBcompat.pm. These would need to be added in almost every CGI that does DB
> access.  

Yes, MySQL (at the time all of Bugzilla was originally written) didn't have
transactions.  Using locks is how you create a transaction type environment in
MySQL (except that you don't get rollback).  We have autocommit turned on in the
DBI handle.
Frankly, I don't like what I'm reading in the recent comments here. Please don't
throw out any of the existing Sybase stuff in DBCompat.pm or limit your solution
in such a way that it would make adding Sybase support more difficult. The
appropriate solution to this bug should make Bugzilla work with virtually any
RDBMS, not just PostgreSQL and MySQL. Don't throw the baby out with the bath
water just because all you are personally concerned about is PostgreSQL.
(In reply to comment #98)
> Frankly, I don't like what I'm reading in the recent comments here. Please don't
> throw out any of the existing Sybase stuff in DBCompat.pm or limit your solution
> in such a way that it would make adding Sybase support more difficult.

  Ed --

  It's good to know that there are users interested in Sybase support. For right
now, I think our focus is to push out a version that supports *any* other
database, *at all*. :-) To that effect, I'd like to make the port to that
platform (Postgres) as easy as possible, and possibly grease other ports along
the way. 

Sybase implemented its database in a way that made sense at the time, but
doesn't make as much sense in the modern world. That doesn't mean that I think
that we should make life hard for people who want to port Bugzilla to Sybase --
quite the contrary. :-) I'd love to see Bugzilla work on every database in
existence. I'd also like to see a Postgres version get out the door, a version
with as few potential bugs as possible, that doesn't break the MySQL version. 

We're been waiting for PostgreSQL support for over two years, now. At this
point, if Sybase support will delay PostgreSQL support, then I think that we
should skip it. However, I don't see anything wrong with Sybase support in 2.20
or 2.22.

-M
dkl -- In regard to REPLACE, I just noticed comment 35 and bug 190432 -- I think
one of those has the fix.
(In reply to comment #100)
> dkl -- In regard to REPLACE, I just noticed comment 35 and bug 190432 -- I think
> one of those has the fix.

When transactions are in place for both PostgreSQL and MySQL with proper table
locking then this should no longer be a problem with race conditions right?
Probably will defer worrying about this one til the bigger picture is solved.
For those we are interested in MySQL <> PostgreSQL SQL differences, please see:

http://www.in-nomine.org/~asmodai/mysql-to-pgsql.txt

I need to update it again and rework it though.
(In reply to comment #99)
> (In reply to comment #98)
> > Frankly, I don't like what I'm reading in the recent comments here. Please don't
> > throw out any of the existing Sybase stuff in DBCompat.pm or limit your solution
> > in such a way that it would make adding Sybase support more difficult.
> 
>   Ed --
> 
>   It's good to know that there are users interested in Sybase support. For right
> now, I think our focus is to push out a version that supports *any* other
> database, *at all*. :-) To that effect, I'd like to make the port to that
> platform (Postgres) as easy as possible, and possibly grease other ports along
> the way. 
> 
> Sybase implemented its database in a way that made sense at the time, but
> doesn't make as much sense in the modern world. That doesn't mean that I think
> that we should make life hard for people who want to port Bugzilla to Sybase --
> quite the contrary. :-) I'd love to see Bugzilla work on every database in
> existence. I'd also like to see a Postgres version get out the door, a version
> with as few potential bugs as possible, that doesn't break the MySQL version. 
> 
> We're been waiting for PostgreSQL support for over two years, now. At this
> point, if Sybase support will delay PostgreSQL support, then I think that we
> should skip it. However, I don't see anything wrong with Sybase support in 2.20
> or 2.22.
> 
> -M


No worries, ed. I am not going to remove any previous Sybase work that is
currently there. But it has been a long time goal of mine to get Pg support into
the main code base so I think we should focus on Mysql and Pg for the time being
and worry about other support in future releases. If anyone is actively using
Sybase for Bugzilla (Dave?) and would be interested in testing then that would
be great. Otherwise I don't have access to a server myself.

-d
(In reply to comment #103)
> If anyone is actively using
> Sybase for Bugzilla (Dave?) and would be interested in testing then that would
> be great. Otherwise I don't have access to a server myself.

I lose access to mine in a week.

I do have the "developers edition" of Sybase ASE on my laptop, but I don't have
enough hard drive space to actually run it :)  So I could probably test again at
some point in the future when I come up with a bigger hard drive.
(In reply to comment #103)
> No worries, ed. I am not going to remove any previous Sybase work that is
> currently there. But it has been a long time goal of mine to get Pg support into
> the main code base so I think we should focus on Mysql and Pg for the time being
> and worry about other support in future releases. If anyone is actively using
> Sybase for Bugzilla (Dave?) and would be interested in testing then that would
> be great. Otherwise I don't have access to a server myself.

I've got no problems with focusing on PostgreSQL first, and I applaud the
efforts so far. My main concerns were about a couple of comments that "Bugzilla
would never support Sybase" (or words to that effect) followed by suggestions to
rip out the Sybase support that was already present. I'm glad to hear that that
will not be happening, Dave. Although I do have a particular interest in using
Bugzilla on Sybase, I would hope that I'm not speaking only for the prospective
Sybase Bugzilla users, but also for users of Oracle and other popular DBMSes.
The goal should be make Bugzilla as DBMS-agnostic as possible.

As far as date/time fields go, I strongly feel that you should consider NOT
using DBMS-specific date/time field types at all. I would strongly advocate
using char fields or float fields. I've been writing DBMS-agnostic database
applications for years, specifically Sybase, Oracle, INGRES, and MySQL, and, if
you steer clear of the things that the various DBMS implementations disagree
upon from the beginning, then you save yourself a lot of work. A char field
containing a date/time in ISO 8601 format (YYYY-MM-DD hh:mm:ss.ssss) is just as
effective (alphabetical sorting results in chronological sorting) and it will
work with every DBMS out there guaranteed. For some applications, to save on
storage space, I convert date/times to MJDs (Modified Julian Dates) and store
them as floats. Anyway, those are a couple of options that I feel should be
considered.
Here is my latest changes to DBcompat.pm so that people could chew on it over
the weekend. I have implemented LockTable and UnlockTable differently in that I
am performing the actual locks inside instead of simply returning the SQL
string.
This allows me to format it specifically for the database or return nothing if
we don't need it. I have only converted votes.cgi so far and it seems to work.
You can look at that file to see how the function is called.

Also cleaned up some things, added some more docs, throws code error for
functions not yet supported for a particular database.

More to come!
I will look at this, at some point. One comment is that it may be better off
going to transactions for everything, since we're going to have to audit all the
LOCK TABLE commands anyway to do this.

That requires innodb for mysql, though, and that is very, very, very, very slow.
Maybe they've improved it recently - anyone know?

Re Sybase, there was a whole lot of uglyness required, starting at the fact that
the DBD driver doesn't let you have two open statements at once. That one is a
killer. MySQL does support that (although by default it reads all the data into
RAM at once) Postgres < 7.4 doesn't support it at all, but DBD hides it by
storing all the data internally as a buffer. Or you could use cursors, but thats
an invasive change. I don't know if DBD for postgres >= 7.4 uses the max row
count option to fetch bits of a portal, rather than the entire thing at once.
Starting up a seccond DBI connection for Sybase doesn't work becaus you then
don't have LOCKs/transactions from the first.
(In reply to comment #107)
> Re Sybase, there was a whole lot of uglyness required, starting at the fact
> that the DBD driver doesn't let you have two open statements at once. That
> one is a killer.

Actually, the DBD does let you have multiple open statements.  But the core
C-level driver for Sybase doesn't.  The DBD will emulate it by opening
additional connections behind the scenes and not telling you (which of course
kills all hopes of maintaining locks and transactions).  You can, of course,
disable that behavior, and force it to only have one open connection, but then
you have all kinds of problems when you're dealing with looping over data in
hopes of doing something with it, unless you can afford to load all of said data
into RAM first.
I didn't realise that DBD did the multiple conenctions thing for you. Its still
silly and broken, though. Like I said, other DBs have DBD pretend to support this...
Attached patch Port CVS Head to PostgreSQL (v8) (obsolete) — Splinter Review
Change all table locking to use new LockTables() and UnlockTables in
DBcompat.pm. Added new subroutines StartTransaction() and EndTransaction() as a
first step towards transaction support. Seems to work correctly since DBI will
automatically switch over to AutoCommit=0 when a begin_work is called and back
to AutoCommit=1 when commited. Also changed LIMIT to support Mysql and
Postgresql style OFFSET.
Attachment #141703 - Attachment is obsolete: true
If we want to move on with this, I'd suggest us making a landing plan, which
involves perhaps breaking up the patch into smaller bits and tying them into
separate bugs for review. If the patches are small enough, I volunteer to review
them.
Comment on attachment 142476 [details] [diff] [review]
Port CVS Head to PostgreSQL (v8)

>Index: Attachment.pm
>@@ -76,9 +76,9 @@
>-              SELECT attach_id, DATE_FORMAT(creation_ts, '%Y.%m.%d %H:%i'),
>+              SELECT attach_id, " . Bugzilla::DB::DateFormat('creation_ts', '%Y.%m.%d %H:%i') . ", 

This was this way already, so we probably should fix it in another bug, but
shouldn't we be retrieving seconds on all our time retrievals?

>Index: Bug.pm
>@@ -148,7 +148,14 @@
>     where bugs.bug_id = $bug_id
>       AND products.id = bugs.product_id
>       AND components.id = bugs.component_id
>-    group by bugs.bug_id";
>+    GROUP BY 
>+      bugs.bug_id, alias, bugs.product_id, products.name, version, 
>+      rep_platform, op_sys, bug_status, resolution, priority, 
>+      bug_severity, bugs.component_id, components.name, assigned_to, 
>+      reporter, bug_file_loc, short_desc, target_milestone, 
>+      qa_contact, status_whiteboard, creation_ts, delta_ts,
>+      reporter_accessible, cclist_accessible,
>+      estimated_time, remaining_time";
> 
>   &::SendSQL($query);
>   my @row = ();

This completely toasted us on Sybase...  what does DISTINCT bug_id do if you
use that at the beginning of the query in place of the GROUP BY?  That's what
we ended up having to do for Sybase.

>@@ -211,44 +218,43 @@
>       $self->{'keywords'} = join(', ', @list);
>     }
>   }
>-
>   $self->{'attachments'} = Attachment::query($self->{bug_id});
> 
>   # The types of flags that can be set on this bug.
>   # If none, no UI for setting flags will be displayed.
>-  my $flag_types = 
>-    Bugzilla::FlagType::match({ 'target_type'  => 'bug', 
>-                                'product_id'   => $self->{'product_id'}, 
>-                                'component_id' => $self->{'component_id'} });
>-  foreach my $flag_type (@$flag_types) {
>-      $flag_type->{'flags'} = 
>-        Bugzilla::Flag::match({ 'bug_id'      => $self->{bug_id},
>-                                'type_id'     => $flag_type->{'id'},
>-                                'target_type' => 'bug' });
>-  }
>-  $self->{'flag_types'} = $flag_types;
>-  $self->{'any_flags_requesteeble'} = grep($_->{'is_requesteeble'}, @$flag_types);
>+  #my $flag_types = 
>+  #  Bugzilla::FlagType::match({ 'target_type'  => 'bug', 
>+  #                              'product_id'   => $self->{'product_id'}, 
>+  #                              'component_id' => $self->{'component_id'} });
>+  #foreach my $flag_type (@$flag_types) {
>+  #    $flag_type->{'flags'} = 
>+  #      Bugzilla::Flag::match({ 'bug_id'      => $self->{bug_id},
>+  #                              'type_id'     => $flag_type->{'id'},
>+  #                              'target_type' => 'bug' });
>+  #}
>+  #$self->{'flag_types'} = $flag_types;
>+  #$self->{'any_flags_requesteeble'} = grep($_->{'is_requesteeble'}, @$flag_types);
> 
>   # The number of types of flags that can be set on attachments to this bug
>   # and the number of flags on those attachments.  One of these counts must be
>   # greater than zero in order for the "flags" column to appear in the table
>   # of attachments.
>-  my $num_attachment_flag_types =
>-    Bugzilla::FlagType::count({ 'target_type'  => 'attachment',
>-                                'product_id'   => $self->{'product_id'},
>-                                'component_id' => $self->{'component_id'},
>-                                'is_active'    => 1 });
>-  my $num_attachment_flags =
>-    Bugzilla::Flag::count({ 'target_type'  => 'attachment',
>-                            'bug_id'       => $self->{bug_id} });
>-
>-  $self->{'show_attachment_flags'}
>-    = $num_attachment_flag_types || $num_attachment_flags;
>+  #my $num_attachment_flag_types =
>+  #  Bugzilla::FlagType::count({ 'target_type'  => 'attachment',
>+  #                              'product_id'   => $self->{'product_id'},
>+  #                              'component_id' => $self->{'component_id'},
>+  #                              'is_active'    => 1 });
>+  #my $num_attachment_flags =
>+  #  Bugzilla::Flag::count({ 'target_type'  => 'attachment',
>+  #                          'bug_id'       => $self->{bug_id} });
>+  #
>+  #$self->{'show_attachment_flags'}
>+  #  = $num_attachment_flag_types || $num_attachment_flags;
> 
>-  $self->{'milestoneurl'} = $::milestoneurl{$self->{product}};
>+  #$self->{'milestoneurl'} = $::milestoneurl{$self->{product}};
> 
>-  $self->{'isunconfirmed'} = ($self->{bug_status} eq $::unconfirmedstate);
>-  $self->{'isopened'} = &::IsOpenedState($self->{bug_status});
>+  #$self->{'isunconfirmed'} = ($self->{bug_status} eq $::unconfirmedstate);
>+  #$self->{'isopened'} = &::IsOpenedState($self->{bug_status});
>   
>   my @depends = EmitDependList("blocked", "dependson", $bug_id);
>   if (@depends) {

Why is the flag stuff all getting commented out?

>@@ -340,11 +346,11 @@
>              " LEFT JOIN user_group_map" .
>              " ON user_group_map.group_id = groups.id" .
>              " AND user_id = $::userid" .
>-             " AND NOT isbless" .
>+             " AND isbless = 0" .
>              " LEFT JOIN group_control_map" .
>              " ON group_control_map.group_id = groups.id" .
>              " AND group_control_map.product_id = " . $self->{'product_id'} .
>-             " WHERE isbuggroup");
>+             " WHERE isbuggroup = 1");

Heh, I see Postgres has the same problem with Booleans that Sybase does. :)

>Index: Bugzilla.pm
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla.pm,v
>retrieving revision 1.9
>diff -u -r1.9 Bugzilla.pm
>--- Bugzilla.pm	27 Nov 2003 01:00:59 -0000	1.9
>+++ Bugzilla.pm	27 Feb 2004 23:08:45 -0000
>@@ -29,6 +29,7 @@
> use Bugzilla::Config;
> use Bugzilla::Constants;
> use Bugzilla::DB;
>+use Bugzilla::DBcompat;
> use Bugzilla::Template;
> use Bugzilla::User;

Why are we adding Bugzilla::DBcompat here if we're not adding any references to
it?

>Index: CGI.pl
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/CGI.pl,v
>retrieving revision 1.209
>diff -u -r1.209 CGI.pl
>--- CGI.pl	30 Oct 2003 01:31:57 -0000	1.209
>+++ CGI.pl	27 Feb 2004 23:08:45 -0000
>@@ -313,21 +313,22 @@
>     die "Invalid id: $id" unless $id=~/^\s*\d+\s*$/;
> 
>     if (defined $starttime) {
>-        $datepart = "and bugs_activity.bug_when > " . SqlQuote($starttime);
>+        $datepart = "and bugs_activity.bug_when > TO_TIMESTAMP(" . SqlQuote($starttime) . ", 'YYYYMMDDHH24MISS') ";
>     }

Won't this break MySQL?  This should be a call into one of the compatibility
functions.  In fact, the entire date comparison should probably be a
compatibility function, since you have to do DATEDIFF() and check if the result
is positive or not on Sybase.

>@@ -157,10 +157,10 @@
> sub CleanTokenTable {
>-    &::SendSQL("LOCK TABLES tokens WRITE");
>-    &::SendSQL("DELETE FROM tokens 
>-                WHERE TO_DAYS(NOW()) - TO_DAYS(issuedate) >= " . $maxtokenage);
>-    &::SendSQL("UNLOCK TABLES");
>+    Bugzilla::DB::LockTables("LOCK TABLES tokens WRITE");
>+    &::SendSQL("DELETE FROM tokens WHERE " . Bugzilla::DB::ToDays('NOW()') . " - " . 
>+                Bugzilla::DB::ToDays('issuedate') . " >= " . $maxtokenage);
>+    Bugzilla::DB::UnlockTables();

OK, I realize this was already here, but what's the point of a lock here? 
Isn't a single DELETE statement atomic on its own?

>@@ -233,9 +233,9 @@
>     close SENDMAIL;
> 
>     # Delete the token from the database.
>-    &::SendSQL("LOCK TABLES tokens WRITE");
>+    Bugzilla::DB::LockTables("LOCK TABLES tokens WRITE");
>     &::SendSQL("DELETE FROM tokens WHERE token = $quotedtoken");
>-    &::SendSQL("UNLOCK TABLES");
>+    Bugzilla::DB::UnlockTables();
> }

ditto here.

>Index: attachment.cgi
>@@ -476,6 +477,11 @@
>+    if ($db_driver eq 'Pg') {
>+        use MIME::Base64;
>+        $thedata = decode_base64($thedata);
>+    }

Heh, this brings back memories.  We have to encode it as hexadecimal digits for
Sybase.

>@@ -837,15 +843,23 @@
>   $filename = SqlQuote($filename);
>   my $description = SqlQuote($::FORM{'description'});
>   my $contenttype = SqlQuote($::FORM{'contenttype'});
>-  my $thedata = SqlQuote($data);
>+  my $thedata = $::FORM{'data'};

Where did we lose $data from that we've going back to the FORM to get it?  If
we need to do this, use $cgi->param('data').  $::FORM is going away, if it
hasn't already, and you'll have less cvs conflicts :)

>@@ -961,15 +975,16 @@
>   # Get a list of flag types that can be set for this attachment.
>   SendSQL("SELECT product_id, component_id FROM bugs WHERE bug_id = $bugid");
>   my ($product_id, $component_id) = FetchSQLData();
>-  my $flag_types = Bugzilla::FlagType::match({ 'target_type'  => 'attachment' , 
>-                                               'product_id'   => $product_id , 
>-                                               'component_id' => $component_id });
>-  foreach my $flag_type (@$flag_types) {
>-    $flag_type->{'flags'} = Bugzilla::Flag::match({ 'type_id'   => $flag_type->{'id'}, 
>-                                                    'attach_id' => $::FORM{'id'} });
>-  }
>-  $vars->{'flag_types'} = $flag_types;
>-  $vars->{'any_flags_requesteeble'} = grep($_->{'is_requesteeble'}, @$flag_types);
>+  # PGSQL
>+#  my $flag_types = Bugzilla::FlagType::match({ 'target_type'  => 'attachment' , 
>+#                                               'product_id'   => $product_id , 
>+#                                               'component_id' => $component_id });
>+#  foreach my $flag_type (@$flag_types) {
>+#    $flag_type->{'flags'} = Bugzilla::Flag::match({ 'type_id'   => $flag_type->{'id'}, 
>+#                                                    'attach_id' => $::FORM{'id'} });
>+#  }
>+#  $vars->{'flag_types'} = $flag_types;
>+#  $vars->{'any_flags_requesteeble'} = grep($_->{'is_requesteeble'}, @$flag_types);

Again we have flagtype stuff commented out.

>Index: buglist.cgi
>@@ -220,12 +220,12 @@
>     return if !$userid;
> 
>     SendSQL("
>-        SELECT DISTINCT  groups.id, name, description, isactive
>-                   FROM  groups, user_group_map
>-                  WHERE  user_id = $userid AND NOT isbless
>-                    AND  user_group_map.group_id = groups.id
>-                    AND  isbuggroup
>-               ORDER BY  description ");
>+        SELECT  groups.id, name, description, isactive
>+          FROM  groups, user_group_map
>+         WHERE  user_id = $userid AND isbless = 0
>+           AND  user_group_map.group_id = groups.id
>+           AND  isbuggroup = 1
>+      ORDER BY  description ");

Why did we lose the DISTINCT here?  Was it unnecessary to begin with?

>Index: collectstats.pl

reminder for myself that I left off here. :)

More to come...
(In reply to comment #110)

Surely the correct way to do attachments in Postgres is to use a field of type
BYTEA, which does binary data quite happily. Alternatively, Postgres has its own
lightning fast base64 encode and decode routines. Using MIME::Base64 routines
seems ... odd. Let the database do the work for you.
(In reply to comment #112)
> (From update of attachment 142476 [details] [diff] [review])
> >Index: Attachment.pm
> >@@ -76,9 +76,9 @@
> >-              SELECT attach_id, DATE_FORMAT(creation_ts, '%Y.%m.%d %H:%i'),
> >+              SELECT attach_id, " . Bugzilla::DB::DateFormat('creation_ts',
'%Y.%m.%d %H:%i') . ", 
> 
> This was this way already, so we probably should fix it in another bug, but
> shouldn't we be retrieving seconds on all our time retrievals?
> 

Added %s to the format string. 

> >Index: Bug.pm
> >@@ -148,7 +148,14 @@
> >     where bugs.bug_id = $bug_id
> >       AND products.id = bugs.product_id
> >       AND components.id = bugs.component_id
> >-    group by bugs.bug_id";
> >+    GROUP BY 
> >+      bugs.bug_id, alias, bugs.product_id, products.name, version, 
> >+      rep_platform, op_sys, bug_status, resolution, priority, 
> >+      bug_severity, bugs.component_id, components.name, assigned_to, 
> >+      reporter, bug_file_loc, short_desc, target_milestone, 
> >+      qa_contact, status_whiteboard, creation_ts, delta_ts,
> >+      reporter_accessible, cclist_accessible,
> >+      estimated_time, remaining_time";
> > 
> >   &::SendSQL($query);
> >   my @row = ();
> 
> This completely toasted us on Sybase...  what does DISTINCT bug_id do if you
> use that at the beginning of the query in place of the GROUP BY?  That's what
> we ended up having to do for Sybase.
> 

This still doesn't work with PostgreSQL. Adding DISTINCT and removing the GROUP
BY causes SQL error because of the sum() function around votes.count.


> >+  #my $flag_types = 
> >+  #  Bugzilla::FlagType::match({ 'target_type'  => 'bug', 
> >+  #                              'product_id'   => $self->{'product_id'}, 
> >+  #                              'component_id' => $self->{'component_id'} });
> >+  #foreach my $flag_type (@$flag_types) {
> >+  #    $flag_type->{'flags'} = 
> >+  #      Bugzilla::Flag::match({ 'bug_id'      => $self->{bug_id},
> >+  #                              'type_id'     => $flag_type->{'id'},
> >+  #                              'target_type' => 'bug' });
> >+  #}
> >+  #$self->{'flag_types'} = $flag_types;
> >+  #$self->{'any_flags_requesteeble'} = grep($_->{'is_requesteeble'},
@$flag_types);
>
> Why is the flag stuff all getting commented out?
>

Fixed in my latest code. PostgreSQL was failing complaining of missing a JOINed
table in one of the LEFT JOIN clauses. Apparently MySQL was including it
automatically and Pg couldn't figure it out. I had the code commented out so
that I would come back and visit the problem later.
 
> >Index: Bugzilla.pm
> >===================================================================
> >RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla.pm,v
> >retrieving revision 1.9
> >diff -u -r1.9 Bugzilla.pm
> >--- Bugzilla.pm	27 Nov 2003 01:00:59 -0000	1.9
> >+++ Bugzilla.pm	27 Feb 2004 23:08:45 -0000
> >@@ -29,6 +29,7 @@
> > use Bugzilla::Config;
> > use Bugzilla::Constants;
> > use Bugzilla::DB;
> >+use Bugzilla::DBcompat;
> > use Bugzilla::Template;
> > use Bugzilla::User;
> 
> Why are we adding Bugzilla::DBcompat here if we're not adding any references to
> it?

Removed. Not necessary.  

> >Index: CGI.pl
> >===================================================================
> >RCS file: /cvsroot/mozilla/webtools/bugzilla/CGI.pl,v
> >retrieving revision 1.209
> >diff -u -r1.209 CGI.pl
> >--- CGI.pl	30 Oct 2003 01:31:57 -0000	1.209
> >+++ CGI.pl	27 Feb 2004 23:08:45 -0000
> >@@ -313,21 +313,22 @@
> >     die "Invalid id: $id" unless $id=~/^\s*\d+\s*$/;
> > 
> >     if (defined $starttime) {
> >-        $datepart = "and bugs_activity.bug_when > " . SqlQuote($starttime);
> >+        $datepart = "and bugs_activity.bug_when > TO_TIMESTAMP(" .
SqlQuote($starttime) . ", 'YYYYMMDDHH24MISS') ";
> >     }
> 
> Won't this break MySQL?  This should be a call into one of the compatibility
> functions.  In fact, the entire date comparison should probably be a
> compatibility function, since you have to do DATEDIFF() and check if the result
> is positive or not on Sybase.
> 

Removed the TO_TIMESTAMP. Pg seems to work happily without it.

> >@@ -157,10 +157,10 @@
> > sub CleanTokenTable {
> >-    &::SendSQL("LOCK TABLES tokens WRITE");
> >-    &::SendSQL("DELETE FROM tokens 
> >-                WHERE TO_DAYS(NOW()) - TO_DAYS(issuedate) >= " . $maxtokenage);
> >-    &::SendSQL("UNLOCK TABLES");
> >+    Bugzilla::DB::LockTables("LOCK TABLES tokens WRITE");
> >+    &::SendSQL("DELETE FROM tokens WHERE " . Bugzilla::DB::ToDays('NOW()') .
" - " . 
> >+                Bugzilla::DB::ToDays('issuedate') . " >= " . $maxtokenage);
> >+    Bugzilla::DB::UnlockTables();
> 
> OK, I realize this was already here, but what's the point of a lock here? 
> Isn't a single DELETE statement atomic on its own?
> 
> >@@ -233,9 +233,9 @@
> >     close SENDMAIL;
> > 
> >     # Delete the token from the database.
> >-    &::SendSQL("LOCK TABLES tokens WRITE");
> >+    Bugzilla::DB::LockTables("LOCK TABLES tokens WRITE");
> >     &::SendSQL("DELETE FROM tokens WHERE token = $quotedtoken");
> >-    &::SendSQL("UNLOCK TABLES");
> >+    Bugzilla::DB::UnlockTables();
> > }
> 
> ditto here.
> 

Not sure about this one. Tried to find some real world cases supporting whether
it is necessary or not. I feel it may be necessary when dealing with tables that
have a primary key as one of the columns.

> >@@ -837,15 +843,23 @@
> >   $filename = SqlQuote($filename);
> >   my $description = SqlQuote($::FORM{'description'});
> >   my $contenttype = SqlQuote($::FORM{'contenttype'});
> >-  my $thedata = SqlQuote($data);
> >+  my $thedata = $::FORM{'data'};
> 
> Where did we lose $data from that we've going back to the FORM to get it?  If
> we need to do this, use $cgi->param('data').  $::FORM is going away, if it
> hasn't already, and you'll have less cvs conflicts :)
>

Fixed.
 
> >Index: buglist.cgi
> >@@ -220,12 +220,12 @@
> >     return if !$userid;
> > 
> >     SendSQL("
> >-        SELECT DISTINCT  groups.id, name, description, isactive
> >-                   FROM  groups, user_group_map
> >-                  WHERE  user_id = $userid AND NOT isbless
> >-                    AND  user_group_map.group_id = groups.id
> >-                    AND  isbuggroup
> >-               ORDER BY  description ");
> >+        SELECT  groups.id, name, description, isactive
> >+          FROM  groups, user_group_map
> >+         WHERE  user_id = $userid AND isbless = 0
> >+           AND  user_group_map.group_id = groups.id
> >+           AND  isbuggroup = 1
> >+      ORDER BY  description ");
> 
> Why did we lose the DISTINCT here?  Was it unnecessary to begin with?
>

Fixed. 


(In reply to comment #114)
> (In reply to comment #112)
> > This completely toasted us on Sybase...  what does DISTINCT bug_id do if you
> > use that at the beginning of the query in place of the GROUP BY?  That's what
> > we ended up having to do for Sybase.
> 
> This still doesn't work with PostgreSQL. Adding DISTINCT and removing the GROUP
> BY causes SQL error because of the sum() function around votes.count.

OK, I'd rather have MySQL and Postgres both working than Sybase. :)  Like I've
said before, Sybase is just too different - it'll never work out of the box.  We
can get it as close as we can though.
I'd be willing to bet anyone stuck using Sybase will have people on site capable
of completing the conversion. :)
dkl --

Do you think that you could possibly split DBCompat.pm and the standard MySQL
changes out into two other patches? I think it'd be way easier to push this in
as three patches (1st DBCompat.pm, 2nd MySQL changes, 3rd PostgreSQL changes).

Of course, that doesn't mean that patch 2 can't modify patch 1, etc., but it'd
just be nice to be able to push *something* in. :-)

If we could get DBCompat.pm in 2.18, that'd be pretty nice. (I can always dream...)
Blocks: 189388
This is close enough that it might make it, and I'd love to see it get in. 
Putting in the blocking queue for a later decision after seeing how far we get. :)
Flags: blocking2.18?
(In reply to comment #116)
> dkl --
> 
> Do you think that you could possibly split DBCompat.pm and the standard MySQL
> changes out into two other patches? I think it'd be way easier to push this in
> as three patches (1st DBCompat.pm, 2nd MySQL changes, 3rd PostgreSQL changes).
> 
> Of course, that doesn't mean that patch 2 can't modify patch 1, etc., but it'd
> just be nice to be able to push *something* in. :-)
> 
> If we could get DBCompat.pm in 2.18, that'd be pretty nice. (I can always
dream...)


Not sure how each this would be since most of the PostgreSQL changes require the
DBcompat.pm module to be there. But we could just include the DBcompat.pm module
with 2.18 once it is stabilized and then that way people can use it for new
enhancements as needed and then I can get the other changes in post 2.18. I plan
on putting the new PG changes into production here at RH once 2.18 is out so it
should get some good testing past that point. Also if we get the DBcompat.pm in
there before, it shouldnt cause any problems, would just be an extra module
laying around. Should I file a separate bug and attach DBcompat.pm to it or
something else similar already out there?
I think that bug 211769 is almost what we're looking for there, but I think that
DBCompat.pm should probably just get its own bug. File it, and ask for review &
approval, and we'll see if we can get it in for 2.18. I think that just the
existence of the file in Bugzilla should be easy to get approval for.
(In reply to comment #119)
> I think that bug 211769 is almost what we're looking for there, but I think that
> DBCompat.pm should probably just get its own bug. File it, and ask for review &
> approval, and we'll see if we can get it in for 2.18. I think that just the
> existence of the file in Bugzilla should be easy to get approval for.

Created bug 237862 specifically for the DBcompat.pm module. Please put further
comments regarding that code in that report. Hopefully we can at least get
review going on that one for possible inclusion into next release.
Adding bug 237862 to blocker list. I will submit a new patch to this bug that
has DBcompat.pm removed since it will be reviewed in the separate bug.
Depends on: 534, bz-dbcompat
:: sigh ::  :)
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
An interesting thing I just discovered on my PgSQL Bugzilla installation --

You must CREATE DATABASE bugs WITH ENCODING = 'SQL_ASCII' (which is the default
if you just type CREATE DATABASE bugs;). If you use 'UNICODE', Bugzilla will die
when you try to insert Unicode characters higher than 0x10000.

I thought I should note this here.
Surely it only dies on invalid unicode bits? you'd want UTF-8 if we were going
to do this.

I have this feeling that postgre disables certain optimisations for non-ascii
dbs, though. I know if does if you have a non-default sort order.
According to http://www.postgresql.com/docs/7.4/static/multibyte.html, UNICODE
for the charset given to Postgres means utf-8.
If a miracle happens I'll be happy to change my mind, but I'm not gonna hold
2.18 for this ;)
Flags: blocking2.18? → blocking2.18-
Depends on: bz-dbschema
No longer depends on: 534
Depends on: 247998, 248001
No longer depends on: 247998
Depends on: bz-dbinstall
Blocks: bz-oracle
Depends on: 250832
Depends on: 250547
Depends on: 251960
Depends on: 250591
Depends on: 182136
Depends on: 253357
Depends on: 253360
Bugzilla 2.20 feature set is now frozen as of 15 Sept 2004.  Anything flagged
enhancement that hasn't already landed is being pushed out.  If this bug is
otherwise ready to land, we'll handle it on a case-by-case basis, please set the
blocking2.20 flag to '?' if you think it qualifies.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
Flags: documentation?
Flags: documentation?
Blocks: 274266
So people.

Where are we standing with this? : )
We(In reply to comment #128)
> So people.
> 
> Where are we standing with this? : )

Well... I have Bugzilla running on Postgres for a while now in production
environment, with just minor problems I'm just working on to resolve (e-mails
are case sensitive, full text search is broken etc.). For most of the other
changes, the patches are submitted, waiting for review and bit-rotting... Have
some time for reviews? ;-)
Depends on: 280493
Attachment #141854 - Attachment is obsolete: true
Attachment #142476 - Attachment is obsolete: true
No longer depends on: bz-custres
Depends on: 281550
> Well... I have Bugzilla running on Postgres for a while now in production
> environment, with just minor problems I'm just working on to resolve (e-mails
> are case sensitive, full text search is broken etc.). For most of the other
> changes, the patches are submitted, waiting for review and bit-rotting... Have
> some time for reviews? ;-)

The patches seem to have bitrotted - does anyone have any thay apply against 2.18?
(In reply to comment #130)
> 
> The patches seem to have bitrotted - does anyone have any thay apply against 2.18?
> 

If you mean patches at this bug, yes, they are very bitrotted, and also
obsolete. Look at the blockers of this bug, you will find newer. But do not
expect them to apply against 2.18, they are against the cvs tip. And they are
still not complete, if you think about geting bz runing on postgres...
No longer blocks: 189388
Depends on: 283076
Assignee: dkl → mkanat
Summary: Make Bugzilla's SQL compatible with PostgreSQL (PgSQL) → Make Bugzilla's SQL fully compatible with PostgreSQL (PgSQL)
Depends on: 284125
Depends on: 284244
Depends on: 285397
Depends on: 285401
Depends on: 285403
Depends on: 285407
Depends on: 285411
Depends on: 285523
No longer depends on: 285523
Depends on: 285532
Depends on: 285534
Depends on: 285552
Depends on: 285555
Depends on: 285678
Depends on: 285692
Depends on: 285695
Depends on: 285705
No longer depends on: 283076
Depends on: 285740
Depends on: 285748
Depends on: 286235
Depends on: 286360
Depends on: 286392
Depends on: 286490
Depends on: 286686
Depends on: 286695
Depends on: 287138
Depends on: 289042
Depends on: 289043
OK, the focus of this bug will now be to have *basic* PostgreSQL compatibility
for 2.20. That is, to the best of our ability as the Bugzilla developer and
testing community, we should have a version of Bugzilla that works on PostgreSQL.

Once bug 286695 is fixed, PostgreSQL support should be ready for general testing.
Status: NEW → ASSIGNED
Summary: Make Bugzilla's SQL fully compatible with PostgreSQL (PgSQL) → Make Bugzilla's SQL compatible with PostgreSQL (PgSQL)
Target Milestone: Bugzilla 2.22 → Bugzilla 2.20
Attachment #135422 - Attachment is obsolete: true
Attachment #130979 - Attachment is obsolete: true
Attachment #130925 - Attachment is obsolete: true
Attachment #100290 - Attachment is obsolete: true
Attachment #107199 - Attachment is obsolete: true
Depends on: 289012
Depends on: 291803
[Also posted to the mozilla-webtools newsgroup and the developers list.]

        As of today's CVS version, Bugzilla's PostgreSQL support is
ready for beta testing. There are still some known bugs -- see the
blockers on this for details.

        If you encounter any bugs while using Bugzilla on PostgreSQL, please
report them to bugzilla.mozilla.org and make them block this bug. Also
CC me on them. 

        *Don't* add comments to this bug about problems you encounter. File a
new bug instead, per the above instructions.

        The steps for setting up a Bugzilla installation on PostgreSQL are very
simple:

        (0) Connect to the template1 database in PostgreSQL as a PostgreSQL
superuser. (That's usually the "postgres" user.)

        (1) Create a 'bugs' user:

        CREATE USER bugs PASSWORD 'password' CREATEDB;

        (2) Make sure that your pg_hba.conf allows md5 login from the local
machine, from the 'bugs' user:

        host all bugs 127.0.0.1 255.255.255.255 md5

        (3) Make sure that your postgresql.conf allows TCP/IP connections from
the local machine:

        tcpip_socket = true

        (4) Stop PostgreSQL, and then start it again.

        (5) Set up Bugzilla normally, editing localconfig and so forth per the
instructions in the Bugzilla Guide.

        Have fun!
(In reply to comment #133)
> [Also posted to the mozilla-webtools newsgroup and the developers list.]
> 
>         As of today's CVS version, Bugzilla's PostgreSQL support is
> ready for beta testing. 

woohoo!!!


> 
>         (2) Make sure that your pg_hba.conf allows md5 login from the local
> machine, from the 'bugs' user:
> 
>         host all bugs 127.0.0.1 255.255.255.255 md5

Or from postgres 7.4 on:

          host all bugs 127.0.0.1/32 md5

> 
>         (3) Make sure that your postgresql.conf allows TCP/IP connections from
> the local machine:
> 
>         tcpip_socket = true

This setting disappeared in postgres version 8.0, which in any case listens to
localhost by default, unlike previous versions.

Depends on: 292119
Depends on: 292715
Depends on: 292718
Depends on: 292768
Depends on: 293015
Blocks: majorbugs
No longer blocks: majorbugs
Depends on: 296039
Depends on: 296054
Depends on: 296075
Component: Bugzilla-General → Database
Depends on: 301062
Depends on: 301067
Depends on: 301141
As far as I'm concerned, this is actually done. There are a few features that
don't yet work on PostgreSQL (new charts and whining), but the major effort to
port Bugzilla to PostgreSQL is complete. All of the most-important features
(creating bugs, searching for bugs, and editing bugs) work very well, or at
least well-enough to use in a production environment. PostgreSQL support has
started to be considered a regular part of Bugzilla development, and within a
few months should be completely up to par with MySQL support.

Future bugs filed against PostgreSQL, instead of blocking this bug, should be
filed in the Database component and should have their summary start with
"[PostgreSQL]".

I'd like to thank everybody who ever helped work on this; this was one of the
most significant projects in Bugzilla's history, and now that it's in place,
future cross-DB work will be so much easier. :-)

Well done, everybody. :-)
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
No longer depends on: 301062, 301067, 301141
Resolution: --- → FIXED
Summary: Make Bugzilla's SQL compatible with PostgreSQL (PgSQL) → Make Bugzilla's SQL generally compatible with PostgreSQL (PgSQL)
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.