Closed Bug 146679 (bz-dbschema) Opened 22 years ago Closed 19 years ago

Reusable, structured, database-independent schema

Categories

(Bugzilla :: Bugzilla-General, enhancement, P2)

2.15
enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: CodeMachine, Assigned: edwardjsabol)

References

(Blocks 3 open bugs)

Details

Attachments

(4 files, 9 obsolete files)

The current schema in checksetup.pl would be useful for other things.

For example, foreign key references could be used by sanity check, as well as
bug deletion, to automatically work out subsidiary tables.

To do this we need to include foreign key references.  That should not be
difficult, you add "PRIMARY KEY(fieldname)" to the tables, and then elsewhere
REFERENCES tablename to the end of the field line.

Now they may not be enforced in MySQL (that is bug #109473), but it would mean
these references would be centralised.for the various purposes they can be used for.

Now to get this information back it would be necessary to parse the schema, and
that's too complicated.  Therefore it would be better for the schema to be
structured and deparsed when used in checksetup.

An additional benefit would be you wouldn't need to specify the field type
information when adding a field during an upgrade, just the field name, and you
look up the field type from the schema (and avoid potential bugs if it was
inconsistent between the schema and the upgrade code).
We can't do that with forieng keys, because then if we ever change them, how
would we specifiy that? What if we add them to the schema, then someone updates
toa  version of mysql which supports them. How do we 'know' which one of those
the db knows about?

I don't want to add foreign key stuff until we support a db which handles them.
Note that postgres can't modify foreign keys (its alter table stuff sucks), so
we don't really want these to start with.
To my knowledge MySQL will silently allow but ignore referential constraints
that have been specified.  I have used foreign key specifications in MySQL
before.

The point is the presence of foreign key specifications is useful for sanity
check and so on without touching the database and therefore caring whether the
database enforces referential integrity.

At the moment the foreign key information is already in sanity check and also
the bug table references in bug deletion (although incomplete there), and
eventually will also be in the schema once we support referential integrity.  So
why not just have them in the schema?
Okay, I thought I'd bring some ideas from bug 98304 over here, and expand the
scope of this bug just a little to fulfill the needs of that other bug. 

The idea is that as we add more and more databases, one of the great hurdles is
creating the actual tables in checksetup. We don't want to have three
checksetups for three different databases, and we don't want checksetup to grow
by thousands of lines for every new DB.

The current idea is to create a system of schema-generation that is
database-independent. Then we'd have a structured schema that was useful as an
object inside of Bugzilla, as well as having a way to create schemas across
databases. I'll move some attachments and some blockers over here in the next
few comments.

-M
Summary: Reusable (structured?) schema. → Reusable (structured?) database-independent schema.
Jeroen Ruigrok van der Werven was the original poster of this attachment in bug
98304 comment 67. This is a pgsetup.pl file, similar to checksetup.pl, but it
supports PgSQL. The differences between at least these two databases are seen
more easily by comparing this file to checksetup.pl as it exists.
This is the best example we have, so far, of what we're moving toward.

The primary difference between this and what would be best to implement is that
the implementation should be more object-oriented, so that any part of bugzilla
can get information about the schema.
Depends on: bz-postgres
This bug has certain dependencies that were originally part of bug 98304. -M
Depends on: bz-enums
No longer depends on: bz-postgres
(In reply to comment #5)
> This is the best example we have, so far, of what we're moving toward.
> 
> The primary difference between this and what would be best to implement is that
> the implementation should be more object-oriented, so that any part of bugzilla
> can get information about the schema.

I agree that a more object-oriented approach would be a good idea.

I suggest someone take a look at the CPAN module Dimedis::Ddl. It's quite unfortunate that the 
documentation is written in German, but, if you look at the code and the examples, it sure seems a 
lot like what we want here. It's object-oriented and even supports Sybase, Oracle, and Mysql out of 
the box. Even if Dimedis::Ddl doesn't offer the exact functionality or have the exact interface that 
we want, it should provide a lot of inspiration for someone.

http://search.cpan.org/~jred/Dimedis-Ddl-0.35/
Blocks: bz-sybase
(In reply to comment #5)
> Created an attachment (id=140047)
> Proof of Concept code for DB independent schema generation
> 
> This is the best example we have, so far, of what we're moving toward.
> 

I'm glad you like it :-)

> The primary difference between this and what would be best to implement is that
> the implementation should be more object-oriented, so that any part of bugzilla
> can get information about the schema.

I agree that it should be OOified. However, I don't think that exposing schema
specifics to the whole of the Bugzilla code is the best approach,
architecturally. I would rather see a move in the reverse direction - to remove
all knowledge of the database from most of the code and simply present a
procedural interface. That interface code would know about schema specifics, but
the rest of the code would not. Perhaps that's a bit too big a change, though.
(In reply to comment #7)
> 
> I suggest someone take a look at the CPAN module Dimedis::Ddl. It's quite
unfortunate that the 
> documentation is written in German, but, if you look at the code and the
examples, it sure seems a 
> lot like what we want here. It's object-oriented and even supports Sybase,
Oracle, and Mysql out of 
> the box. Even if Dimedis::Ddl doesn't offer the exact functionality or have
the exact interface that 
> we want, it should provide a lot of inspiration for someone.
> 

It doesn't appear to have a PostgreSQL driver.

(In reply to comment #9)
> It doesn't appear to have a PostgreSQL driver.

Based on perusing the DDL drivers for Sybase, Oracle, Mysql, MS SQL, I would hazard to say that it 
wouldn't be difficult for someone to make a PostgreSQL driver for it, utilizing much of what has 
already been learned with your proof-of-concept schema generation code and DBCompat.pm as a 
knowledge base. Besides, I was more suggesting that the Dimedis::Ddl module be used as 
inspiration for the Bugzilla implementation. It's got a nice OO design. And for the record, support 
for Oracle and Sybase is equally important as PostgreSQL in my mind and none of the other 
proposed solutions have even begun to address those.
I would be open to the Ddl library as a basis for something that we could ship
with Bugzilla, provided that its license is amenable to that. (We couldn't just
use the CPAN version, since we'd have to depend on our changes getting there,
something that isn't good for speed of release or ease of support.)

Also, I agree with Andrew from comment 8 -- the ideal is that BZ should not even
know that it's running on a database, just that it has some sort of "persistence
layer." However, that's in the future -- what I think would be good now is PgSQL
support ASAP, and if that comes with Oracle + Sybase support, all the better. :-)

-M
(In reply to comment #11)
> I would be open to the Ddl library as a basis for something that we could ship
> with Bugzilla, provided that its license is amenable to that.

The Dimedis::Ddl README says:

    This library is free software; you can redistribute it and/or
    modify it under the same terms as Perl itself.

Since Perl is distributed jointly under either the Artistic or GPL licenses, I think that means we're 
safe to use it as a basis for Bugzilla.

> (We couldn't just use the CPAN version, since we'd have to depend on our changes
> getting there, something that isn't good for speed of release or ease of support.)

I agree totally. Plus the module's namespace just isn't very aesthetically pleasing... :-)
Bugzilla::DDL anyone?

> Also, I agree with Andrew from comment 8 -- the ideal is that BZ should not even
> know that it's running on a database, just that it has some sort of "persistence
> layer."

This discussion is probably for a different bug, but has anyone looked at Class::DBI? You'd 
practically have to rewrite most of Bugzilla, but, man, that would be sweet. Getting back on topic to  
database-independent schema gneration (somewhat tangentially), someone recently released a 
CPAN module called Class::DBI::DDL. Unfortunately, it doesn't abstract the column types and 
attributes in a way that is needed for true database-independence, so it would need a fair amount 
of work (more than Dimedis::Ddl), but someone might want to take a glance at that too. It can be 
used independently of Class::DBI, FWIW.

> what I think would be good now is PgSQL support ASAP, and if
> that comes with Oracle + Sybase support, all the better. :-)

I can live with that approach.
I spent a little while to make the proof of conept code I wrote more
object-oriented. Extra methods should be able to be added easily for schema
discovery etc. if required. Right now the only significant method is the
generation of all the DLL required for a particular DB - the 2 examples being
MySQL and PostgreSQL. Demo code use at the bottom.
(In reply to comment #13)
> Created an attachment (id=140325)
> OOified proof of concept code

It's barely OO. I still think the Dimedis::Ddl object design is more impressive, but this is admittedly 
simpler and may be easier to maintain and extend to other DBMS's.

Make decimal(5,2) a DB-specific data type. Might want to add INT2 and INT4 types as well.

Maybe change package name from Bugzilla::DBSchema to Bugzilla::DB::Schema?

Anyone have any ideas on what methods this object class should support? Besides the method for 
creating the schema, of course. Someone mentioned methods for introspection, but I find it hard to 
believe
Attached file A slightly more OO schema class design (obsolete) —
I took Andrew's OO proof-of-concept and fleshed it out a little more, made it a
little more object-oriented, IMHO.
I'm OK with the changes Ed made, although I'm not certain they add much.

It just occurred to me that we might be able to reduce the number of type fixups
required by using type aliases, via Postgres's (SQL standard) CREATE DOMAIN.

e.g. 

  CREATE DOMAIN longblob AS bytea;

Just a thought.
If you feel that this is ready and useful, I would suggest that you ask bbaetz
to review it.

I think it looks like a good idea, and is generally in the direction of where we
were going with discussion on bug 98304.
Blocks: bz-postgres
We have several bugs discussing this sort of thing. I'd prefer that we just had
one... The other one has more discussion
(In reply to comment #18)
> We have several bugs discussing this sort of thing. I'd prefer that we just had
> one... The other one has more discussion

Could you be more specific? Nothing comes up when searching for either of the
keywords "schema" or "DDL".
Blocks: bz-oracle
Comment on attachment 142116 [details]
A slightly more OO schema class design


Oooh, I like this. :)

>      UNIQUE => [qw(user_id group_id isderived isbless)],

Unique indexes should probably have names, just like normal indexes do.
I would define these the same way you do regular indexes.  Regular indexes can
also be multi-column, and you can have more than one unique index, also.

I think it would be nice to get references included in here somehow, too...

>     longdescs => 
>     {
>      FIELDS => [
>		 bug_id => 	{TYPE => 'integer',  NOTNULL => 1,
				 REFERENCES => ['bugs','bug_id'] },
>		 who =>		{TYPE => 'integer', NOTNULL => 1,
				 REFERENCES => ['profiles','userid'] },

etc. ?

>		 bug_when => 	{TYPE => 'DATETIME', NOTNULL => 1 },
>		 work_time =>	{TYPE => 'decimal(5,2)', 
>				 NOTNULL => 1,  DEFAULT => '0'},
>		 thetext =>	{TYPE => 'text'},
>		 isprivate => 	{TYPE => 'smallint',
>				 NOTNULL => 1, DEFAULT => '0' }

MySQL has no booleans...  a lot of these "smallint"s should be booleans.
(probably every field that starts with "is").

>      INDEXES => {
>		  longdescs_bugid_idx => ['bug_id'],
>		  longdescs_who_idx => ['who'],
>		  longdescs_bugwhen_idx => ['bug_when'],
>		 },

Here's a good example of your indexes, for comparison with the uniques...
Unique is just a special kind of index. :)

>sub ddl_statements {

>	$crst .= "\n\tUNIQUE \(" . join(", ",@$unique) . "\)\n"  
>	  if ($unique);

This should be like the CREATE INDEX below:

>	while (my($iname,$ifields) = each %$indexes) {
>	    push(@ddl, 
>		 "CREATE INDEX $iname ON $tname \(" .
>		 join(", ",@$ifields) . "\);\n");
>	}

	while (my($iname,$ifields) = each %$unique) {
	    push(@ddl, 
		 "CREATE UNIQUE INDEX $iname ON $tname \(" .
		 join(", ",@$ifields) . "\);\n");
	}
(In reply to comment #20)
> (From update of attachment 142116 [details])
> 
> Oooh, I like this. :)

Glad to hear it!

> >      UNIQUE => [qw(user_id group_id isderived isbless)],
> 
> Unique indexes should probably have names, just like normal indexes do.
> I would define these the same way you do regular indexes.  Regular indexes can
> also be multi-column, and you can have more than one unique index, also.

I agree unique indexes should have names. However, can we formalize the naming
of the unique indexes to be tablename_unique_idx? All of the current indexes
have a name of the form tablename_column(s)_idx, so that seems natural.

> I think it would be nice to get references included in here somehow, too...

There's already support for it, and, as you saw, a few tables have REFERENCES
specified. What more would you like to see or what you like to see changed with
regard to this?

> >		 isprivate => 	{TYPE => 'smallint',
> >				 NOTNULL => 1, DEFAULT => '0' }
> 
> MySQL has no booleans...  a lot of these "smallint"s should be booleans.
> (probably every field that starts with "is").

OK, I've changed isprivate to a BOOLEAN. All the other fields that start with
"is" were already declared that way. I also went through all of the smallint
fields, and I think that was the only one that should have been BOOLEAN.

> >      INDEXES => {
> >		  longdescs_bugid_idx => ['bug_id'],
> >		  longdescs_who_idx => ['who'],
> >		  longdescs_bugwhen_idx => ['bug_when'],
> >		 },
> 
> Here's a good example of your indexes, for comparison with the uniques...
> Unique is just a special kind of index. :)
> 
> >sub ddl_statements {
> 
> >	$crst .= "\n\tUNIQUE \(" . join(", ",@$unique) . "\)\n"  
> >	  if ($unique);
>
> This should be like the CREATE INDEX below:
> 
> >	while (my($iname,$ifields) = each %$indexes) {
> >	    push(@ddl, 
> >		 "CREATE INDEX $iname ON $tname \(" .
> >		 join(", ",@$ifields) . "\);\n");
> >	}
> 
> 	while (my($iname,$ifields) = each %$unique) {
> 	    push(@ddl, 
> 		 "CREATE UNIQUE INDEX $iname ON $tname \(" .
> 		 join(", ",@$ifields) . "\);\n");
> 	}

Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps he
can comment. I assumed it was a MySQL thing. I'm not actually familiar with
MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some
Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, I'll
change it and upload a new patch.
The choice of column types between this bug and bug 248001 seem to be on a
collision course.  If booleans become booleans, bug 248001 ceases to be the
right thing to do.

(In reply to comment #22)
> The choice of column types between this bug and bug 248001 seem to be on a
> collision course.  If booleans become booleans, bug 248001 ceases to be the
> right thing to do.

Well, there's no collision at the moment. All current subclasses of the Schema
module implement the BOOLEAN "database-specific data-type alias" (for lack of
better terminology) using the smallest integer type supported by each database.
But that is worth noting for any databases supported in the future.
(In reply to comment #21)

> 
> Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps he
> can comment. I assumed it was a MySQL thing. I'm not actually familiar with
> MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some
> Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, I'll
> change it and upload a new patch.

I don't recall - probably no very deep reason. I'm OK with whatever works. I
agree the indexes should be explicitly named.

(In reply to comment #20
> Unique is just a special kind of index. :)

I think what you're suggesting is to change the schema structure to look like this:

      UNIQUE => { components_unique_idx => [qw(product_id name)],
      INDEXES => { components_name_idx => ['name'] },

However, one could also take the "unique is just a special kind of index"
statement further and just have INDEXES and flag an index to be UNIQUE, like this:

       INDEXES => {
                  components_unique_idx => {FIELDS => [qw(product_id name)],
					    UNIQUE => 1},
                  components_name_idx => {FIELDS => ['name']},
                 },

This form has the added advantage in that you could define multiple unique
indexes per table. Is there a need for that?

Personally, I think I prefer the way it is now and just formalize the naming of
all unique indexes tablename_unique_idx:

      UNIQUE => [qw(product_id name)],
      INDEXES => { components_name_idx => ['name'] },

Anyone want to state their preference?
(In reply to comment #23)
> (In reply to comment #22)
> > The choice of column types between this bug and bug 248001 seem to be on a
> > collision course.  If booleans become booleans, bug 248001 ceases to be the
> > right thing to do.
> 
> Well, there's no collision at the moment. All current subclasses of the Schema
> module implement the BOOLEAN "database-specific data-type alias" (for lack of
> better terminology) using the smallest integer type supported by each database.
> But that is worth noting for any databases supported in the future.

Sorry for the noise but I was wrong. The current PostgreSQL Schema subclass does
indeed implement BOOLEAN as boolean and TRUE as true and FALSE as false. Does
bug 248001 represent the current thinking of those leading the efforts to port
Bugzilla to PostgreSQL?
(In reply to comment #26)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > The choice of column types between this bug and bug 248001 seem to be on a
> > > collision course.  If booleans become booleans, bug 248001 ceases to be the
> > > right thing to do.
> > 
> > Well, there's no collision at the moment. All current subclasses of the Schema
> > module implement the BOOLEAN "database-specific data-type alias" (for lack of
> > better terminology) using the smallest integer type supported by each database.
> > But that is worth noting for any databases supported in the future.
> 
> Sorry for the noise but I was wrong. The current PostgreSQL Schema subclass does
> indeed implement BOOLEAN as boolean and TRUE as true and FALSE as false. Does
> bug 248001 represent the current thinking of those leading the efforts to port
> Bugzilla to PostgreSQL?

Postgresql doesnt like it when they are used within the WHERE clause of the SQL
but they can be returned just fine as a column value. DBD::Pg will convert the
't' and 'f' values that PostgreSQL stores the values as to 1 and 0 respectively
so Perl will not complain. But smallints work as well for most of these. Is
there any reason not to use smallints to represent simple 1 or 0 values?
(In reply to comment #27)
> But smallints work as well for most of these. Is
> there any reason not to use smallints to represent simple 1 or 0 values?

Perhaps space? Obviously, a boolean only needs one bit for storage, although I would guess most 
DBMSes probably use at least one byte for speed reasons. A smallint is typically two bytes.
(In reply to comment #25)
>        INDEXES => {
>                   components_unique_idx => {FIELDS => [qw(product_id name)],
> 					    UNIQUE => 1},
>                   components_name_idx => {FIELDS => ['name']},
>                  },

  I much prefer this form, as it allows us to also specify other index options
as needed in the future.
Attached patch Revised schema modules (obsolete) — Splinter Review
Attachment #142116 - Attachment is obsolete: true
(In reply to comment #29)
> (In reply to comment #25)
> >        INDEXES => {
> >                   components_unique_idx => {FIELDS => [qw(product_id name)],
> >                                           UNIQUE => 1},
> >                   components_name_idx => {FIELDS => ['name']},
> >                  },
> 
>   I much prefer this form, as it allows us to also specify other index options
> as needed in the future.

Unfortunately, I already went with Dave's suggestion in the latest patch. That form was a bit "wordy" for 
my tastes, and I couldn't think of any scenario where one would want multiple separate unique indexes. 
But if the consensus agrees with you, I'll certainly switch.

Unrelated: What's the deal with the lastused column of the logincookies table? It's MySQL type is 
timestamp. What should it be for PostgreSQL? Other MySQL timestamp fields were translated to interval 
fields for PostgreSQL except for this one.
Also, I suspect that many of the parties involved in this bug will have comments
on bug 248175.
> Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps 
he
> can comment. I assumed it was a MySQL thing. I'm not actually familiar with
> MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some
> Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, 
I'll
> change it and upload a new patch.

Please see http://www.in-nomine.org/~asmodai/mysql-to-pgsql.txt for some porting 
notes I was/am documenting for the PostgreSQL project.  I also have, in the 
pipeline, plans to document MS-SQL to PostgreSQL and Oracle to PostgreSQL.  
Perhaps Sybase some day too.
All indexes must have names, so that we can more portably drop them later, if
required. That includes primary keys.
(In reply to comment #31)

> Unrelated: What's the deal with the lastused column of the logincookies table?
It's MySQL type is 
> timestamp. What should it be for PostgreSQL? Other MySQL timestamp fields were
translated to interval 
> fields for PostgreSQL except for this one.

This is stretching my recollection a bit. Postgresql doesn't have MySQL's
autoupdate timestamp gadget, so you have to do that with triggers. I think this
is the only field that actually uses the autoupdate feature. (Not sure that this
answers the question, though).

I don't vouch for the accuracy of what I did - there could be some mistakes,
which is why I labeled it as a proof of concept.
Blocks: 247560
In light of bug 248001 being resolved, shall I change the PostgreSQL schema contained herein to 
implement BOOLEANs as a smallint?
Why? Would we gain something?
(In reply to comment #37)
> Why? Would we gain something?

I just want to keep abreast of the changes to the schema. Will booleans still
work in light of the changes committed in bug 248001?
(In reply to comment #38)
> I just want to keep abreast of the changes to the schema. Will booleans still
> work in light of the changes committed in bug 248001?

  Yeah. AFAIK, DBI handles all the conversions back and forth from "Y" to "1",
etc. And I think that where we can, it's best to stick to the type most like our
"conceptual" type, if it exists in the backend DB.
Comment on attachment 151500 [details] [diff] [review]
Revised schema modules

  I'm going to take a crack at this, since I'm getting very familiar with
checksetup and what it needs for DB schema.

>diff -rNu Bugzilla.HEAD/Schema.pm Bugzilla/Schema.pm

  In the future, unless there's some reason not to, it's a good idea to use
"cvs diff -u" instead of "diff -Nru."

>+package Bugzilla::Schema;

  I think this should be DBSchema, to be consistent with DBCompat. Although
maybe we should move all of those, and make it DB::Schema and DB::Compat.

>+use Carp qw(croak);

  We use Bugzilla::Error::ThrowCodeError for code errors, not Carp, usually.

>+     user_group_map =>
>+     {

  Is there a reason that the schema is initialized inside of a function instead
of inside of an "our" or "my" variable? Or even read in from an entirely
separate file? I think maybe Bugzilla::DB::Schema::Constants might be a good
place, or just Bugzilla::Constants or Bugzilla::DB::Schema::ANSI.

>+      UNIQUE => { user_group_map_unique_idx =>
>+                   [qw(user_id group_id isderived isbless)] },

  OK, I do like this method. There could be reasons to have more than one
unique index per table, though, but I think this allows it. However, some day
we might want to also be able to set other index options. We need a form where
we can set *any* index option, where "unique" is just one option.

>+                 work_time =>	{TYPE => 'decimal(5,2)',

  While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION
=> [5,2]? Or OPTIONS => '(5,2)'?

>+                 thetext =>	{TYPE => 'text'},

  We definitely need documentation for some of these types, to say what they
*need* to do in order for Bugzilla to work. For example, not all DBs have a
"text," type, but they all have something that would work exactly the same, for
Bugzilla's purposes. They all have different limitations, though.

>+                 id =>	{TYPE => 'SMALLSERIAL', NOTNULL => 1,

  I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd like
to stick with ANSI types in the Schema module.

>+                 name =>	{TYPE => 'varchar(64)', NOTNULL => 1 },

  Similar to the above, we could do TYPE => 'varchar', LENGTH => 64 (or OPTIONS
=> '(64)').

>+                 dup => 	{TYPE => 'integer', NOTNULL => 1 ,
>+                                 DISPLAYWIDTH => 9, PRIMARYKEY => 1},

  Would it make more sense to have a PRIMARY => 'dup', since there can only be
one Primary Key anyway?

>+                 id => 		{TYPE => 'INTSERIAL', NOTNULL => 1,

  Same question again about the type. Also, all the ids should be the same
type, shouldn't they? I can imagine there existing a DB where there might be
problems if we didn't have the same type for the different ids in our
JOIN/WHEREs.

>+                 userregexp =>	{TYPE => 'TINYTEXT', NOTNULL => 1},

  'tinytext' is a weird thing. I mean, maybe we use it now, but... is it ANSI?
(It might be, I just don't know.)

>+     # These next six are the "enumerated" fields used in bugs.
>+     # We might need another column for sort order so that could be
>+     # played around with...

  These are all now actually in existence in bug 17453's patch.

>+                                         REFERENCES =>
>+                                         'bug_severity_values(value)'},

  I like this. :-) I forget... are there any DBs that take important options to
making a Foreign Key (besides just the field name)?

>+                 resolution => 	{TYPE => 'varchar(255)', NOTNULL => 1,
>+                                 REFERENCES =>
>+                                 'bug_resoltion_values(value)'},

  Typo.

>+      INDEXES => {
>+                  bugs_assigned_to_idx => ['assigned_to'],
>+                  bugs_creation_ts_idx => ['creation_ts'],
>+                  bugs_delta_ts_idx => ['delta_ts'],
>+                  bugs_bug_severity_idx => ['bug_severity'],
>+                  bugs_bug_status_idx => ['bug_status'],
>+                  bugs_op_sys_idx => ['op_sys'],
>+                  bugs_priority_idx => ['priority'],
>+                  bugs_product_id_idx => ['product_id'],
>+                  bugs_reporter_idx => ['reporter'],
>+                  bugs_version_idx => ['version'],
>+                  bugs_component_id_idx => ['component_id'],
>+                  bugs_resolution_idx => ['resolution'],
>+                  bugs_target_milestone_idx => ['target_milestone'],
>+                  bugs_qa_contact_idx => ['qa_contact'],
>+                  bugs_votes_idx => ['votes'],
>+                 },

  I'm almost certain that some of the indexes on the bug table need to be
multi-column... are they not, now?

>+                 thedata =>	{TYPE => 'LONGBLOB', NOTNULL => 1},

  Hrm, don't LongBlobs need some options for some DBs in order to work the same
way they do for Bugzilla in MySQL? (I'm not certain here, once again, I just
have this vague memory...)

>+sub adjust_schema {
>+    # PRIVATE method to alter the abstract schema to be DB-specific.

  WHOA! This whole function needs major commenting; it's pretty mysterious, at
this point. :-)

>+sub ddl_statements {
>+    # PUBLIC method to generate DDL from the concrete schema.
>+    # Returns an array of DDL strings.

  You might want say "SQL statements to create the table/indexes" instead of
DDL for us Mere Mortal Perl Coders. :-)

>+    my $self = shift;
?+    my @ddl = ();

  I think that ddl should be a hash, so that we could just do $ddl{'create'}.
Also, you might want to have a separate function to just generate and return
the CREATE TABLE statement on the fly, just returning a string. I think that
would be much simpler for the Installation script.

>+            $create_table .= "\t$fname\t$type";
>+            $create_table .= " NOT NULL" if ($finfo->{NOTNULL});
>+            $create_table .= " DEFAULT $default" if (defined($default));
>+            $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY});

  I think that some of this may need to go into the DB-specific stuff, wouldn't
you think?

>+                 "CREATE UNIQUE INDEX $iname ON $tname \(" .

  This command also could possibly differ across DBs.

>+#------------------------------------------------------------------------------
>+1;

  OK, here's what I *really* need. :-) A function that reads in the *current*
schema, and compares it to the required schema. Of course, that's the ENTIRE
purpose of checksetup at this point, so that change could possibly go into bug
248175.

>+	BOOLEAN =>	'boolean',
>+	FALSE =>	'false', 
>+	TRUE =>		'true',

  Hrm, to save DBI the work, wouldn't it be better to do 0 and 1?

>+	LONGBLOB =>	'bytea',

  Are these actually equivalent? I recall having to do something strange to get
them to work equivalently in the RH Bugzilla fork.

>+    $self->{db_extras} =
>+      [
>+      " CREATE RULE logincookies_insert AS\n" .
>+	" ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" .
>+	" DO INSTEAD INSERT INTO logincookies VALUES \(\n" .
>+	" NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n",

  Oh, this is somewhat "bad," because it's not abstractly tied to the schema.
That is, if I update the schema and it requires one of these, then I'd have to
go and know to update each subclass that requires this, which defeats the
purpose of subclassing. We should be able to modify the schema somehow that we
automatically know to make these from certain options passed in the schema.

  Those are my comments to start with. :-)
Reassigning to patch writer.
Assignee: zach → edwardjsabol
I'm pretty sure that this is a P2, since it blocks a P2 and a P1.
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.22
(In reply to comment #40)
>   In the future, unless there's some reason not to, it's a good idea to use
> "cvs diff -u" instead of "diff -Nru."

  Clarification: I just meant that you should use "cvs diff" instead of "diff"
when possible. :-) The -Nru is of course always very important if you're using a
normal "diff" command.
(In reply to comment #40)
> I think this should be DBSchema, to be consistent with DBCompat. Although
> maybe we should move all of those, and make it DB::Schema and DB::Compat.

Fine with me. I'll make it DBSchema in the next patch. If DBCompat is renamed, I'll follow suit.

> We use Bugzilla::Error::ThrowCodeError for code errors, not Carp, usually.

OK, I've switched to ThrowCodeError().

>   Is there a reason that the schema is initialized inside of a function instead
> of inside of an "our" or "my" variable?

The schema is modified at object instantiation time. If the schema structure is class data instead of 
instance data, you might have a scenario in which multiple object instances are modifying the same 
structure with conflicting results. It's just safer if the schema is instance-based data, I think. I suppose 
you could "my" the abstract schema and then copy it to instance data at instantiation time, but then you 
have an extra copy of the schema structure taking up memory.

> Or even read in from an entirely separate file?
> I think maybe Bugzilla::DB::Schema::Constants might be a good
> place, or just Bugzilla::Constants or Bugzilla::DB::Schema::ANSI. 

I have no strong feelings on this and whatever the consensus is is fine with me, but I think it's simpler if 
the schema is actually in the schema module.

> >+                 work_time => {TYPE => 'decimal(5,2)',
> 
>   While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION
> => [5,2]? Or OPTIONS => '(5,2)'?

Seems like overkill, especially since decimal is an ANSI type supported by all the DBMSes that Bugzilla 
will likely ever support.

> >+                 thetext =>   {TYPE => 'text'},
> 
>   We definitely need documentation for some of these types, to say what they
> *need* to do in order for Bugzilla to work. For example, not all DBs have a
> "text," type, but they all have something that would work exactly the same, for
> Bugzilla's purposes. They all have different limitations, though.

Contributions welcome. Unfortunately, I don't feel qualified to do that myself. My experience with the 
Bugzilla code outside of this module is minimal.

> >+                 id =>        {TYPE => 'SMALLSERIAL', NOTNULL => 1,
> 
>   I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd like
> to stick with ANSI types in the Schema module.

The uppercased types like INTSERIAL are "abstract" types. It's a requirement that the database-specific 
subclasses define them. (Yes, this needs to be documented...) I don't think there is an ANSI auto-
incrementing, serialized integer type, is there?

> >+                 name =>      {TYPE => 'varchar(64)', NOTNULL => 1 },
> 
>   Similar to the above, we could do TYPE => 'varchar', LENGTH => 64 (or OPTIONS
> => '(64)').

I don't see how that improves things. It seems like an unnecessary complication.

> >+                 dup =>       {TYPE => 'integer', NOTNULL => 1 ,
> >+                                 DISPLAYWIDTH => 9, PRIMARYKEY => 1},
> 
>   Would it make more sense to have a PRIMARY => 'dup', since there can only be
> one Primary Key anyway?

Maybe. Multi-column primary keys are possible, though perhaps arguably indicative of a poor database 
design.

> >+                 id =>                {TYPE => 'INTSERIAL', NOTNULL => 1,
> 
>   Same question again about the type. Also, all the ids should be the same
> type, shouldn't they? I can imagine there existing a DB where there might be
> problems if we didn't have the same type for the different ids in our
> JOIN/WHEREs.

I believe the current MySQL schema has smallint auto_increment ids, mediumint auto_increment ids, 
and regular integer auto_increment ids. The goal was to reproduce the current schema used for MySQL, 
so the abstract base schema needs to differentiate between these different auto_increment types in 
order to do that.

> >+     # These next six are the "enumerated" fields used in bugs.
> >+     # We might need another column for sort order so that could be
> >+     # played around with...
> 
>   These are all now actually in existence in bug 17453's patch.

So what needs to be changed?

> I forget... are there any DBs that take important options to
> making a Foreign Key (besides just the field name)?

I'm not aware of any.

> >+                 resolution =>        {TYPE => 'varchar(255)', NOTNULL => 1,
> >+                                 REFERENCES =>
> >+                                 'bug_resoltion_values(value)'},
> 
>   Typo.

Fixed in the forthcoming patch.

> I'm almost certain that some of the indexes on the bug table need to be
> multi-column... are they not, now?

I just checked checksetup.pl and it doesn't look like there are any multi-column indexes on the bugs 
table to me, but maybe I'm reading the code wrong?
 
>   I think that ddl should be a hash, so that we could just do $ddl{'create'}.

What other keys would you have in such a hash? An abbreviated example of the structure you 
are suggesting that the ddl_statements() method return would be useful.

> Also, you might want to have a separate function to just generate and return
> the CREATE TABLE statement on the fly, just returning a string. I think that
> would be much simpler for the Installation script.

Um, that's what it does now, doesn't it?

> >+            $create_table .= "\t$fname\t$type";
> >+            $create_table .= " NOT NULL" if ($finfo->{NOTNULL});
> >+            $create_table .= " DEFAULT $default" if (defined($default));
> >+            $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY});
> 
>   I think that some of this may need to go into the DB-specific stuff, wouldn't
> you think?

You mean the primary key part? Yeah, I guess some refactoring here would probably be appropriate...
 
>   OK, here's what I *really* need. :-) A function that reads in the *current*
> schema, and compares it to the required schema. Of course, that's the ENTIRE
> purpose of checksetup at this point, so that change could possibly go into bug
> 248175.

Well, what can be done in this class to facilitate that goal?

> >+      BOOLEAN =>      'boolean',
> >+      FALSE =>        'false', 
> >+      TRUE =>         'true',
> 
>   Hrm, to save DBI the work, wouldn't it be better to do 0 and 1?

I don't know. I'll change it if that's the consensus.
 
> >+      LONGBLOB =>     'bytea',
> 
>   Are these actually equivalent? I recall having to do something strange to get
> them to work equivalently in the RH Bugzilla fork.

Beats me. I don't use blobs.
 
> >+    $self->{db_extras} =
> >+      [
> >+      " CREATE RULE logincookies_insert AS\n" .
> >+      " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" .
> >+      " DO INSTEAD INSERT INTO logincookies VALUES \(\n" .
> >+      " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n",
> 
>   Oh, this is somewhat "bad," because it's not abstractly tied to the schema.
> That is, if I update the schema and it requires one of these, then I'd have to
> go and know to update each subclass that requires this, which defeats the
> purpose of subclassing. We should be able to modify the schema somehow that we
> automatically know to make these from certain options passed in the schema.

How exactly do you suggest generalizing this? This "CREATE RULE" is rather DBMS-specific, isn't it?
(In reply to comment #43)
> (In reply to comment #40)
> >   In the future, unless there's some reason not to, it's a good idea to use
> > "cvs diff -u" instead of "diff -Nru."
> 
>   Clarification: I just meant that you should use "cvs diff" instead of "diff"
> when possible. :-) The -Nru is of course always very important if you're using a
> normal "diff" command.

Sorry, I can't seem to figure out how to use "cvs diff" with files that haven't yet been added to the CVS 
repository. Could you elaborate? "cvs diff -uN" doesn't work, at least not with the version of cvs I have 
installed.
Attached patch Latest DBSchema patch (obsolete) — Splinter Review
Partially addresses comment#40.
Attachment #151500 - Attachment is obsolete: true
(In reply to comment #44)
> (In reply to comment #40)
> > >+    $self->{db_extras} =
> > >+      [
> > >+      " CREATE RULE logincookies_insert AS\n" .
> > >+      " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" .
> > >+      " DO INSTEAD INSERT INTO logincookies VALUES \(\n" .
> > >+      " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n",
> > 
> >   Oh, this is somewhat "bad," because it's not abstractly tied to the schema.
> > That is, if I update the schema and it requires one of these, then I'd have to
> > go and know to update each subclass that requires this, which defeats the
> > purpose of subclassing. We should be able to modify the schema somehow that we
> > automatically know to make these from certain options passed in the schema.
> 
> How exactly do you suggest generalizing this? This "CREATE RULE" is rather
DBMS-specific, isn't it?

Hmmm, if I understand correctly what the query does, it's to cope with timestamp
type of MySQL, which does not exists in PostgreSQL.
If yes, then it's not neccessary to do this for logincookies since bug 257303
has been checked in. It's still neccessary for bug table (but I can't find it in
this patch). If someone can review patch at bug 257315, then this 'hack' can be
dropped completely, which would be IMHO a good thing.
(In reply to comment #44)
> The schema is modified at object instantiation time. [snip]

  OK. I agree. The variable is good where it is.

> >   While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION
> > => [5,2]? Or OPTIONS => '(5,2)'?
> 
> Seems like overkill, especially since decimal is an ANSI type supported by all 
> the DBMSes that Bugzilla will likely ever support.

  It just seemed like a good idea as far as abstraction went. If it turns out to
be easier to code this way, then that's all good. :-)

> >   We definitely need documentation for some of these types, [snip]
> 
> Contributions welcome. Unfortunately, I don't feel qualified to do that 
> myself. My experience with the Bugzilla code outside of this module is 
> minimal.

  OK. I *might* be able to be of help, but I think that only myk, bbaetz, or
justdave could really say for certain what the requirements of each DB field are.

> >   I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd 
> > like  to stick with ANSI types in the Schema module.
> 
> The uppercased types like INTSERIAL are "abstract" types. It's a requirement 
> that the database-specific subclasses define them. (Yes, this needs to be 
> documented...) I don't think there is an ANSI auto-incrementing, serialized 
> integer type, is there?

  Yeah, I'm pretty sure there's no ANSI auto-increment type. As long as this is
documented that classes need to implement this, that's fine. :-) (Is there no
way in perl to programmatically enforce that requirement?)

> > >+                 dup =>       {TYPE => 'integer', NOTNULL => 1 ,
> > >+                                 DISPLAYWIDTH => 9, PRIMARYKEY => 1},
> > 
> >   Would it make more sense to have a PRIMARY => 'dup', since there can only 
> > be one Primary Key anyway?
> 
> Maybe. Multi-column primary keys are possible, though perhaps arguably 
> indicative of a poor database design.

  Oh, that's true. OK, maybe we'd better keep it the way it is, then.

> I believe the current MySQL schema has smallint auto_increment ids, mediumint 
> auto_increment ids, and regular integer auto_increment ids. The goal was to 
> reproduce the current schema used for MySQL, so the abstract base schema needs 
> to differentiate between these different auto_increment types in order to do 
> that.

  OK. Let's keep those as they are now, then. We don't want to change the
current schema and abstract it at the same time.

> > >+     # These next six are the "enumerated" fields used in bugs.
> > >+     # We might need another column for sort order so that could be
> > >+     # played around with...
> > 
> >   These are all now actually in existence in bug 17453's patch.
> 
> So what needs to be changed?

  I'll send you the schema changes in a direct email, since they're sort of
long. Others who are concerned can look at the checksetup.pl changes in the bug
17453 patch.

> > I'm almost certain that some of the indexes on the bug table need to be
> > multi-column... are they not, now?
> 
> I just checked checksetup.pl and it doesn't look like there are any 
> multi-column indexes on the bugs table to me, but maybe I'm reading the code 
> wrong?

  Oh yeah, you are indeed correct. No multi-column indexes at all.

> > Also, you might want to have a separate function to just generate and return
> > the CREATE TABLE statement on the fly, just returning a string. I think that
> > would be much simpler for the Installation script.
> 
> Um, that's what it does now, doesn't it?

  Oh. I didn't realize that. :-) Just document what it returns more exactly, and
then nobody will ever complain again. :-)

> 
> > >+            $create_table .= "\t$fname\t$type";
> > >+            $create_table .= " NOT NULL" if ($finfo->{NOTNULL});
> > >+            $create_table .= " DEFAULT $default" if (defined($default));
> > >+            $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY});
> > 
> >   I think that some of this may need to go into the DB-specific stuff, 
> wouldn't you think?
> 
> You mean the primary key part? Yeah, I guess some refactoring here would
probably be appropriate...

  Yeah. I'm not certain that every DB does primary keys the same way. Of course,
at the worst, we could just refactor it later. All I'm concerned about to start
with is Pg and MySQL support.

> >   OK, here's what I *really* need. :-) A function that reads in the 
> > *current* schema, and compares it to the required schema. Of course, that's 
> > the ENTIRE purpose of checksetup at this point, so that change could 
> > possibly go into bug 248175.
> 
> Well, what can be done in this class to facilitate that goal?

  A function that could read the current, actual schema into a DBSchema object. :-)

> > >+      LONGBLOB =>     'bytea',
> > 
> >   Are these actually equivalent? I recall having to do something strange to 
> > get them to work equivalently in the RH Bugzilla fork.
> 
> Beats me. I don't use blobs.

byeta: http://www.postgresql.org/docs/7.3/static/datatype-binary.html

"The SQL standard defines a different binary string type, called BLOB or BINARY
LARGE OBJECT. The input format is different compared to bytea, but the provided
functions and operators are mostly the same."

longblob: http://dev.mysql.com/doc/mysql/en/String_type_overview.html#IDX1144

So I think they're pretty much the same, for our purposes, and DBI should handle
any differences.


> > >+    $self->{db_extras} =
> > >+      [
> > >+      " CREATE RULE logincookies_insert AS\n" .
> > >+      " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" .
> > >+      " DO INSTEAD INSERT INTO logincookies VALUES \(\n" .
> > >+      " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n",
> > 
> >   Oh, this is somewhat "bad," because it's not abstractly tied to the 
> > schema. That is, if I update the schema and it requires one of these, then 
> > I'd have to go and know to update each subclass that requires this, which 
> > defeats the purpose of subclassing. We should be able to modify the schema 
> > somehow that we automatically know to make these from certain options passed 
> > in the schema.
> 
> How exactly do you suggest generalizing this? This "CREATE RULE" is rather 
> DBMS-specific, isn't it?

  Yes. We would need a TIMESTAMP abstract field type, that let us know that we
had to do something like this. If timestamps go away before this patch is
checked-in, then we're OK and don't need it, but in the mean time it might be a
good idea.
Oh, also, we need a way to specify that certain indexes are FULLTEXT, like
bugs.short_desc. :-(
Blocks: 249400
OK, I've now started to actually work on cross-DB installation.

I have discovered that *all* I actually need, in order to make Bugzilla cross-DB
compatible for installation, is something that maps field types to SQL.

So that part of this patch is actually all I need. :-) I just need to be able to
put in a variable in all of the "ChangeFieldType" calls and so forth in
checksetup.pl.

Could you re-submit a patch that does just that? :-) (Also, I'm now a real
reviewer, so I can review your patch on this.)
Status: NEW → ASSIGNED
Oh, and the schema should fit inside the structure of bug 237862's new
Bugzilla::DB heirarchy.

You can just put the schema information directly into the Mysql.pm and Pg.pm.
Depends on: bz-dbcompat
Target Milestone: Bugzilla 2.22 → Bugzilla 2.20
(In reply to comment #50)
> I have discovered that *all* I actually need, in order to make Bugzilla cross-DB
> compatible for installation, is something that maps field types to SQL.

Honestly, that doesn't sound right to me. There's something wrong with the code
design if that's the case. The SQL that is used to create a table should be
coming from this class.

> So that part of this patch is actually all I need. :-) I just need to be able to
> put in a variable in all of the "ChangeFieldType" calls and so forth in
> checksetup.pl.
> 
> Could you re-submit a patch that does just that? :-) (Also, I'm now a real
> reviewer, so I can review your patch on this.)

Hunh? Give an example of the arguments you would pass to such a method and what
you think the return values of the method should be for those arguments.

Lately, I've been thinking about redoing this whole class around DBIx::DBSchema.
I'm not happy with this code at all really. If you read this whole bug, you'll
see I've been advocating a more object-oriented approach from the beginning. A
database schema object should be a collection of table objects which should be a
collection of column objects and index objects, and that's exactly what
DBIx::DBSchema does. Is it feasible to make DBIx:DBSchema a prequisite for Bugzilla?

http://search.cpan.org/~ivan/DBIx-DBSchema-0.23/DBSchema.pm
(In reply to comment #51)
> Oh, and the schema should fit inside the structure of bug 237862's new
> Bugzilla::DB heirarchy.
> 
> You can just put the schema information directly into the Mysql.pm and Pg.pm.

Sorry, I don't agree at all with that.
Oh. :-) Actually, thinking about it, you're right! :-) The alter table syntax
and create table syntax should come from this. :-) Look at my work on bug 277617
-- those methods that I moved are the various methods of Bugzilla::DB that need
to be made cross-platform.

As far as where the code goes, I think I may have not been totally clear. :-)
There is a new structure in Bugzilla:

Bugzilla::DB::Mysql and Bugzilla::DB::Pg -- these contain the Bugzilla-specific
cross-db compatibility code. They work really well, they're very cool. They are
created in bug 237862.

The schema stuff should be a part of those modules, because that way we can get
the information out of $dbh.
(In reply to comment #52)
> Is it feasible to make DBIx:DBSchema a prequisite for Bugzilla?
> 
> http://search.cpan.org/~ivan/DBIx-DBSchema-0.23/DBSchema.pm

  Well, the only problem with that is it limits us to supporting, for all of
time,  only the databases that DBIx::DBSchema supports. At this time, that's
Mysql and Postgres. So that's great. :-) But I think we do want to support
Oracle at some point, too, and I know that there is a small camp who want Sybase
support.

  Really, I'm OK with adding more prerequisites to Bugzilla. And I'm happy to
have code that other people have written and tested.

  My most important goal with this abstract schema is to support cross-database
installation. If you'll look at things like Bugzilla::DB::bz_change_field_def
(in bug 277617), that's pretty much all that has to work.

  Any elegant, future-proof way of making that work is good by me. :-)
(In reply to comment #54)
> As far as where the code goes, I think I may have not been totally clear. :-)
> There is a new structure in Bugzilla:
> 
> Bugzilla::DB::Mysql and Bugzilla::DB::Pg -- these contain the Bugzilla-specific
> cross-db compatibility code. They work really well, they're very cool. They are
> created in bug 237862.
> 
> The schema stuff should be a part of those modules, because that way we can get
> the information out of $dbh.

Yeah, I've been following that bug, but I haven't looked at the code yet. I
believe the Schema should be its own object class. I have no problems calling
the class Bugzilla::DB::Schema though. But initializing and carrying around the
whole database schema is a waste of cycles and memory. The only time the schema
is used is at installation time really. I suppose we could add a method to
Bugzilla::DB which instantiates a schema object and returns it, but I firmly
believe the schema should be a separate object class.
(In reply to comment #56)
> > The schema stuff should be a part of those modules, because that way we can get
> > the information out of $dbh.
> 
> Yeah, I've been following that bug, but I haven't looked at the code yet. I
> believe the Schema should be its own object class. I have no problems calling
> the class Bugzilla::DB::Schema though. But initializing and carrying around the
> whole database schema is a waste of cycles and memory. The only time the schema
> is used is at installation time really. I suppose we could add a method to
> Bugzilla::DB which instantiates a schema object and returns it, but I firmly
> believe the schema should be a separate object class.

I second that. We need the schema when creating/upgrading the database
(checksetup.pl) and we will need it for consistency checks (sanity_check.pl). It
should be separate module, instatiated on request.
Also, we should try to put to the DB specific modules as little information as
possible, otherwise it will be a nightmare to keep it in sync.
The idea of a module with DB schema description in some generic, DB agnostic
way, and then DB specific functions mapping this schema to particular database
is so far the best I have seen here. What I am missing there is just how to work
with things like upgrades - do you still need all the conversion code we have in
checksetup.pl now, or can we somehow automagically detect that the DB schema and
actual DB are different and fix the differences? Maybe a DB version number
stored somewhere in the database would help here?
(In reply to comment #56)
> But initializing and carrying around the
> whole database schema is a waste of cycles and memory. The only time the schema
> is used is at installation time really.

  Yeah, OK, I agree with that.

> I suppose we could add a method to
> Bugzilla::DB which instantiates a schema object and returns it, but I firmly
> believe the schema should be a separate object class.

  Hrm. Well, the reason that I like the idea of having it in the Bugzilla::DB
subclasses is that's already our structure for determining which database we're
running on. That is, it's a very elegant OO solution that "just works."
Basically, it picks the right module based on the $db_driver in localconfig. I'd
hate to duplicate that code.

  I think it's definitely not a good idea to init the schema unless we ask for
it, though. So perhaps users could get at the schema with
$dbh->bz_schema->method, and then we'd only init the schema if we were using it,
and we'd already know (even if the schema code was in another class) what
database we were using.

  Oh, and another small note re: DBIx::DBSchema: it's a bit concerning that the
most recent version of DBIx::DBSchema on CPAN is from 23 October 2003.
(In reply to comment #57)
> do you still need all the conversion code we have in
> checksetup.pl now, or can we somehow automagically detect that the DB schema and
> actual DB are different and fix the differences? Maybe a DB version number
> stored somewhere in the database would help here?

  I've thought about this for quite some time, myself. :-) Unfortunately, such a
tool is not possible. We need checksetup to do various different things when the
schema changes, not just change the schema itself. Sometimes we have to do
migration and then a schema change, a schema change and then a migration, or a
mixture of migration code and schema changes in complex sequences.

  Although some schema changes are simple and don't involve migration code, most
changes do.
(In reply to comment #59)
> (In reply to comment #57)
> > do you still need all the conversion code we have in
> > checksetup.pl now, or can we somehow automagically detect that the DB schema and
> > actual DB are different and fix the differences? Maybe a DB version number
> > stored somewhere in the database would help here?
> 
>   I've thought about this for quite some time, myself. :-) Unfortunately, such a
> tool is not possible. We need checksetup to do various different things when the
> schema changes, not just change the schema itself. Sometimes we have to do
> migration and then a schema change, a schema change and then a migration, or a
> mixture of migration code and schema changes in complex sequences.
> 
>   Although some schema changes are simple and don't involve migration code, most
> changes do.

Yep, I was already afraid that's the case, I was just hoping someone would have
that magical wand :-).
Which leaves us with a schema usable for new installations and (maybe)
consistency checks. Upgrade code will be hardcoded in checksetup as it is now
and will be another place with DB schema information. Although I don't like
duplicating information all around the code, this seems to be acceptable (as it
correspond to current implementation :-) ).
So what we basically needs is:
1) set of DB independent functions for inspecting and modifying existing
database exposed to checksetup.pl - bug 277617
2) DB independent function for creating a new database, which will take DB
schema as a parameter (in DB agnostic format), or set of functions and then do
the DB structure parsing in checksetup, based on how complex and DB dependant it is.
3) modify sanity_check to use the schema module (low priority ATM).

Any takers to implement 2? :-) (considering the patch at this bug has already
pretty good DB agnostic description).
(In reply to comment #58)
>   Hrm. Well, the reason that I like the idea of having it in the Bugzilla::DB
> subclasses is that's already our structure for determining which database we're
> running on. That is, it's a very elegant OO solution that "just works."
> Basically, it picks the right module based on the $db_driver in localconfig. I'd
> hate to duplicate that code.

It's hardly rocket science. The current schema class constructor already does
mostly that, except for the $db_driver part. But if we just use Bugzilla::DB to
instantiate the Schema object, then that simplifies things further.

>   I think it's definitely not a good idea to init the schema unless we ask for
> it, though. So perhaps users could get at the schema with
> $dbh->bz_schema->method, and then we'd only init the schema if we were using it,
> and we'd already know (even if the schema code was in another class) what
> database we were using.

Yes, I like that idea very much. On the first call to the bz_schema() method,
the schema object would be instantiated and stored as class data. Any subsequent
 call would simply return the schema object.

Personally, I would move all those bz_* methods you've added in bug 277617 to
the Schema object class. Instead of $dbh->bz_add_field(), it would be
$dbh->bz_schema->add_field(). I suppose that's a matter of taste though and
there are pros and cons either way.

>   Oh, and another small note re: DBIx::DBSchema: it's a bit concerning that the
> most recent version of DBIx::DBSchema on CPAN is from 23 October 2003.

That's version 0.22. The current version is 0.23 which was released on 16
February 2004. Not much better, I suppose. Oh, it mostly already supports
Sybase. But I'll probably stick with the current implementation, as I've
suddenly started to feel kind of lazy. :-) If we were starting over from scratch
though, I'd definitely go with DBIx::DBSchema or something similar.
(In reply to comment #60)
> So what we basically needs is:
> 1) set of DB independent functions for inspecting and modifying existing
> database exposed to checksetup.pl - bug 277617
> 2) DB independent function for creating a new database, which will take DB
> schema as a parameter (in DB agnostic format), or set of functions and then do
> the DB structure parsing in checksetup, based on how complex and DB dependant
it is.
> 3) modify sanity_check to use the schema module (low priority ATM).
> 
> Any takers to implement 2? :-) (considering the patch at this bug has already
> pretty good DB agnostic description).

We already mostly have 2. The ddl_statements() method returns the SQL statements
used to create the database. All we need is a method that loops over those SQL
statements, passes them to $dbh->do(), and checks for errors.

So the question is should that go in Bugzilla::DB or Bugzilla::DB::Schema? I
think it should go in the latter (along with bug 277617's methods for altering
the database), but, like I wrote in a previous comment, there are pros and cons
to doing it either way.

The inspecting portion of 1 is a lot harder, I think. DBI has some methods for
doing that stuff, but the level of support varies from DBMS to DBMS.
> Personally, I would move all those bz_* methods you've added in bug 277617 to
> the Schema object class. Instead of $dbh->bz_add_field(), it would be
> $dbh->bz_schema->add_field(). I suppose that's a matter of taste though and
> there are pros and cons either way.

  Yeah, I was thinking about that, too.

  I was sort of waffling as to whether the Schema classes should just be a
hidden implementation detail of Bugzilla::DB, or actually used by callers.

  Also, it's kind of nice to have all the functions that a caller needs to know
about for the database in any way inside of one module.

  But if we can come up with some pros of having the functions in the Schema
object, I definitely wouldn't object.

> That's version 0.22. The current version is 0.23 which was released on 16
> February 2004. Not much better, I suppose. Oh, it mostly already supports
> Sybase. But I'll probably stick with the current implementation, as I've
> suddenly started to feel kind of lazy. :-)

  Hahaha, OK. :-)

> If we were starting over from  scratch though, I'd definitely go with 
> DBIx::DBSchema or something similar.

  Interesting. Yeah, I hear that there's a lot of interesting code in this area,
like AnyDBD or things like that. But I agree, I think it's better to go with
what we have. The Schema stuff that you've already designed is pretty cool.

(In reply to comment #62)
> We already mostly have 2. The ddl_statements() method returns the SQL 
> statements used to create the database. All we need is a method that loops 
> over those SQL statements, passes them to $dbh->do(), and checks for errors.

  Yeah. Which is totally great for creating the database, the first time. So I
wouldn't object to using that method for that.

  Right now, most of checksetup is upgrade code, though.

  And I would be slightly concerned to have a database created using one code
path, but upgraded using another code path. Of course, as long as they were all
modular in some way, and it worked, I wouldn't object. :-) My goal right now is
working PostgreSQL support in the installer, and anything that gets me that
makes me happy. :-)

> The inspecting portion of 1 is a lot harder, I think. DBI has some methods for
> doing that stuff, but the level of support varies from DBMS to DBMS.

  Well, we can sort of create a generic interface for it. We can just parse the
results of some custom commands (or use the DBI stuff, as you mention), and
return a hash/array that gives us certain information about a table/column.

  We do need some way of generating cross-platform versions of all of the bug
277617 functions, though -- re-writing checksetup entirely to use a different
method would delay cross-platform support for possibly six months or a year, and
only provided that I could actually get enough time at my company to work that
out, and then actually find somebody to review it. :-)

  The functions that we have now work, and they're all we need to implement it.
If we have to fudge it for PostgreSQL support, I won't object. If we can get it
elegant and perfect for PostgreSQL support, I also won't object. :-)
(In reply to comment #62)
> We already mostly have 2. The ddl_statements() method returns the SQL statements
> used to create the database. All we need is a method that loops over those SQL
> statements, passes them to $dbh->do(), and checks for errors.

I know, I have looked at the code (looks very good, btw :-) ). It just needs to
be finished and brought up to date with the latest development on bug 237862 :-).

> 
> So the question is should that go in Bugzilla::DB or Bugzilla::DB::Schema? I
> think it should go in the latter (along with bug 277617's methods for altering
> the database), but, like I wrote in a previous comment, there are pros and cons
> to doing it either way.

I would personally vote for the schema plus DB independent support functions to
go to DB::Schema, DB independent code which may be used without the schema to
Bugzilla::DB, and DB specific functions to Bugzilla::DB overrides (Mysql and Pg
ATM).
The idea behind is that if you need the schema (and methods to work with it),
you include the DB::Schema module, but not otherwise (I think right now, only
checksetup.pl will need to include the Schema module, maybe sanity_check later).

> 
> The inspecting portion of 1 is a lot harder, I think. DBI has some methods for
> doing that stuff, but the level of support varies from DBMS to DBMS.

Yes, but as we can implement it separatelly on every DB through the DB specific
modules, and do not have to rely on DBI (although we should use it whenewer
possible), it should be doable.
You dont have to rewrite much of checksetup at all.  Old installations will
always be on MySQL.  It is only new installations and upgrades from this point
forward that need a compatibility layer.
Yeah. Actually (this is for another bug, though), I'm going to re-write
checksetup to migrate as much as possible starting with 2.17.1, which was when
Red Hat started their PostgreSQL fork. But I might also just write a separate
script to handle those migrations.
(In reply to comment #63)
>   Also, it's kind of nice to have all the functions that a caller needs to know
> about for the database in any way inside of one module.
> 

On one hand, it's nice to have them together, on the other, it may be tempting
then to use functions meant to be used for DB setup by runtime as well (I can
see a code checking for column type somewhere in search.pm and it scares me). If
we separate "DB management" and "DB use" into two modules, it's more obvious how
they were meant to be used.

>   And I would be slightly concerned to have a database created using one code
> path, but upgraded using another code path. Of course, as long as they were all
> modular in some way, and it worked, I wouldn't object. :-)

That worries me too, but is basically what we have now, only it's on two places
in one module (checksetup.pl, if we forget about sanity_check for now). We are
proposing to put it to two separate modules.
Maybe if we move DB upgrade code from checksetup to schema module as well? That
would be closer to what we have now. It would also mean that checksetup will be
involved only in OS and perl checks, file permisions and stuff like that, which
I personally sort of like :-).
We can also add one sanity check at the end of the setup phase, which would
compare the current database after upgrade with our schema, ensuring that we got
by the upgrade what we need.

>   We do need some way of generating cross-platform versions of all of the bug
> 277617 functions, though -- re-writing checksetup entirely to use a different
> method would delay cross-platform support for possibly six months or a year, and
> only provided that I could actually get enough time at my company to work that
> out, and then actually find somebody to review it. :-)
> 
>   The functions that we have now work, and they're all we need to implement it.
> If we have to fudge it for PostgreSQL support, I won't object. If we can get it
> elegant and perfect for PostgreSQL support, I also won't object. :-)

The major rule here is "small steps". We can easily move the functions from
checksetup to the DB layer (that's what your current patch does), then we will
have to make them DB agnostic, and then we can continue improving the interface.
But redesigning it from scratch is very seldom a way to success...
(In reply to comment #63)
>   Yeah, I was thinking about that, too.
> 
>   I was sort of waffling as to whether the Schema classes should just be a
> hidden implementation detail of Bugzilla::DB, or actually used by callers.
> 
>   Also, it's kind of nice to have all the functions that a caller needs to know
> about for the database in any way inside of one module.

Yes, that's the most compelling reason to keep those methods in Bugzilla::DB.

> But if we can come up with some pros of having the functions in the Schema
> object, I definitely wouldn't object.

Well, the only argument I can come up with is one based on object design
principles, but it's not holding water upon further thought. I was thinking that
the methods that are schema-related obviously should be in the schema class. But
while those methods for altering the database *utilize* the schema object, but
they don't actually *act* on the schema object. So I think I've convinced myself
that they belong in Bugzilla::DB. But whatever.

> > We already mostly have 2. The ddl_statements() method returns the SQL 
> > statements used to create the database. All we need is a method that loops 
> > over those SQL statements, passes them to $dbh->do(), and checks for errors.
> 
>   Yeah. Which is totally great for creating the database, the first time. So I
> wouldn't object to using that method for that.

OK, so do we need a create_db() method in Bugzilla::DB or would just looping
over the $dbh->bz_schema->ddl_statements in checksetup be sufficient? Since
you're working on bug 27716, I'll leave that up to you, Max.

>   Right now, most of checksetup is upgrade code, though.
> 
>   And I would be slightly concerned to have a database created using one code
> path, but upgraded using another code path. Of course, as long as they were all
> modular in some way, and it worked, I wouldn't object. :-)

I think that's always going to be a danger. The only solution that is 100%
effective and perfectly cross-DBMS is to dump the database to text files, drop
the database, recreate the database from scratch, and populate the new database
using the database dump from before the upgrade. I suppose that wouldn't be a
very popular method though.
Max, that brings us back to here, I think:

(In reply to comment #52)
> (In reply to comment #50)
> > I have discovered that *all* I actually need, in order to make Bugzilla cross-DB
> > compatible for installation, is something that maps field types to SQL.
> 
> > So that part of this patch is actually all I need. :-) I just need to be able to
> > put in a variable in all of the "ChangeFieldType" calls and so forth in
> > checksetup.pl.
> > 
> > Could you re-submit a patch that does just that? :-) (Also, I'm now a real
> > reviewer, so I can review your patch on this.)
>
> Give an example of the arguments you would pass to such a method and what you
> think the return values of the method should be for those arguments.

How would you be using this method exactly? A usage example would be very
helpful in implementing what you want.
(In reply to comment #68)
> So I think I've convinced myself that they belong in Bugzilla::DB. But 
> whatever.

  OK. They are there now (with my bug), so that sounds good to me. :-)

> OK, so do we need a create_db() method in Bugzilla::DB or would just looping
> over the $dbh->bz_schema->ddl_statements in checksetup be sufficient? Since
> you're working on bug 27716, I'll leave that up to you, Max.

  I think a create_db() method would be great. The more code that I can move out
of checksetup, the happier I am.

> [snip] I suppose that wouldn't be a very popular method though.

  Hahahaha, yeah. That certainly wouldn't be very popular. :-)

(In reply to comment #69)
> Max, that brings us back to here, I think:
> [snip]
> How would you be using this method exactly? A usage example would be very
> helpful in implementing what you want.

  OK, here's some checksetup code that exists, now (in bug 277617 form):

>  $dbh->bz_add_field('products', 'disallownew', 'tinyint not null');

  That's the simplest -- me adding a field to the products table, called
disallownew, with that type. Everything but the type is already cross-DB, there.

>  if ($dbh->bz_get_field_def('products', 'product')) {
>    $dbh->bz_change_field_type ('bugs',       'product', 'varchar(64) not null');

  There's something a little more complicated -- we check for the existence of a
field, and if it exists, we redefine some fields. Basically, we see if the
schema is old, and we change it to the new schema.

  When it gets more complicated than this, we:

  (1) Check for the existence of a field
  (2) Do some migration code
  (3) Change a field definition
  (4) Possibly repeat 2/4 a few times

  That's the general idea. In the same way, I need to be able to add indexes,
drop indexes, and check if tables exist. For the most part, functions for all of
these things already exist, and are handled by bug 277617. If, as I work on
checksetup, I discover I need another function, that can definitely be arranged. :-)
Oh, and as far as return values, look at the pod docs for bz_get_field_def and
bz_get_index_def in bug 277617.
>   OK, here's some checksetup code that exists, now (in bug 277617 form):
> 
> >  $dbh->bz_add_field('products', 'disallownew', 'tinyint not null');

<snip>

What I would also like to see here, and that seems to fit into the scope of this
bug, is some abstraction of DB types, probably using constants. So the line
above would become something like:

$dbh->bz_add_field('products', 'disallownew', 'BOOLEAN NOT NULL');

or something in that sense. As we don't want to use DB specific type for DB
schema, we do not want to use them for upgrade code also...
(In reply to comment #72)
> >   OK, here's some checksetup code that exists, now (in bug 277617 form):
> > 
> > >  $dbh->bz_add_field('products', 'disallownew', 'tinyint not null');
> 
> <snip>
> 
> What I would also like to see here, and that seems to fit into the scope of this
> bug, is some abstraction of DB types, probably using constants. So the line
> above would become something like:
> 
> $dbh->bz_add_field('products', 'disallownew', 'BOOLEAN NOT NULL');
> 
> or something in that sense. As we don't want to use DB specific type for DB
> schema, we do not want to use them for upgrade code also...

Just to clarify this further, that means that the we need something like
_adjust_schema method in this patch which will be generalised to be able to map
abstract DB type to DB specific type on demand, and it will be used internally
by all these functions which takes types as arguments.
(In reply to comment #73)
> (In reply to comment #72)
> > >   OK, here's some checksetup code that exists, now (in bug 277617 form):
> > > 
> > > >  $dbh->bz_add_field('products', 'disallownew', 'tinyint not null');

Max, how about if we make it

$dbh->bz_add_field('products','disallownew')

and then change bz_add_field() to look up the type via the Schema object?

Otherwise, the code will end up looking like this:

$dbh->bz_add_field('products','disallownew', $dbh->bz_schema-
>get_table_column_type('products','disallownew'))

which seems awfully redundant.

Everyone happy with the method name get_table_column_type()? It's kind of verbose... If so, a patch will 
be forthcoming which adds this to the DB::Schema class.
(In reply to comment #74)
> (In reply to comment #73)
> > (In reply to comment #72)
> > > >   OK, here's some checksetup code that exists, now (in bug 277617 form):
> > > > 
> > > > >  $dbh->bz_add_field('products', 'disallownew', 'tinyint not null');
> 
> Max, how about if we make it
> 
> $dbh->bz_add_field('products','disallownew')
> 
> and then change bz_add_field() to look up the type via the Schema object?

No, unfortunatelly you can't do that. If you happen to be upgrading from really
old BZ installation to a new one, and it happens that some column changed type
twice (e.g. text->varchar->blob), you would immediatelly convert to the last
type via the first call, possibly breaking the upgrade path.

> 
> Otherwise, the code will end up looking like this:
> 
> $dbh->bz_add_field('products','disallownew', $dbh->bz_schema-
> >get_table_column_type('products','disallownew'))
> 
> which seems awfully redundant.
> 
> Everyone happy with the method name get_table_column_type()? It's kind of
verbose... If so, a patch will 
> be forthcoming which adds this to the DB::Schema class.

Which is exactly what I was targeting by requesting the functions to take
abstract types and doing the conversion internally. You will call it e.g.:

$dbh->bz_add_field('products', 'disallownew', BOOLEAN);

and the bz_add_field function will look up what type it should use for BOOLEAN
on e.g. MySQL.

Sorry, but as we already agreed, the schema knowledge just has to be on two
places, one for upgrade, and one for new install.
(In reply to comment #74)
> $dbh->bz_add_field('products','disallownew')

  Believe it or not, sometimes I may need to change a field to something that
isn't its current definition, and then change it again. Migration is nuts.
(In reply to comment #75)
> No, unfortunatelly you can't do that. If you happen to be upgrading from really
> old BZ installation to a new one, and it happens that some column changed type
> twice (e.g. text->varchar->blob), you would immediatelly convert to the last
> type via the first call, possibly breaking the upgrade path.

Tom, I think you're getting bz_add_field() mixed up with bz_change_field_type(). What you are 
describing is not a problem with bz_add_field(), but it is definitely a problem with how 
bz_change_field_type() is utilized. I agree there needs to be a Schema method for converting the DB-
generic types like BOOLEAN to its corresponding DB-specific type. I'll implement that as well. Any 
suggestions for what to name that method?
(In reply to comment #77)

  Just to clarify: I think I may also need to sometimes add fields that don't
have their modern definition, and then change them later in the code.
Alias: bz-dbschema
(In reply to comment #77)
> Any suggestions for what to name that method?

sql_column_type, maybe? That would be consistent.

(In reply to comment #77)
> Tom, I think you're getting bz_add_field() mixed up with
bz_change_field_type(). What you are 
> describing is not a problem with bz_add_field(), but it is definitely a
problem with how 
> bz_change_field_type() is utilized.

Yes, I was talking more about change_field_type than add_field, but as Max
already pointed out, the same holds true for add_field as well. You can have
cases where you add a field with certain type and change it later (and you can't
just add the field with the new type). Yes, DB upgrading sucks :-).

(In reply to comment #79)
> sql_column_type, maybe? That would be consistent.

Where are we going to use this? I would almost say it should be private, as it
should be called only by other Bugzilla::DB functions, am I right here?
(In reply to comment #78)
>   Just to clarify: I think I may also need to sometimes add fields that don't
> have their modern definition, and then change them later in the code.

Ugh. That seems kind of nuts, if you ask me. Kind of defeats the whole purpose of the Schema class!  I 
suppose that's the only feasible way to implement upgrading the database from really old versions to 
the current version. Well, I'll make one last suggestion: If bz_add_field() is passed only two arguments, I 
think it should still look up the (current) type via Schema. If bz_add_field() is passed three arguments, 
then it could use that third argument as the type of the column to be added. What do you think?
Alias: bz-dbschema
(In reply to comment #81)
> Well, I'll make one last suggestion: If bz_add_field() is passed only two 
> arguments, I think it should still look up the (current) type via Schema. If 
> bz_add_field() is passed three arguments, then it could use that third argument 
> as the type of the column to be added. What do you think?

  It's a fine idea, but checksetup is "time-based" code. That is, at a certain
time in the history of Bugzilla I want to change a field to a specific type, and
then later I want to change it to another specific type. checksetup code is
supposed to be "write once, ignore forever." :-)
(In reply to comment #81)
> (In reply to comment #78)
> >   Just to clarify: I think I may also need to sometimes add fields that don't
> > have their modern definition, and then change them later in the code.
> 
> Ugh. That seems kind of nuts, if you ask me. Kind of defeats the whole purpose
of the Schema class!  I 
> suppose that's the only feasible way to implement upgrading the database from
really old versions to 
> the current version.

Yep, it's nuts, but that's life :-). As it is now, schema will be used ONLY for
new installs (and maybe consistency checks etc.), but not for upgrades.

> Well, I'll make one last suggestion: If bz_add_field() is passed only two
arguments, I 
> think it should still look up the (current) type via Schema. If bz_add_field()
is passed three arguments, 
> then it could use that third argument as the type of the column to be added.
What do you think?

No. bz_add_field() should be used only by checksetup, nothing else. If we
encourage (or even allow) use with two parameters, it will cause trouble later
on when you find you really need to change that type.

Think about the whole upgrade process as a journal, or DB transaction - you are
always only adding new conversion statements, but never modify or remove what's
already there. That's the only way to guarantee upgrade from anything to latest
version.
If you allow to use Schema for some of the calls, you will effectively change
it's parameters when the schema changes, which means breaking the transaction
record and (possibly) breaking the upgrade...
Attached patch V3 (obsolete) — Splinter Review
New patch which adds sql_type() method, which I think addresses our recent
discussion with regards to supporting abstract data types in checksetup.pl.

This patch also addresses comment #49 by assuming that Mysql will always need
to create a FULLTEXT index whenever one of the fields being indexed is of type
text.

I just noticed that checksetup.pl and the Mysql schema generated by this object
class exhibit a fair number of differences. Most of the Mysql integer data
types are "mediumint", but the Schema object has "integer". Some of the "text"
fields are "mediumtext" in checksetup.pl. Do these differences need to be
addressed?
Attachment #168834 - Attachment is obsolete: true
(In reply to comment #84)
> I just noticed that checksetup.pl and the Mysql schema generated by this object
> class exhibit a fair number of differences. Most of the Mysql integer data
> types are "mediumint", but the Schema object has "integer". Some of the "text"
> fields are "mediumtext" in checksetup.pl. Do these differences need to be
> addressed?

  Yeah. Unfortunately, MySQL will either refuse to do, or fail to use correct
indexes for, or at least spit out a warning about, comparing integer fields of
different sizes. So what is now a mediumint needs to stay a mediumint.

  If I could go back in time, yes, I'd almost definitely change all the
"integer" fields to something more sane. But for now, if it's "mediumint" we
need a way to keep it "mediumint."

  For PostgreSQL, I don't care, as long as it's all consistent. :-) If we can
keep all the integer types consistent for Pg, then I'm really happy. :-)

  Perhaps later, we can put in a checksetup change to standardize all integer
fields to the same type. In fact, if somebody would file a bug for that and
assign it to me, I'd be happy to do it (after this patch, here).
Comment on attachment 173714 [details] [diff] [review]
V3

  As always, this is very well-designed. :-)

>+sub initialize {

  We usually call this sort of thing _create, in Bugzilla modules. I suppose
it's not technically private, here, but "protected" instead. Still, if it's not
meant to be used by callers, I'd give it an underscore. Or some other
convention.

>+    my($self,$tname,$iname,$ifields,$itype) = @_;

  Bugzilla style is usually to put a space between those.

  Also, use more descriptive variable names. We don't object to having
longer_names, for Bugzilla. :-) That would make it a lot clearer to people who
aren't you and I what this code is doing. :-)

>diff -rNu Bugzilla.HEAD/DB/Schema/Pg.pm Bugzilla/DB/Schema/Pg.pm
> [snip]
>+    $self->{db_extras} =

  As I've mentioned before, I would prefer if somehow this could be done by the
create_db() function (or whatever it will be called) when it runs into a
"timestamp" type. Of course, we're also trying to eliminate all timestamp
fields, so we'll probably get to that before we actually go live with this
code. (Still, I'd rather that not block this patch.)

  It's only a nit, though, and since this *works*, I'd be happy to check this
part in as-is.

>diff -rNu Bugzilla.HEAD/DB/Schema/Sybase.pm Bugzilla/DB/Schema/Sybase.pm

  Wow! :-) Cool. :-)

  Let's keep Sybase out of this patch for now, though. We can add it in later,
for the Sybase bug, or in some other bug.

>+    $self->{end_of_statement} = "\ngo\n";

  Oh, I kept forgetting to mention this -- DBI is supposed to handle this. We
don't need to handle this. At the least, this should be in Bugzilla::DB, not
here.

>--- Bugzilla.HEAD/DB/Schema.pm	Wed Dec 31 19:00:00 1969
>+++ Bugzilla/DB/Schema.pm	Tue Feb  8 02:31:30 2005

>+use strict;
>+use Bugzilla::Config qw(:db);

  You shouldn't need that information here, technically, because Bugzilla::DB
will always be your caller, and will supply you with something useful.

  Perhaps new() should just take an $dbh handle created by Bugzilla::DB, or
something (just idle thoughts).

>+    my $this = shift;
>+    my $class = ref($this) || $this;
>+    my $driver = @_ ? shift : $db_driver;

  Yeah, this is what we shouldn't be doing here. (It has to do with the fact
that every time we call $db_driver, we have to do error-checking to output
sensible error messages to the user, and I'd like to keep that all in the
Buzilla::DB framework.)

  I think that we can get rid of the mechanism, and expect the caller to always
know what type of DB they will be instantiating on (since the caller will
always be Bugzilla::DB, in our case).

>+sub _adjust_schema {
>+    # PRIVATE method to alter the abstract schema at instantiation-time to be
>+    # DB-specific. It's a generic enough routine that it can be defined here in
>+    # the base class. Subclasses may override or extend this method if needed.
>+
>+    my $self = shift;
>+    my $db_specific = $self->{db_specific};
>+    foreach my $table (keys %{ $self->{schema} }) {
>+        my %fields = (@{ $self->{schema}{$table}{FIELDS} });
>+        foreach my $field_def (values %fields) {
>+            if (exists($db_specific->{$field_def->{TYPE}})) {
>+                $field_def->{TYPE} = $db_specific->{$field_def->{TYPE}};
>+            }
>+            if (exists($field_def->{DEFAULT})
>+                && exists($db_specific->{$field_def->{DEFAULT}})) {
>+                $field_def->{DEFAULT} = $db_specific->{$field_def->{DEFAULT}};

  As I mentioned before, this needs extensive line-by-line comments.

>+sub ddl_statements {
>+    # PUBLIC method to generate the SQL statements needed to create the
>+    # the Bugzilla database's tables and indexes. Subclasses may override or
>+    # extend this method if needed.
>+    # Returns an array of strings containing the SQL statements.

  Hrm... I'd rather that this were private, and a create_db() call was public.
Then we could just call create_db() in checksetup, and eliminate a lot of
nonsense from that file. :-)

  Look at the part of checksetup.pl that starts with:

>########################################################################
># Create tables
>########################################################################

  A lot of that can be moved to inside this class, in a create_db() function.

  So generally, instead, what should happen is a lot of looping $dbh->do
statements. If that shows up as Bugzilla::DB::bz_create_db(), I'm totally fine
with that. If it shows up here, because it requires intimate knowledge of the
internals of the Schema, I'm also OK with that.

>+    my $self = shift;
>+    my @ddl = ();
>+
>+    foreach my $tname (keys %{ $self->{schema} }) {
>+        my $thash = $self->{schema}{$tname};

  Another point where longer_variable_names would help.

>+    ThrowCodeError("According to the database schema, the table $table does not contain a field named $field.") unless ($finfo);

  This line is a bit too long, so it'll just need to be cut up into some
appended strings.

>+    my $type = $finfo->{TYPE};
>+    my $default = $finfo->{DEFAULT};
>+    my $fkref = $self->{no_references} ? undef : $finfo->{REFERENCES};

  I could use a comment for that line. :-)

>+    my $sql = $type;
>+    $sql .= " NOT NULL" if ($finfo->{NOTNULL});
>+    $sql .= " DEFAULT $default" if (defined($default));
>+    $sql .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY});
>+    $sql .= "\n\t\t\t\tREFERENCES $fkref" if $fkref;

  That REFERENCES has to not be in the code for MySQL 3. Do we handle that?

>+sub sql_type ($) {
>+    # PUBLIC method to convert abstract (database-generic) field types to
>+    # database-specific field types.
>+
>+    my($self,$type) = @_;
>+
>+    return($self->{db_specific}{$type} || $type);

  And this looks perfect. :-) I wouldn't object to having it in Bugzilla::DB,
too.

  OK, here's what we'll actually need in order for this to get checked-in:

  (1) Important comments addressed
  (2) Actual use in the codebase inside of this patch. That is, the "create
tables" section of checksetup should be replaced with a call to bz_create_db()
or bz_create_tables() or whatever you want to call it. :-) We will then need to
do regression testing -- I can do this, because I have access to landfill,
where we have a lot of old Bugzilla DBs.
  (3) Full integration into the Bugzilla::DB structure (which will be required
for 2 above, anyway).
Attachment #173714 - Flags: review-
Attached patch V4 (obsolete) — Splinter Review
(In reply to comment #86)
> >+sub initialize {
> 
>   We usually call this sort of thing _create, in Bugzilla modules. I suppose
> it's not technically private, here, but "protected" instead. Still, if it's
not
> meant to be used by callers, I'd give it an underscore. Or some other
> convention.

Right. Renamed method to _initialize.

> >+	my($self,$tname,$iname,$ifields,$itype) = @_;
> 
>   Bugzilla style is usually to put a space between those.

OK.

>   Also, use more descriptive variable names. We don't object to having
> longer_names, for Bugzilla. :-) That would make it a lot clearer to people
who
> aren't you and I what this code is doing. :-)

OK.

> >diff -rNu Bugzilla.HEAD/DB/Schema/Pg.pm Bugzilla/DB/Schema/Pg.pm
> > [snip]
> >+	$self->{db_extras} =
> 
>   As I've mentioned before, I would prefer if somehow this could be done by
the
> create_db() function (or whatever it will be called) when it runs into a
> "timestamp" type. Of course, we're also trying to eliminate all timestamp
> fields, so we'll probably get to that before we actually go live with this
> code. (Still, I'd rather that not block this patch.)
> 
>   It's only a nit, though, and since this *works*, I'd be happy to check this

> part in as-is.

I don't really understand what you want here. If you want me to delete both of
these
"CREATE RULE" statements from the Pg class, just say the word.

Or is the whole db_extras thing that you don't like?

>   Let's keep Sybase out of this patch for now, though. We can add it in
later,
> for the Sybase bug, or in some other bug.

Well, I expected that.

> >+	$self->{end_of_statement} = "\ngo\n";
> 
>   Oh, I kept forgetting to mention this -- DBI is supposed to handle this. We

> don't need to handle this. At the least, this should be in Bugzilla::DB, not
> here.

Yes, you're absolutely right. This has been removed.

> >+use strict;
> >+use Bugzilla::Config qw(:db);
> 
>   You shouldn't need that information here, technically, because Bugzilla::DB

> will always be your caller, and will supply you with something useful.
>
>   Perhaps new() should just take an $dbh handle created by Bugzilla::DB, or
> something (just idle thoughts).
> 
> >+	my $this = shift;
> >+	my $class = ref($this) || $this;
> >+	my $driver = @_ ? shift : $db_driver;
> 
>   Yeah, this is what we shouldn't be doing here. (It has to do with the fact
> that every time we call $db_driver, we have to do error-checking to output
> sensible error messages to the user, and I'd like to keep that all in the
> Buzilla::DB framework.)

It doesn't make the constructor any shorter or simpler to remove it, but OK.

>   I think that we can get rid of the mechanism, and expect the caller to
always
> know what type of DB they will be instantiating on (since the caller will
> always be Bugzilla::DB, in our case).

Yeah, sure.

> >+sub _adjust_schema {
>   As I mentioned before, this needs extensive line-by-line comments.

Done.

> >+sub ddl_statements {
> >+	# PUBLIC method to generate the SQL statements needed to create the
> >+	# the Bugzilla database's tables and indexes. Subclasses may override
or
> >+	# extend this method if needed.
> >+	# Returns an array of strings containing the SQL statements.
> 
>   Hrm... I'd rather that this were private, and a create_db() call was
public.
> Then we could just call create_db() in checksetup, and eliminate a lot of
> nonsense from that file. :-)

I thought we decided to put create_db() in Bugzilla::DB? 

Aside: Looking at checksetup.pl, I see it's creating the tables with "TYPE =
MYISAM" appended to all of the "CREATE TABLE" statements. The current
implmentation of the Schema class does not do this. Should it?

> >+	ThrowCodeError("According to the database schema, the table $table does
not contain a field named $field.") unless ($finfo);
> 
>   This line is a bit too long, so it'll just need to be cut up into some
> appended strings.

Done.

> >+	my $type = $finfo->{TYPE};
> >+	my $default = $finfo->{DEFAULT};
> >+	my $fkref = $self->{no_references} ? undef : $finfo->{REFERENCES};
> 
>   I could use a comment for that line. :-)

I basically just put that in there so that subclasses can eliminate references
without overriding the whole method. I'm still thinking it over.
 
>   That REFERENCES has to not be in the code for MySQL 3. Do we handle that?

Not currently. I could add a Mysql3.pm subclass which ISA Mysql with the only
difference being that it's _initialize method sets $self->{no_references} to
true after calling SUPER::_initialize. It would be up to Bugzilla::DB to
instantiate either the Mysql or Mysql3 schema. What do you think?

>   OK, here's what we'll actually need in order for this to get checked-in:
> 
>   (1) Important comments addressed

Mostly done?
Attachment #173714 - Attachment is obsolete: true
(In reply to comment #87)
> I don't really understand what you want here. If you want me to delete both of
> these "CREATE RULE" statements from the Pg class, just say the word.

  Nah. Don't change anything for now. The CREATE RULE statements will eventually
disappear anyway, when the timestamp fields disappear.

> I thought we decided to put create_db() in Bugzilla::DB? 

  Oh yeah, we did. You can add it there, though, as a part of this patch. (At
least, after the required patch is checked-in.)

> Aside: Looking at checksetup.pl, I see it's creating the tables with "TYPE =
> MYISAM" appended to all of the "CREATE TABLE" statements. The current
> implmentation of the Schema class does not do this. Should it?

  Yes, it definitely should. That was a bugfix; I forget the bug number.

> I basically just put that in there so that subclasses can eliminate references
> without overriding the whole method. I'm still thinking it over.

  OK. It needs to be documented somewhere, then, if you keep it.

> >   That REFERENCES has to not be in the code for MySQL 3. Do we handle that?
> 
> Not currently. I could add a Mysql3.pm subclass which ISA Mysql with the only
> difference being that it's _initialize method sets $self->{no_references} to
> true after calling SUPER::_initialize. It would be up to Bugzilla::DB to
> instantiate either the Mysql or Mysql3 schema. What do you think?

  Instead, there should be a server_version sub in Bugzilla::DB that will tell
you the information you need inside of Schema::Mysql to disable references or not.

> Mostly done?

  I'll check the new patch itself over more, later. :-) It does sound like it's
mostly done, though. :-)
(In reply to comment #88)
>   Nah. Don't change anything for now. The CREATE RULE statements will eventually
> disappear anyway, when the timestamp fields disappear.
> 

They already did, last patch was checked in today. There shouldn't be any column
with timestamp type anymore.
Oh, I just saw this:

Take a look at DBI::type_info_all and DBI::type_info. I think it could be really
useful for this module, perhaps... Perhaps we could override type_info for types
that it doesn't support or that we want to change.
Attached patch V5 (obsolete) — Splinter Review
I went through all the table definitions in checksetup.pl and fixed any
discrepancies with the abstract schema. There were a lot of changes made to the
schema since Andrew first created the abstract schema! Please look over these
changes. This version should generate mediumint and mediumtext fields in all
the same places that the current checksetup.pl schema does. I also appended
"TYPE = MYISAM" to all MySQL CREATE TABLE statements. I removed the CREATE RULE
statements from the Pg schema, as per comment #89.
Attachment #173835 - Attachment is obsolete: true
Comment on attachment 174459 [details] [diff] [review]
V5

Great! This actually basically looks good.

It needs some POD docs, though.

And before I can actually review+ it, I need to see some use of it or other.
Perhaps a create_db function in the new Bugzilla::DB structure. And also patch
the new Bugzilla::DB structure to know about the Schema objects.

But this part of it looks fundamentally good.
Attachment #174459 - Flags: review-
And note that there have been some schema checkins since this patch was posted, too:

http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/webtools/bugzilla/checksetup.pl

cryptpassword varchar(128) is the only one that I see, but there might have been
others before that, that I'm not catching.
FWIW, when you are at that :-), I would also like to see some more integration
with the db compat modules. I think we don't need new modules for DB specific
code, you can now add it to the existing ones. We can also remove the driver
dependent instantiation from here to avoid duplicating effort.
I would probably go for a function in the DB.pm module to return/create the db,
which would get the generic schema from either DB.pm, or, preferably, from
Schema.pm, and then call DB specific (abstract) method to adjust it with DB
specific rules. How does that sound? :-)
Otherwise I really like the direction this patch is going...
(In reply to comment #94)
> FWIW, when you are at that :-), I would also like to see some more integration
> with the db compat modules.

  There will be. However, the Schema objects should stay as they are. They will
be objects privately held by the Bugzilla::DB object, and accessed that way.
Nobody outside of Bugzilla::DB will ever know about Schema.

> I would probably go for a function in the DB.pm module to return/create the db,

  Already planned. :-) That's what I just mentioned in my review. :-)
Comment on attachment 174459 [details] [diff] [review]
V5

>+} #eosub--new
>+#------------------------------------------------------------------------------

  Oh, and don't do those -- it's not standard Bugzilla style.
(In reply to comment #96)
> (From update of attachment 174459 [details] [diff] [review] [edit])
> >+} #eosub--new
> >+#------------------------------------------------------------------------------
> 
>   Oh, and don't do those -- it's not standard Bugzilla style.
> 

It should be.
(In reply to comment #97)
> >   Oh, and don't do those -- it's not standard Bugzilla style.
> > 
> 
> It should be.

  It won't be, though. It clutters the space and nobody will ever remember to
always do it. :-) It's a nice thing, particularly when reading in a highlighting
editor, but it's just not what we do at this point. I prefer to be as consistent
as possible with the current standards.
Actually, I thought about it, and it does make things more readable.

But make the lines 70 characters, not 80, so we can quote them more easily.
(In reply to comment #99)
> Actually, I thought about it, and it does make things more readable.
> But make the lines 70 characters, not 80, so we can quote them more easily.

  I should clarify. :-) I've decided I like the #---- things. But make them 70
characters long instead of 80 characters long. :-)
As stated at bug 283076, we need to specify precision of dates for Postgres to
be zero to get rid of the fraction.
Also, you will need to add "without time zone" to Postgres date types. See the
(outdated) pgsetup.pl at bug 98304
(https://bugzilla.mozilla.org/attachment.cgi?id=130925) for details.
Ed: All I need for this patch to get review+ is to have a section that adds a
"bz_schema" object to the Bugzilla::DB instance.

Then I can code the actual installation code.
Alias: bz-dbschema
Blocks: 284348
(In reply to comment #102)
> Ed: All I need for this patch to get review+ is to have a section that adds a
> "bz_schema" object to the Bugzilla::DB instance.
> 
> Then I can code the actual installation code.

I'll post a new patch in a few days, Max.
OK. Feature freeze for 2.20 is March 15, so I've been hacking away at checksetup
pretty rapidly. I might get around to it first, so we'll see.
OK, now that I'm actually implementing this for bug 284348, I've realized that I
need get_table_ddl, and I don't need ddl_statements.

It needs to be the Bugzilla::DB class that runs the check for the tables --
because I don't always have to create *all* the tables; I just need to create
some of them, sometimes.

So get_table_ddl should become public, and ddl_statements should be removed.
To give you an idea, here's what I'm working on, now:

sub bz_setup_database {
    my ($self) = @_;

    # Get a list of the existing tables (if any) in the database
    my $table_sth = $self->table_info(undef, undef, undef, "TABLE");
    my @tables = @{$self->selectcol_arrayref($table_sth, { Columns => [3] })};

    # go through our %table hash and create missing tables
    while (my ($table_name) = each %table) {
        next if grep /^$table_name$/, @tables;
        print "Creating table $tabname ...\n";

    $self->do($self->_bz_schema->get_table_ddl($table_name))
        or die "Could not create table '$tabname'. " .
          "Please check your " . $self->PROGRAM_NAME . " access rights.\n";
}
(In reply to comment #105)
> OK, now that I'm actually implementing this for bug 284348, I've realized that I
> need get_table_ddl, and I don't need ddl_statements.
> 
> It needs to be the Bugzilla::DB class that runs the check for the tables --
> because I don't always have to create *all* the tables; I just need to create
> some of them, sometimes.
> 
> So get_table_ddl should become public, and ddl_statements should be removed.

Surely you need both? I'm thinking of the case where the database is created
from scratch (fresh installation). Or at least a method to get a list of all
tables in the schema. The latter sounds like it could be useful in any case.
Well, actually, when I create all the tables in the database, I do it in the
loop above. So it's Bugzilla::DB that needs to do the loop.

There's actually never a time when I say "create this database from scratch" --
I just say, "create for me any tables that don't already exist."
Also, I just realized that the patch has a bunch of tabs in it. Use four spaces
instead of tabs.
(In reply to comment #108)
> Well, actually, when I create all the tables in the database, I do it in the
> loop above. So it's Bugzilla::DB that needs to do the loop.
> 
> There's actually never a time when I say "create this database from scratch" --
> I just say, "create for me any tables that don't already exist."

OK, but what is %table and where does it come from in the code snippet above?
I maintain that the list of tables should come from Bugzilla::DB::Schema, so
I think there should be a get_tables() method (or get_table_list() method?) to
supplement the requested get_table_ddl() method.
OK, this schema is up-to-date as of today with exactly what checksetup creates.


This is the file I'm using to work on bug 284348, where I've tested it and I
know that it works.

There are a few things that need to be addressed, though, even with this patch:


  + The table definitions need to be in a consistent order. Why are they
ordered like they are now? If there's some reason, put a comment in the file.
In any case, put a comment in the file noting how the table definitions should
be ordered, so they don't become chaotic.

  + There are tabs in the file, and there's also strange spacing. All indents
should be four characters, except where something else would be clearer. The
table definitions should try to have all their "=>" aligned in each table.

  + I've left a few XXX comments -- they should be addressed.

  + Index names: Right now, we technically *do* have index names in MySQL. They
are named after the first column in the index. I've changed a few of the
existing ones to match that, just so that NO changes whatsoever happen to the
schema. If we do decide to go this way, then the other ones also need to be
changed.
    When they are created by _get_index_ddl, they can have their name changed
to table_name_$name_idx, if we want. I'm not sure if we should do that only for
non-MySQL databases, or for all databases.
    If Bugzilla really does NOT currently depend on index names, then we can
change them. However, we should also then put code in checksetup to drop all
the current indexes, or to rename them to the proper names, so that in the
future we *can* use the index names if we want.
Attachment #174459 - Attachment is obsolete: true
Oh, and yes, I'd love a get_table_list method. :-) That would be great! :-)
(In reply to comment #111)
>   + The table definitions need to be in a consistent order. Why are they
> ordered like they are now?

There's no reason for the order as far as I know. It's in the same order as
when Andrew Dunstan created the abstract schema in his initial prototype.
What order do you prefer?

>   + There are tabs in the file, and there's also strange spacing. All indents
> should be four characters, except where something else would be clearer.

I'll get rid of the tabs.

> The table definitions should try to have all their "=>" aligned in each table.

Fine.

>   + I've left a few XXX comments -- they should be addressed.

The FULLTEXT issue is already addressed in the Mysql Schema module, dude.
Nothing needs to be done here.

The DISPLAYWIDTH attributes are not used. I think Andrew intended them for the
future, but they can be removed.

>   + Index names: Right now, we technically *do* have index names in MySQL. They
> are named after the first column in the index. I've changed a few of the
> existing ones to match that, just so that NO changes whatsoever happen to the
> schema. If we do decide to go this way, then the other ones also need to be
> changed.
>     When they are created by _get_index_ddl, they can have their name changed
> to table_name_$name_idx, if we want. I'm not sure if we should do that only for
> non-MySQL databases, or for all databases.
>     If Bugzilla really does NOT currently depend on index names, then we can
> change them. However, we should also then put code in checksetup to drop all
> the current indexes, or to rename them to the proper names, so that in the
> future we *can* use the index names if we want.

I disagree with this change. First off, I am extremely doubtful that Bugzilla depends
on the index names. Secondly, the index naming scheme was discussed nearly a
year ago and all comments indicate tacit approval of the naming scheme that has been
employed. It was implicitly approved by Dave, I believe. If you want a specific index
naming scheme for Mysql, then it can addressed in the Mysql module (perhaps
in the _create_index_ddl method) or you rename existing indexes in checksetup.
Status: ASSIGNED → NEW
Hrm, I'm not sure what order I prefer. :-) I think that related tables should be
grouped together. So all the groups tables should be in one place, and all the
flags tables should be in another, and so on.

Eventually I want to have POD docs for the schema, for each table. That's why I
moved all the checksetup comments into the module, so that we can eventually
convert them to POD docs. So the tables should be in some order where if we make
the POD docs inline then related tables will be grouped together in the docs.

As far as the index naming, that's OK, then. I talked to Dave, and he agrees
that we should be using named indexes. So you can change back the things I
changed -- my apologies on that, then! :-)

I *thought* that we were doing something about the FULLTEXT stuff in the MySQL
module. :-) My apologies. :-) I would like to *somehow* note that certain fields
*ought* to have fulltext indexes, though, in the primary schema. The "Simple Bug
Search" page depends on fulltext indexes, so even though we're not going to
implement them in PostgreSQL for Bugzilla 2.20, it would be nice to note that
they *ought* to be there *if possible* in every DB.
Status: NEW → ASSIGNED
Oh, by the way, you might want to check out my work so far in bug 284348, where
I'm using (and very much enjoying) your module. I created the hooks in
Bugzilla::DB, so you don't need to do that for this patch if you don't want. :-)
(In reply to comment #114)
> I *thought* that we were doing something about the FULLTEXT stuff in the MySQL
> module. :-) My apologies. :-) I would like to *somehow* note that certain fields
> *ought* to have fulltext indexes, though, in the primary schema. The "Simple Bug
> Search" page depends on fulltext indexes, so even though we're not going to
> implement them in PostgreSQL for Bugzilla 2.20, it would be nice to note that
> they *ought* to be there *if possible* in every DB.

Yep, I concur to that. I would prefer to add some flag or field to the generic
schema stating that we want full text index on specific column. DB which support
that would use that flag, other DBs can ignore it, but it should be specified in
the schema explicitly.
Do I understand correctly, that we are currently adding fulltext index to every
text field, which has index associated with it? If yes, it's different from what
we have in checksetup - see e.g. table components, column name - it's indexed,
but not fulltext. Having fulltext on every indexed text column is a bit waste,
isn't it?
(In reply to comment #116)
> Do I understand correctly, that we are currently adding fulltext index to every
> text field, which has index associated with it? If yes, it's different from what
> we have in checksetup - see e.g. table components, column name - it's indexed,
> but not fulltext. Having fulltext on every indexed text column is a bit waste,
> isn't it?

Sorry, my mistake, you are not checking varchar, but only tiny/medium/text, so
it will work correctly. But anyway, with this approach we are losing the
flexibility to have text column as index without full text. I don't know if it
makes sense to create indexed text without fulltext index, but I just wanted to
point that out :-).
(In reply to comment #117)
> But anyway, with this approach we are losing the
> flexibility to have text column as index without full text. I don't know if it
> makes sense to create indexed text without fulltext index, but I just wanted to
> point that out :-).

I'm aware of that and that was my belief. I don't think it makes sense to ever
create a non-fulltext index on a text field.

There are two additional reasons why I am resistant to the idea of an index
having an attributes other than UNIQUE:
1. It's not ANSI/ISO-standard SQL, as far as I know. Philosophically,
anything non-standard should be implemented in the DB-specific subclasses.
2. The schema data structure would have to be restructured (refer to
comment #25), and I'm lazy.

I think I can overcome these with sufficient persuasion.

Just out of curiosity, what DBMSs besides MySQL support FULLTEXT indexes?
(In reply to comment #118)
> I'm aware of that and that was my belief. I don't think it makes sense to ever
> create a non-fulltext index on a text field.

  It does (any index is better than no index), but FULLTEXT indexes are usually
more useful.

> There are two additional reasons why I am resistant to the idea of an index
> having an attributes other than UNIQUE:
> 1. It's not ANSI/ISO-standard SQL, as far as I know. Philosophically,
> anything non-standard should be implemented in the DB-specific subclasses.

  Yeah. But we need the functionality in Bugzilla, so I'd like to at least have
the option. We should just document in the Schema class that the FULLTEXT
indexes may be implemented by subclasses as normal indexes, so Caveat Emptor.

> 2. The schema data structure would have to be restructured (refer to
> comment #25), and I'm lazy.

  Hahahaha. :-) You could also create a FULLTEXT section. It's up to you.
 
> I think I can overcome these with sufficient persuasion.

  /me makes persuasive hand motions. :-)
 
> Just out of curiosity, what DBMSs besides MySQL support FULLTEXT indexes?

  I know that there's an add-on module for PostgreSQL called tsearch that does
it. We may use it in Bugzilla in the future.

  I think that there are a few other DBMSes that do, but I don't know off the
top of my head. I'll bet MS-SQL does in some way or another, but I haven't used
it in a while. Does Oracle not have any fulltext support? (Oracle is likely our
next implementation after PostgreSQL.)
(In reply to comment #114)
> Hrm, I'm not sure what order I prefer. :-) I think that related tables should be
> grouped together. So all the groups tables should be in one place, and all the
> flags tables should be in another, and so on.

I think you'll need to be more specific. I could do alphabetical. I could mimic the
order in checksetup.pl. But if you want a specific order, maybe you should
e-mail it to me.
OK, I emailed the proposed order.

Oh, and I forgot to mention that I removed all the REFERENCES stuff -- let's
leave that for bug 109473, because it would actually change the schema at this
point, and I'm trying to avoid any functionality changes while we re-arch this.
Attached patch V7 (obsolete) — Splinter Review
Changes:

1. Removed tabs.
2. Re-ordered tables.
3. Removed ddl_statements() method.
4. Added get_table_ddl() and get_table_list() methods.
5. Renamed some protected methods.
6. Added some pod, but more pod needed, I'm sure.
7. Reverted index names.
8. Merged in Max's schema changes.

Didn't address recent FULLTEXT comments yet. Will probably
restructure the schema data structure a la comment #25 in
the next version.
Comment on attachment 176022 [details] [diff] [review]
V7

This is awesome. The POD is great. :-) And the spacing is way better. This
makes me happy.

Try to put the POD inline in the file -- that's the new style that we're moving
toward. That way the function docs end up above the functions, and the module
description ends up at the top of the file. It's nice for people who are
reading the code.

You can use =cut to end sections.

Note somewhere in the module that the entire Schema module is package-private.
That is, only Bugzilla::DB should ever interact with the Schema modules -- CGI
callers should only interact with Bugzilla::DB.

Oh, and instead of sql_type, how about a method that takes the standard Schema
format for a field, and returns me some SQL? That is, {TYPE => 'INT1', NOTNULL
=> 1}. I think that would be the best. Then we could use that format in
checksetup for the future, and have another function that translates the raw
SQL definitions into that format for backwards-compatibility functions.

>diff -rNu Bugzilla.HEAD/DB/Schema/Mysql.pm Bugzilla/DB/Schema/Mysql.pm
> [snip]
>+        BOOLEAN =>      'tinyint',

  I'm not sure if we've discussed this, but should we do tinyint(1)? I think it
probably doesn't matter, actually.

>+        INTERVAL =>     'timestamp',

  I think we don't need that type anymore. :-)

>+    $self->{create_table_postamble} = ' TYPE = MYISAM';

  I think it would be better to override _get_create_table_ddl and append that
to the end of the returned data from $self->SUPER::_get_create_table_ddl.


  I look forward to the new FULLTEXT thing! :-)
(In reply to comment #123)

> Oh, and instead of sql_type, how about a method that takes the standard Schema
> format for a field, and returns me some SQL? That is, {TYPE => 'INT1', NOTNULL
> => 1}. I think that would be the best. Then we could use that format in
> checksetup for the future, and have another function that translates the raw
> SQL definitions into that format for backwards-compatibility functions.

Sounds like _get_field_ddl(), but it's not clear to me what exactly return
value(s) you want.

> >+    $self->{create_table_postamble} = ' TYPE = MYISAM';
> 
>   I think it would be better to override _get_create_table_ddl and append that
> to the end of the returned data from $self->SUPER::_get_create_table_ddl.

Perhaps. I'll consider it.
(In reply to comment #124)
> Sounds like _get_field_ddl(), but it's not clear to me what exactly return
> value(s) you want.

  Yeah, it does sound like that. (I haven't looked over the existing functions
in detail. I want it to preferably return SQL that I can pass to ALTER TABLE as
the column definition.

  Or a function that would generate an ALTER TABLE statement to change a column
definition (it would be the backend for bz_change_field_def). That would also be
acceptable. But I think the thing to generate the SQL for the column definition
would be more immediately useable.
(In reply to comment #125)
>> Sounds like _get_field_ddl(), but it's not clear to me what exactly return
>> value(s) you want.
>   Yeah, it does sound like that. (I haven't looked over the existing functions
> in detail. I want it to preferably return SQL that I can pass to ALTER TABLE as
> the column definition.

I'm just dense when it comes to these things... If I'm understanding you
correctly, you pass {TYPE => 'INT1', NOTNULL => 1} to some method and you want
it to return the string "tinyint not null" for MySQL. Presumably it should grok
DEFAULT and PRIMARYKEY as well. OK, no problem. I'll probably call this method
get_type_ddl().
OK, a few things that I have discovered:

(1) We should use smallint for booleans. DBD::Pg only converts 0 to f and 1 to t
if you use placeholders. Bugzilla uses 0 and 1 outside of placeholders ALL OVER.
So it's to smallint we go.

HOWEVER:

From http://www.postgresql.org/docs/7.3/interactive/datatype.html#DATATYPE-INT
------------------------------------------------------------------------------
Note:  If you have a column of type smallint or bigint with an index, you may
encounter problems getting the system to use that index. For instance, a clause
of the form

... WHERE smallint_column = 42

will not use an index, because the system assigns type integer to the constant
42, and PostgreSQL currently cannot use an index when two different data types
are involved. A workaround is to single-quote the constant, thus:

... WHERE smallint_column = '42'
-------------------------------------------------------------------------------

Which means that all our integer types for PostgreSQL should be "integer," I think.
We could also use bit(1) for the booleans. That might be better, but I'm not
certain.
(In reply to comment #127)
> HOWEVER:
> 
> From http://www.postgresql.org/docs/7.3/interactive/datatype.html#DATATYPE-INT
> ------------------------------------------------------------------------------
> Note:  If you have a column of type smallint or bigint with an index, you may
> encounter problems getting the system to use that index. For instance, a clause
> of the form
> 
> ... WHERE smallint_column = 42
> 
> will not use an index, because the system assigns type integer to the constant
> 42, and PostgreSQL currently cannot use an index when two different data types
> are involved. A workaround is to single-quote the constant, thus:
> 
> ... WHERE smallint_column = '42'
> -------------------------------------------------------------------------------
> 
> Which means that all our integer types for PostgreSQL should be "integer," I
think.

You should use the latest version of the docs.
http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT shows
that issue is gone in 8.x.

As for booleans/bits, it's possible that these types will eventually support
combining multiple boolean/bit fields into a single word, which would provide
far more efficient storage than a smallint would. I don't believe this is
currently done, though.
(In reply to comment #129)
> You should use the latest version of the docs.
> http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT 
> shows that issue is gone in 8.x.

  We definitely won't be requiring PostgreSQL 8.x. Having PostgreSQL not use
indexes when we used a raw int would kill us.

  For the booleans, I've done some testing, and I think that we should use integer.
Blocks: 284529
Well, another possibility is to add "::text" at the end of the parameter. I'm
not sure what caused this issue in pre-8.0, but it's likely it could be fixed by
adding a few default casts to the bugzilla database as well. Of course this is
extra work; I don't know how important small/big ints are to bugzilla.

Did you test boolean v. int or boolean v. bit? BTW, you could also get booleans
to cast automatically to ints just by adding a few casts to the database schema.
You could also change the default boolean->text casting so it returned 1 or 0.
No longer blocks: 284529
(In reply to comment #129)
> You should use the latest version of the docs.
> http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT shows
> that issue is gone in 8.x.
> 
> As for booleans/bits, it's possible that these types will eventually support
> combining multiple boolean/bit fields into a single word, which would provide
> far more efficient storage than a smallint would. I don't believe this is
> currently done, though.

That's very cool and sexy :-).
But we are currently aiming at Pg 7.3, we can't force people to use
bleeding-edge if they don't want to :-).
I would like to use booleans too, it's almost always better to use correct type
:-), but it would currently break too many things and we don't have time to fix
them all. Let's get this working with ints first (the same as we are using with
MySQL), and then we can work out if we can switch to bools later (and if we gain
anything with that ;-) ).
Blocks: 284529
(In reply to comment #132)

If you want to avoid 'bleeding edge' (which isn't really available in released
versions of PostgreSQL), then you should target 7.4.x. 7.3 is about 2 years old,
and 7.4 has substantial performance improvements (among other things).
(In reply to comment #133)
> (In reply to comment #132)
> 
> If you want to avoid 'bleeding edge' (which isn't really available in released
> versions of PostgreSQL), then you should target 7.4.x. 7.3 is about 2 years old,
> and 7.4 has substantial performance improvements (among other things).

We know that 7.2 is not working without hacks. We thing that 7.3 should work. So
that's the minimal version required. We are not encouraging people to run old
versions, but we are trying to run on as much setups as possible.
If we hit a problem which would be solved by bumping up the version, we'll
consider that, but AFAIK there is no reason to do it yet.
There's a field called 'oneemailperbug' in the attached schema that needs to be
called 'onemailperbug'. (One Mail Per Bug, not One Email Per Bug).
Attached patch V8 (obsolete) — Splinter Review
Changes:
1. Many corrections to the abstract schema.
2. Integrated POD into the code.
3. Eliminated create_table_postamble cruft. Mysql.pm overrides
_get_create_table_ddl() instead to
specify the MYISAM storage engine.
4. Eliminated the sql_type() and _get_field_ddl() methods.
5. Added the following public methods: get_type_ddl(), get_column_info(), and
get_table_columns().
Max, you asked for get_type_ddl(). The last one is for bug #247560. The second
one has no current
usefulness, but it just seemed like a good idea.

I just noticed that there are similar methods in Bugzilla::DB ... Maybe these
latter two new methods
should be called get_field_info() and get_table_fields() instead?

I'm once again tempted to create a "column" object class...

The schema data structure re-structuring will have to wait another version. I
wanted to get the
schema fixes and API changes out there ASAP.
Attachment #176022 - Attachment is obsolete: true
Yeah, the new API looks good.

Don't worry about the similar functions -- the functions in Bugzilla::DB return
the current state of the DB on the disk, while the functions in
Bugzilla::DB::Schema only return the "ideal" state.

And yeah, I can't currently imagine a need for get_column_info.

If you have any suggestions about the patch on bug 284529, though, that would be
appreciated. Right now DBI's type_info is somewhat unreliable. column_info is
also not good for figuring out if a field is a serial/auto-increment field.
However, they are the only methods that I have, at the moment, and they work
fairly decently other than those two problems.
OK, as far as comment 137 goes, I'll handle that. I have a lot of experience now
with DBI's type_info, and I know how to handle this.

If you just post a new version with the updated index stuff, it's review+, and
it goes in.
Blocks: 284845
Blocks: 284850
Blocks: 285111
Blocks: 285113
OK, based on my work in bug 285113, it would be really nice if the abstract
schema was a Class variable using our or use constant (use constant would
probably be the best, so that it's definitely read-only), and then was just
copied into the $self instance.
Attached patch V9Splinter Review
The constructor now dies instead of ThrowCodeError(). Max, please clean up the
_bz_schema() instantiation in Bugzilla::DB as we discussed.

The abstract schema is now a constant, and it has been documented. I threw in
Max's SCHEMA_VERSION constant as well.

Indexes are defined slightly differently in this version, allowing you to
specify an index TYPE for those fans of FULLTEXT indexes. And since unique
indexes are simply a type of index as well, the UNIQUEs are gone. The table
hash contains only FIELDS and INDEXES.

Storable's dclone() routine is used to copy the abstract schema before it is
made database-specific. A version of the abstract schema structure is retained
as object data as well.

All indentation nits should be addressed now, I hope.
Attachment #176112 - Attachment is obsolete: true
Comment on attachment 176707 [details] [diff] [review]
V9

Excellent. :-)

However, you changed a few fields from NULL to "NOT NULL default 0",
apparently:

bugs_activity.attach_id
components.initialowner
components.initialqacontact

I can fix that on checkin.

Also, after we check-in this patch, we should have another one to make the
current MySQL fully compliant with this one -- the only change (besides
renaming the indexes, which I've already filed a bug for) is to fix all the
tinyint(1) fields to be normally-sized tinyint fields.
Attachment #176707 - Flags: review+
Flags: approval?
Attachment #175992 - Attachment is obsolete: true
Blocks: 285345
Flags: approval? → approval+
(In reply to comment #141)
> (From update of attachment 176707 [details] [diff] [review] [edit])
> However, you changed a few fields from NULL to "NOT NULL default 0",
> apparently:
> 
> bugs_activity.attach_id
> components.initialowner
> components.initialqacontact

I don't remember changing them. I suspect they've been like that for at least a
couple versions....

> I can fix that on checkin.

OK, fine with me.
Comment on attachment 176707 [details] [diff] [review]
V9

>diff -rNu Bugzilla.HEAD/DB/Schema.pm Bugzilla/DB/Schema.pm
>--- Bugzilla.HEAD/DB/Schema.pm	Wed Dec 31 19:00:00 1969
>+++ Bugzilla/DB/Schema.pm	Tue Mar  8 02:46:03 2005

>+=item C<ABSTRACT_SCHEMA>
>+
>+The abstract database schema structure consists is a hash reference
>+in which each key is the name of a table in the Bugzilla database.

s/constists is/consists of/

>+            target_milestone    => {TYPE => 'varchar(20)',
>+                                    NOTNULL => 1, DEFAULT => "'---'"},

I'm worried with this because it's possible that not every database engine is
going to use the same quoting style (more a problem if we put something in a
default value that needs to be escaped).  But don't let that stop this... we
can cross that bridge when we come to it I guess.
Thank you so much, Ed. :-) This is a wonderful enhancement to Bugzilla.

I addressed the checkin comments.

RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v  <--  Mysql.pm
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v
done
Checking in Bugzilla/DB/Schema/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v  <--  Pg.pm
initial revision: 1.1
done
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Summary: Reusable (structured?) database-independent schema. → Reusable, structured, database-independent schema
Blocks: 285380
Blocks: 287483
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: