Bugzilla::DB::Schema needs a way to serialize and store its abstract schema

RESOLVED FIXED in Bugzilla 2.20

Status

()

enhancement
P2
normal
RESOLVED FIXED
15 years ago
15 years ago

People

(Reporter: mkanat, Assigned: mkanat)

Tracking

2.19.2
Bugzilla 2.20
Dependency tree / graph
Bug Flags:
approval +

Details

Attachments

(1 attachment, 2 obsolete attachments)

We need a way to serialize the "abstract" schema in Bugzilla::DB::Schema.

That is, we need to be able to get at it before _adjust_schema is called (or
copy it to some other variable before that happens -- that's probably the best
solution), so that I can write that to the database and use it as the
authoritative guide to what fields are currently what type, etc.

I think that we also ought to keep a record of some "version" number of the
schema. I think that we should probably start at 1.00 and go on to 1.01, etc.
Either that, or we could use the CVS version number of the file, though I think
that's less reliable when we're writing patches.
Oh, I think we also ought to be able to serialize the specific schema (that is,
the schema translated into real stuff for this database, after _adjust_schema),
because it might come in handy some day. That's not as big of a priority, though.
Depends on: bz-dbschema
Severity: normal → enhancement
Target Milestone: --- → Bugzilla 2.20
OK, I think we should use Storable for this, since that's its purpose --
serializing and deserializing data structures.
Honestly, I think you've jumped the gun on opening this bug. I'm not sure I even
agree that there's a problem, let alone with your proposed solution.
Well, feel free to respond on the developers list or on the parent bug, if you'd
like.

There are definitely various problems that I've run into when actually
implementing things.
Summary: Bugzilla::DB::Schema needs a way to serialize its *abstract* schema and its specifc schema → Bugzilla::DB::Schema needs a way to serialize and store its abstract schema
OK, here's a way to serialize and deserialize a Schema, so that we can read off
the "current Schema" into Bugzilla::DB.

This will make the rest of dbinstall MUCH easier.
Attachment #176686 - Flags: review?(bugzilla)
This is, as you can imagine, a work-in-progress, somewhat.

store_abstract_schema() should not be in Bugzilla::DB::Schema, anymore. :-)
Status: NEW → ASSIGNED
Comment on attachment 176686 [details] [diff] [review]
Methods to serialize, store, and deserialize a schema

So, this is a little less dense than some of my other patches, Tomas, if you
want to look at it... :-D
Attachment #176686 - Flags: review?(Tomas.Kopal)
Comment on attachment 176686 [details] [diff] [review]
Methods to serialize, store, and deserialize a schema

Yeah, I love this stuff :-). I was thinking about including some version number
in the database for a long time now, but I never thought about storing the
whole schema :-). Great idea.

>+=item C<SCHEMA_VERSION>
>+
>+The 'version' of the internal schema structure. This version number
>+is incremented every time the the fundamental structure of Schema
>+internals changes. 
>+
>+This is NOT changed every time a table or a column is added. This 
>+number is incremented only if the internal structures of this 
>+Schema would be incompatible with the internal structures of a 
>+previous Schema version.
>+
>+=begin undocumented
>+
>+As a guideline for whether or not to change this version number,
>+you should think, "Will my changes make this structure fundamentally
>+incompatible with the old structure?" Think about serialization
>+of the data structures, because that's what this version number
>+is used for.
>+
>+You should RARELY need to increment this version number.
>+
>+=end undocumented

It looks you really tried to explain it, but I am still not clear when I need
to update the version number and when not. This needs to be crystal clear to
everyone (including me ;-) ), otherwise we are in trouble. Please, try a bit
harder :-). Maybe some examples would help?

>     my $self = shift;
>+    my $schema = shift;
> 
>     # Define the Bugzilla abstract database schema.
>-    $self->{schema} = {
>+    $schema ||= {
> 
>      # BUG-RELATED TABLES
>      # ------------------

Nit: Indentation.

>+sub deserialize_abstract {
>+    my ($serialized, $version) = @_;
>+
>+    my %thawed_hash = thaw($serialized);
>+
>+    # At this point, we have no backwards-compatibility
>+    # code to write, so $version is ignored.

What do you intend to do here? Do you want to move the DB upgrade code here? If
yes, we need BIG delimiters and comments around :-). How do you plan to use the
version number?

>+=item C<store_abstract_schema()>
>+
>+ Description: Stores an abstract database schema in the database
>+              for later recovery.
>+ Params:      $schema (optional) - hashref. The schema that we should
>+              store. If omitted, we store the current schema.
>+ Returns:     nothing
>+
>+=cut
>+
>+sub store_abstract_schema {
>+    my ($self, $schema) = @_;
>+
>+    $schema ||= $self->{_abstract_schema};
>+
>+    my $store_me = _serialize_abstract(
>+}

Something missing here? ;-)

>+=begin undocumented
>+
>+=item C<_bz_real_schema()>
>+
>+ Description: Returns the current Schema for the real database
>+              on the disk, as opposed to the "ideal" schema
>+              returned by _bz_schema. Note that only the "abstract"
>+              parts of _bz_real_schema are guaranteed to be accurate --
>+              the "database-specific" parts may not work properly.

I do not like this comment. What other database we have than a real one on the
disk :-)? Maybe something like "curently used database we are going to upgrade"
or something in that direction? Better yet, why not say directly that this will
recover the abstract schema of the database which was used for the current
installation?

>+=item C<_bz_store_real_schema()>
>+
>+ Description: Stores the _bz_real_schema structures in the database
>+              for later recovery. Call this function whenever you make
>+              a change to the _bz_real_schema.

Again, I don't understand who and when should be calling this function.
Wouldn't it be better to say something like "Call this function whenewer you do
a change to the database schema (both in the actual database and the abstract
schema structure)"?
I am probably just missing the big picture of how all this fits together. Maybe
doing some short top-level description of the overall design would be helpful?
(Maybe on devel mailing list for broader audience and review).

>+sub _bz_store_real_schema {
>+    my ($self) = @_;
>+
>+    # We want to store the current object, not one
>+    # that we read from the database. So we use the actual hash
>+    # member instead of the subroutine call.
>+    my $store_me = $self->{_bz_real_schema}->serialize_abstract();
>+    $self->do("UPDATE bz_schema SET version = ?, schema_data = ?",
>+              undef, Bugzilla::DB::Schema::SCHEMA_VERSION, $store_me);

If the schema does not exists yet, this will not update anything. Better would
be DELETE followed by INSERT.
Attachment #176686 - Flags: review?(Tomas.Kopal) → review-
OK, now that I've explained the whole plan in bug 285111 comment 5, I'll explain
here how serialization works:

  Basically, we store the $self->{schema} hash as a binary object into a blob
field. Along with it, we store a version number. We store this number because
one should ALWAYS store a version number along with anything you serialize --
you'll need it in case you need to deserialize in a strange new world where
everything is different.

  An example of the use of the version number:

  Today, we store all individual columns like this:

  column_name => { TYPE => 'TYPE', NOTNULL => 1 }

  Imagine that someday we decide that NOTNULL is bad, and we want to change it
to NULL => 0.

  But we have a bunch of Bugzilla installations around the world with a
serialized schema that has NOTNULL in it! When we deserialize that structure, it
just WILL NOT WORK properly inside of our new Schema object. So, immediately
after deserializing, we need to go through the hash and change all NOTNULLs to
NULLs and so on.

  We know that we need to do that because we know that version 1.00 used
NOTNULL. Having made the change to NULL, we would now be version 1.01.

  To serialize and deserialize, we use Storable, the mechanism in perl intended
for exactly this purpose.
(In reply to comment #8)

  Yeah, thanks for the feedback on my docs. That's really what I needed, was to
write them and then have somebody explain to me how they needed to be clarified.
It was difficult for me, with all this serialization magic in my head, to
clearly explain to somebody unless they told me what about it they didn't
understand.

> >+    $self->do("UPDATE bz_schema SET version = ?, schema_data = ?",
> >+              undef, Bugzilla::DB::Schema::SCHEMA_VERSION, $store_me);
> 
> If the schema does not exists yet, this will not update anything. Better would
> be DELETE followed by INSERT.

  Yeah, I've thought about that. I was just worried as to what would happen if
for some reason we could DELETE but we couldn't INSERT; then we'd lose the whole
thing. Too bad that we can't do it inside of a transaction... :-)

  I suppose that I could go for DELETE/INSERT now, with an XXX comment saying
that we should use transactions when we support them.
(In reply to comment #10)
>   Yeah, I've thought about that. I was just worried as to what would happen if
> for some reason we could DELETE but we couldn't INSERT; then we'd lose the whole
> thing. Too bad that we can't do it inside of a transaction... :-)
> 
>   I suppose that I could go for DELETE/INSERT now, with an XXX comment saying
> that we should use transactions when we support them.

If you add some unique column (either version number, but then you need to
increment it on every update, which is probably not what you want, or you can
add something like serial number, last modified date etc.), then you can do it
in the opposite order - first insert the new record, and if and only if
successful, you delete the old one. Then if the delete fails, you can make the 
code reading from the table to get always the latest, newest record (ORDER BY +
LIMIT 1). That should be bullet proof :-).
Which comes to the bigger picture - what if you modify the actual database
structure and then the serailisation fails (e.g. can't insert)? Will you somehow
"undo" the change? Again, no transactions :-(. I think it's very difficult make
this completely safe...
Attachment #176686 - Flags: review?(bugzilla)
Posted patch v2 (obsolete) — Splinter Review
OK, how about this one?

I've also fixed some of the internal comments in the Schema that got
checked-in, because they need it, and this is an ideal time to do it.
Attachment #176686 - Attachment is obsolete: true
Attachment #176827 - Flags: review?(Tomas.Kopal)
Priority: -- → P2
Depends on: 285443
By the way, I thought I'd point out that Bugzilla::DB::Schema apparently no longer needs to
"use Bugzilla::Error", so you might want to nix that in the patch you are working on here even
though it has nothing to do with serialization, I guess.
Comment on attachment 176827 [details] [diff] [review]
v2

>Index: Bugzilla/DB.pm
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v
>retrieving revision 1.27
>diff -u -r1.27 DB.pm
>--- Bugzilla/DB.pm	8 Mar 2005 23:23:30 -0000	1.27
>+++ Bugzilla/DB.pm	9 Mar 2005 04:01:17 -0000
>@@ -521,9 +521,69 @@
>     return $self;
> }
> 
>+#####################################################################
>+# Private Methods
>+#####################################################################
>+
>+=begin private
>+
>+=head1 PRIVATE METHODS
>+
>+These methods really are private. Do not override them in subclasses.
>+
>+=over 4

Nit: I haven't found any matching =back for this.

>+sub _bz_store_real_schema {
>+    my ($self) = @_;
>+
>+    # We want to store the current object, not one
>+    # that we read from the database. So we use the actual hash
>+    # member instead of the subroutine call. If the hash
>+    # member is not defined, we will (and should) fail.
>+    my $store_me = $self->{_bz_real_schema}->serialize_abstract();
>+    $self->do("UPDATE bz_schema SET schema_data = ?, version = ?",
>+              undef, $store_me, Bugzilla::DB::Schema::SCHEMA_VERSION);
>+}

We really need to change this to work on new installations, with this table
empty. This is a showstopper for this patch, sorry.

>Index: Bugzilla/DB/Schema.pm
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v
>retrieving revision 1.1
>diff -u -r1.1 Schema.pm
>--- Bugzilla/DB/Schema.pm	9 Mar 2005 01:46:29 -0000	1.1
>+++ Bugzilla/DB/Schema.pm	9 Mar 2005 04:01:17 -0000
>@@ -978,15 +1012,25 @@
>               define the database-specific implementation of the all
>               abstract data types), and then call the C<_adjust_schema>
>               method.
>- Parameters:  none
>+ Parameters:  $abstract_schema (optional) - A reference to a hash. If 
>+                  provided, this hash will be used as the internal
>+                  representation of the abstract schema instead of our
>+                  default abstract schema. This is intended for internal 
>+                  use only by deserialize_abstract.
>  Returns:     the instance of the Schema class
> 
> =cut
> 
>     my $self = shift;
>+    my $abstract_schema = shift;

Nit: I would prefer syntax "my ($self, $abstract_schema) = @_;" here.

>+A field hash reference should must contain the key C<TYPE>. Optional field
>+keys include C<PRIMARYKEY>, C<NOTNULL>, and C<DEFAULT>. 

Should, or must? :-)

>+ Description: Takes any format ever returned by any version
>+              of serialize_abstract and returns a Schema object
>+              representing the serialized data instead
>+              of the modern data.

Do not talk about modern data here, the function does not have anything to do
with modern data. What about "...serialized data as stored for the schema for
current installation." or something in that sense?

Otherwise, it looks good :-).
Attachment #176827 - Flags: review?(Tomas.Kopal) → review-
Posted patch v3Splinter Review
Attachment #176827 - Attachment is obsolete: true
Attachment #177112 - Flags: review?(Tomas.Kopal)
Attachment #177112 - Flags: review?(edwardjsabol)
Blocks: 285713
Blocks: 285723
Depends on: 285740
Comment on attachment 177112 [details] [diff] [review]
v3

>+        die "Attempted to initialize the schema but there are already "
>+            . " $table_size copies of it stored.\nThis should never happen.\n"
>+            . " Compare the two rows of the bz_schema table and delete the "
>+            . "newer one.";

Nit: what if there are three ;-)? "...newer one(s)."?

>+=head1 SERIALIZATION/DESERIALIZATION
>+
>+=item C<serialize_abstract()>

Nit: =over 4 between these two lines (and =back later)?

Otherwise, it's perfect :-).
Attachment #177112 - Flags: review?(Tomas.Kopal) → review+
Blocks: 286527
Blocks: 285748
OK, here's what I want to do. I want to check in this code, but I want to
effectively disable it by commenting-out the call to _bz_init_schema_storage().

The reason that I want to check it in now is that I'm getting REALLY far into
this branch, and I'm starting to worry a lot about bitrot.

This code *has* to go in for 2.20 anyhow.
Flags: approval?
Attachment #177112 - Flags: review?(edwardjsabol)
This appears to have a dependency on Storable that we're not checking for in
checksetup.pl.  We still support Perl 5.6.1, and it doesn't ship with Storable
as part of core.
Flags: approval? → approval+
er, that approval was accidental, but this is okay I guess, since that
dependency was already there before this patch.  Let's get a new bug to make
sure checksetup.pl is checking for Storable.
Storable issue filed as bug 286669.

Checking in Bugzilla/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v  <--  DB.pm
new revision: 1.34; previous revision: 1.33
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.9; previous revision: 1.8
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v  <--  Mysql.pm
new revision: 1.2; previous revision: 1.1
done
Checking in Bugzilla/DB/Schema/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v  <--  Pg.pm
new revision: 1.4; previous revision: 1.3
done
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Oh, I forgot to fix the nits. Did it:

Checking in Bugzilla/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v  <--  DB.pm
new revision: 1.35; previous revision: 1.34
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.10; previous revision: 1.9
done
You need to log in before you can comment on or make changes to this bug.