Closed Bug 531243 Opened 15 years ago Closed 12 years ago

Bugzilla crashes on show_bug if it's hit while a custom field is being added

Categories

(Bugzilla :: Database, defect)

3.4.4
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 4.2

People

(Reporter: justdave, Assigned: LpSolit)

Details

(Whiteboard: [wanted-bmo])

Attachments

(1 file, 3 obsolete files)

Users were getting the following error trying to view show_bug.cgi during the 2 or 3 minutes it took to add the new cf_status_thunderbird30 column to the bugs table:

DBD::mysql::db selectrow_hashref failed: Unknown column 'cf_status_thunderbird30' in 'field list' [for Statement "
            SELECT alias,assigned_to,bug_file_loc,bug_id,bug_severity,bug_status,cclist_accessible,component_id,delta_ts,estimated_time,everconfirmed,op_sys,priority,product_id,qa_contact,remaining_time,rep_platform,reporter_accessible,resolution,short_desc,status_whiteboard,target_milestone,version,reporter    AS reporter_id,DATE_FORMAT(creation_ts, '%Y.%m.%d %H:%i') AS creation_ts,DATE_FORMAT(deadline, '%Y-%m-%d') AS deadline,cf_blocking_fennec,cf_blocking_193,cf_status_192,cf_blocking_191,cf_status_191,cf_blocking_thunderbird30,cf_status_thunderbird30 FROM bugs
             WHERE bug_id = ?"] at /data/www/bugzilla.mozilla.org/Bugzilla/Object.pm line 79

I suspect this means the field was getting added to the field list/fielddefs before the column was added to the bugs table?  Should be the other way around, right?
No, we need a field in order to determine the tables to create and so on. Schema changes cannot be wrapped in a transaction, they cause an automatic commit.

This may be something to document, instead of to fix (because we can't, really--at least not very easily), and it would only be noticeable on very large installations (which is the only reason I'm reducing it to normal).
Severity: major → normal
Perhaps we may set shutdownhtml to something relevant before non-transactional schema changes and clear it afterwards?

By 'set' I mean field adding code, not editparams.cgi
(In reply to comment #2)
> Perhaps we may set shutdownhtml to something relevant before non-transactional
> schema changes and clear it afterwards?
> 
> By 'set' I mean field adding code, not editparams.cgi

  No, because then all of Bugzilla would stop, and users would get logged out. :-)

  We could do a "SELECT 1 FOR UPDATE FROM bugs", though, maybe? No, that wouldn't work either, because we'd still have to be within a transaction, and the schema changes would commit the transaction.
  The architectural way to fix it would be to do run_create_validators on the parameters, in Field->create, and then see if it would be possible to just pass around the values to all the functions that need it, but I think that might be architecturally tough.
Note that you don't add custom fields every day, (not even every month!) and so the probability to fall into this race condition is really slim. Not a big deal, IMO (and this shouldn't require any heavy hack to work around this).
(In reply to comment #5)
> Note that you don't add custom fields every day, (not even every month!) and so
> the probability to fall into this race condition is really slim. Not a big
> deal, IMO (and this shouldn't require any heavy hack to work around this).

  That's basically my thought as well. If it's too architecturally hacky to work around this, then we might just want to document the fact and make sure that people add custom fields at low-use times.
Looking at the code briefly, it seems like this would be architecturally clean to fix if we had a way to create a Bugzilla::Object in memory and then tell it to commit to the database at a later time.

Bugzilla::Field->create() could then call SUPER->create() with the flag for "create in memory only" set, and then tell it to commit to the database after it's done the rest of the field setup.
Yeah, this should work, because the code in Bug.pm that's crashing is actually going to the fielddefs table in the database to get the list of custom fields, and the code doing the creates doesn't actually go back to the fielddefs table in the database for anything (it references the in-memory copy of the Field object everywhere) so as long as it hasn't been committed to the fielddefs table yet, it shouldn't crash anyone while it runs.
Even easier, show_bug.cgi only loads *active* custom fields.  If it's initially created with the obsolete flag set, the crash is avoided.  Patch coming up in a moment.
Attached patch Patch v1 (obsolete) — Splinter Review
This works around the issue by stashing a copy of the obsolete value passed on field creation, initially creating it in the database with obsolete forced to 1, then restoring the correct obsolete value after the rest of the setup is complete.

This patch is made against 3.6 branch, but applies cleanly (with offset) to both 3.4 and trunk.

This has been successfully tested on bmo's bugzilla-stage-tip
Assignee: database → justdave
Attachment #441409 - Flags: review?(mkanat)
Attached patch Patch v2 (obsolete) — Splinter Review
Updated patch per review comments from LpSolit on IRC.

> <@LpSolit> justdave: $fieldvalues->{'obsolete'} = $original_obsolete; is useless as you don't use $fieldvalues later

Should be $field->set_obsolete() instead.  The idea was to make the real value available in case of the of other setup routines made use of it.  Went about restoring it the wrong way.

> <@LpSolit> justdave: also, ->set_foo has no effect till you call ->update

Which should be where I originally had the set_obsolete() call.
Attachment #441409 - Attachment is obsolete: true
Attachment #441411 - Flags: review?(mkanat)
Attachment #441409 - Flags: review?(mkanat)
(In reply to comment #12)
> The idea was to make the real value available in case of the of other setup
> routines made use of it.

Er, "in case any of the other".  I believe none of them actually do at the moment, it was just an attempt to future-proof it.
Comment on attachment 441411 [details] [diff] [review]
Patch v2

  This is a tremendously clever idea! :-)

  There are some other areas that pull obsolete fields (search, maybe?) but I think that either they won't be affected or they just aren't as important.

>=== modified file 'Bugzilla/Field.pm'
>--- Bugzilla/Field.pm	2010-04-07 01:01:41 +0000
>+++ Bugzilla/Field.pm	2010-04-26 00:38:42 +0000
>@@ -794,10 +794,23 @@
> 
> sub create {
>     my $class = shift;
>-    my $field = $class->SUPER::create(@_);
>+    my $fieldvalues = shift;

  Nit: We usually call this $params.

>+    my $original_obsolete;

  You actually don't need to store this--you can't set obsolete to 1 on field creation (and there wouldn't be any sensible reason to do so).

>+
>+    if ($fieldvalues->{'custom'}) {
>+        $original_obsolete = $fieldvalues->{'obsolete'};
>+        # If the field is active in the fields list before all of the data
>+        # structures are created, anything accessing Bug.pm  will crash.  So

  Add an explanation that on large databases, adding the field takes a long time, so this is in fact a worthwhile piece of code.

>+        # restore the obsolete value that got stashed earlier (in memory)
>+        $field->set_obsolete($original_obsolete);

  Instead of doing set_ and then update() later, just add a manual UPDATE statement using $dbh at the end, like we do for creation_ts in Bug::create. It's somewhat architecturally odd to call update() from within create(), and it will also probably confuse some extensions.
Attachment #441411 - Flags: review?(mkanat) → review-
(In reply to comment #14)
>   You actually don't need to store this--you can't set obsolete to 1 on field
> creation (and there wouldn't be any sensible reason to do so).

That's untrue. You can mark the custom field as obsolete on field creation. And there is a sensible reason to do so: when you need to populate the field values before enabling it to other users.
Oh, okay. In that case, keep the stuff that stores the old obsolete value. :-)
(In reply to comment #14)
>   Instead of doing set_ and then update() later, just add a manual UPDATE
> statement using $dbh at the end, like we do for creation_ts in Bug::create.
> It's somewhat architecturally odd to call update() from within create(), and it
> will also probably confuse some extensions.

Hmm, ok.  Is there something besides $field->set_obsolete() that it should be doing also, then?  Seems like that would theoretically have the same "re-entrant" problem as $field->update().
See how Bugzilla::Bug::create() deals with creation_ts--we basically want to do the exact same thing here.
Whiteboard: [wanted-bmo]
This apparently got missed and didn't get committed.  Is there any other followup that needs to be done here or can this be committed as-is?  I'm guessing we only want 3.6 and forward at this point and forget about 3.4 (maybe forget about 3.6? It's not a security issue, but then again, it is a crash).

Some other tidbits about this learned from subsequent trial and error on bmo in recent days:

MySQL by default provides a 50-second automatic lock-break on InnoDB.  That is, if you read-for-update-lock a table, your lock will get broken 50 seconds after the next person asks for a read-for-update-lock on the same table.  This will happen on the bugs table any time someone submits a change to a bug.  This means if you create a custom field, and it takes 3 minutes to update the bugs table for the field, and 1 minute into your update, someone makes a change to a bug, the user making the bug change sits there for 50 seconds, then their change goes through and yours gets killed with a "lock wait timeout exceeded" error.

My *very* hacky workaround to make sure the field add completes on bmo was:
1) pull one webhead out of the pool
2) configure that webhead to filter POST requests to process_bug.cgi, post_bug.ci and attachment.cgi and send them to a downtime page instead of to the cgi (note only POST, not GET)
>        RewriteCond %{REQUEST_METHOD} =POST
>        RewriteRule ^/(attachment|process_bug|post_bug)\.cgi$ /data/www/bugzilla.mozilla.org/addfielddowntime.cgi
3) switch all traffic to that webhead on the load balancer
4) run the add field script
5) switch traffic back to the other webheads on the load balancer
6) restore the original config on the special webhead and add it back to the pool.
Attached file addfielddowntime.cgi (example) (obsolete) —
Here's the addfielddowntime.cgi file referenced in the above apache config snippet, just as an example of the type of response a user who triggered it would get.
Hey Dave. As far as I can see, this didn't get committed because the patch is r- and needs a new revision?
Yeah, I didn't read closely when I made that comment, and thought I'd dropped the ball somewhere else.  A clean fix for this really needs to be a little more thorough than what I originally did here anyway.  Let me see what I can cook up here.
Flags: blocking4.4+
Target Milestone: --- → Bugzilla 4.2
glob/dkl: what's bmo doing about this currently?
We added code to Bugzilla::Field::create to update a param that disallows and bug updates during the time a custom field is being created. After, it sets the param to false and updates are allowed again.

http://bzr.mozilla.org/bmo/4.0/annotate/head:/Bugzilla/Field.pm#L980

There are alot of if (Bugzilla->params->{disabled_bug_updates}) conditions spread around the core code where updates are made.

Example:
http://bzr.mozilla.org/bmo/4.0/annotate/head:/process_bug.cgi#L99

dkl
Attached patch patch, v3Splinter Review
The original idea from justdave was the right way to go. This will fix the problem for most installations. Only very large installations will be affected by this problem. But for these large installations, maybe a shutdown makes sense. At least, I don't think we want something more complex than that for this issue.
Assignee: justdave → LpSolit
Attachment #441411 - Attachment is obsolete: true
Attachment #530454 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #671816 - Flags: review?(justdave)
Comment on attachment 671816 [details] [diff] [review]
patch, v3

Review of attachment 671816 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this looks good.  This'll still trip if it takes too long to run, but the number of sites big enough to hit that is small enough they can probably use bmo's patch for that.
Attachment #671816 - Flags: review?(justdave) → review+
Flags: approval4.4+
Flags: approval4.2+
Flags: approval+
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Field.pm
Committed revision 8442.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.4/
modified Bugzilla/Field.pm
Committed revision 8429.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/Field.pm
Committed revision 8156.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: