Last Comment Bug 531243 - Bugzilla crashes on show_bug if it's hit while a custom field is being added
: Bugzilla crashes on show_bug if it's hit while a custom field is being added
Status: RESOLVED FIXED
[wanted-bmo]
:
Product: Bugzilla
Classification: Server Software
Component: Database (show other bugs)
: 3.4.4
: All All
: -- normal (vote)
: Bugzilla 4.2
Assigned To: Frédéric Buclin
: default-qa
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-26 01:09 PST by Dave Miller [:justdave] (justdave@bugzilla.org)
Modified: 2012-10-18 16:24 PDT (History)
8 users (show)
LpSolit: approval+
LpSolit: approval4.4+
LpSolit: blocking4.4+
LpSolit: approval4.2+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch v1 (1.33 KB, patch)
2010-04-25 17:20 PDT, Dave Miller [:justdave] (justdave@bugzilla.org)
no flags Details | Diff | Splinter Review
Patch v2 (1.37 KB, patch)
2010-04-25 17:41 PDT, Dave Miller [:justdave] (justdave@bugzilla.org)
mkanat: review-
Details | Diff | Splinter Review
addfielddowntime.cgi (example) (2.06 KB, text/plain)
2011-05-05 16:14 PDT, Dave Miller [:justdave] (justdave@bugzilla.org)
no flags Details
patch, v3 (1.06 KB, patch)
2012-10-16 06:25 PDT, Frédéric Buclin
justdave: review+
Details | Diff | Splinter Review

Description Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-26 01:09:13 PST
Users were getting the following error trying to view show_bug.cgi during the 2 or 3 minutes it took to add the new cf_status_thunderbird30 column to the bugs table:

DBD::mysql::db selectrow_hashref failed: Unknown column 'cf_status_thunderbird30' in 'field list' [for Statement "
            SELECT alias,assigned_to,bug_file_loc,bug_id,bug_severity,bug_status,cclist_accessible,component_id,delta_ts,estimated_time,everconfirmed,op_sys,priority,product_id,qa_contact,remaining_time,rep_platform,reporter_accessible,resolution,short_desc,status_whiteboard,target_milestone,version,reporter    AS reporter_id,DATE_FORMAT(creation_ts, '%Y.%m.%d %H:%i') AS creation_ts,DATE_FORMAT(deadline, '%Y-%m-%d') AS deadline,cf_blocking_fennec,cf_blocking_193,cf_status_192,cf_blocking_191,cf_status_191,cf_blocking_thunderbird30,cf_status_thunderbird30 FROM bugs
             WHERE bug_id = ?"] at /data/www/bugzilla.mozilla.org/Bugzilla/Object.pm line 79

I suspect this means the field was getting added to the field list/fielddefs before the column was added to the bugs table?  Should be the other way around, right?
Comment 1 Max Kanat-Alexander 2009-11-26 04:55:16 PST
No, we need a field in order to determine the tables to create and so on. Schema changes cannot be wrapped in a transaction, they cause an automatic commit.

This may be something to document, instead of to fix (because we can't, really--at least not very easily), and it would only be noticeable on very large installations (which is the only reason I'm reducing it to normal).
Comment 2 Vitaly Fedrushkov 2009-11-26 05:23:28 PST
Perhaps we may set shutdownhtml to something relevant before non-transactional schema changes and clear it afterwards?

By 'set' I mean field adding code, not editparams.cgi
Comment 3 Max Kanat-Alexander 2009-11-26 05:40:39 PST
(In reply to comment #2)
> Perhaps we may set shutdownhtml to something relevant before non-transactional
> schema changes and clear it afterwards?
> 
> By 'set' I mean field adding code, not editparams.cgi

  No, because then all of Bugzilla would stop, and users would get logged out. :-)

  We could do a "SELECT 1 FOR UPDATE FROM bugs", though, maybe? No, that wouldn't work either, because we'd still have to be within a transaction, and the schema changes would commit the transaction.
Comment 4 Max Kanat-Alexander 2009-11-26 05:42:17 PST
  The architectural way to fix it would be to do run_create_validators on the parameters, in Field->create, and then see if it would be possible to just pass around the values to all the functions that need it, but I think that might be architecturally tough.
Comment 5 Frédéric Buclin 2009-11-26 09:04:52 PST
Note that you don't add custom fields every day, (not even every month!) and so the probability to fall into this race condition is really slim. Not a big deal, IMO (and this shouldn't require any heavy hack to work around this).
Comment 6 Max Kanat-Alexander 2009-11-26 15:42:10 PST
(In reply to comment #5)
> Note that you don't add custom fields every day, (not even every month!) and so
> the probability to fall into this race condition is really slim. Not a big
> deal, IMO (and this shouldn't require any heavy hack to work around this).

  That's basically my thought as well. If it's too architecturally hacky to work around this, then we might just want to document the fact and make sure that people add custom fields at low-use times.
Comment 7 Byron Jones ‹:glob› [PTO until 2017-01-09] 2009-11-26 16:34:09 PST
note this is largely a mysql limitation:

http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis
Comment 8 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 16:32:02 PDT
Looking at the code briefly, it seems like this would be architecturally clean to fix if we had a way to create a Bugzilla::Object in memory and then tell it to commit to the database at a later time.

Bugzilla::Field->create() could then call SUPER->create() with the flag for "create in memory only" set, and then tell it to commit to the database after it's done the rest of the field setup.
Comment 9 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 16:36:26 PDT
Yeah, this should work, because the code in Bug.pm that's crashing is actually going to the fielddefs table in the database to get the list of custom fields, and the code doing the creates doesn't actually go back to the fielddefs table in the database for anything (it references the in-memory copy of the Field object everywhere) so as long as it hasn't been committed to the fielddefs table yet, it shouldn't crash anyone while it runs.
Comment 10 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 17:12:51 PDT
Even easier, show_bug.cgi only loads *active* custom fields.  If it's initially created with the obsolete flag set, the crash is avoided.  Patch coming up in a moment.
Comment 11 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 17:20:51 PDT
Created attachment 441409 [details] [diff] [review]
Patch v1

This works around the issue by stashing a copy of the obsolete value passed on field creation, initially creating it in the database with obsolete forced to 1, then restoring the correct obsolete value after the rest of the setup is complete.

This patch is made against 3.6 branch, but applies cleanly (with offset) to both 3.4 and trunk.

This has been successfully tested on bmo's bugzilla-stage-tip
Comment 12 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 17:41:22 PDT
Created attachment 441411 [details] [diff] [review]
Patch v2

Updated patch per review comments from LpSolit on IRC.

> <@LpSolit> justdave: $fieldvalues->{'obsolete'} = $original_obsolete; is useless as you don't use $fieldvalues later

Should be $field->set_obsolete() instead.  The idea was to make the real value available in case of the of other setup routines made use of it.  Went about restoring it the wrong way.

> <@LpSolit> justdave: also, ->set_foo has no effect till you call ->update

Which should be where I originally had the set_obsolete() call.
Comment 13 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-25 17:46:14 PDT
(In reply to comment #12)
> The idea was to make the real value available in case of the of other setup
> routines made use of it.

Er, "in case any of the other".  I believe none of them actually do at the moment, it was just an attempt to future-proof it.
Comment 14 Max Kanat-Alexander 2010-04-26 17:05:25 PDT
Comment on attachment 441411 [details] [diff] [review]
Patch v2

  This is a tremendously clever idea! :-)

  There are some other areas that pull obsolete fields (search, maybe?) but I think that either they won't be affected or they just aren't as important.

>=== modified file 'Bugzilla/Field.pm'
>--- Bugzilla/Field.pm	2010-04-07 01:01:41 +0000
>+++ Bugzilla/Field.pm	2010-04-26 00:38:42 +0000
>@@ -794,10 +794,23 @@
> 
> sub create {
>     my $class = shift;
>-    my $field = $class->SUPER::create(@_);
>+    my $fieldvalues = shift;

  Nit: We usually call this $params.

>+    my $original_obsolete;

  You actually don't need to store this--you can't set obsolete to 1 on field creation (and there wouldn't be any sensible reason to do so).

>+
>+    if ($fieldvalues->{'custom'}) {
>+        $original_obsolete = $fieldvalues->{'obsolete'};
>+        # If the field is active in the fields list before all of the data
>+        # structures are created, anything accessing Bug.pm  will crash.  So

  Add an explanation that on large databases, adding the field takes a long time, so this is in fact a worthwhile piece of code.

>+        # restore the obsolete value that got stashed earlier (in memory)
>+        $field->set_obsolete($original_obsolete);

  Instead of doing set_ and then update() later, just add a manual UPDATE statement using $dbh at the end, like we do for creation_ts in Bug::create. It's somewhat architecturally odd to call update() from within create(), and it will also probably confuse some extensions.
Comment 15 Frédéric Buclin 2010-04-26 17:10:34 PDT
(In reply to comment #14)
>   You actually don't need to store this--you can't set obsolete to 1 on field
> creation (and there wouldn't be any sensible reason to do so).

That's untrue. You can mark the custom field as obsolete on field creation. And there is a sensible reason to do so: when you need to populate the field values before enabling it to other users.
Comment 16 Max Kanat-Alexander 2010-04-26 17:14:07 PDT
Oh, okay. In that case, keep the stuff that stores the old obsolete value. :-)
Comment 17 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-04-26 19:16:05 PDT
(In reply to comment #14)
>   Instead of doing set_ and then update() later, just add a manual UPDATE
> statement using $dbh at the end, like we do for creation_ts in Bug::create.
> It's somewhat architecturally odd to call update() from within create(), and it
> will also probably confuse some extensions.

Hmm, ok.  Is there something besides $field->set_obsolete() that it should be doing also, then?  Seems like that would theoretically have the same "re-entrant" problem as $field->update().
Comment 18 Max Kanat-Alexander 2010-04-26 20:11:27 PDT
See how Bugzilla::Bug::create() deals with creation_ts--we basically want to do the exact same thing here.
Comment 19 Dave Miller [:justdave] (justdave@bugzilla.org) 2011-05-05 16:09:55 PDT
This apparently got missed and didn't get committed.  Is there any other followup that needs to be done here or can this be committed as-is?  I'm guessing we only want 3.6 and forward at this point and forget about 3.4 (maybe forget about 3.6? It's not a security issue, but then again, it is a crash).

Some other tidbits about this learned from subsequent trial and error on bmo in recent days:

MySQL by default provides a 50-second automatic lock-break on InnoDB.  That is, if you read-for-update-lock a table, your lock will get broken 50 seconds after the next person asks for a read-for-update-lock on the same table.  This will happen on the bugs table any time someone submits a change to a bug.  This means if you create a custom field, and it takes 3 minutes to update the bugs table for the field, and 1 minute into your update, someone makes a change to a bug, the user making the bug change sits there for 50 seconds, then their change goes through and yours gets killed with a "lock wait timeout exceeded" error.

My *very* hacky workaround to make sure the field add completes on bmo was:
1) pull one webhead out of the pool
2) configure that webhead to filter POST requests to process_bug.cgi, post_bug.ci and attachment.cgi and send them to a downtime page instead of to the cgi (note only POST, not GET)
>        RewriteCond %{REQUEST_METHOD} =POST
>        RewriteRule ^/(attachment|process_bug|post_bug)\.cgi$ /data/www/bugzilla.mozilla.org/addfielddowntime.cgi
3) switch all traffic to that webhead on the load balancer
4) run the add field script
5) switch traffic back to the other webheads on the load balancer
6) restore the original config on the special webhead and add it back to the pool.
Comment 20 Dave Miller [:justdave] (justdave@bugzilla.org) 2011-05-05 16:14:21 PDT
Created attachment 530454 [details]
addfielddowntime.cgi (example)

Here's the addfielddowntime.cgi file referenced in the above apache config snippet, just as an example of the type of response a user who triggered it would get.
Comment 21 Max Kanat-Alexander 2011-05-07 22:57:08 PDT
Hey Dave. As far as I can see, this didn't get committed because the patch is r- and needs a new revision?
Comment 22 Dave Miller [:justdave] (justdave@bugzilla.org) 2011-05-07 23:01:55 PDT
Yeah, I didn't read closely when I made that comment, and thought I'd dropped the ball somewhere else.  A clean fix for this really needs to be a little more thorough than what I originally did here anyway.  Let me see what I can cook up here.
Comment 23 Dave Miller [:justdave] (justdave@bugzilla.org) 2012-08-17 08:40:29 PDT
glob/dkl: what's bmo doing about this currently?
Comment 24 David Lawrence [:dkl] 2012-08-17 09:04:47 PDT
We added code to Bugzilla::Field::create to update a param that disallows and bug updates during the time a custom field is being created. After, it sets the param to false and updates are allowed again.

http://bzr.mozilla.org/bmo/4.0/annotate/head:/Bugzilla/Field.pm#L980

There are alot of if (Bugzilla->params->{disabled_bug_updates}) conditions spread around the core code where updates are made.

Example:
http://bzr.mozilla.org/bmo/4.0/annotate/head:/process_bug.cgi#L99

dkl
Comment 25 Frédéric Buclin 2012-10-16 06:25:13 PDT
Created attachment 671816 [details] [diff] [review]
patch, v3

The original idea from justdave was the right way to go. This will fix the problem for most installations. Only very large installations will be affected by this problem. But for these large installations, maybe a shutdown makes sense. At least, I don't think we want something more complex than that for this issue.
Comment 26 Dave Miller [:justdave] (justdave@bugzilla.org) 2012-10-18 16:14:35 PDT
Comment on attachment 671816 [details] [diff] [review]
patch, v3

Review of attachment 671816 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this looks good.  This'll still trip if it takes too long to run, but the number of sites big enough to hit that is small enough they can probably use bmo's patch for that.
Comment 27 Frédéric Buclin 2012-10-18 16:24:52 PDT
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Field.pm
Committed revision 8442.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.4/
modified Bugzilla/Field.pm
Committed revision 8429.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/Field.pm
Committed revision 8156.

Note You need to log in before you can comment on or make changes to this bug.