Last Comment Bug 286625 - ./collectstats.pl --regenerate is enormously slow
: ./collectstats.pl --regenerate is enormously slow
Status: RESOLVED FIXED
: perf
Product: Bugzilla
Classification: Server Software
Component: Query/Bug List (show other bugs)
: 2.19.2
: All All
: -- normal (vote)
: Bugzilla 3.4
Assigned To: Max Kanat-Alexander
: default-qa
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-03-17 13:28 PST by Shane H. W. Travis
Modified: 2010-10-18 14:36 PDT (History)
6 users (show)
LpSolit: approval+
LpSolit: approval3.4+
myk: blocking2.20-
justdave: blocking2.18.1-
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Code patch for tip (2.26 KB, patch)
2005-03-17 14:33 PST, Shane H. W. Travis
bugreport: review+
LpSolit: review-
mkanat: review-
Details | Diff | Splinter Review
Code patch for 2.18 (1.08 KB, patch)
2005-03-17 14:33 PST, Shane H. W. Travis
bugreport: review+
LpSolit: review-
Details | Diff | Splinter Review
additional patch to fix breakage for 2.18 (1.32 KB, patch)
2005-03-22 15:19 PST, Joel Peshkin
no flags Details | Diff | Splinter Review
additional patch to fox breakage of 2.19 (1.30 KB, patch)
2005-03-22 15:31 PST, Joel Peshkin
no flags Details | Diff | Splinter Review
Work In Progress (4.74 KB, patch)
2009-08-19 00:20 PDT, Max Kanat-Alexander
no flags Details | Diff | Splinter Review
v1 (7.18 KB, patch)
2009-08-19 03:52 PDT, Max Kanat-Alexander
LpSolit: review+
Details | Diff | Splinter Review

Description Shane H. W. Travis 2005-03-17 13:28:30 PST
Doing most anything with the bugs_activity table is interminably slow, because 
the existing indices don't cover the added/removed fields, which are what 
really need help the most.

For example: running 'collectstats.pl --regenerate' on my local database with 
these indices took under 5 hours; before putting the indices on, I let it run 
for 35 hours before I stopped it, and it did not yet look to be more than 
halfway done.

Patch to follow.
Comment 1 Shane H. W. Travis 2005-03-17 14:33:15 PST
Created attachment 177773 [details] [diff] [review]
Code patch for tip

Adds indices to existing and new tables; tip version
Comment 2 Shane H. W. Travis 2005-03-17 14:33:49 PST
Created attachment 177774 [details] [diff] [review]
Code patch for 2.18

Adds indices to existing and new tables; 2.18 version
Comment 3 Joel Peshkin 2005-03-17 19:24:48 PST
Comment on attachment 177773 [details] [diff] [review]
Code patch for tip

r=joel by inspection
Comment 4 Joel Peshkin 2005-03-17 19:25:25 PST
Comment on attachment 177774 [details] [diff] [review]
Code patch for 2.18

r=joel by inspection
Comment 5 Frédéric Buclin 2005-03-22 13:02:56 PST
travis, assuming you are on vacation this week, I'm commiting this patch myself. ;)


Tip:

Checking in checksetup.pl;
/cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v  <--  checksetup.pl
new revision: 1.376; previous revision: 1.375
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.15; previous revision: 1.14
done


2.18 branch:

Checking in checksetup.pl;
/cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v  <--  checksetup.pl
new revision: 1.289.2.29; previous revision: 1.289.2.28
done
Comment 6 Joel Peshkin 2005-03-22 15:19:49 PST
Created attachment 178297 [details] [diff] [review]
additional patch to fix breakage for 2.18
Comment 7 Frédéric Buclin 2005-03-22 15:23:56 PST
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_idfield' at
./checksetup.pl line 3873
Comment 8 Joel Peshkin 2005-03-22 15:31:48 PST
Created attachment 178299 [details] [diff] [review]
additional patch to fox breakage of 2.19
Comment 9 Frédéric Buclin 2005-03-22 15:53:33 PST
Both patche have been backed out. checksetup.pl fails with the following error
messages:

Tip:
Adding more indexes for bugs_activity table.
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_idfield' at
./checksetup.pl line 3873


2.18:

Adding more indexes for bugs_activity tables.
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_idfield' at
./checksetup.pl line 4531.
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_added' at
./checksetup.pl line 4532.
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_removed' at
./checksetup.pl line 4533.
DBD::mysql::db do failed: Duplicate key name 'bugs_activity_added_removed' at
./checksetup.pl line 4534.


Moreover, I observe a index name inconsistency:

+$dbh->do("ALTER TABLE bugs_activity ADD INDEX bugs_activity_idfield (bug_id,
fieldid)");
+            bugs_activity_bug_field_idx      => [qw(bug_id fieldid)],

Comment 10 Frédéric Buclin 2005-03-22 15:54:34 PST
Comment on attachment 177773 [details] [diff] [review]
Code patch for tip

per my previous comment
Comment 11 Max Kanat-Alexander 2005-03-22 16:15:40 PST
Comment on attachment 177773 [details] [diff] [review]
Code patch for tip

Also, could you explain why you're adding three more fulltext indexes?

Fulltext indexes will cause us problems in the future, because we can't have
transactional (InnoDB) tables with fulltext indexes in them.

I don't fully understand why we would be *searching* on added and removed, but
I'm sure there's some good reason?
Comment 12 Joel Peshkin 2005-03-22 16:34:44 PST
I can see why you would want an index on (fieldid, added) and (fieldid,
removed), but I cannot see why those would need to be fulltext.

Please try some benchmarks using (fieldid, added) and (fieldid, removed) or,
possibly, (fieldid, added, bugid)(fieldid, removed, bugid) 
or (fieldid, leftmost 10 chars of added), etc....

Comment 13 Max Kanat-Alexander 2005-03-22 23:14:47 PST
You know, looking over the general Bugzilla code, and the regenerate option of
collectstats, it looks like we just need the following multi-column index to
handle the regenerate stuff:

bugs_activity(bug_id, bug_when, fieldid)

We could then also drop the bug_id index, or just replace it with that one.

An examination of Bugzilla code shows me that we should get performance gains
elsewhere in Bugzilla with that multi-column index.

I don't think that anything else is actually necessary for 2.18.1, at the least.

Also, the indexes in the patch on this bug don't include _idx on the end of
their name, which they should.
Comment 14 Max Kanat-Alexander 2005-03-22 23:21:44 PST
Oh, actually, actual experimentation with EXPLAIN shows that the index created
in this bug is actually correct, not the index that I just stated. So the idea
index is:

bugs_activity(bug_id, fieldid)

And we can eliminate the bug_id index, because we can always use that one instead.

But that's the only index needed to fix the collectstats problem.
Comment 15 Myk Melez [:myk] [@mykmelez] 2005-04-21 15:11:47 PDT
"If it's not a regression from 2.18 and it's not a critical problem with
something that's already landed, let's push it off." - Dave
Comment 16 Frédéric Buclin 2005-09-13 06:13:57 PDT
Not a security bug, retargetting to 2.20.
Comment 17 Frédéric Buclin 2005-11-17 09:03:48 PST
travis has gone, and nothing is broken here, just slow => retargetting
Comment 18 Frédéric Buclin 2006-10-17 13:09:50 PDT
We are freezing the code for 3.0 in two weeks and we don't expect this bug to be fixed on time.
Comment 19 Max Kanat-Alexander 2009-08-19 00:20:44 PDT
Created attachment 395272 [details] [diff] [review]
Work In Progress

This is a work in progress, including some debugging code.

As usual, what is ACTUALLY slow is not what you might think.
Comment 20 Max Kanat-Alexander 2009-08-19 03:52:20 PDT
Created attachment 395288 [details] [diff] [review]
v1

This makes --regenerate take minutes instead of hours or days.
Comment 21 Max Kanat-Alexander 2009-08-19 03:55:30 PDT
This patch also fixes a problem where --regenerate wasn't correctly counting the empty resolution. (I believe normal collectstats.pl also has this bug, and we can address it separately somewhere.)
Comment 22 Frédéric Buclin 2009-08-31 04:23:33 PDT
Comment on attachment 395288 [details] [diff] [review]
v1

Nice catch about the empty resolution. Old data is indeed wrong as the number of closed bugs doesn't match the number of bugs having a resolution. Also, the perf improvement is huge. IMO, both are valid reasons to take this patch for 3.4. r=LpSolit
Comment 23 Frédéric Buclin 2009-08-31 04:25:28 PDT
Let's take it on the 3.4 branch as it fixes a bug. And the huge perf problem can also be seen as a bug.
Comment 24 Max Kanat-Alexander 2009-08-31 14:17:49 PDT
tip:

Checking in collectstats.pl;
/cvsroot/mozilla/webtools/bugzilla/collectstats.pl,v  <--  collectstats.pl
new revision: 1.70; previous revision: 1.69
done

3.4:

Checking in collectstats.pl;
/cvsroot/mozilla/webtools/bugzilla/collectstats.pl,v  <--  collectstats.pl
new revision: 1.68.2.2; previous revision: 1.68.2.1
done
Comment 25 Max Kanat-Alexander 2010-10-18 14:36:31 PDT
Added to the release notes in bug 604256.

Note You need to log in before you can comment on or make changes to this bug.