Last Comment Bug 577754 - Updating bugs_fulltext for a 3.6 upgrade is really, really slow
: Updating bugs_fulltext for a 3.6 upgrade is really, really slow
Status: RESOLVED FIXED
:
Product: Bugzilla
Classification: Server Software
Component: Installation & Upgrading (show other bugs)
: 3.6
: All All
: -- normal (vote)
: Bugzilla 3.6
Assigned To: Max Kanat-Alexander
: default-qa
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-09 14:09 PDT by Max Kanat-Alexander
Modified: 2010-07-10 12:06 PDT (History)
4 users (show)
mkanat: approval+
mkanat: approval4.0+
mkanat: approval3.6+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
v1 (1.57 KB, patch)
2010-07-09 18:29 PDT, Max Kanat-Alexander
mkanat: review+
Details | Diff | Splinter Review

Description Max Kanat-Alexander 2010-07-09 14:09:04 PDT
For some reason, it's WAY faster to truncate the entire bugs_fulltext table and repopulate it than to do a DELETE/INSERT of just the affected rows when we update the attachment comments. I'm going to figure out why and put a fix into 3.6 and above.
Comment 1 Max Kanat-Alexander 2010-07-09 14:44:26 PDT
BTW, if anybody wants *really* fast bugs_fulltext updating, then around the bugs_fulltext stuff in Install::DB, but DISABLE KEYS and ENABLE KEYS. Then do a myisamchk -q -p afterward, which is super fast. (Of course, you have to shut down MySQL to do it.)
Comment 2 Max Kanat-Alexander 2010-07-09 14:47:03 PDT
  Oh, although that last recommendation may not actually work--I'm not entirely sure. I'm having some trouble on my local bmo clone that are unrelated to this bug.
Comment 3 Michael Thomas (Mockodin) 2010-07-09 16:19:52 PDT
Under mssql adding two indexes help as well. Likely they may be useful for other databases:

For the first index, longdescs could easily have millions of lines when you have a few hundred thousand bugs. The schema defines no index on isprivate at all, resulting in a potentially massive scan.

CREATE UNIQUE INDEX custom_longdescs1_idx ON longdescs
(
	[isprivate] ASC,
	[comment_id] ASC,
	[bug_id] ASC
)

CREATE INDEX custom_longdescs2_idx ON longdescs
(
	[bug_id] ASC
)

Note that there is an existing index for bug_id in the schema but it also includes bug_when, at least on mssql the optimizer prefers using the index with only bug_id during population.

Granted this doesn't help during insert comment 1 points out the best solution for performance while inserting, though under mssql you could just disable the ftindex and re-enable after, it would result in a full repopulate of the ftindex.
Comment 4 Michael Thomas (Mockodin) 2010-07-09 16:24:17 PDT
Example an instance of 200k bugs took 5.5 minutes to populate vs about 20-30 minutes before.
Comment 5 Michael Thomas (Mockodin) 2010-07-09 16:34:22 PDT
(In reply to comment #3)
> Granted this doesn't help during insert

To correct myself, it does help on the select side of the query, however combining it with the method in comment 1 will speed it up that much more.
Comment 6 Max Kanat-Alexander 2010-07-09 17:05:02 PDT
  That's true that there's no isprivate index. Good point. It's not the primary time-consumer under MySQL (the primary time consumer is rebuilding the fulltext index) but it's still something that could possibly help--though probably not under MySQL, given the restrictions on index usage.
Comment 7 Max Kanat-Alexander 2010-07-09 18:29:20 PDT
Created attachment 456654 [details] [diff] [review]
v1

Okay, this makes the updates *considerably* faster, in my testing. It should be even better than the TRUNCATE TABLE hack that justdave and I worked out, because it doesn't have to copy as much data.

I tried adding an index on isprivate, and it didn't make any difference in my tests. MySQL never used it, and there was no benchmark difference. (However, it did take quite a long time to add the index to the longdescs table.)
Comment 8 Max Kanat-Alexander 2010-07-09 18:33:37 PDT
BTW, for those curious, the other methods I tested were:

* TRUNCATE TABLE before calling _populate_bugs_fulltext() and always populating the whole table.
* Wrapping _populate_bugs_fulltext in DISABLE KEYS and ENABLE KEYS.

Both of those solutions were equally fast, and faster than what's currently in Install::DB. (Not too surprising, when you think about the fact that the time is being taken up by building the indexes, which actually happens the same way with both those methods.)

However, it's even faster to just do the REPLACE INTO, because that makes everything into a single statement, and copies over less data. So it's much like doing a new INSERT in terms of index-building performance, but with half the data (in bmo's case).

I still have to test this against a full upgrade, locally, but I've tested it against subsets of the longdescs data from bmo and it seems to have worked just fine.
Comment 9 Max Kanat-Alexander 2010-07-09 22:14:38 PDT
  Okay, I checked and verified that this works and helps quite a bit. I replaced 50,000 rows in the table in 1 minute and 30 seconds on my local machine.

  One caution though: you need to make sure that your innodb_buffer_pool is large enough, or you're in for a VERY slow upgrade. For example, when I try to add 200,000 rows, it's WAY slower. I watched and the InnoDB stats plummet when it runs out of innodb_buffer_pool space for the data it's reading. (Goes from 10,000+ reads per second to like, 40.)
Comment 10 Max Kanat-Alexander 2010-07-09 22:20:49 PDT
Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Install/DB.pm
Committed revision 7323.

Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/4.0/
modified Bugzilla/Install/DB.pm
Committed revision 7304.

Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/3.6/
modified Bugzilla/Install/DB.pm
Committed revision 7129.
Comment 11 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-07-10 12:06:16 PDT
Remind me when I get back from traveling and I can make a test run of this on the bugzilla staging server, too.

Note You need to log in before you can comment on or make changes to this bug.