User wants to search for all bugs they've commented on in a particular date range so they enter their email address in search by people, tick the commenter box and add a couple of custom search terms "Comment" "changed after" and "Comment" "changed before". The resulting search takes quite a while and show up in the mysql slow-query log. Running the slow query directly gives a time of 7m26s. Mysqladmin processlist shows "Copying to tmp table." I notice that there is a sub SELECT statement that returns a record containing bugid and isprivate for each time the user has commented. A lot of this is duplicate info. Changing from SELECT to SELECT DISTINCT for the subquery (Bugzilla/Search.pm line 2266) dramatically reduces the amount of data returned and the query completes in less than 4 seconds. The user says this query used to be much faster in 3.0.x but I've not yet created a test environment to compare the SQL queries.
I can confirm that this makes a huge difference on my local test installation.
Although not a massive improvement as others have reported, it does improve things somewhat on my test install of BMO 4.2 with DISTINCT: 1.89s 1.94s 1.89s ---- 1.90s average without DISTINCT: 2.07s 2.09s 2.10s ----- 2.08s average
I ran the following search on a Bugzilla installation with only 1250 bugs: In the "Search By People" section: a commenter contains lpsolit@.... In the "Custom Search" section: comment changed before 2012-12-12 + comment changed after 2003-05-19 + flag setter is equal to lpsolit@... With the patch applied, it returns 293 bugs in 3 seconds. Without the patch, I had to kill the query in MySQL directly because it didn't complete after 4 minutes. This is significant enough to be backported.
Created attachment 688503 [details] [diff] [review] patch, v1
Comment on attachment 688503 [details] [diff] [review] patch, v1 Review of attachment 688503 [details] [diff] [review]: ----------------------------------------------------------------- r=dkl
Fascinating! I tried this on our production data set with help from Frédéric: [11:50:15] <sheeri> slow query took 1 min 11.48 sec the first time, I'm trying it again. [11:50:19] <sheeri> then I'll try it with DISTINCT :D [11:52:25] <sheeri> 1 min 4.79 seconds the 2nd time [11:53:22] <sheeri> 27.02 seconds the 1st time using DISTINCT [11:53:42] <sheeri> 2nd time using DISTINCT, 25.74 secods Avg time for the first query (without distinct) is 68.135 seconds. Avg time for the second query (with distinct) is 26.38 seconds. That's a 60% improvement. Mr Aardvark, I'm curious as to what made you try DISTINCT, since even the EXPLAIN looks like it would be worse with DISTINCT in there.
It appears that the older query is faster because the MySQL Optimizer chose to put the bugs table first, and not one of the derived tables as in the slower query. If the Optimizer is allowed to choose either of the derived tables for the 1st table by cost, the query will take significantly longer (although the optimizer thinks that will be faster). For MySQL Versions only, (and this makes me wonder if this occurs in Oracle or PGSQL installations) the solution is to structure the join order and tell the optimizer what to use instead of letting it decide for itself. The way we can do this is by using SELECT STRAIGHT_JOIN ...and ensuring all the tables are in the optimal order in the "FROM" portion of the query. This may take a few tries and the use of EXPLAIN to identify proper order for mapping, but will significantly speed up the query. Note that the indicator of DISTINCT speeding up a large result set query is a clear identifier that the Optimizer is choosing table order poorly.
Sheeri, I chose DISTINCT because it was one of the few spanners in my limited SQL toolkit :) I ran the sub select on its own and it returned 6000+ rows with loads of duplicates. I'm think it returned bugid & isprivate for each time the user commented on the bug which seemed unnecessary. Adding the DISTINCT dropped it to ~1000 rows. That probably made the difference between writing the tmp table to disk and having it in heap.
I tested this patch on PostgreSQL, and the perf win is also significant. Results are similar to what I got in comment 3. Committing to: bzr+ssh://email@example.com/bugzilla/trunk/ modified Bugzilla/Search.pm Committed revision 8511. Committing to: bzr+ssh://firstname.lastname@example.org/bugzilla/4.4/ modified Bugzilla/Search.pm Committed revision 8479. Committing to: bzr+ssh://email@example.com/bugzilla/4.2/ modified Bugzilla/Search.pm Committed revision 8176.
Added to relnotes for 4.4rc2.
gogogogog why so slow...