Last Comment Bug 818007 - Searching by commenter is slow
: Searching by commenter is slow
Status: RESOLVED FIXED
: perf
Product: Bugzilla
Classification: Server Software
Component: Query/Bug List (show other bugs)
: 4.2.3
: All All
: -- major (vote)
: Bugzilla 4.2
Assigned To: Frédéric Buclin
: default-qa
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-04 04:57 PST by Mr Aardvark
Modified: 2013-04-28 14:00 PDT (History)
7 users (show)
LpSolit: approval+
LpSolit: approval4.4+
LpSolit: blocking4.4+
LpSolit: approval4.2+
LpSolit: blocking4.2.5+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch, v1 (836 bytes, patch)
2012-12-04 15:35 PST, Frédéric Buclin
dkl: review+
Details | Diff | Splinter Review

Description Mr Aardvark 2012-12-04 04:57:58 PST
User wants to search for all bugs they've commented on in a particular date range so they enter their email address in search by people, tick the commenter box and add a couple of custom search terms "Comment" "changed after" and "Comment" "changed before".

The resulting search takes quite a while and show up in the mysql slow-query log. Running the slow query directly gives a time of 7m26s. Mysqladmin processlist shows "Copying to tmp table."

I notice that there is a sub SELECT statement that returns a record containing bugid and isprivate for each time the user has commented. A lot of this is duplicate info.

Changing from SELECT to SELECT DISTINCT for the subquery (Bugzilla/Search.pm line 2266) dramatically reduces the amount of data returned and the query completes in less than 4 seconds.

The user says this query used to be much faster in 3.0.x but I've not yet created a test environment to compare the SQL queries.
Comment 1 Frédéric Buclin 2012-12-04 12:30:53 PST
I can confirm that this makes a huge difference on my local test installation.
Comment 2 David Lawrence [:dkl] 2012-12-04 14:07:59 PST
Although not a massive improvement as others have reported, it does improve things somewhat on my test install of BMO 4.2

with DISTINCT:
1.89s
1.94s
1.89s
----
1.90s average

without DISTINCT:
2.07s
2.09s
2.10s
-----
2.08s average
Comment 3 Frédéric Buclin 2012-12-04 15:29:40 PST
I ran the following search on a Bugzilla installation with only 1250 bugs:

In the "Search By People" section: a commenter contains lpsolit@....
In the "Custom Search" section: comment     changed before 2012-12-12
                              + comment     changed after  2003-05-19
                              + flag setter is equal to    lpsolit@...

With the patch applied, it returns 293 bugs in 3 seconds. Without the patch, I had to kill the query in MySQL directly because it didn't complete after 4 minutes.

This is significant enough to be backported.
Comment 4 Frédéric Buclin 2012-12-04 15:35:40 PST
Created attachment 688503 [details] [diff] [review]
patch, v1
Comment 5 David Lawrence [:dkl] 2012-12-04 19:00:47 PST
Comment on attachment 688503 [details] [diff] [review]
patch, v1

Review of attachment 688503 [details] [diff] [review]:
-----------------------------------------------------------------

r=dkl
Comment 6 Sheeri Cabral [:sheeri] 2012-12-05 08:59:49 PST
Fascinating! I tried this on our production data set with help from Frédéric:
 

[11:50:15] <sheeri> slow query took 1 min 11.48 sec the first time, I'm trying it again.
[11:50:19] <sheeri> then I'll try it with DISTINCT :D
[11:52:25] <sheeri> 1 min 4.79 seconds the 2nd time
[11:53:22] <sheeri> 27.02 seconds the 1st time using DISTINCT
[11:53:42] <sheeri> 2nd time using DISTINCT, 25.74 secods

Avg time for the first query (without distinct) is 68.135 seconds.
Avg time for the second query (with distinct) is 26.38 seconds.

That's a 60% improvement.

Mr Aardvark, I'm curious as to what made you try DISTINCT, since even the EXPLAIN looks like it would be worse with DISTINCT in there.
Comment 7 Brandon Johnson 2012-12-05 10:39:40 PST
It appears that the older query is faster because the MySQL Optimizer chose to put the bugs table first, and not one of the derived tables as in the slower query. If the Optimizer is allowed to choose either of the derived tables for the 1st table by cost, the query will take significantly longer (although the optimizer thinks that will be faster).

For MySQL Versions only, (and this makes me wonder if this occurs in Oracle or PGSQL installations) the solution is to structure the join order and tell the optimizer what to use instead of letting it decide for itself.

The way we can do this is by using SELECT STRAIGHT_JOIN ...and ensuring all the tables are in the optimal order in the "FROM" portion of the query. This may take a few tries and the use of EXPLAIN to identify proper order for mapping, but will significantly speed up the query.

Note that the indicator of DISTINCT speeding up a large result set query is a clear identifier that the Optimizer is choosing table order poorly.
Comment 8 Mr Aardvark 2012-12-05 13:01:09 PST
Sheeri,
I chose DISTINCT because it was one of the few spanners in my limited SQL toolkit :)

I ran the sub select on its own and it returned 6000+ rows with loads of duplicates. I'm think it returned bugid & isprivate for each time the user commented on the bug which seemed unnecessary. Adding the DISTINCT dropped it to ~1000 rows.

That probably made the difference between writing the tmp table to disk and having it in heap.
Comment 9 Frédéric Buclin 2012-12-07 05:09:57 PST
I tested this patch on PostgreSQL, and the perf win is also significant. Results are similar to what I got in comment 3.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Search.pm
Committed revision 8511.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.4/
modified Bugzilla/Search.pm
Committed revision 8479.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.2/
modified Bugzilla/Search.pm
Committed revision 8176.
Comment 10 Frédéric Buclin 2013-02-17 16:36:16 PST
Added to relnotes for 4.4rc2.
Comment 11 fredly1988 2013-04-28 14:00:43 PDT
gogogogog why so slow...

Note You need to log in before you can comment on or make changes to this bug.