Last Comment Bug 819432 - Execute queries in two steps to improve performance
: Execute queries in two steps to improve performance
Status: RESOLVED FIXED
: perf
Product: Bugzilla
Classification: Server Software
Component: Query/Bug List (show other bugs)
: 4.5
: All All
: -- normal (vote)
: Bugzilla 4.4
Assigned To: Frédéric Buclin
: default-qa
:
Mentors:
http://dev.mysql.com/doc/refman/5.5/e...
: 179322 (view as bug list)
Depends on:
Blocks: 545795
  Show dependency treegraph
 
Reported: 2012-12-07 10:15 PST by Frédéric Buclin
Modified: 2013-06-27 07:17 PDT (History)
9 users (show)
LpSolit: approval+
LpSolit: approval4.4+
LpSolit: blocking4.4+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch, v1 (13.69 KB, patch)
2012-12-25 16:04 PST, Frédéric Buclin
no flags Details | Diff | Splinter Review
patch, v1.1 (16.87 KB, patch)
2012-12-27 17:11 PST, Frédéric Buclin
dkl: review+
Details | Diff | Splinter Review

Description Frédéric Buclin 2012-12-07 10:15:54 PST
Proof of concept:

Experience 1:
------------

In the Advanced Search page, unselect everything (in case something is selected by default), and in boolean charts, set the two following criteria, joined by AND:

 Comment is changed before 2014-12-31
 Comment is changed after 1980-01-01

The goal is to get all bugs! For the buglist, go to the Change Columns page and unselect everything. This way, only the bug ID will be displayed, nothing more.

Result: the SQL query only takes 0.52 second on my machine, with 1250 bugs.


Experience 2:
------------

Now type "ALL" in the QuickSearch box to get all bugs. When the list is displayed, append &columnlist=all to the URL to display all columns. Despite the large number of columns, the SQL query only takes 0.43 second, slightly faster than the query above.


Experience 3:
------------

Use again the same criteria as in experience 1, but in the Change Columns page, you only select the "Number of comments" column. Result: timeout! No results are returned and you have to manually kill the process in MySQL and PostgreSQL (yes, I tested on both).


Analyze:
-------

In this last experience, we could expect the execution time to be of the same order of magnitude as the first two experiences, but it doesn't! So we are definitely wasting time by doing everything in the same query. So we have two choices:

a) Understand how the DB optimizer works to better join our tables (good luck!), or
b) Split the query into two separate pieces: first get bug IDs only using the criteria set by the user, and then get column data by passing only the bug IDs we got in the first run. This should hugely improve things as the WHERE clause would simply get a hardcoded list of bug IDs.
Comment 1 Reed Loden [:reed] (use needinfo?) 2012-12-07 10:36:52 PST
When doing your calculations, remember to take into account the cost of two round-trip calls to the database. That can be overhead in itself, especially with databases that aren't localhost.
Comment 2 Frédéric Buclin 2012-12-07 10:41:47 PST
If I drop the "Number of comments" column and display all the other ones, the SQL query takes 117 seconds to execute. This is still much more than doing experiences 1 + 2 in two separate steps.
Comment 3 Frédéric Buclin 2012-12-07 10:44:23 PST
(In reply to Reed Loden [:reed] from comment #1)
> When doing your calculations, remember to take into account the cost of two
> round-trip calls to the database. That can be overhead in itself, especially
> with databases that aren't localhost.

I think that's minor compared to the time spent to execute the queries and compared to the number of SQL queries already executed to display a page like the buglist one (including the page header and footer).
Comment 4 Frédéric Buclin 2012-12-07 11:39:50 PST
(In reply to Frédéric Buclin from comment #2)
> If I drop the "Number of comments" column and display all the other ones,
> the SQL query takes 117 seconds to execute.

If I replace all INNER JOIN by STRAIGHT_JOIN (MySQL-specific thing), the time spent is now 7 seconds only! So the DB optimizer is very bad at guessing which table to consider first.
Comment 5 Frédéric Buclin 2012-12-07 12:21:48 PST
The slow part is to copy data to a temporary table:

SHOW PROFILE;

+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| Copying to tmp table | 59.162165 |
| Sorting result       | 27.910422 |
| Sending data         | 21.927981 |
+----------------------+-----------+

For comparison, with STRAIGHT_JOIN, there is no temporary table, which is why things are way faster:

SHOW PROFILE;

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| Sorting result       | 0.000008 |
| Sending data         | 5.261249 |
+----------------------+----------+
Comment 6 Byron Jones ‹:glob› [PTO until 2016-10-10] 2012-12-07 15:22:31 PST
frédéric, what are your thoughts on doing per-db optimisation when building the search query?

ie. in bugzilla::search doing:
  if (current db is mysql)
    use straight_join
  else
    use normal join

this breaks our db abstraction layer from a purist's perspective, however there's significant performance gains, and mysql is the most common db for bugzilla.
Comment 7 Frédéric Buclin 2012-12-07 15:27:14 PST
(In reply to Byron Jones ‹:glob› from comment #6)
> frédéric, what are your thoughts on doing per-db optimisation when building
> the search query?

We would need much more data than what I pasted in the few comments above to guarantee that it's always better to bypass the DB optimizer logic by forcing it to read DB tables in the order they appear in the FROM part of the query. Do not forget that we join tables pretty randomly.
Comment 8 Dave Miller [:justdave] (justdave@bugzilla.org) 2012-12-07 15:55:56 PST
Wonder if Sheeri has any insight on this...
Comment 9 Frédéric Buclin 2012-12-11 15:00:34 PST
I just did some more testing, and it's not systematic that STRAIGHT_JOIN improves performance. In some long queries, STRAIGHT_JOIN doesn't help at all. So I still think that splitting the monolithic SQL query into two separate queries is a better way to go.
Comment 10 Frédéric Buclin 2012-12-12 06:57:46 PST
Splitting queries into two distinct steps as described in comment 0 helps a lot. A query which currently takes 65 seconds on my local test installation with 1250 bugs in it now only takes 0.01 second to complete (cumulative time of both queries!!). I assume the perf win on larger DB will be large too as joining tables with many rows is what makes MySQL slow, per http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.html

"Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.

A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery.

[...]

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return."

So this confirms what I saw in comment 0 and is a valid enough reason to implement it as two distinct queries.

The improvement is significant enough to take it for 4.4. This will involve a 4.4rc2 for more testing coverage, but it worths it (instead of waiting 9+ more months for 5.0).
Comment 11 Sheeri Cabral [:sheeri] 2012-12-12 11:12:00 PST
/me clicks the "like" button for Comment 10.
Comment 12 David Marshall 2012-12-17 17:53:52 PST
(In reply to Frédéric Buclin from comment #0)

> 
> a) Understand how the DB optimizer works to better join our tables (good
> luck!), or

This is extraordinarily difficult when we don't know which/how many tables will be included in the query. I spent a lot of 2007 investigating this part and got nowhere, except to decide that MySQL (4.0 FWIW) will stop investigating possible query plans and just pick one when the number of possible plans exceeds some number.

> b) Split the query into two separate pieces: first get bug IDs only using
> the criteria set by the user, and then get column data by passing only the
> bug IDs we got in the first run. This should hugely improve things as the
> WHERE clause would simply get a hardcoded list of bug IDs.

This is exactly what we do at Y! for some queries. I think the guts of Search.pm have changed so much that my description doesn't apply anymore, but when building a predicate from the charts, a table is either identifying or informational. We determine the smallest query possible to figure out which bugs are in the result and then do another query to get all the requested information.
Comment 13 Frédéric Buclin 2012-12-25 16:04:14 PST
Created attachment 695685 [details] [diff] [review]
patch, v1

Improvements depend on the queries being executed and on the columns being displayed, but the performance is always visible.

Here is an example of a significant win (on MySQL) using GCC Bugzilla. In both cases are all columns displayed in the buglist, and the DB used in MySQL and PostgreSQL is exactly the same (same custom fields, same data). The search criteria are:

Product: (is equal to any of the strings) Bugzilla, gcc
Commenter: (contains the string) lpsolit
Comment: (contains all of the strings) testing bug
Comment: (changed before) 2012-12-12
Comment: (changed after) 2001-01-01

The first result is for changed before/after being against independent comments, while the 2nd result is against the same comment. This makes a huge difference on MySQL, but is negligible on PostgreSQL:

Without my patch (MySQL): 59.9 seconds / 22.0 seconds
With my patch (MySQL):    20.8 seconds / 19.1 seconds

Without my patch (PostgreSQL): 7.5 seconds / 7.5 seconds
With my patch (PostgreSQL):    6.9 seconds / 6.3 seconds

This is exactly the same DB on the same (local) machine, but we can see that PostgreSQL 9.1.6 is much faster than MariaDB 5.5.25 (shame on MariaDB). The win with PostgreSQL is not impressive, but the win with MySQL is huge. Note that both installations are patched with patches from bug 824262 and bug 824361 to make things a bit faster.
Comment 14 Frédéric Buclin 2012-12-26 07:25:50 PST
(In reply to Frédéric Buclin from comment #0)
> Experience 3:
> ------------
> 
> Use again the same criteria as in experience 1, but in the Change Columns
> page, you only select the "Number of comments" column. Result: timeout! No
> results are returned and you have to manually kill the process in MySQL and
> PostgreSQL (yes, I tested on both).

For the record, with my patch applied, it now takes 0.83 second to return data, with *all* columns displayed, including "Number of comments". Much better than the timeout described above. :)
Comment 15 Frédéric Buclin 2012-12-27 17:11:18 PST
Created attachment 696177 [details] [diff] [review]
patch, v1.1

Same patch, but with POD included for new() and data().
Comment 16 David Lawrence [:dkl] 2013-01-16 08:46:53 PST
Comment on attachment 696177 [details] [diff] [review]
patch, v1.1

Review of attachment 696177 [details] [diff] [review]:
-----------------------------------------------------------------

Verified boost in performance and code looks good to me. r=dkl
Comment 17 Frédéric Buclin 2013-01-16 10:11:10 PST
Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified buglist.cgi
modified collectstats.pl
modified report.cgi
modified whine.pl
modified Bugzilla/Search.pm
modified template/en/default/list/list.html.tmpl
modified template/en/default/reports/report.html.tmpl
modified xt/lib/Bugzilla/Test/Search/FieldTest.pm
Committed revision 8558.


Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/4.4/
modified buglist.cgi
modified collectstats.pl
modified report.cgi
modified whine.pl
modified Bugzilla/Search.pm
modified template/en/default/list/list.html.tmpl
modified template/en/default/reports/report.html.tmpl
modified xt/lib/Bugzilla/Test/Search/FieldTest.pm
Committed revision 8506.
Comment 18 Frédéric Buclin 2013-02-17 16:37:00 PST
Added to relnotes for 4.4rc2.
Comment 19 Denis Roy 2013-05-22 13:46:11 PDT
Nice work here.  We've also often found improvements when splitting complex queries, and definitely avoid subqueries.
Comment 20 Frédéric Buclin 2013-06-27 07:17:31 PDT
*** Bug 179322 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.