1183492 - Optimize the SQL query for get_enterable_products()

Assignee

Description

•

10 years ago

Attached patch optimize_products — Details — Splinter Review

The SQL used in get_enterable_products could be faster when dealing with a large number of products and components. Attached is a patch to optimize this query. Without the patch MySQL: 66ms (0.06681315) Postgresql: 55ms With the patch MySQL: 14ms (0.01404838) Postgresql: 2.5ms

Attachment #8633288 - Flags: review?(gerv)

Frédéric Buclin

Comment 1

•

10 years ago

Wouldn't an INNER JOIN be more efficient instead of using a correlated subquery? You would automatically enforce products.id = components.product_id and products.id = versions.product_id this way. The MySQL documentation seems to say that a join would be more efficient than a subquery.

Severity: normal → enhancement

Gervase Markham [:gerv]

Comment 2

•

10 years ago

Matt: Thanks for this! I'd happily check your patch in as-is, but are you interested in investigating LpSolit's suggestion? Gerv

Matt Tyson

Assignee

Comment 3

•

10 years ago

Do you mean something like this? SELECT DISTINCT p.id FROM products p JOIN components c ON (c.product_id = p.id) JOIN versions v ON (v.product_id = p.id) WHERE v.isactive = 1 AND c.isactive = 1 AND p.id IN (); I get 162ms for Postgres and 31ms for MySQL. Generally MySQL is better at joins than subqueries, although that doesn't appear to be the case here. In our use case we have >75,000 active components. The subquery method cuts down the search space significantly. The postgres query analyzer seems to get indicate that it's getting bogged down joining the products and components table.

Status: NEW → ASSIGNED

Frédéric Buclin

Comment 4

•

10 years ago

(In reply to Matt Tyson from comment #3) > Do you mean something like this? Yes, that's what I meant. I'm surprised your patch makes things better, because it's basically a clone of "INNER JOIN". And AFAIK, correlated subqueries force the inner query to be re-evaluated several times.

Matt Tyson

Assignee

Comment 5

•

10 years ago

(In reply to Frédéric Buclin from comment #4) > correlated subqueries force the inner query to be re-evaluated several times. Interesting, Perhaps someone else can benchmark this query on their database?

Gervase Markham [:gerv]

Comment 6

•

10 years ago

I don't want to let the best be the enemy of the good here. The patch seems at very worst harmless, and I have no reason to doubt Matt that it's an improvement on his workloads. LpSolit: let me know if you are actively pursuing even better options; if not, I'll r+ this version. Gerv

Frédéric Buclin

Comment 7

•

10 years ago

Comment on attachment 8633288 [details] [diff] [review] optimize_products >diff --git a/Bugzilla/User.pm b/Bugzilla/User.pm >index 8df1316..d6c1f12 100644 >--- a/Bugzilla/User.pm >+++ b/Bugzilla/User.pm >@@ -1431,10 +1431,10 @@ sub get_enterable_products { > WHERE ' . $dbh->sql_in('products.id', $enterable_ids) . > ' AND products.id IN (SELECT DISTINCT components.product_id > FROM components >- WHERE components.isactive = 1) >+ WHERE components.isactive = 1 AND products.id = components.product_id) If you replace AND products.id = components.product_id by AND $dbh->sql_in('components.product_id', $enterable_ids) (with the concatenation correctly written, of course), are things better? Maybe the SQL optimizer already does this for us, automatically.

Matt Tyson

Assignee

Comment 8

•

10 years ago

This seems to perform the same as the unmodified query SELECT DISTINCT products.id FROM products WHERE products.id IN () AND products.id IN (SELECT DISTINCT components.product_id FROM components WHERE components.isactive = 1 AND components.product_id IN () ) AND products.id IN (SELECT DISTINCT versions.product_id FROM versions WHERE versions.isactive = 1); Postgres: 55ms MySQL: 75ms

Frédéric Buclin

Comment 9

•

10 years ago

(In reply to Matt Tyson from comment #8) > This seems to perform the same as the unmodified query I meant to do this for both components and versions, not components only. :) But anyway, I did my own testing on MySQL (MariaDB 10.0.19), Pg 9.4.4 and Oracle 10.2. For MySQL and Pg, I used exactly the same DB thanks to contrib/bzdbcopy.pl (I cloned my MySQL DB into Pg). My DB has 2000 components and 2000 versions (only). For Oracle, I used a vanilla installation; the point is just to make sure your query works on Oracle. - Using INNER JOIN as proposed in comment 1 shows similar performance as your patch on MySQL, but is *much* worse on Pg. 10-20 times slower. So forget INNER JOIN. - Using IN () as proposed in comment 7 shows no perf win, neither on MySQL nor on Pg. So forget IN (). - Using your patch shows an interesting perf win on MySQL (~3 times faster), but only a moderate perf win on Pg (~25%). So my results for MySQL are compatible with what you found in comment 0, but are pretty different from what you found for Pg. Maybe that's due to https://rt.cpan.org/Public/Bug/Display.html?id=105492 which you reported upstream, as you have much more components than me. But anyway, your patch is an improvement in all cases. :) Now, could you test something for me? I found that get_enterable_products() is *much* slower for logged out users than for logged in users. This is due to the first query of the method, the one which collects $enterable_ids. The problem is not $self->groups_as_string because I cached the data before starting my timer (see below). Before running the command lines below, I added "return 1;" right after the first query is executed in Bugzilla::User::get_enterable_products(), so that the time given below is the time taken to execute the first query only: # perl -Mlib=lib -MTime::HiRes -MBugzilla -MBugzilla::User -wE 'my $user = Bugzilla::User->new(1); $user->groups_as_string; my $t0 = Time::HiRes::time(); my $groups = $user->get_enterable_products; my $t1 = Time::HiRes::time(); say $t1 - $t0' 0.000322818756103516 # perl -Mlib=lib -MTime::HiRes -MBugzilla -MBugzilla::User -wE 'my $user = Bugzilla::User->new(0); $user->groups_as_string; my $t0 = Time::HiRes::time(); my $groups = $user->get_enterable_products; my $t1 = Time::HiRes::time(); say $t1 - $t0' 0.0216419696807861 User ID 1 is an admin account. User ID 0 is a logged out user. As you can see, the time for the logged out user (21.6ms) is *much* slower than for the logged in account (0.3ms), 70 times slower! I can reproduce on MySQL, Pg and Oracle (despite Oracle has only one product, with only one component and only one version). So we must do something very wrong here.

Matt Tyson

Assignee

Comment 10

•

10 years ago

I can replicate your results and it looks pretty strange. I'm not sure what the problem is though. The actual generated SQL from both versions of the query run at the same speed when copy/pasted into psql. I'll have to do some profiling with NYTProf at a later date. I don't really have time right now, but I do have a longer term plan to do some more optimizing of the bugzilla codebase. I'd prefer to treat this as a separate bug. Feel free to assign to me and I'll look at it when I can.

Frédéric Buclin

Comment 11

•

10 years ago

Comment on attachment 8633288 [details] [diff] [review] optimize_products r=LpSolit

Attachment #8633288 - Flags: review?(gerv) → review+

Frédéric Buclin

Comment 12

•

10 years ago

(In reply to Matt Tyson from comment #10) > I can replicate your results and it looks pretty strange. I'm not sure what > the problem is though. I suspect this is due to us passing -1 as a group ID when no other group applies. If logged out, we should short-circuit this query and run one which only looks for (NOT) NULL. > I'd prefer to treat this as a separate bug. I'm fine with that.

Flags: approval?

Target Milestone: --- → Bugzilla 6.0

Frédéric Buclin

Updated

•

10 years ago

Summary: Optimize sql for get_enterable_products → Optimize the SQL query for get_enterable_products()

Simon Green

Updated

•

10 years ago

Flags: approval? → approval+

Frédéric Buclin

Comment 13

•

10 years ago

To ssh://gitolite3@git.mozilla.org/bugzilla/bugzilla.git bdc889f..61a971c master -> master

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Frédéric Buclin

Comment 14

•

10 years ago

> I'd prefer to treat this as a separate bug. bug 1188451

Gervase Markham [:gerv]

Comment 15

•

10 years ago

Thanks for this, Matt :-) Gerv

Bugzilla

Optimize the SQL query for get_enterable_products()

Categories

(Bugzilla :: Database, enhancement)

Tracking

()

People

(Reporter: mtyson, Assigned: mtyson)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Updated

Comment 13

Comment 14

Comment 15

Attachment

General

Description

File Name

Content Type