Closed Bug 824987 Opened 12 years ago Closed 12 years ago

query analysis for bugzilla query checksum 281345723352719199

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 825263

People

(Reporter: scabral, Assigned: mreid)

References

Details

(Whiteboard: change from subquery to join)

SELECT bug_id , UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts , login_name AS modified_by , UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts , login_name AS created_by , ispatch AS 'attachments.ispatch' , isobsolete AS 'attachments.isobsolete' , isprivate AS 'attachments.isprivate' , attach_id FROM attachments a JOIN profiles p ON a.submitter_id = p.userid WHERE bug_id IN ( SELECT bug_id FROM bugs WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 AND true ) ORDER BY bug_id, attach_id, a.creation_ts
Here's the EXPLAIN on MySQL 5.1 (current running bugzilla). It's not great, because of the dependent subquery on the bugs table and the resulting full table scan on the attachments table. *************************** 1. row *************************** id: 1 select_type: PRIMARY table: a type: ALL possible_keys: attachments_submitter_id_idx key: NULL key_len: NULL ref: NULL rows: 679625 Extra: Using where; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.submitter_id rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: bugs type: unique_subquery possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: func rows: 1 Extra: Using where 3 rows in set (0.00 sec)
In MariaDB 5.5, however, the query optimization is much better - the same query's plan still has the full table scan on attachments, but only a primary key lookup on the bugs table, not a dependent subquery. *************************** 1. row *************************** id: 1 select_type: PRIMARY table: a type: ALL possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx key: NULL key_len: NULL ref: NULL rows: 689913 Extra: Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.submitter_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: PRIMARY table: bugs type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.bug_id rows: 1 Extra: Using where
Running it on MySQL 5.1 3 times in a row, I'm getting results in 3.58, 3.71 and 3.61 seconds. Running it on MariaDB 5.5 3 times in a row, I'm getting results in 3.34, 3.32 and 3.31 seconds. So it's a bit faster on MariaDB 5.5, but not tons faster. Here are the FROM/WHERE clauses: FROM attachments a JOIN profiles p ON a.submitter_id = p.userid WHERE bug_id IN (SELECT bug_id FROM bugs WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 AND true) ORDER BY bug_id, attach_id, a.creation_ts\G And from the fingerprint: from attachments a join profiles p on a.submitter_id = p.userid where bug_id in ( select bug_id from bugs where unix_timestamp(delta_ts)*? > ? and true ) order by bug_id, attach_id, a.creation_ts From what I can see, this could easily be turned into a JOIN - there's no grouping or other filtering that would prevent this from being a JOIN like this: FROM attachments a JOIN profiles p ON a.submitter_id = p.userid JOIN bugs ON a.bug_id = bugs.bug_id WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 ORDER BY a.bug_id, attach_id, a.creation_ts\G SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts, login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts, login_name AS created_by, ispatch AS 'attachments.ispatch', isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate', attach_id FROM attachments a JOIN profiles p ON a.submitter_id = p.userid JOIN bugs ON a.bug_id = bugs.bug_id WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 ORDER BY a.bug_id, attach_id, a.creation_ts\G This results in: 3.14, 3.18, 3.19 seconds on MySQL 5.1 3.30, 3.45, 3.31 seconds on MariaDB 5.5 So not much of a difference on MariaDB 5.5, but definitely a difference on MySQL 5.1. Is there any way you can change this from a subquery to a JOIN? There's almost a half-second difference on the current running MySQL.
Blocks: 732354
No longer blocks: 783545
I don't know where this code comes from, but it doesn't exist upstream. Must come from a bmo extension.
Whiteboard: change from subquery to join
Assignee: server-ops-database → scabral
That Unix time (1346050769000) is Mon, 27 Aug 2012 06:59:29 GMT. What date are the logs being analysed? I wonder if this is a notification extension of some sort. Gerv
Nope, it's just that we were using the logs from then: https://bugzilla.mozilla.org/show_bug.cgi?id=783545#c10
(In reply to Frédéric Buclin from comment #4) > I don't know where this code comes from, but it doesn't exist upstream. Must > come from a bmo extension. this isn't generated by bugzilla/bmo code. my bet would be this is metrics polling the db for changes. (In reply to Gervase Markham [:gerv] from comment #5) > That Unix time (1346050769000) is Mon, 27 Aug 2012 06:59:29 GMT. What date > are the logs being analysed? well spotted! hopefully these are old logs, because that time span would cover a lot of bugs.
(In reply to Byron Jones ‹:glob› (away until 2nd Jan) from comment #7) > well spotted! hopefully these are old logs, because that time span would > cover a lot of bugs. d'oh somehow i missed sheeri's response to this in comment 6 :)
These are queries from Metrics. The subquery was originally written with the idea of avoiding a large read on the big activities table and was carried over to all the others. If a join is better across the board, we can definitely change the queries around. Mark, can you please work with Sheeri on that?
Assignee: scabral → mreid
Sure. I've actually got a development version that uses JOINs instead of subqueries since it sped things up for local testing. Sheeri, I got good results using something like this - can you confirm that it gives the desired speedup for you too? SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts, login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts, login_name AS created_by, ispatch AS 'attachments.ispatch', isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate', attach_id FROM attachments a INNER JOIN (SELECT bug_id FROM bugs WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000) q ON q.bug_id = a.bug_id JOIN profiles p ON a.submitter_id = p.userid ORDER BY bug_id, attach_id, a.creation_ts
You do need to specify which "bug_id" in the ORDER BY. The EXPLAIN for this is: *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 78827 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: a type: ref possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx key: attachments_bug_id_idx key_len: 3 ref: q.bug_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: PRIMARY table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.submitter_id rows: 1 Extra: *************************** 4. row *************************** id: 2 select_type: DERIVED table: bugs type: index possible_keys: NULL key: bugs_delta_ts_idx key_len: 8 ref: NULL rows: 840702 Extra: Using where; Using index 4 rows in set (0.81 sec) There's still a subquery there. Testing it out, it's: MySQL 5.1 - 4.86, 4.63, 4.68 seconds MariaDB 5.5 - 3.88, 3.83, 3.84 seconds So the query I offered in comment 3 is still much faster, in both MySQL 5.1 and MariaDB 5.5.
When I try to run the JOIN without a subquery on my dev machine, it doesn't seem to want to use an index. Here's the explain I get: mysql> explain SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts, login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts, login_name AS created_by, ispatch AS 'attachments.ispatch', isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate', attach_id FROM attachments a JOIN profiles p ON a.submitter_id = p.userid JOIN bugs ON a.bug_id = bugs.bug_id WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1330450269000 ORDER BY a.bug_id, attach_id, a.creation_ts\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx key: NULL key_len: NULL ref: NULL rows: 582654 Extra: Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.submitter_id rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: bugs type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: bugs.a.bug_id rows: 1 Extra: Using where 3 rows in set (0.01 sec) The query ends up taking >10 seconds to run, while the join-on-subquery version runs in about 1 second (using the same plan as you saw in comment 11). Do you know how I can get it to choose the attachments_bug_id_idx index?
This is the same general problem as https://bugzilla.mozilla.org/show_bug.cgi?id=825263, solution already implemented in new version
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.