query analysis for bugzilla query checksum 281345723352719199

RESOLVED DUPLICATE of bug 825263

Status

RESOLVED DUPLICATE of bug 825263
6 years ago
4 years ago

People

(Reporter: scabral, Assigned: mreid)

Tracking

Details

(Whiteboard: change from subquery to join)

(Reporter)

Description

6 years ago
SELECT bug_id
        , UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts
        , login_name AS modified_by
        , UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts
        , login_name AS created_by
        , ispatch AS 'attachments.ispatch'
        , isobsolete AS 'attachments.isobsolete'
        , isprivate AS 'attachments.isprivate'
        , attach_id
FROM
    attachments a
        JOIN profiles p ON a.submitter_id = p.userid
WHERE bug_id IN (
        SELECT bug_id FROM bugs WHERE 
    UNIX_TIMESTAMP(delta_ts)*1000 >  1346050769000
    AND true
        )
ORDER BY
    bug_id,
        attach_id,
        a.creation_ts
(Reporter)

Comment 1

6 years ago
Here's the EXPLAIN on MySQL 5.1 (current running bugzilla). It's not great, because of the dependent subquery on the bugs table and the resulting full table scan on the attachments table.

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ALL
possible_keys: attachments_submitter_id_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 679625
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.submitter_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: bugs
         type: unique_subquery
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: func
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)
(Reporter)

Comment 2

6 years ago
In MariaDB 5.5, however, the query optimization is much better - the same query's plan still has the full table scan on attachments, but only a primary key lookup on the bugs table, not a dependent subquery.

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ALL
possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 689913
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.submitter_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: bugs
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.bug_id
         rows: 1
        Extra: Using where
(Reporter)

Comment 3

6 years ago
Running it on MySQL 5.1 3 times in a row, I'm getting results in 3.58, 3.71 and 3.61 seconds.

Running it on MariaDB 5.5 3 times in a row, I'm getting results in 3.34, 3.32 and 3.31 seconds.

So it's a bit faster on MariaDB 5.5, but not tons faster.


Here are the FROM/WHERE clauses:
FROM attachments a JOIN profiles p ON a.submitter_id = p.userid 
WHERE bug_id IN (SELECT bug_id FROM bugs WHERE UNIX_TIMESTAMP(delta_ts)*1000 >  1346050769000 AND true) ORDER BY bug_id, attach_id, a.creation_ts\G

And from the fingerprint:
from attachments a join profiles p on a.submitter_id = p.userid where bug_id in ( select bug_id from bugs where unix_timestamp(delta_ts)*? > ? and true ) order by bug_id, attach_id, a.creation_ts

From what I can see, this could easily be turned into a JOIN - there's no grouping or other filtering that would prevent this from being a JOIN like this:
FROM attachments a
 JOIN profiles p ON a.submitter_id = p.userid 
 JOIN bugs ON a.bug_id = bugs.bug_id
WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 ORDER BY a.bug_id, attach_id, a.creation_ts\G


SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts, login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts, login_name AS created_by, ispatch AS 'attachments.ispatch', isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate', attach_id 
FROM attachments a
 JOIN profiles p ON a.submitter_id = p.userid 
 JOIN bugs ON a.bug_id = bugs.bug_id
WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000 ORDER BY a.bug_id, attach_id, a.creation_ts\G

This results in:
3.14, 3.18, 3.19 seconds on MySQL 5.1

3.30, 3.45, 3.31 seconds on MariaDB 5.5

So not much of a difference on MariaDB 5.5, but definitely a difference on MySQL 5.1.

Is there any way you can change this from a subquery to a JOIN? There's almost a half-second difference on the current running MySQL.
(Reporter)

Updated

6 years ago
Blocks: 732354
No longer blocks: 783545

Comment 4

6 years ago
I don't know where this code comes from, but it doesn't exist upstream. Must come from a bmo extension.
(Reporter)

Updated

6 years ago
Whiteboard: change from subquery to join
(Reporter)

Updated

6 years ago
Assignee: server-ops-database → scabral
That Unix time (1346050769000) is Mon, 27 Aug 2012 06:59:29 GMT. What date are the logs being analysed?

I wonder if this is a notification extension of some sort.

Gerv
(Reporter)

Comment 6

6 years ago
Nope, it's just that we were using the logs from then:

https://bugzilla.mozilla.org/show_bug.cgi?id=783545#c10
(In reply to Frédéric Buclin from comment #4)
> I don't know where this code comes from, but it doesn't exist upstream. Must
> come from a bmo extension.

this isn't generated by bugzilla/bmo code.
my bet would be this is metrics polling the db for changes.

(In reply to Gervase Markham [:gerv] from comment #5)
> That Unix time (1346050769000) is Mon, 27 Aug 2012 06:59:29 GMT. What date
> are the logs being analysed?

well spotted!  hopefully these are old logs, because that time span would cover a lot of bugs.
(In reply to Byron Jones ‹:glob› (away until 2nd Jan) from comment #7)
> well spotted!  hopefully these are old logs, because that time span would
> cover a lot of bugs.

d'oh somehow i missed sheeri's response to this in comment 6  :)
These are queries from Metrics.  The subquery was originally written with the idea of avoiding a large read on the big activities table and was carried over to all the others.  If a join is better across the board, we can definitely change the queries around.

Mark, can you please work with Sheeri on that?
Assignee: scabral → mreid
(Assignee)

Comment 10

6 years ago
Sure.  I've actually got a development version that uses JOINs instead of subqueries since it sped things up for local testing.

Sheeri, I got good results using something like this - can you confirm that it gives the desired speedup for you too?

SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts, login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts, login_name AS created_by, ispatch AS 'attachments.ispatch', isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate', attach_id FROM attachments a INNER JOIN (SELECT bug_id FROM bugs WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1346050769000) q ON q.bug_id = a.bug_id JOIN profiles p ON a.submitter_id = p.userid ORDER BY bug_id, attach_id, a.creation_ts
(Reporter)

Comment 11

6 years ago
You do need to specify which "bug_id" in the ORDER BY. The EXPLAIN for this is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 78827
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ref
possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx
          key: attachments_bug_id_idx
      key_len: 3
          ref: q.bug_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.submitter_id
         rows: 1
        Extra: 
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: bugs
         type: index
possible_keys: NULL
          key: bugs_delta_ts_idx
      key_len: 8
          ref: NULL
         rows: 840702
        Extra: Using where; Using index
4 rows in set (0.81 sec)


There's still a subquery there. Testing it out, it's:

MySQL 5.1 - 4.86, 4.63, 4.68 seconds
MariaDB 5.5 - 3.88, 3.83, 3.84 seconds

So the query I offered in comment 3 is still much faster, in both MySQL 5.1 and MariaDB 5.5.
(Assignee)

Comment 12

6 years ago
When I try to run the JOIN without a subquery on my dev machine, it doesn't seem to want to use an index.  Here's the explain I get:

mysql> explain SELECT a.bug_id, UNIX_TIMESTAMP(a.creation_ts)*1000 AS modified_ts,
 login_name AS modified_by, UNIX_TIMESTAMP(a.creation_ts)*1000 AS created_ts,
 login_name AS created_by, ispatch AS 'attachments.ispatch',
 isobsolete AS 'attachments.isobsolete', isprivate AS 'attachments.isprivate',
 attach_id  FROM attachments a  JOIN profiles p ON a.submitter_id = p.userid 
  JOIN bugs ON a.bug_id = bugs.bug_id WHERE UNIX_TIMESTAMP(delta_ts)*1000 > 1330450269000 ORDER BY a.bug_id, attach_id, a.creation_ts\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: attachments_bug_id_idx,attachments_submitter_id_idx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 582654
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.submitter_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: bugs
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: bugs.a.bug_id
         rows: 1
        Extra: Using where
3 rows in set (0.01 sec)

The query ends up taking >10 seconds to run, while the join-on-subquery version runs in about 1 second (using the same plan as you saw in comment 11).

Do you know how I can get it to choose the attachments_bug_id_idx index?
This is the same general problem as  https://bugzilla.mozilla.org/show_bug.cgi?id=825263, solution already implemented in new version
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 825263
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.