Closed Bug 1446418 Opened 6 years ago Closed 4 years ago

slow mysql query if `repository` table gets used, timeout

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Unassigned)

Details

Running the following queries on sql.telemetry.mozilla.org has significant differences for the run time and these occur if one more table ("repository") gets joined because it has rather few rows and there is only match per row.

"Fast" (35-50s): https://sql.telemetry.mozilla.org/queries/52051/source
SELECT job.start_time, job.id, job.repository_id, text_log_error.line
FROM text_log_step
JOIN text_log_error
  ON text_log_step.id = text_log_error.step_id
JOIN job
  ON text_log_step.job_id = job.id
WHERE text_log_error.line LIKE '%getpocket.cdn.mozilla.net%'
ORDER BY text_log_step.id

Slow, times out (limit 120s?): https://sql.telemetry.mozilla.org/queries/52052/source
SELECT job.start_time, job.id, repository.name, text_log_error.line
FROM text_log_step
JOIN text_log_error
  ON text_log_step.id = text_log_error.step_id
JOIN job
  ON text_log_step.job_id = job.id
JOIN repository
  ON job.repository_id = repository.id
WHERE text_log_error.line LIKE '%getpocket.cdn.mozilla.net%'
ORDER BY text_log_step.id

Any idea why this is slow? There are other, succeeding queries similar to that which involve more tables including "repository" but don't use WHERE ... LIKE ...
I'd imagine the answer lies in the way MySQL's query optimizer is working, and something in:
https://dev.mysql.com/doc/refman/5.7/en/join.html
https://dev.mysql.com/doc/refman/5.7/en/nested-join-optimization.html

This is something a DBA might need to answer. I'd ask them to look at the explains below:

# First query:

> EXPLAIN SELECT job.start_time, job.id, job.repository_id, text_log_error.line
FROM text_log_step
JOIN text_log_error
  ON text_log_step.id = text_log_error.step_id
JOIN job
  ON text_log_step.job_id = job.id
WHERE text_log_error.line LIKE '%getpocket.cdn.mozilla.net%'
ORDER BY text_log_step.id

******************** 1. row *********************
           id: 1
  select_type: SIMPLE
        table: text_log_error
   partitions: 
         type: ALL
possible_keys: text_log_error_step_id_88397b6b7d4c02b_uniq
          key: 
      key_len: 
          ref: 
         rows: 19344836
     filtered: 11.11
        Extra: Using where; Using temporary; Using filesort
******************** 2. row *********************
           id: 1
  select_type: SIMPLE
        table: text_log_step
   partitions: 
         type: eq_ref
possible_keys: PRIMARY,text_log_step_job_id_433c8ebb7b438ffc_uniq
          key: PRIMARY
      key_len: 8
          ref: treeherder.text_log_error.step_id
         rows: 1
     filtered: 100.00
        Extra: 
******************** 3. row *********************
           id: 1
  select_type: SIMPLE
        table: job
   partitions: 
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: treeherder.text_log_step.job_id
         rows: 1
     filtered: 100.00
        Extra: 
3 rows in set


# Second (slower) query:

> EXPLAIN SELECT job.start_time, job.id, repository.name, text_log_error.line
FROM text_log_step
JOIN text_log_error
  ON text_log_step.id = text_log_error.step_id
JOIN job
  ON text_log_step.job_id = job.id
JOIN repository
  ON job.repository_id = repository.id
WHERE text_log_error.line LIKE '%getpocket.cdn.mozilla.net%'
ORDER BY text_log_step.id

******************** 1. row *********************
           id: 1
  select_type: SIMPLE
        table: repository
   partitions: 
         type: index
possible_keys: PRIMARY
          key: repository_name_13592bc8829e9cbe_uniq
      key_len: 152
          ref: 
         rows: 91
     filtered: 100.00
        Extra: Using index; Using temporary; Using filesort
******************** 2. row *********************
           id: 1
  select_type: SIMPLE
        table: job
   partitions: 
         type: ref
possible_keys: PRIMARY,job_repository_id_54467c6e_idx,job_repository_id_87ce756c_idx,job_repository_id_9811246d_idx,job_repository_id_ed9a47c7_idx,job_repository_id_21096eb3_idx
          key: job_repository_id_9811246d_idx
      key_len: 4
          ref: treeherder.repository.id
         rows: 5234
     filtered: 100.00
        Extra: Using index
******************** 3. row *********************
           id: 1
  select_type: SIMPLE
        table: text_log_step
   partitions: 
         type: ref
possible_keys: PRIMARY,text_log_step_job_id_433c8ebb7b438ffc_uniq
          key: text_log_step_job_id_433c8ebb7b438ffc_uniq
      key_len: 8
          ref: treeherder.job.id
         rows: 1
     filtered: 100.00
        Extra: Using index
******************** 4. row *********************
           id: 1
  select_type: SIMPLE
        table: text_log_error
   partitions: 
         type: ref
possible_keys: text_log_error_step_id_88397b6b7d4c02b_uniq
          key: text_log_error_step_id_88397b6b7d4c02b_uniq
      key_len: 8
          ref: treeherder.text_log_step.id
         rows: 4
     filtered: 11.11
        Extra: Using where
4 rows in set
Component: Treeherder: Infrastructure → DB: MySQL
Product: Tree Management → Data & BI Services Team
QA Contact: mpressman
Version: --- → other
Changing the "JOIN repository" to "LEFT JOIN repository" seems to allow mysql to optimize the query better. The EXPLAIN changes to look more like the "quicker" query. When I ran the changed query I got results back in about two minutes.

It's not what you'd call fast by any means, but that's more the fault of the "WHERE text_log_error.line LIKE '%getpocket.cdn.mozilla.net%'" which causes mysql to have to scan all the rows in text_log_error. You could maybe improve that by using FULLTEXT indexes on that column if speed is an issue.
Chris, many thanks for your help :-)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

Armen, I've re-opened this and put it in your triage list, but have not assigned it to you. As part of your performance analysis, could you and :aryx find out if this is still a problem, and whether we could make the sheriffs' jobs easier by modifying the query or adding a fulltext index?

Thanks very much.

Status: RESOLVED → REOPENED
Type: task → defect
Component: DB: MySQL → Treeherder: Infrastructure
Product: Data & BI Services Team → Tree Management
QA Contact: mpressman → kthiessen
Resolution: FIXED → ---
Version: other → ---
Component: Treeherder: Infrastructure → Treeherder
Priority: -- → P3

:aryx IF there are slow-SQL problems, you can send them to me; hopefully we can move to BigQuery shortly

Status: REOPENED → RESOLVED
Closed: 6 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.