Closed
Bug 862889
Opened 12 years ago
Closed 12 years ago
Optimize top 10 by count and top 10 by slowness addons production queries
Categories
(Data & BI Services Team :: DB: MySQL, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: scabral)
References
Details
(Whiteboard: [2013q3] August)
Optimize top 10 by count and top 10 by slowness addons production queries
| Assignee | ||
Updated•12 years ago
|
Priority: -- → P1
Whiteboard: [2013q3] → [2013q3] August
| Assignee | ||
Comment 1•12 years ago
|
||
set long_query_time to 0 on db1 and db2.mktdb.services.phx1.mozilla.com. Will come back tomorrow and set it back to 2 on each server and analyze the logs. If you get a monitor alert for configuration differences this is why (we no longer monitor the servers...)
| Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-database → scabral
| Assignee | ||
Comment 2•12 years ago
|
||
We had 42G on db2 before the 24 hours was up, so we stopped it then (just under 500M on db1). Processing the files now.
| Assignee | ||
Comment 3•12 years ago
|
||
I processed the files in order of longest total response time. I am processing them by count right now, in the background, so we have both.
Starting with db1, I'll put each query in a new comment so they're easily referenceable by comment:
Query 1 ran a total of 5 times, but it's a long query, total exec time was 961 minutes.
The query is:
SELECT
personas.persona_id AS id,
translations.localized_string AS name,
users.display_name AS author,
personas.addon_id
FROM personas
INNER JOIN addons ON personas.addon_id = addons.id
LEFT OUTER JOIN translations ON addons.name = translations.id
LEFT OUTER JOIN addons_users ON addons.id = addons_users.addon_id
LEFT OUTER JOIN users ON addons_users.user_id = users.id;
And it's run by the metrics user. This should move to a slave, as it's read-only.
The EXPLAIN shows that it is using an index for all tables except personas, on which it's doing a full table scan. This is appropriate, as this query has no filter and should in fact use all the rows in personas.
# Query 1: 0.00 QPS, 0.07x concurrency, ID 0x9E5E14F18DEFC48B at byte 5895087714
# This item is included in the report because it matches --limit.
# Scores: Apdex = 0.00 [1.0]*, V/M = 1.52
# Query_time sparkline: | ^|
# Time range: 2013-08-19 15:23:14 to 19:22:59
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 5
# Exec time 16 961s 167s 216s 192s 213s 17s 184s
# Lock time 0 733us 93us 285us 146us 273us 67us 119us
# Rows sent 3 1.74M 357.12k 357.15k 357.13k 345.04k 0 345.04k
# Rows examine 0 8.70M 1.74M 1.74M 1.74M 1.69M 0 1.69M
# Query size 0 3.10k 634 634 634 634 0 634
# String:
# Databases addons_mozilla_org
# Hosts
# Users metrics
| Assignee | ||
Comment 4•12 years ago
|
||
db1, query 2 by total query time, ran 303 times with a total of 581s running time.
Recommendation - can be done in real time since adding/removing indexes are non-blocking:
Eliminate duplicate index on name and add index on (name,status,addontype_id):
alter table addons_mozilla_org.addons add index(name,status,addontype_id), drop index addons_ibfk_2;
Note - even if you do not add the index, there is no reason for the index on name, since there is already a UNIQUE index on name.
----
the query itself:
SELECT DISTINCT `addons`.`id` FROM `addons` INNER JOIN `translations` ON (`addons`.`name` = `translations`.`id`) WHERE ("en-us"="en-us" AND NOT (`addons`.`status` = 11 ) AND `translations`.`localized_string` = 'SmartTemplate4' AND NOT (`addons`.`addontype_id` = 11 ))\G
| Assignee | ||
Comment 5•12 years ago
|
||
db1, query 3 ran almost a million times, total exec time 395s
# Time range: 2013-08-19 14:48:31 to 20:19:16
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 6 969939
# Exec time 6 395s 172us 12ms 407us 690us 178us 348us
# Lock time 17 144s 63us 3ms 148us 273us 66us 125us
# Rows sent 4 1.92M 0 512 2.07 6.98 5.62 0.99
# Rows examine 0 8.61M 0 2.20k 9.30 31.70 27.05 3.89
# Query size 25 1.25G 1.34k 4.34k 1.35k 1.33k 34.83 1.33k
This query is optimized and using appropriate keys. the max it ran for is 12 ms. Being called a million times in just under 6 hours seems like a lot, but if that's expected, there's nothing to do to optimize this query.
SELECT `applications_versions`.`id`, `applications_versions`.`application_id`, `applications_versions`.`version_id`, `applications_versions`.`min`, `applications_versions`.`max`, `applications`.`id`, `applications`.`created`, `applications`.`modified`, `applications`.`guid`, `applications`.`supported`, `versions`.`id`, `versions`.`created`, `versions`.`modified`, `versions`.`addon_id`, `versions`.`license_id`, `versions`.`releasenotes`, `versions`.`approvalnotes`, `versions`.`version`, `versions`.`version_int`, `versions`.`nomination`, `versions`.`reviewed`, `versions`.`has_info_request`, `versions`.`has_editor_comment`, `versions`.`deleted`, `versions`.`supported_locales`, `versions`.`_developer_name`, `appversions`.`id`, `appversions`.`created`, `appversions`.`modified`, `appversions`.`application_id`, `appversions`.`version`, `appversions`.`version_int`, T5.`id`, T5.`created`, T5.`modified`, T5.`application_id`, T5.`version`, T5.`version_int` FROM `applications_versions`
INNER JOIN `versions` ON (`applications_versions`.`version_id` = `versions`.`id`)
INNER JOIN `applications` ON (`applications_versions`.`application_id` = `applications`.`id`)
INNER JOIN `appversions` ON (`applications_versions`.`min` = `appversions`.`id`)
INNER JOIN `appversions` T5 ON (`applications_versions`.`max` = T5.`id`)
WHERE `applications_versions`.`version_id` IN (1452032, 127489, 1244163, 1445390, 1332737, 1467402, 1419790, 1341955, 1462290, 129043, 113178, 688475, 1467240, 38493, 1245232, 1369137, 119866, 129084, 1295943, 1295944/*... omitted 187 items ...*/)\G
| Assignee | ||
Comment 6•12 years ago
|
||
db1, query 4 has a similar profile as query 3 - ran half a million times, total execution time is 381 seconds, but that's due to the frequency of the query.
# Time range: 2013-08-19 14:48:32 to 20:19:16
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 3 499690
# Exec time 6 381s 479us 9ms 762us 1ms 240us 690us
# Lock time 9 79s 87us 4ms 157us 273us 52us 138us
# Rows sent 0 277.77k 0 1 0.57 0.99 0.49 0.99
# Rows examine 0 10.30M 1 3.40k 21.62 51.63 159.07 4.96
# Query size 14 750.47M 1.53k 1.54k 1.54k 1.53k 0.76 1.53k
It uses 8 tables with a join and a dependent subquery, but it uses primary keys (1 row max) for 6 of the 8 tables, and one table (the dependent subquery) only has 1 row examined, the other has 5, so that's why this is a speedy query.
No further optimization recommended.
| Assignee | ||
Comment 7•12 years ago
|
||
db1, query 5 is similar to queries 3 and 4 - almost a million calls, 308s total. No further optimization is recommended; this query uses appropriate indexes.
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 6 970046
# Exec time 5 308s 123us 7ms 317us 541us 151us 273us
# Lock time 13 113s 43us 6ms 116us 260us 66us 93us
# Rows sent 3 1.77M 0 390 1.92 5.75 4.68 0.99
# Rows examine 0 2.72M 0 770 2.94 11.95 9.32 0.99
# Query size 17 909.15M 972 3.95k 982.76 964.41 35.34 964.41
SELECT `files`.`id`, `files`.`created`, `files`.`modified`, `files`.`version_id`, `files`.`platform_id`, `files`.`filename`, `files`.`size`, `files`.`hash`, `files`.`codereview`, `files`.`jetpack_version`, `files`.`builder_version`, `files`.`status`, `files`.`datestatuschanged`, `files`.`no_restart`, `files`.`strict_compatibility`, `files`.`requires_chrome`, `files`.`reviewed`, `files`.`binary`, `files`.`binary_components`, `files`.`uses_flash`, `versions`.`id`, `versions`.`created`, `versions`.`modified`, `versions`.`addon_id`, `versions`.`license_id`, `versions`.`releasenotes`, `versions`.`approvalnotes`, `versions`.`version`, `versions`.`version_int`, `versions`.`nomination`, `versions`.`reviewed`, `versions`.`has_info_request`, `versions`.`has_editor_comment`, `versions`.`deleted`, `versions`.`supported_locales`, `versions`.`_developer_name` FROM `files` INNER JOIN `versions` ON (`files`.`version_id` = `versions`.`id`) WHERE `files`.`version_id` IN (100355, 1429508, 1300486, 1282058, 1237003, 61455, 1234966, 1445912, 1325082, 1325085, 22559, 72152, 1464326, 45095, 1437736, 1378354, 1282101, 53315, 20554, 127054/*... omitted 360 items ...*/)\G
| Assignee | ||
Comment 8•12 years ago
|
||
I'm going to skip the rest on db1 for now, because total execution time is not particularly useful. Most of these queries are good. I'm processing the files also by average time, and will use those figures once they're done.
I will check out the first few queries on db2 ordered by total time.
| Assignee | ||
Comment 9•12 years ago
|
||
recommendation: add 2 more slaves so you are n+1
db2, Query 1 - avg query time, 6 seconds.
This query is slow because it has to read rows from disk. When rows were cached, the query took about 1.5 seconds, when they were not, the query took 6 seconds. On a non-loaded server, the query took 1.7 seconds when the rows were not cached, and 0.7 seconds when rows were already in the cache.
It looks like Marketplace has 4 slaves, in contrast to the old addons' 6 slaves. With 6 slaves we were n+1, with 4 slaves, you are n-1 and so the slaves are slower than they ought to be....by 3.5 times.
# Time range: 2013-08-19 14:50:32 to 20:14:08
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 8913
# Exec time 55 57931s 787ms 25s 6s 15s 5s 6s
# Lock time 0 724ms 45us 425us 81us 98us 20us 76us
# Rows sent 0 8.70k 1 1 1 1 0 1
# Rows examine 38 5.61G 660.32k 660.37k 660.34k 650.62k 0 650.62k
# Query size 0 3.98M 464 470 468.34 463.90 0 463.90
SELECT COUNT(*) FROM `addons` INNER JOIN `versions` ON (`addons`.`current_version` = `versions`.`id`) WHERE ("ru"="ru" AND NOT (`addons`.`status` = 11 ) AND (`versions`.`id` IS NOT NULL AND `addons`.`inactive` = 0 AND `addons`.`status` IN (4)) AND `addons`.`addontype_id` = 9 AND NOT (`addons`.`id` IN (9449, 199424, 258053, 261458, 263347, 316928, 324462, 326593, 328076, 337547, 339842, 346226, 354399, 361313, 361551, 365177, 366659, 371268, 429014, 450340)))\G
KEY `visible_idx` (`addontype_id`,`status`,`inactive`,`current_version`),
| Assignee | ||
Comment 10•12 years ago
|
||
db2, query 2 - has a *wide* variation in times it takes to run - min 263 us, avg 817 ms, median 53ms, max 27s. This is likely due to the wide variance in rows examined in the query. It does use appropriate indexes, but trying the query on an unloaded server vs. a loaded one is very different, even when the information is in the cache.
cache times are 2.5 - 6 seconds on a loaded server, and 1-1.5 seconds on an unloaded server. When information isn't cached, from my tests on a loaded server, the query time was between 1.5 and 2 minutes, and an unloaded server had query times between 30 seconds and 1.5 minutes - these were from my own tests.
So again, I'd recommend having more slaves to offload the servers.
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 9899
# Exec time 7 8088s 263us 27s 817ms 4s 2s 53ms
# Lock time 0 881ms 52us 608us 88us 108us 24us 84us
# Rows sent 0 94.35k 1 15 9.76 14.52 5.35 11.95
# Rows examine 6 1019.78M 3 804.51k 105.49k 399.43k 157.24k 4.07k
# Query size 0 5.15M 535 566 545.76 537.02 6.40 537.02
SELECT `addons`.`id` FROM `addons` USE INDEX (`downloads_type_idx`) INNER JOIN `versions` ON (`addons`.`current_version` = `versions`.`id`) INNER JOIN `addons_collections` ON (`addons`.`id` = `addons_collections`.`addon_id`) WHERE ("ga-ie"="ga-ie" AND NOT (`addons`.`status` = 11 ) AND (`versions`.`id` IS NOT NULL AND `addons`.`inactive` = 0 AND `addons`.`status` IN (1, 3, 4, 6, 7, 8, 9, 10)) AND "ga-ie"="ga-ie" AND NOT (`addons`.`status` = 11 ) AND `addons_collections`.`collection_id` = 5196082 ) ORDER BY `addons`.`weeklydownloads` DESC LIMIT 6\G
| Assignee | ||
Comment 11•12 years ago
|
||
Recommendation: see if this query can be run less frequently. It currently runs about once every 23 seconds.
db2, query 3 by total time - the query ran fewer than 1000 times, with an average time of 8 seconds (min time of 2 seconds).
# Time range: 2013-08-19 14:50:57 to 19:54:39
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 747
# Exec time 6 6262s 2s 42s 8s 26s 8s 5s
# Lock time 0 51ms 41us 1ms 68us 84us 53us 63us
# Rows sent 0 747 1 1 1 1 0 1
# Rows examine 7 1.14G 1.56M 1.56M 1.56M 1.53M 0 1.53M
# Query size 0 181.15k 246 253 248.32 246.02 3.50 246.02
SELECT COUNT(*) FROM `collections` LEFT OUTER JOIN `applications` ON (`collections`.`application_id` = `applications`.`id`) WHERE ("pt-pt"="pt-pt" AND `collections`.`listed` = 1 AND (`collections`.`application_id` = 1 OR `applications`.`id` IS NULL))\G
The EXPLAIN uses the primary key on the applications table, but the "listed" key for the collections table. The "listed" key is used, which is appropriate because it's a clustered index so it has application_id within it. Unfortunately, it only lessens the filter to about half the table, because the split is about even:
MariaDB [addons_mozilla_org]> select count(*),listed from collections group by listed;
+----------+--------+
| count(*) | listed |
+----------+--------+
| 1558490 | 0 |
| 1519169 | 1 |
+----------+--------+
2 rows in set (1.10 sec)
Unfortunately this query is as good as it is going to get, without major changes (like, say, making a collections_listed and a collections_unlisted table and changing all the code to match that, and then this query can select from the appropriate table).
This query is looking for collections where the application_id is either NULL or 1, so it's looking for a special kind of application, very likely. Perhaps this query could be run once every 5 minutes and cached, and then a query once every 23 seconds could be run against the cache (whether the cache is a different db table, or in memcached or something?).
| Assignee | ||
Comment 12•12 years ago
|
||
db2, query 4: recommendation is again to not have so many queries on this server.
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 6157
# Exec time 3 3501s 4ms 18s 569ms 2s 982ms 230ms
# Lock time 0 697ms 71us 530us 113us 138us 27us 108us
# Rows sent 0 19.94k 1 30 3.32 20.43 6.10 0.99
# Rows examine 18 2.69G 492 1.39M 457.76k 1.03M 270.64k 345.04k
# Query size 0 4.22M 708 741 719.15 719.66 14.69 685.39
This takes between 1-3 seconds in my tests, depending on load, even when it's cached, but it's using appropriate indexes. Check out the rows examined, sometimes it's looking at a ton of rows.
| Assignee | ||
Comment 13•12 years ago
|
||
I'm now looking at queries sorted by average time (pt-query-digest cannot do this, but mysqldumpslow can).
The first query on db1 is the same as in comment 3.
db1, query 2 based on average time is an UPDATE with no WHERE clause:
UPDATE collections SET weekly_subscribers = N, monthly_subscribers = N
It was run once, with an average time of 34 seconds.
# User@Host: marketplace_prod[marketplace_prod] @ [10.32.126.20]
# Thread_id: 5024886 Schema: addons_mozilla_org QC_hit: No
# Query_time: 34.144670 Lock_time: 0.000092 Rows_sent: 0 Rows_examined: 3075083
SET timestamp=1376926240;
UPDATE collections SET weekly_subscribers = 0, monthly_subscribers = 0;
Assuming this "reset" is appropriate, it has to update 3 million rows and does not use any filters so no indexes are appropriate...so I think a 30-second query time is acceptable.
| Assignee | ||
Comment 14•12 years ago
|
||
db1, query 3 based on average time is another query that ran only once.
After testing, it looks like changing the addons_id index to a composite index, we can get at least 2x performance:
alter table stats_share_counts drop index addon_id, add index(addon_id,service,count);
Here's the query:
# Thread_id: 5044985 Schema: addons_mozilla_org QC_hit: No
# Query_time: 11.784338 Lock_time: 0.000131 Rows_sent: 0 Rows_examined: 739233
SET timestamp=1376933417;
REPLACE INTO stats_share_counts_totals (addon_id, service, count)
(SELECT addon_id, service, SUM(count)
FROM stats_share_counts
RIGHT JOIN addons ON addon_id = addons.id
WHERE service IN ('digg','facebook','delicious','myspace','friendfeed','twitter')
GROUP BY addon_id, service);
| Assignee | ||
Comment 15•12 years ago
|
||
db1, query 4 was the same as comment 11.
| Assignee | ||
Comment 16•12 years ago
|
||
db1, query 5 is similar to comment 11 and comment 15. It only ran twice, so I'm not sure if it's worth caching, and the avg runtime was 2.54 seconds.
Count: 2 Time=2.54s (5s) Lock=0.00s (0s) Rows=20.0 (40), marketplace_prod[mar
ketplace_prod]@[10.32.126.20]
SELECT `collections`.`id` FROM `collections` LEFT OUTER JOIN `applications` ON
(`collections`.`application_id` = `applications`.`id`) WHERE ("S"="S" AND `coll
ections`.`listed` = N AND (`collections`.`application_id` = N OR `applications
`.`id` IS NULL)) ORDER BY `collections`.`modified` DESC LIMIT N;
| Assignee | ||
Comment 17•12 years ago
|
||
Went to services meeting yesterday and presented the grand finding of the databases being overloaded. Jason Thomas said he'd work with Services folks to order more hardware. 4 slaves instead of 6 were chosen because the Marketplace hardware is bigger/better than what Addons had.
| Assignee | ||
Comment 18•12 years ago
|
||
Resolving; the August goal of optimizing the addons queries has been met, and the resulting actions are in the child bugs.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•