Closed
Bug 775133
Opened 13 years ago
Closed 13 years ago
Look at developer_mozilla_org queries
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: scabral)
Details
(Whiteboard: [due jul 22nd])
make sure the top 10 longest queries are optimized and make sure the top 10 most frequent queries are optimized. Deadline: before Sunday Jun 22nd.
| Assignee | ||
Comment 1•13 years ago
|
||
In my homedir on boris I have the 4G of logfiles parsed and ready to be looked at.
Assignee: server-ops-database → scabral
| Assignee | ||
Comment 2•13 years ago
|
||
long_query_time has been set back to 2 seconds on developer1 and developer2
| Assignee | ||
Comment 3•13 years ago
|
||
Late to the party, the deadline has passed, but I took a look at the digests, and there is a noticeable query that makes up 89% of the queries on developer1...here's the details.
This chart shows there were 23,481 calls, 89.0% of queries, with a total response time of over 227,271 seconds.
# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M Ite
# ==== ================== ================= ======= ======= ==== ===== ===
# 1 0x85A9AF617FB985E0 227271.6719 89.0% 23481 9.6790 0.11 2.78 SELECT pages restrictions
This chart is far more interesting - notice the "Exec time" line, and how the average is 10 seconds, median is 9 seconds. Notice also the "rows examined" line, which has an average of 98,400 rows examined.
# Query 1: 0.00 QPS, 0.04x concurrency, ID 0x85A9AF617FB985E0 at byte 48530463
# This item is included in the report because it matches --limit.
# Scores: Apdex = 0.11 [1.0], V/M = 2.78
# Query_time sparkline: | ^_|
# Time range: 2012-05-08 13:33:50 to 2012-07-19 07:39:56
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 0 23481
# Exec time 89 227272s 0 33s 10s 18s 5s 9s
# Lock time 47 463s 96us 9s 20ms 348us 223ms 194us
# Rows sent 9 13.10M 4 2.71k 585.12 1.78k 684.98 271.23
# Rows examine 76 2.20G 26.07k 106.56k 98.40k 101.89k 5.13k 97.04k
# Rows affecte 0 0 0 0 0 0 0 0
# Rows read 9 13.10M 4 2.71k 585.12 1.78k 684.98 271.23
# Bytes sent 0 897.28M 655 178.95k 39.13k 117.95k 44.58k 15.96k
# Tmp tables 0 1 0 1 0.00 0 0.01 0
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 0 0 0 0 0 0 0 0
# Query size 1 17.46M 758 829 779.72 793.42 8.29 755.64
# String:
# Databases wiki_mdc_deki
Here's a sample query:
SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags,
(SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children
FROM pages
LEFT JOIN restrictions ON page_restriction_id = restriction_id
WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117) OR (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G
And the EXPLAIN. Note here that the pages table doesn't use an index, that is, does a full table scan:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: pages
type: ALL
possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp
key: NULL
key_len: NULL
ref: NULL
rows: 97340
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: restrictions
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: wiki_mdc_deki.pages.page_restriction_id
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: C
type: ref
possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp
key: page_parent
key_len: 4
ref: wiki_mdc_deki.pages.page_id
rows: 39
Extra: Using where
As the number of pages increases, this query is going to get much slower.
| Assignee | ||
Comment 4•13 years ago
|
||
I'm not sure why this isn't in the same WHERE clause:
pages.page_id = 35 OR pages.page_id IN (26415,10114,117)
Why not just
pages.page_id IN (35,26415,10114,117) ?
That doesn't affect the query optimization, but it's something to note.
################################################################################
In this case, I'd recommend splitting it up into 2 queries. There's an index on the "parent" field (called page_parent) that is used if you do only the non page_id filters. The query can be split into 2 for better performance:
SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags,
(SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children
FROM pages
LEFT JOIN restrictions ON page_restriction_id = restriction_id
WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117);
and
SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags,
(SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children
FROM pages
LEFT JOIN restrictions ON page_restriction_id = restriction_id
WHERE (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G
################################################################################
Note that before the WHERE clause, both queries are the same. If you want it to still be one query, you can do a UNION, which removes duplicates by default:
SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags,
(SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children
FROM pages
LEFT JOIN restrictions ON page_restriction_id = restriction_id
WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117);
UNION
SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags,
(SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children
FROM pages
LEFT JOIN restrictions ON page_restriction_id = restriction_id
WHERE (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G
################################################################################
| Assignee | ||
Comment 5•13 years ago
|
||
Here's an EXPLAIN of the UNION, with no full table scans any more!
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: pages
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 4
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: restrictions
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: wiki_mdc_deki.pages.page_restriction_id
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: C
type: ref
possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp
key: page_parent
key_len: 4
ref: wiki_mdc_deki.pages.page_id
rows: 39
Extra: Using where
*************************** 4. row ***************************
id: 3
select_type: UNION
table: pages
type: range
possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp
key: page_parent
key_len: 4
ref: NULL
rows: 38
Extra: Using where
*************************** 5. row ***************************
id: 3
select_type: UNION
table: restrictions
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: wiki_mdc_deki.pages.page_restriction_id
rows: 1
Extra:
*************************** 6. row ***************************
id: 4
select_type: DEPENDENT SUBQUERY
table: C
type: ref
possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp
key: page_parent
key_len: 4
ref: wiki_mdc_deki.pages.page_id
rows: 39
Extra: Using where
*************************** 7. row ***************************
id: NULL
select_type: UNION RESULT
table: <union1,3>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
7 rows in set (0.00 sec)
And I'll note there's still the dependent subquery, which isn't the best, but once we're not doing a full table scan on the outer pages table, a dependent subquery on such a small set is less of an intrusion.
| Assignee | ||
Updated•13 years ago
|
Whiteboard: [due jul 22nd]
| Assignee | ||
Comment 6•13 years ago
|
||
10:50:26] <@lorchard> sheeri: Actually, that query on wiki_mdc_deki belongs to MindTouch and is or should be no longer very relevant
10:50:34] <sheeri> lorchard w00t!
[10:50:38] <sheeri> then I'll close the ticket.
[10:51:02] <sheeri> It's still a good idea to check queries regularly, but until the DB team is somewhere past half-staffed, that's just not something we can get to.
[10:51:09] <@lorchard> I imagine we *might* see some queries against that DB if we ever need to remigrate some pages or etc, but it shouldn't be a part of daily traffic
[10:51:18] <@lorchard> Yeah, fair enough
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•