Closed Bug 775133 Opened 13 years ago Closed 13 years ago

Look at developer_mozilla_org queries

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

Details

(Whiteboard: [due jul 22nd])

make sure the top 10 longest queries are optimized and make sure the top 10 most frequent queries are optimized. Deadline: before Sunday Jun 22nd.
In my homedir on boris I have the 4G of logfiles parsed and ready to be looked at.
Assignee: server-ops-database → scabral
long_query_time has been set back to 2 seconds on developer1 and developer2
Late to the party, the deadline has passed, but I took a look at the digests, and there is a noticeable query that makes up 89% of the queries on developer1...here's the details. This chart shows there were 23,481 calls, 89.0% of queries, with a total response time of over 227,271 seconds. # Profile # Rank Query ID Response time Calls R/Call Apdx V/M Ite # ==== ================== ================= ======= ======= ==== ===== === # 1 0x85A9AF617FB985E0 227271.6719 89.0% 23481 9.6790 0.11 2.78 SELECT pages restrictions This chart is far more interesting - notice the "Exec time" line, and how the average is 10 seconds, median is 9 seconds. Notice also the "rows examined" line, which has an average of 98,400 rows examined. # Query 1: 0.00 QPS, 0.04x concurrency, ID 0x85A9AF617FB985E0 at byte 48530463 # This item is included in the report because it matches --limit. # Scores: Apdex = 0.11 [1.0], V/M = 2.78 # Query_time sparkline: | ^_| # Time range: 2012-05-08 13:33:50 to 2012-07-19 07:39:56 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 0 23481 # Exec time 89 227272s 0 33s 10s 18s 5s 9s # Lock time 47 463s 96us 9s 20ms 348us 223ms 194us # Rows sent 9 13.10M 4 2.71k 585.12 1.78k 684.98 271.23 # Rows examine 76 2.20G 26.07k 106.56k 98.40k 101.89k 5.13k 97.04k # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 9 13.10M 4 2.71k 585.12 1.78k 684.98 271.23 # Bytes sent 0 897.28M 655 178.95k 39.13k 117.95k 44.58k 15.96k # Tmp tables 0 1 0 1 0.00 0 0.01 0 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 0 0 0 0 0 0 0 0 # Query size 1 17.46M 758 829 779.72 793.42 8.29 755.64 # String: # Databases wiki_mdc_deki Here's a sample query: SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags, (SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children FROM pages LEFT JOIN restrictions ON page_restriction_id = restriction_id WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117) OR (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G And the EXPLAIN. Note here that the pages table doesn't use an index, that is, does a full table scan: *************************** 1. row *************************** id: 1 select_type: PRIMARY table: pages type: ALL possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp key: NULL key_len: NULL ref: NULL rows: 97340 Extra: Using where *************************** 2. row *************************** id: 1 select_type: PRIMARY table: restrictions type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: wiki_mdc_deki.pages.page_restriction_id rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: C type: ref possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp key: page_parent key_len: 4 ref: wiki_mdc_deki.pages.page_id rows: 39 Extra: Using where As the number of pages increases, this query is going to get much slower.
I'm not sure why this isn't in the same WHERE clause: pages.page_id = 35 OR pages.page_id IN (26415,10114,117) Why not just pages.page_id IN (35,26415,10114,117) ? That doesn't affect the query optimization, but it's something to note. ################################################################################ In this case, I'd recommend splitting it up into 2 queries. There's an index on the "parent" field (called page_parent) that is used if you do only the non page_id filters. The query can be split into 2 for better performance: SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags, (SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children FROM pages LEFT JOIN restrictions ON page_restriction_id = restriction_id WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117); and SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags, (SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children FROM pages LEFT JOIN restrictions ON page_restriction_id = restriction_id WHERE (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G ################################################################################ Note that before the WHERE clause, both queries are the same. If you want it to still be one query, you can do a UNION, which removes duplicates by default: SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags, (SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children FROM pages LEFT JOIN restrictions ON page_restriction_id = restriction_id WHERE pages.page_id = 35 OR pages.page_id IN (26415,10114,117); UNION SELECT page_id, page_namespace, page_title, page_display_name, pages.page_parent as page_parent, restriction_perm_flags, (SELECT count(C.page_id) FROM pages C WHERE C.page_parent = pages.page_id AND C.page_id != 35 AND C.page_parent != 35 AND C.page_namespace = 0 AND C.page_is_redirect = 0 GROUP BY C.page_parent) AS page_children FROM pages LEFT JOIN restrictions ON page_restriction_id = restriction_id WHERE (pages.page_is_redirect = 0 AND pages.page_parent = 26415 AND pages.page_namespace = 0) OR (pages.page_is_redirect = 0 AND pages.page_parent = 10114 AND pages.page_id != 26415 AND pages.page_namespace = 0)\G ################################################################################
Here's an EXPLAIN of the UNION, with no full table scans any more! *************************** 1. row *************************** id: 1 select_type: PRIMARY table: pages type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 4 Extra: Using where *************************** 2. row *************************** id: 1 select_type: PRIMARY table: restrictions type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: wiki_mdc_deki.pages.page_restriction_id rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: C type: ref possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp key: page_parent key_len: 4 ref: wiki_mdc_deki.pages.page_id rows: 39 Extra: Using where *************************** 4. row *************************** id: 3 select_type: UNION table: pages type: range possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp key: page_parent key_len: 4 ref: NULL rows: 38 Extra: Using where *************************** 5. row *************************** id: 3 select_type: UNION table: restrictions type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: wiki_mdc_deki.pages.page_restriction_id rows: 1 Extra: *************************** 6. row *************************** id: 4 select_type: DEPENDENT SUBQUERY table: C type: ref possible_keys: PRIMARY,name_title,page_parent,namespace_redirect_timestamp key: page_parent key_len: 4 ref: wiki_mdc_deki.pages.page_id rows: 39 Extra: Using where *************************** 7. row *************************** id: NULL select_type: UNION RESULT table: <union1,3> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 7 rows in set (0.00 sec) And I'll note there's still the dependent subquery, which isn't the best, but once we're not doing a full table scan on the outer pages table, a dependent subquery on such a small set is less of an intrusion.
Whiteboard: [due jul 22nd]
10:50:26] <@lorchard> sheeri: Actually, that query on wiki_mdc_deki belongs to MindTouch and is or should be no longer very relevant 10:50:34] <sheeri> lorchard w00t! [10:50:38] <sheeri> then I'll close the ticket. [10:51:02] <sheeri> It's still a good idea to check queries regularly, but until the DB team is somewhere past half-staffed, that's just not something we can get to. [10:51:09] <@lorchard> I imagine we *might* see some queries against that DB if we ever need to remigrate some pages or etc, but it shouldn't be a part of daily traffic [10:51:18] <@lorchard> Yeah, fair enough
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.