Last Comment Bug 783545 - Metrics for defining a q4 goal optimizing bugzilla queries
: Metrics for defining a q4 goal optimizing bugzilla queries
Status: RESOLVED INCOMPLETE
[2012q3]
:
Product: Data & BI Services Team
Classification: Other
Component: DB: MySQL (show other bugs)
: other
: x86 Mac OS X
: P2 major
: ---
Assigned To: Gervase Markham [:gerv]
: Corey Shields [:cshields]
Mentors:
Depends on: 775663
Blocks: 732354
  Show dependency treegraph
 
Reported: 2012-08-17 06:49 PDT by Sheeri Cabral [:sheeri]
Modified: 2014-10-17 12:46 PDT (History)
16 users (show)
See Also:
Due Date:
Mozilla Project: ---
QA Whiteboard:
Iteration: ---
Points: ---
Cab Review: ServiceNow Change Request (use flag)


Attachments

Description Sheeri Cabral [:sheeri] 2012-08-17 06:49:09 PDT
We will have a measurable, definable q4 goal for optimizing bugzilla queries. For that, we need some metrics, which can be gotten from logs.

We'll need to turn slow query logs to log all queries (that finish) that are >0 seconds, for a certain amount of time (at least one day). This should be done on all machines in the active cluster (currently, 1 master, 3 slaves).

From these, we should be able to find in a q4 goal:
- slowest queries, by total/average time
  - these we should optimize

- most frequent queries
  - these we should see why there are so many of them
  - can any of these be cached? query_cache_type = ON DEMAND in MySQL, or semi-dynamic data?

- standard deviation of query times
  - Find out how many of our queries finish in certain key times (or key % of queries and how soon they finish - e.g. 57.6% of our queries finish in 4 seconds or less, or 50% of our queries finish in 5.275 seconds or less)
  - From this, we can optimize the most frequently used slow queries

We should also turn on the general log for a day or so, and cross-reference the killed bugs (process ID's in log files), to see if we can avoid long-running queries somehow (optimization or not allowing certain queries).
Comment 1 Gervase Markham [:gerv] 2012-08-18 01:33:51 PDT
Per your IRC message: yes, this is a fine summary of the task :-) I look forward to seeing the data.

Gerv
Comment 2 Frédéric Buclin 2012-08-19 05:05:27 PDT
How is this bug different from bug 732354?
Comment 3 Gervase Markham [:gerv] 2012-08-20 03:14:24 PDT
This bug is the result of a conversation between me and sheeri about how exactly to proceed and what analytics need to be done. You can either mark them as duplicate, or mark that one as dependent on this one.

Gerv
Comment 4 Sheeri Cabral [:sheeri] 2012-08-22 10:01:40 PDT
So, since tp-bugs01-master01 is still MySQL 5.0, the minimum long_query_time (for logging to the slow query log file) is 1 second.

Bug 731665 is for the upgrade, and failing over from tp-bugs01-master01 as the master is in bug 775663. We should probably do the failover sooner rather than later, I can pick a slow time in the evening or weekend to get this done.
Comment 5 Sheeri Cabral [:sheeri] 2012-08-22 11:35:42 PDT
Midnight Eastern US time tonight has been proposed and accepted by myself, Byron and David.
Comment 6 Sheeri Cabral [:sheeri] 2012-08-23 15:47:52 PDT
tp-bugs01-master01 is now a slave, and has been upgraded to Percona 5.1.
Comment 7 Gervase Markham [:gerv] 2012-08-24 05:44:13 PDT
OK. So the next step is to turn logging on for a day (probably a week day, to get a "normal" sample) and then run the logs through an analysing program?

Do you have a particular day in mind?

Gerv
Comment 8 Sheeri Cabral [:sheeri] 2012-08-24 10:48:24 PDT
I could turn it on first thing Monday morning, how's that?
Comment 9 Gervase Markham [:gerv] 2012-08-27 01:42:42 PDT
Sounds good to me :-)

Gerv
Comment 10 Sheeri Cabral [:sheeri] 2012-08-27 06:56:47 PDT
It's Monday morning, and I'll set the slow query logging down to 0 and turn general logging on for the bugzilla production servers.
Comment 11 Sheeri Cabral [:sheeri] 2012-08-27 07:07:01 PDT
All 4 production bugzilla machines have the general log on and slow query logging on and set to log queries over 0 seconds.
Comment 12 Sheeri Cabral [:sheeri] 2012-08-28 13:03:27 PDT
general log has been turned off, slow query logging turned back to logging queries over 2 seconds, and the logs have been copied to /root/logs on each machine.

Next step: process the logs to get some metrics.
Comment 13 Sheeri Cabral [:sheeri] 2012-09-04 13:04:39 PDT
Running this on tp-bugs01-slave01 now to get the slowest queries by total time:

[root@tp-bugs01-slave01 logs]# time pt-query-digest --create-review-table --limit 100% --order-by Query_time:sum --sample 1 --user checksum --password **ELIDED** --type slowlog --review h=localhost,D=percona,t=query_review  mysql-slow.log  > review_slow_log1.log

(this will take about 2-3 hours, pt-query-digest is good about not using too many resources for this).

The most frequent queries should parse *each* general log using:

time pt-query-digest --create-review-table --limit 100% --order-by Query_time:cnt --sample 1 --user checksum --password **ELIDED** --type genlog --review h=localhost,D=percona,t=query_review  mysql-slow.log  > review_genlogX.log

(we can also use the slow logs and --order-by Query_time:cnt to see the most frequent queries that finished, without being killed by a cron script).
Comment 14 Sheeri Cabral [:sheeri] 2012-09-11 09:55:41 PDT
processing the general logs with this script:

mkdir processed logs
for j in `ls tp-bugs01*.log.gz` ; do echo $j; gunzip $j; for i in `ls tp-bugs01*.log` ; do echo $i; time pt-query-digest --create-review-table --limit 100% --order-by Query_time:cnt --sample 1 --user checksum --password ******* --type genlog --review h=localhost,D=percona,t=query_review  $i > processed/$i.processed.log && gzip $i && mv $j logs/; done;  done

Processed general logs on tp-bugs01-slave01, and running this on tp-bugs01-master01 now.
Comment 15 Sheeri Cabral [:sheeri] 2012-09-12 09:32:46 PDT
general logs are processed on all but tp-bugs01-slave03, which is running now. I'm working on processing the slow query logs on the machines other than slave01 now.

time pt-query-digest --create-review-table --limit 100% --order-by Query_time:sum --sample 1 --user checksum --password **ELIDED** --type slowlog --review h=localhost,D=percona,t=query_review  mysql-slow.log  > review_slow_log1.log

Running this on the slow logs for master01, slave02 and slave03.
Comment 16 Dustin J. Mitchell [:dustin] 2012-09-28 12:13:05 PDT
http://code.google.com/p/mysql-slow-query-log-visualizer/
Comment 17 Sheeri Cabral [:sheeri] 2012-09-28 12:23:50 PDT
Dustin's going to take a look at http://www.neurofuzzy.net/2011/12/24/mysql-slow-query-log-visualizer/ to get a pretty graph so we can easily see some peaks to flatten out.
Comment 18 Dustin J. Mitchell [:dustin] 2012-09-28 13:22:04 PDT
That app definitely isn't suited to looking at 12G of data.  It's designed for slow query logs, not general query logs.

I'm moving the data to a single host, where I'll run pt-query-digest with various parameters until I find a nice short list.
Comment 19 Dustin J. Mitchell [:dustin] 2012-09-28 14:27:32 PDT
General logs have no timing information, so it will be difficult to correlate those with the slow-query logs.  Also, if we do a gazillion of a query that's so fast it doesn't show up in the slow-query logs (time is measured as 0), then .. does that tree make a sound?

At any rate, I think that there are two objectives here, and each can be answered by a suitable invocation of pt-query-digest run over the appropriate logs.

For query cachability:

  "Examine the most frequent queries, grouped by fingerprint. Optimize the frequency of the top 10% of all queries down by 50%".

Phrased like this, I'm not sure it's worth it?  That would only mean a 5% reduction in number of queries, and except where those overlapped with the slow queries work, that wouldn't even be a 5% reduction in total load.


For slow queries:

  "Examine queries, grouped by fingerprint, and ordered by total time consumed during the log capture.  Optimize these queries to save an estimated 10% of total query time."

Note that this does not imply which queries get optimized - if it's more fruitful to optimize lots of less-time-consuming queries than to tackle the worst offenders, so be it.  It does require a bit of modeling to calculate total query time and the effects of optimizations of a single query.  That might be a bit complex, but I think some reasonable approximations can make it tractable.
Comment 20 Dustin J. Mitchell [:dustin] 2012-09-28 14:48:57 PDT
I've got all of the data on backup1.db.phx1, in ~root/bug783545/{genlogs,slowlogs}.

For query cachability:

> pt-query-digest --group-by fingerprint --fingerprint  --limit 20 --type genlog genlogs/*.log

On a test subset of the logs, that gave me

# 1.1s user time, 10ms system time, 22.46M rss, 192.59M vsz
# Current date: Fri Sep 28 14:33:37 2012
# Hostname: backup1.db.phx1.mozilla.com
# Files: test.log
# Overall: 4.09k total, 109 unique, 510.88 QPS, 0x concurrency ___________
# Time range: 2012-08-27 11:59:23 to 11:59:31
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time              0       0       0       0       0       0       0
# Query size         2.08M       5   2.83k  532.42   2.76k  858.54  234.30

# Profile
# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0xFFEBB4E4D3CD6CD6  0.0000  0.0%   478 0.0000 1.00  0.00 SELECT bugs
#    2 0x14D21F7050EC2C07  0.0000  0.0%   295 0.0000 1.00  0.00 SELECT profiles
#    3 0x888F0EC503C28937  0.0000  0.0%   239 0.0000 1.00  0.00 SELECT bugs cc bug_group_map
#    4 0x086E25DC08D8F612  0.0000  0.0%   228 0.0000 1.00  0.00 SELECT fielddefs
#    5 0xDE5B4ABC5FB01A9B  0.0000  0.0%   186 0.0000 1.00  0.00 SELECT attachments
#    6 0xFFD1911EFCDB1CD2  0.0000  0.0%   108 0.0000 1.00  0.00 SELECT flags
#    7 0x2A11171E7A191FE4  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT products
#    8 0x95E43EB346990335  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT products
#    9 0xDA7B97356061A8BB  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT keywords
#   10 0x08CCA1F951DE9BF1  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT components
#   11 0xB4ACC8D26EC2F4F0  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT bug_see_also
#   12 0xA2743C3B622960FA  0.0000  0.0%   103 0.0000 1.00  0.00 SELECT classifications
#   13 0x5B7C4DB42B70EA61  0.0000  0.0%   102 0.0000 1.00  0.00 SELECT cc profiles
#   14 0x19F283636D8C1E6D  0.0000  0.0%   102 0.0000 1.00  0.00 SELECT dependencies bugs bug_status
#   15 0x33D8A2A3D2872A06  0.0000  0.0%   102 0.0000 1.00  0.00 SELECT dependencies bugs bug_status
#   16 0x7CCC6FD35D6EF395  0.0000  0.0%   100 0.0000 1.00  0.00 SELECT attachments
#   17 0xB3497C9D653738DC  0.0000  0.0%   100 0.0000 1.00  0.00 SELECT bug_cf_locale
#   18 0x1C7815E867A52497  0.0000  0.0%   100 0.0000 1.00  0.00 SELECT flags
#   19 0x2C3ACFED3031A047  0.0000  0.0%    99 0.0000 1.00  0.00 SELECT groups bug_group_map group_control_map
#   20 0xC195F27406BA79BD  0.0000  0.0%    98 0.0000 1.00  0.00 SELECT groups
# MISC 0xMISC              0.0000  0.0%  1132 0.0000   NS   0.0 <89 ITEMS>

where the highest 10% is, in fact, captured by the first query (total count is in the "Overall" line, and 0xFFEBB4E4D3CD6CD6 is responsible for 478 calls).  We'll see if that's the case for the whole corpus - it may be necessary to adjust the --limit.  Also, note that if we could reduce 0x14D21F7050EC2C07 and 0x888F0EC503C28937 by 50% but leave 0xFFEBB4E4D3CD6CD6 alone, we'd have achieved the same goal (5% reduction in total queries executed).

As for slow queries:

> pt-query-digest --fingerprint --group-by fingerprint --order-by Query_time:sum --limit 10% --type slowlog slowlogs/*.log

seems a decent place to start.

# 12s user time, 50ms system time, 32.86M rss, 202.98M vsz
# Current date: Fri Sep 28 14:43:40 2012
# Hostname: backup1.db.phx1.mozilla.com
# Files: STDIN
# Overall: 33.88k total, 191 unique, 222.88 QPS, 0.78x concurrency _______
# Time range: 2012-08-28 04:00:02 to 04:02:34
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time           118s     4us     11s     3ms     1ms   125ms   167us
# Lock time             2s       0     1ms    72us   204us    52us    54us
# Rows sent        603.17k       0   2.90k   18.23  124.25  114.81    0.99
# Rows examine      28.74M       0   3.76M  889.59  246.02  34.34k    0.99
# Rows affecte           0       0       0       0       0       0       0
# Rows read          1.41M       0 767.43k   43.70  124.25   4.10k    0.99
# Bytes sent       148.27M       0 589.91k   4.48k  10.80k  32.38k  793.42
# Tmp tables         5.02k       0       5    0.15    0.99    0.37       0
# Tmp disk tbl         208       0       2    0.01       0    0.08       0
# Tmp tbl size       5.25G       0   7.95M 162.35k   1.03M 445.93k       0
# Query size        20.06M       6  17.20k  620.87   2.76k  946.83  234.30

# Profile
# Rank Query ID           Response time Calls R/Call  Apdx V/M   Item
# ==== ================== ============= ===== ======= ==== ===== =========
#    1 0xF76B44FAAC8DBF8B 57.6702 48.9%    12  4.8058 0.00  0.01 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
#    2 0x6EBEBB46B650A9DF 10.5381  8.9%     1 10.5381 0.00  0.00 SELECT cc profiles bugs
#    3 0xFDA4566562CD18C8  9.5970  8.1%     1  9.5970 0.00  0.00 SELECT bugs profiles products components cc profiles longdescs profiles bug_status priority bug_group_map cc profiles
#    4 0x9FF3278C98B77093  4.3107  3.7%     1  4.3107 0.00  0.00 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
#    5 0x76FE71F870A6D9AD  4.0483  3.4%     1  4.0483 0.50  0.00 SELECT bugs profiles products components cc profiles longdescs profiles bug_group_map cc profiles
#    6 0xBADB96D7F151AF06  3.2592  2.8%     1  3.2592 0.50  0.00 SELECT bugs profiles products components
#    7 0x7D7AB96FF2A558DD  2.9113  2.5%     2  1.4557 0.50  0.00 SELECT bugs profiles products bug_group_map cc profiles
#    8 0x63322F08414EA78E  2.7629  2.3%     1  2.7629 0.50  0.00 SELECT bugs profiles products components cc profiles bug_group_map cc profiles
#    9 0xFFEBB4E4D3CD6CD6  2.7211  2.3%  4829  0.0006 1.00  0.00 SELECT bugs
#   10 0x03E78A71DB839F5F  2.3793  2.0%     1  2.3793 0.50  0.00 SELECT attachments profiles bugs
# MISC 0xMISC             17.7249 15.0% 29027  0.0006   NS   0.0 <181 ITEMS>

Those ten fingerprints represent 85% of the total time spent in the 33k queries in this corpus, making them a pretty good place to start hacking away.

I've put these two commands into a Makefile in ~root/bug783545 and left them running in a screen session.  I'll check back tomorrow.  Note that the full reports contain lots of potentially sensitive information (each Query ID links to a pretty detailed report including a sample query replete with strings), so we'll need to treat it with due caution.  I won't post them here, anyway!
Comment 21 Dustin J. Mitchell [:dustin] 2012-09-28 15:26:32 PDT
I'm moving the data to backup2, since backup1 could only *just* barely hold the uncompressed logs (125G).  Yeah, 125G of logs.  This is going to run over the weekend.
Comment 22 Sheeri Cabral [:sheeri] 2012-09-28 16:28:31 PDT
For Melissa - I'd call this bug 90% done, we just have to look at the graphs on Monday and come up with final numbers.
Comment 23 Dustin J. Mitchell [:dustin] 2012-09-29 15:01:17 PDT
Well, there won't be any graphs, but yeah.
Comment 24 Dustin J. Mitchell [:dustin] 2012-09-30 08:19:38 PDT
- It took 831 minutes to do the genlog processing.
- Slowlog processing is about half-finished, and should be done in 6-8 hours
Comment 25 Dustin J. Mitchell [:dustin] 2012-10-01 05:42:15 PDT
373 minutes for the slowlog processing.  Here are the summaries (no sensitive info here).  The actual reports are in ~/bug783545 on backup2.db.phx1.

> Overall: 148.73M total, 4.02k unique, 1.48k QPS, 0x concurrency ________
>
> Rank Query ID           Response time Calls    R/Call Apdx V/M   Item
> ==== ================== ============= ======== ====== ==== ===== =======
>    1 0x888F0EC503C28937  0.0000  0.0% 32037096 0.0000 1.00  0.00 SELECT bugs cc bug_group_map
>    2 0xC4C898BE47EDEA1A  0.0000  0.0% 26043265 0.0000 1.00  0.00 SELECT bugs
>    3 0xFFEBB4E4D3CD6CD6  0.0000  0.0% 13310900 0.0000 1.00  0.00 SELECT bugs
>    4 0x2340A01D8D520D87  0.0000  0.0%  6106533 0.0000 1.00  0.00 INSERT votes
>    5 0xDC1910702A0E2513  0.0000  0.0%  6102016 0.0000 1.00  0.00 UPDATE bugs
>    6 0xC195F27406BA79BD  0.0000  0.0%  4922970 0.0000 1.00  0.00 SELECT groups
>    7 0xDE5B4ABC5FB01A9B  0.0000  0.0%  3731168 0.0000 1.00  0.00 SELECT attachments
>    8 0x120BBAFEF1C0409E  0.0000  0.0%  3531670 0.0000 1.00  0.00 SELECT group_group_map
>    9 0x4A15ABFB03ED10B2  0.0000  0.0%  3531668 0.0000 1.00  0.00 SELECT user_group_map
>   10 0x14D21F7050EC2C07  0.0000  0.0%  3288977 0.0000 1.00  0.00 SELECT profiles
>   11 0x95E43EB346990335  0.0000  0.0%  2503843 0.0000 1.00  0.00 SELECT products
>   12 0x36C06FD702E2565D  0.0000  0.0%  2282629 0.0000 1.00  0.00 SELECT classifications
>   13 0xEEC5B23B5DA070C4  0.0000  0.0%  1945696 0.0000 1.00  0.00 SELECT votes
>   14 0x85FFF5AA78E5FF6A  0.0000  0.0%  1618007 0.0000 1.00  0.00 BEGIN
>   15 0x9617DE062D3FFA8D  0.0000  0.0%  1548120 0.0000 1.00  0.00 COMMIT Xid_log_event
>   16 0x11F64398A2780FC3  0.0000  0.0%  1360071 0.0000 1.00  0.00 SELECT groups user_group_map
>   17 0x3AEAAD0E15D725B5  0.0000  0.0%  1091112 0.0000 1.00  0.00 SET
>   18 0x36AB25870320157C  0.0000  0.0%   932108 0.0000 1.00  0.00 SELECT fielddefs
>   19 0x19F283636D8C1E6D  0.0000  0.0%   918474 0.0000 1.00  0.00 SELECT dependencies bugs bug_status
>   20 0x5D51E5F01B88B79E  0.0000  0.0%   809656 0.0000 1.00  0.00 ADMIN CONNECT
>   21 0xAA353644DE4C4CB4  0.0000  0.0%   795522 0.0000 1.00  0.00 ADMIN QUIT
>   22 0xCC47B42511EA22DD  0.0000  0.0%   780779 0.0000 1.00  0.00 SET
>   23 0x7DD5F6760F2D2EBB  0.0000  0.0%   780387 0.0000 1.00  0.00 SHOW VARIABLES
>   24 0x836A6EA883156E9D  0.0000  0.0%   780362 0.0000 1.00  0.00 SET
>   25 0xB31DAB9B93AD77A1  0.0000  0.0%   769037 0.0000 1.00  0.00 SELECT attach_data
>   26 0xF86D5C3DF2320F8C  0.0000  0.0%   763917 0.0000 1.00  0.00 SELECT flags
>   27 0xFFD1911EFCDB1CD2  0.0000  0.0%   680272 0.0000 1.00  0.00 SELECT flags
>   28 0x33D8A2A3D2872A06  0.0000  0.0%   656391 0.0000 1.00  0.00 SELECT dependencies bugs bug_status
>   29 0x9340FC9BBB44C9B9  0.0000  0.0%   625903 0.0000 1.00  0.00 UPDATE ts_job
>   30 0x42A425C6B59EF876  0.0000  0.0%   606605 0.0000 1.00  0.00 SELECT fielddefs
>   31 0x2B1AD53FAA2DD620  0.0000  0.0%   606605 0.0000 1.00  0.00 SELECT fielddefs
>   32 0x8A7D16477C3BB176  0.0000  0.0%   603688 0.0000 1.00  0.00 INSERT ts_job
>   33 0x28FE62860D7CB674  0.0000  0.0%   603683 0.0000 1.00  0.00 DELETE ts_job
>   34 0x96763E81BB96E172  0.0000  0.0%   562222 0.0000 1.00  0.00 SELECT products
>   35 0x08CCA1F951DE9BF1  0.0000  0.0%   553981 0.0000 1.00  0.00 SELECT components
>   36 0x2A11171E7A191FE4  0.0000  0.0%   547886 0.0000 1.00  0.00 SELECT products
>   37 0xA2743C3B622960FA  0.0000  0.0%   547886 0.0000 1.00  0.00 SELECT classifications
>   38 0x086E25DC08D8F612  0.0000  0.0%   547832 0.0000 1.00  0.00 SELECT fielddefs
>   39 0x57DDD75A33267F7E  0.0000  0.0%   547492 0.0000 1.00  0.00 SELECT fielddefs
>   40 0xDA7B97356061A8BB  0.0000  0.0%   543949 0.0000 1.00  0.00 SELECT keywords
> MISC 0xMISC              0.0000  0.0% 18210240 0.0000   NS   0.0 <3981 ITEMS>

> Overall: 70.60M total, 1.69k unique, 2.51k QPS, 2.22x concurrency ______
>
> Rank Query ID           Response time    Calls    R/Call  Apdx V/M   Ite
> ==== ================== ================ ======== ======= ==== ===== ===
>    1 0x888F0EC503C28937  9013.3389 14.4% 27126357  0.0003 1.00  0.58 SELECT bugs cc bug_group_map
>    2 0x9ACDEC76E9BEB1F7  4783.0476  7.6%      146 32.7606 0.00  3.68 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>    3 0xC4C898BE47EDEA1A  3236.9999  5.2% 25920894  0.0001 1.00  0.69 SELECT bugs
>    4 0xF76B44FAAC8DBF8B  2959.8613  4.7%      657  4.5051 0.17  0.08 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
>    5 0xFDA4566562CD18C8  1916.1385  3.1%      205  9.3470 0.00  0.04 SELECT bugs profiles products components cc profiles longdescs profiles bug_status priority bug_group_map cc profiles
>    6 0xFFEBB4E4D3CD6CD6  1715.4005  2.7%  2739881  0.0006 1.00  0.01 SELECT bugs
>    7 0xD524217B1B17D568  1462.3101  2.3%       49 29.8431 0.00  2.04 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>    8 0xE8B5E2A28F8968A6  1446.9571  2.3%       44 32.8854 0.01  3.18 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>    9 0x5488922CB0DD98A2  1333.4416  2.1%      576  2.3150 0.85 13.75 SELECT bugs bugs_fulltext
>   10 0x9340FC9BBB44C9B9  1329.7089  2.1%    43469  0.0306 1.00  1.46 UPDATE ts_job
>   11 0x28FE62860D7CB674  1191.3526  1.9%    41492  0.0287 1.00  1.62 DELETE ts_job
>   12 0xBEF98AB9DC360124  1051.4439  1.7%       82 12.8225 0.00  0.06 SELECT bugs profiles products components cc profiles longdescs profiles bug_status priority bug_group_map cc profiles
>   13 0x17529A6F568B3FDC   958.3152  1.5%       30 31.9438 0.00  1.20 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>   14 0xC349581680738662   903.0830  1.4%    82761  0.0109 1.00  0.01 SELECT bz_schema
>   15 0x120BBAFEF1C0409E   825.8025  1.3%   747225  0.0011 1.00  0.00 SELECT group_group_map
>   16 0x9BC513EE1F68591A   677.0905  1.1%      195  3.4723 0.82 18.61 SELECT bugs profiles products bugs_fulltext bug_group_map
>   17 0xFD6C1550181F3C47   605.9063  1.0%       33 18.3608 0.00  0.18 SELECT bugs products bugs_activity priority bug_severity bug_group_map
>   18 0xDB77235BC049D9EA   566.9032  0.9%       21 26.9954 0.00  0.85 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>   19 0x63B5684D097B8C22   537.6666  0.9%     1226  0.4386 1.00  0.03 SELECT profiles
>   20 0xD589403ADD556F0F   503.9018  0.8%    16851  0.0299 1.00  2.63 UPDATE logincookies
>   21 0x60BFFFA0AD192698   456.8404  0.7%      194  2.3548 0.50  0.13 SELECT bugs profiles products bug_status priority bug_group_map profiles
>   22 0x465EBD1B34D1D6F7   448.8139  0.7%      112  4.0073 0.27  0.07 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map profiles
>   23 0x2C3ACFED3031A047   390.9027  0.6%   113308  0.0034 1.00  0.00 SELECT groups bug_group_map group_control_map
>   24 0xCA164384D0C1EED9   389.1624  0.6%       15 25.9442 0.00  0.05 SELECT bugs profiles products cc profiles longdescs profiles priority bug_severity bug_group_map cc profiles
>   25 0x6EBEBB46B650A9DF   384.9132  0.6%       42  9.1646 0.00  0.21 SELECT cc profiles bugs
>   26 0x8A7D16477C3BB176   367.9291  0.6%    39969  0.0092 1.00  0.77 INSERT ts_job
>   27 0xD66AF0EE14E12E94   353.8701  0.6%       14 25.2764 0.00  0.03 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map cc
>   28 0x3D7CDAEE8CA789F4   353.3219  0.6%    11263  0.0314 1.00  0.02 SELECT groups user_group_map group_group_map
>   29 0x3D6CB8140BFF2F7E   345.2803  0.6%       13 26.5600 0.00  0.10 SELECT bugs profiles products components bugs_fulltext bug_status priority bug_group_map
>   30 0x6A5805478E0E2691   338.8452  0.5%       98  3.4576 0.20  0.49 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
>   31 0x76FE71F870A6D9AD   327.9010  0.5%       83  3.9506 0.41  0.02 SELECT bugs profiles products components cc profiles longdescs profiles bug_group_map cc profiles
>   32 0x3D749CCCFE9BDA45   324.1403  0.5%        9 36.0156 0.00  2.31 SELECT bugs profiles products bugs_fulltext bug_status priority bug_group_map
>   33 0xF62C0499755506DE   320.6598  0.5%       71  4.5163 0.80 18.18 SELECT bugs profiles products bugs_fulltext bug_group_map
>   34 0x85E90B6BDBD62523   316.2919  0.5%       82  3.8572 0.79 16.92 SELECT bugs profiles products bugs_fulltext bug_group_map
>   35 0xA218D88CF3CCD034   303.0826  0.5%      130  2.3314 0.50  0.10 SELECT bugs profiles products bug_status priority bug_group_map cc
>   36 0xB72D7BEE5253ACD9   293.7942  0.5%      164  1.7914 0.91 18.91 SELECT bugs profiles products bugs_fulltext bug_group_map
>   37 0xDB8630E77D64F40C   266.8190  0.4%       98  2.7226 0.50  0.02 SELECT bugs bug_group_map
>   38 0xD509CA49D6A0F4BC   256.8444  0.4%        7 36.6921 0.00  0.26 SELECT sanitized_bugs_activity fielddefs profiles attachments
>   39 0xA9A4E588F11C4441   252.5276  0.4%       83  3.0425 0.84 17.41 SELECT bugs profiles products bugs_fulltext bug_group_map
>   40 0x7D7AB96FF2A558DD   247.1740  0.4%      160  1.5448 0.50  0.01 SELECT bugs profiles products bug_group_map cc profiles
>   41 0x0FC7B18F561F60F6   237.3983  0.4%       97  2.4474 0.54  1.37 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
>   42 0x9328A7EA16E86277   231.6979  0.4%      112  2.0687 0.50  0.15 SELECT bugs profiles products bug_status priority bug_group_map
>   43 0xF6C637CA156B47A3   219.4801  0.4%       45  4.8773 0.21  0.23 SELECT bugs profiles products components cc profiles longdescs profiles bug_status priority bug_group_map cc profiles
>   44 0x93C7788E14E842CB   218.5257  0.3%      157  1.3919 0.50  0.02 SELECT bugs bug_group_map
>   46 0x63322F08414EA78E   197.9990  0.3%       71  2.7887 0.50  0.01 SELECT bugs profiles products components cc profiles bug_group_map cc profiles
>   48 0x0ECB0BADF3C9521B   194.1769  0.3%       34  5.7111 0.00  0.15 SELECT bugs profiles products components cc profiles longdescs profiles bug_status priority bug_group_map cc profiles
>   50 0x35AAC71503957920   184.0636  0.3%       32  5.7520 0.61 14.61 FLUSH
>   58 0xCA9BA5BBC37181C2   135.5664  0.2%       31  4.3731 0.27  0.92 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map keywords
>   60 0xBADB96D7F151AF06   133.8165  0.2%       39  3.4312 0.49  0.01 SELECT bugs profiles products components
>   68 0x0B105355BC6A177F   108.7623  0.2%       67  1.6233 0.50  0.01 SELECT bugs profiles products components bug_status priority bug_group_map cc profiles
>   70 0x435ABBE649E8A8AE   104.4298  0.2%       21  4.9728 0.40  5.26 SELECT bugs profiles products bugs_fulltext bug_group_map
>   71 0x03E78A71DB839F5F   103.6961  0.2%       42  2.4690 0.50  0.02 SELECT attachments profiles bugs
>   79 0x5316013A2F525932    91.6121  0.1%       53  1.7285 0.50  0.13 SELECT bugs profiles products bug_status priority bug_group_map
>   80 0xD907EE4DF1B28EB5    90.3931  0.1%       17  5.3172 0.00  0.01 SELECT bugs profiles products components cc profiles longdescs profiles bug_group_map cc profiles
>   82 0x65269CB0B07DDF32    87.4674  0.1%       21  4.1651 0.17  0.29 SELECT bugs attachments flags flagtypes bug_group_map
>   84 0x15572DB386BDC5FC    85.3265  0.1%       21  4.0632 0.26  0.07 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map
>   85 0x6F49547BF1B833CD    84.8891  0.1%       22  3.8586 0.45  0.05 SELECT bugs profiles products components cc profiles bug_status priority bug_group_map
>   88 0xFE5C7D3126ACC374    81.4058  0.1%       12  6.7838 0.00  0.23 SELECT cc bugs profiles products classifications components bug_cf_locale flags flagtypes keywords keyworddefs bug_status priority bug_group_map keywords
>   92 0xA1FB4CE96CDDCC51    75.2073  0.1%       54  1.3927 0.50  0.01 SELECT bugs bug_group_map
>   94 0x11D6116AC9B3170D    74.5636  0.1%       42  1.7753 0.50  0.02 SELECT bugs bug_group_map
>   98 0xC571703BAD8BDC1C    71.5647  0.1%       41  1.7455 0.90 16.19 SELECT bugs profiles products bugs_fulltext bug_group_map
>  100 0x302EB5FD3E8F5518    69.4257  0.1%       20  3.4713 0.72  9.10 SELECT bugs profiles products bugs_fulltext bug_group_map
Comment 26 Dustin J. Mitchell [:dustin] 2012-10-01 05:45:58 PDT
It's probably not surprising that two fingerprints appear at the top of both lists: 0x888F0EC503C28937 and 0xC4C898BE47EDEA1A.

As for numbers, I'll leave the stake in the ground that I set in comment 19:

"Examine the most frequent queries, grouped by fingerprint. Optimize the frequency of the top 10% of all queries down by 50%" and "Examine queries, grouped by fingerprint, and ordered by total time consumed during the log capture.  Optimize these queries to save an estimated 10% of total query time."

Although given the numbers above, it may be possible to set more aggressive goals?  Gerv, my read of the bug says that's for you to decide..
Comment 27 Byron Jones ‹:glob› 2012-10-01 09:28:50 PDT
in order for this information to be useful, we'll need the complete sql for each query.

as we don't have access to backup2.db.phx1, would it be possible to provide the full report via another mechanism?
Comment 28 Gervase Markham [:gerv] 2012-10-01 09:53:47 PDT
Not sure it's up to me to be setting goals for the DBA team! But as glob said, we need the SQL for these top queries (suitably placeholdered) in order to work out where in the code they are coming from, why they are so slow, and what we might do about it. Is it possible to get that info?

Gerv
Comment 29 Gervase Markham [:gerv] 2012-10-01 09:54:13 PDT
Oh, and... thanks very much for doing all this! :-)

Gerv
Comment 30 Dustin J. Mitchell [:dustin] 2012-10-01 10:00:48 PDT
Sure, who-all should I email that to, and at what GPG keys?
Comment 31 Gervase Markham [:gerv] 2012-10-01 11:24:17 PDT
You can email it to me: http://www.gerv.net/GervPGPKey.asc
Glob and dkl presumably have keys as well.

Gerv
Comment 32 Byron Jones ‹:glob› 2012-10-01 11:32:29 PDT
http://drop.glob.com.au/gpg.asc
Comment 33 David Lawrence [:dkl] 2012-10-01 12:22:17 PDT
http://dl.dropbox.com/u/204930/gpg.asc
Comment 34 Dustin J. Mitchell [:dustin] 2012-10-01 12:30:06 PDT
GPG'd analysis sent to all three of you.
Comment 35 Sheeri Cabral [:sheeri] 2012-11-02 11:17:31 PDT
Well, it's already 1 month into the quarter. Let's just go with going through the top 10 queries by time and top 10 queries by count, and then if we do more than that it's gravy. Sound good?
Comment 36 David Lawrence [:dkl] 2012-11-02 13:28:55 PDT
(In reply to Sheeri Cabral [:sheeri] from comment #35)
> Well, it's already 1 month into the quarter. Let's just go with going
> through the top 10 queries by time and top 10 queries by count, and then if
> we do more than that it's gravy. Sound good?

That works for us as a start. Main thing we need to find out from the logs provided what the full SQL queries are that are taking the most time and look at the current schema that the queries touch to make sure we have everything configured optimally. If the query needs to be restructured thats one thing, and also if we need to adjust the indexes for better performance thats another. For the queries that run with the same attributes over and over again we can look at better caching that information on the application level possibly. One thing we talked about was the use of memcached inside the cluster for storing information that is requested frequently that changes little to none.

dkl
Comment 37 Dustin J. Mitchell [:dustin] 2012-11-02 13:35:46 PDT
The logs I sent along should contain that information, unless I misunderstand what you're looking for.
Comment 38 Sheeri Cabral [:sheeri] 2012-11-02 13:44:06 PDT
This is a q4 goal that the DB team was going to work on, so we'll be looking at each of the top 10 by time and by count, and offering suggestions. We'll also note the individual queries so you can determine if memcached is a better solution.

The idea was we were going to figure out a better metric than "top 10" but since it's already November, let's just pick something and run with it, so we optimize SOMETHING, rather than nothing.
Comment 39 Sheeri Cabral [:sheeri] 2012-11-13 14:21:22 PST
marking this incomplete, because we're just going with "optimize the top 10 queries".

Note You need to log in before you can comment on or make changes to this bug.