Closed
Bug 1333156
Opened 8 years ago
Closed 8 years ago
Categories
(Tree Management :: Treeherder: API, defect)
Tree Management
Treeherder: API
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: erahm, Assigned: wlach)
References
()
Details
Attachments
(5 files)
I'm seeing 503's when trying to query jobs for a given resultset with the following params:
- count
- platform
- result_set_id
- option_collection_hash
This will eventually work after repeated attempts, wlach indicated these were timeouts and we need a new index to speed the queries up.
Assignee | ||
Comment 1•8 years ago
|
||
It seems like almost all the time (nearly 30 seconds, which is our limit) of this failing query is being spent in MySQL specifically on a query to the treeherder `job` table.. One example of slow query in action: https://rpm.newrelic.com/accounts/677903/applications/14179757/datastores#/overview/All/trace/?id=3662f688-e1a3-11e6-9c7c-b82a72d22a14_0_3930&metric=Datastore%252Fstatement%252FMySQL%252Fjob%252Fselect
Which is roughly:
SELECT `build_platform`.`architecture`, `build_platform`.`os_name`, `build_platform`.`platform`, `job`.`build_platform_id`, `reference_data_signatures`.`build_system_type`, `job`.`end_time`, `job`.`failure_classification_id`, `job`.`id`, `job`.`coalesced_to_guid`, `job_group`.`description`, `job_type`.`job_group_id`, `job_group`.`name`, `job_group`.`symbol`, `job`.`guid`, `job_type`.`description`, `job`.`job_type_id`, `job_type`.`name`, `job_type`.`symbol`, `job`.`last_modified`, `machine`.`name`, `machine_platform`.`architecture`, `machine_platform`.`os_name`, `job`.`option_collection_hash`, `machine_platform`.`platform`, `job`.`push_id`, `job`.`reason`, `reference_data_signatures`.`name`, `job`.`result`, `job`.`push_id`, `job`.`running_eta`, `reference_data_signatures`.`signature`, `job`.`start_time`, `job`.`state`, `job`.`submit_time`, `job`.`tier`, `job`.`who` FROM `job` INNER JOIN `machine_platform` ON (`job`.`machine_platform_id` = `machine_platform`.`id`) INNER JOIN `reference_data_signatures` ON (`job`.`signature_id` = `reference_data_signatures`.`id`) INNER JOIN `build_platform` ON (`job`.`build_platform_id` = `build_platform`.`id`) INNER JOIN `job_type` ON (`job`.`job_type_id` = `job_type`.`id`) LEFT OUTER JOIN `job_group` ON (`job_type`.`job_group_id` = `job_group`.`id`) INNER JOIN `machine` ON (`job`.`machine_id` = `machine`.`id`) WHERE (`job`.`repository_id` = %s AND `job`.`option_collection_hash` = %s AND `job`.`push_id` = %s AND `machine_platform`.`platform` = %s AND NOT (`reference_data_signatures`.`signature` IN (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s))) LIMIT 2000
Looking at the details, it seems like the query is using index `job_084f3fc9` which corresponds to machine_platform_id. Why would it not use the `job_e13f0cea` index, which corresponds to push_id and should have a much smaller selectivity? When I run `explain select * from job where machine_platform_id=9 and option_collection_hash='32faaecac742100f7753f0c1d0aa0add01b4046b' and push_id=149652;` (which is virtually the same query), it uses the `job_e13f0cea` index as expected.
Pythian, do you know why this might be the case? I'll attach the output of show create table in case it helps.
Flags: needinfo?(team73)
Assignee | ||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
[beebe.pythian]
Hi we have received the page and have created https://secure.pythian.com/track/cr/1121453 to track the issue internally. We will take a look and let you know here what we are able to find out and suggestions if any.
Updated•8 years ago
|
Assignee: nobody → team73
Flags: needinfo?(team73)
Comment 4•8 years ago
|
||
Hello
I will check this now.
Comment 5•8 years ago
|
||
I worked with Willian in IRC.
This is treeherder in AWS RDS and we do not have access yet.
But we can work with it with the commands bellow.
We need:
The whole query, with values instead %s. The explain of it and tables details as bellow:
EXPLAIN EXTENDED
SELECT `build_platform`.`architecture`, `build_platform`.`os_name`, `build_platform`.`platform`, `job`.`build_platform_id`, `reference_data_signatures`.`build_system_type`, `job`.`end_time`, `job`.`failure_classification_id`, `job`.`id`, `job`.`coalesced_to_guid`, `job_group`.`description`, `job_type`.`job_group_id`, `job_group`.`name`, `job_group`.`symbol`, `job`.`guid`, `job_type`.`description`, `job`.`job_type_id`, `job_type`.`name`, `job_type`.`symbol`, `job`.`last_modified`, `machine`.`name`, `machine_platform`.`architecture`, `machine_platform`.`os_name`, `job`.`option_collection_hash`, `machine_platform`.`platform`, `job`.`push_id`, `job`.`reason`, `reference_data_signatures`.`name`, `job`.`result`, `job`.`push_id`, `job`.`running_eta`, `reference_data_signatures`.`signature`, `job`.`start_time`, `job`.`state`, `job`.`submit_time`, `job`.`tier`, `job`.`who` FROM `job` INNER JOIN `machine_platform` ON (`job`.`machine_platform_id` = `machine_platform`.`id`) INNER JOIN `reference_data_signatures` ON (`job`.`signature_id` = `reference_data_signatures`.`id`) INNER JOIN `build_platform` ON (`job`.`build_platform_id` = `build_platform`.`id`) INNER JOIN `job_type` ON (`job`.`job_type_id` = `job_type`.`id`) LEFT OUTER JOIN `job_group` ON (`job_type`.`job_group_id` = `job_group`.`id`) INNER JOIN `machine` ON (`job`.`machine_id` = `machine`.`id`) WHERE (`job`.`repository_id` = 1 AND `job`.`option_collection_hash` = '32faaecac742100f7753f0c1d0aa0add01b4046b' AND `job`.`push_id` = 149652 AND `machine_platform`.`platform` = 'linux64' AND NOT (`reference_data_signatures`.`signature` IN ('1866e74b7ea5e689b7c4422358ff20fc033ddce9', 'ee795b14272590ad5583d7665074cdcbf109f076', 'b6bb6996a53eb7ed26fa46b38bb5179996ba6730', '98e9cd9901d9852353e56ecb0a905c46a7a20a46', '7da5ca2040922f566a4dfbbd2a87c8e3df2b8fc1', '007841610bf7d1816290de7dd7d2989b075d29d0', 'cf8aa2f786ae557071385f9d5fc69d3c51c8e087', '039bb207007684a43ca04cf2348daa4094e530f7', '871119ce275970cafe3de0aa34b9899c931876e1', 'c1b40b291989bb6efe349c154cf7c5c0013dbfea', 'd10acd9c22d4df0b29811b8bb0c01cd8b2ae3e53', '6b8f49050e2bd7ad5bcfc470225d3bbf154f9c89', '13e6126d6a8b1b9e5662d65f8f8ae9b4a930abef', '1aef0af9c2068fdfba493dec72bc3941bfae484f', '7484b9c8cc01ff1018b4d1caf615624126a3fe7e', '9d6c6e92bceeafbde082ee4b0d27fc02a5b8ceb1', '0427ed39858e4ea4e6c38158f15c58c2db1ea7b3', 'c2d151ddcff65fc5c54152d6adcc0dff25bc0c4a', '1879740c9217f1e9200c54b7e712708512f73b9c', '7eeb5b9e2dcfa1f90ece3e172000328c55b4ecec', 'e08100c7a8847e0e5aaa1238f197f5c89fb7fa21', 'f838c7385c9dcf54440d86ef0d54fa6f6cd6d0cf', '41daad18f9d2530c58ba009d3c056b59c61bcc20', '6aa9c7eb1db876ec588660695898fd625a9fa397', '96cf092f14a8456a74d5c52eba536b34c4e47ceb', 'c9f6badf9e1807132759e0cf48323eb63ad15db8', '169ddebdd4645d31caff0597a8c481eb35b96612', '47d76a264d20eacf09ebb82fba59b2be06d45eb7', '7de94eacf32c217fb63677a8897635b318d0d9df', '28fe53bcabd3db69b02803b130f7952534165588', '9417d5aa7cb458f9435321ee351b9af5c19b24a5', '675243a7d89e5ca118e8eef2817e3908082997c7', '7c9c8231d81440c92b8bcf0d7bde560a71ef9e58', '401b8124ca4787ec0b59df59acf6ff04ae21037b', '631cff2236ee3620f0223318610e2f3b9614973e', 'd2ad40fe55567257850534f5b84a743c948dfe6f', '9e850b2f7a559f650627b0f5d8dedf9f2f896902', '1f692a3d7f3639cee34fc316c9d6fac64df38c70', '13eb594e8fc76a6b1f17f70a436bfe6335acc546', 'c80e79f726647a81025e68939004c7ceca91cf4e', 'feaee4790e3b711087f01785bf8d8f0366b464b2', '12c64899024073242237d958eb49983de470a889', 'd2639e70dadf5ee512a65ca42afcf1505ea6ecae', 'd20ed2eaa63bbb388b8841d72dedbeb1d9807b44', '35e6802055e3acd6f5d81488caf1fa3378e75cb3', '55bcb080c3a7cbe3c6e9a2a75ccc73ca86010a4c', 'c941dbeb0728bfed421f1472851dab6fee41de6c', 'b66fab4fd043f1522b172de57b24e334999b34e9', '7e43474eca35b9d069181966eba4a3014b15141c', '4bbb15e2275e80f866a552f778a8277bfbcdd476', '0c71d23cd33292af80ee7da5a7636baf8ea86c04', '68aca0d7a54860897561fceae9965268979df6d9', 'b05ffc5396d224303d88ed13c13227748e205829', '22189572647c81b305b7c8f1a4aa634062953f24', 'f6bf8bb6a3263f7a2dc0fcdb6c517f95c29ecae3', '321d309600dfc313fc51a4954b0840754f2ce8c7', '425110eb7c28c63e4c7ec3955735c67311251f68', '4732f93e95c9d42b082e4c085a57e59b2dbca60f', 'c9a7f5907b3291aeff8ad3f8648f94f250e4f03e', '5fac0d324a0cb1f35e89599c8be55cb971b4ad54', 'e837f5101ed1c5ad93c4fd2ef1490322df33acca', '3af1aca4e6bd7be124416ae2ac3192701d1ad66d', 'f834afe147c1a31589a9fca14bcb3dd5a1c033bd', 'bc9e05a1cb8b3882b58ada568430404c36046b53', '92bad2d8f4dde22a6314e0b23bbc58216eceb150', 'f921c5ec240836c8c3c8c216edf7753a64785c30', '102e586e73be6c2034643e7419f3d8bba972ed1f', '1ab7b7da75bf2e5ae15e4993f7b59660a0cf8d74', '613651097453de2d5288555e00161e73801c8e21', '4afbeff7b00a79e0a1bd42d80874cc52ec589adf', '460a444bd2e112a82684eb6a76e6fb712fde7f80', '07731be9af5567fc2c76da0772c9aa1cb5e19e4a', 'f8ca283541265463f1731f9de75e9f40188b1c1d', 'ab0ea526473f61a621f4f793cef4d2a5c758b937', 'ca5209d7c28df0157fc6a54b9a802c7e4dfecb81', '1213cbdddedba1dac522ab882b40b45f7182964c', 'c1158f8434513b9ead9966a5a6fe7d9de07cc5b0', '4d576c0a16576bde9d3a063f5e13ebe369d09a12', 'e5dba17a27d135df1a296c09ce2509ae70f28b4f', '785c8f092f77bbae1204b00d9a8a375d4e0ed098', 'b0c6fbf7250a9e87de50ff144d5eb6055eb95d3d', '1434056109865e764f592bf1def1d59e10018322', '564f53fce5027211516c2c027aea75dd1e8bef89', '109d4b752419c93972e63d58271301fba8d72b59', 'e0928fbfb779481c8fe7ea02562040bc7989848c', 'ba5b3b671fdd55200ed1f585acf479d126fea065', '724b5fd8fe3e3c34b7011bb7a962e52da5a98029', '1d5b9d8ee70f4511d04b8c12805dc1e7b2bd18f5', '5f66682ad0806b89d51d307600641b40a42c983a', 'da1395a0a936dd93d2dfdc4b47eeed73efbc6fd2', '5752bc2f4918c9e91be79746607b8fe4ce754292', 'fd0b1d6ee070aecaec5e3ff0854c20494a211d11', 'b7330a0037bf015b168e81af12dc13bd6829df39', '183c584367dbf3df158c793f3b2514fd36915882', '234d18ad6b04f48c18a83b9f53f8ca572c5add41', 'fd07f610cbcf85618801dfc44104e56fb3f3d06a', '647b7a568fb8a70225d319b2e2b774fc6a2e19cf', '23ef50128a9057e05c6b51b7d301c5d8caf5cae8', '9d20b39ad418aba146c55d5f9b8ccf1f5877f19c', '0932dc7296525455f967f48b6bbc7a0469d060f2', 'bb996b80f36fb42030c28c8a80f062a1bfb1191d', '8fcdb4a0a4ab98a2346289c4f0c7c385ea128a55', '23b2e58a8556f7b681ca44641ced2fce8878724b', '94e75e203a0bebe791cf42221b7a9e5d9716c26e', 'ecd18d394289b0906197cf2597424c97dba4ceea', '4cd245a1b13c48a774ba564bcab04ac27af8f50f', '5b5f5422addff15f3506ee45657cf9408c57bd93', '72128887e65a8c3bb077dd5010842c2cb30bbb46', '0ecb431c05bb358b8c7ef339067b1fa93cb86185', 'b22ecc5a3023ea488c920e0ab2e901bc1841a8d4', '680b20f98189abed1bff97655b6c8af559a8b6b0', '1a34630c7e6df7b4b5f20c20b5ec6d3d7c98327b', 'bc5c22318de8a44fa4bcf5404b1a2fc868f40df4', '84075881155353506baab7232388c7d3d8f0c3e9', '882d7be69c94a690d0d5e11c693f9a0eee5ab440', '453736e6b6e5344e68652a4447d08f3119ae72e3', '1119df57fab2702139ef27e0cc56e3ecfcf31f7d', '92efc1443126126e8f2163d2d2cb69889ab95e19', 'd02e880d4df195cb08dfd73a41e637d65517cdac', '49b9e0315f1bc351696d1a76c66f0a04c39fde2b', '6230bccb1b94c2c0c702fa1181f24d6d632fdea6', 'db9dfc94a2bd5700fc81efc2fac078dc9ad7fa86', '8fddcfc02120feb9e750be7781fee4c4b67da57b', '3ac004d6562c7b355392f5028830edd23982d7f9', '3da319ff4c54bd1252c1b53abc71cfdc835b81b0', '398f43b5d9c84e205e76b4fb973e95ba25848511', '277b8afeb7375c08ddfc743377b93a2a987cdaf1', '7c6aefeaf65c3e6b0b60a849cea6ee69d3450b18', 'd2f9439b342674b68c4d070e6d265235bdef941e', '1ec1a420eb6d7ea5f66a42ee3cfd59df80424232', '0ed50f8774487daadb1f7e292c7d4788c10f6c46', '8b45698b9d349fc9194b7f1df786136fcd5ca48b', '34d6dbcb173877afef52d68b1b4d96b772c452cf', 'bde8782e3839c47db5f04fe5dcde6bb67ad99c5d', 'c1481e33d728fca794b7cffe2283f3a9fd000e9f', '3f262b0151f118b54e6f927660fcab0754753d04', '84c69e58886f56c5c9e2f029903d27d4a14e5e6c', '58030bf827fc4dba7a02617afaca38672af597aa', '10dd03e3b56708f24da60840fdf6de4ada32bd33', '87ea8fdca4f0d375b85dfc7316cb1fc46da7caa2', '2c07f444248be423197b94953344e87d1cccd1fc', 'c8565df87730911d78e20d9090f3476d43d5a966', '3b95fb59fc991d2053c671f622a1468ad2e39348', '86f6bd00b46316cfc01f3e34f98d31bf2316a329', 'cb09b58a8293353407cc8e95d9f77b68e842ad93', 'db314ee52dfe95cd616a6941a5ce04f34ecc5b0e', '9b7a50f0b40228508154e3649c5dcaa0ffedeb6c', '5fb95afc7ad7f88f97618ebf81e2c51150fcec47', '5b986f612d292c1a5537db50b81b579b1c279011', '7e2694dfcf5cbbbcc8cf463f0e51d37c464ee816', '41c8c5e627ee2240f2417f11fe828d469ea19154', '9efa70d580552ac1b347563f378d5b38c70f7b8c', '33b4292140fc8542e6bcc43dd28d9e3bda69e6fd', '94a1f7cc6bb66d3b733d1db0916a39305408a02a', '52a7a3b9c1839687f34e72e752e0b5d880c4df43', '09c63f0fa3b0212f5c60f26a0e09e07ba501414b', 'c96c023944ac6d3d92f9541385b6960e9467fa7b', '1771671ab3acc343c550cc4b2da5e92ae78561f9', '350949218e9a1067b8236e25769c5756ddacf096', '73ea0c7de78d8a7c5d30ec467359943fdd974dae', '6871c70035f90494cc024389166500422c915034', '9d92dad5008cacec2a53081204df6c5740e2940a', '9f9c008ef6b934c8307fca7f04dd1d188e419ec0', '8317cf4ac1948c27bfb754bae3011f3d9beb6efb', '0a0c479330778eb0debcd5ed38e0856c0e4cb7c6', '02e3d67b6148e922f9fb8551f775788e4e14a225', 'f057433c2c1fb02fc206b9593ced9c45bbcd04b2', 'e8ce002bc206108310562d66308c2ca4660c2ae2', '7884a03308431f0b5df4ae85af8d5c8504bba52f', '2450f87c8342057dee8bc620cf083cdc99e54a03', 'cdccba6f0088aa4c3570e110628c14fd924c22b5', 'c3b912773f247c01f0937f51dff80c6a7c4b65ba', '845d782f9d3f7d9bb91ee9acb3baab7f4f866be3', '33f9b3f168e11bcf88db54d0f9a575dcd8bbd353'))) LIMIT 2000;
show table status like 'job';
show table status like 'machine_platform';
show table status like 'reference_data_signatures';
show table status like 'build_platform';
show table status like 'job_type';
show table status like 'job_group';
show table status like 'machine';
show create table job\G
show create table machine_platform\G
show create table reference_data_signatures\G
show create table build_platform\G
show create table job_type\G
show create table job_group\G
show create table machine\G
show indexes from job;
show indexes from machine_platform;
show indexes from reference_data_signatures;
show indexes from build_platform;
show indexes from job_type;
show indexes from job_group;
show indexes from machine;
Assignee | ||
Comment 6•8 years ago
|
||
Comment 7•8 years ago
|
||
First try of altered query
Assignee | ||
Comment 8•8 years ago
|
||
tl;dr of my conversation with pythian: machine_platform has a lower cardinality than push_id, so mysql wants to use it as an index in this case even though the query is much less optimal.
I'm not sure of a general solution here. Options:
1. Add an index for this specific case.
2. Try to give special hints to mysql to get it to plan queries better for this table.
3. Try to coerce the queries generated by django to use specific indexes (via http://django-mysql.readthedocs.io/)
4. Give up and get people to use either treeherder's read-only mysql replica (perhaps via redash) or activedata for these kinds of custom queries.
(2) and (3) sound horrible, but (1) might be worth a try. Eric, can you give more detail on what you're trying to do here? If you just want to explore the data, you can now query the treeherder database directly at https://sql.telemetry.mozilla.org Here's an example:
https://sql.telemetry.mozilla.org/queries/1802
Comment 10•8 years ago
|
||
Corrected query with plan
Comment 11•8 years ago
|
||
I updated the altered query that can be used always when push_id is available for where clause (testing is needed).
Other suggestions:
Pythian 1) ANALYZE TABLE job; ANALYZE TABLE machine_platform;
This may not help for sure, but can reduce the not optimal plans
Pythian 2) When push_id is available in Where clause, use the inner query
Pythian 3) Use hints when possible, when know for IGNORE INDEX or USE INDEX as described in https://dev.mysql.com/doc/refman/5.7/en/show-index.html
I know all are hard to generalize and the sub optimal plans are a pain the queries with filters in different tables and joins. This is a process of improving the model.
A new index MAYBE will help, but need to be tested with EXPLAIN PLAN and real testing.
CREATE INDEX idx_pushid_repositoryid ON job(push_id,repository_id);
ANALYZE TABLE job;
This need to be tested as well.
I will ask another DBA to look at it to see if second eyes can be more helpful.
Thanks
Marcelo
goncalves@pythian.com
Comment 12•8 years ago
|
||
I have opened a Pythian CR for that: https://secure.pythian.com/track/cr/1121468
Marcelo
goncalves@pythian.com
Reporter | ||
Comment 13•8 years ago
|
||
(In reply to William Lachance (:wlach) from comment #8)
> Eric, can you give
> more detail on what you're trying to do here?
I'm using the treeherder API to retrieve a list of jobs associated with a given push that was run on a given platform (so all the jobs for rev 123 from mozilla-central that were run on linux64 debug). I then retrieve the URLs of the log files for those jobs so I can download them. This is for the log spam project [1], this specific piece is is the log retrieval portion [2]. It's reasonably sane.
If there's a straightforward alternative I'm happy to implement it.
> If you just want to explore
> the data, you can now query the treeherder database directly at
> https://sql.telemetry.mozilla.org
I'd much rather use a supported and stable API, no SQL please :)
[1] https://github.com/EricRahm/log-spam-hell/
[2] https://github.com/EricRahm/log-spam-hell/blob/8f7de7121866f310c4f5e7a4c89359d82d848ab4/logspam/logs.py#L170-L203
Flags: needinfo?(erahm)
Assignee | ||
Comment 14•8 years ago
|
||
Ok, let's try and just create an index for erahm's use case. If this happens over and over we can consider another strategy. Experimenting on stage...
Assignee: team73 → wlachance
Comment 15•8 years ago
|
||
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8830410 [details] [review]
[treeherder] wlach:1333156 > mozilla:master
I tested this on stage and it seemed to work well with the query above (it returned in well under a second).
This PR also removes an index that we don't need anymore, to maintain balance in the force. ;)
Attachment #8830410 -
Flags: review?(emorley)
Updated•8 years ago
|
Attachment #8830410 -
Flags: review?(emorley) → review+
Comment 17•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/5e09a3daae94978f4e2d88d1fbe0952244cbe18b
Bug 1333156 - Remove unnecessary job indexes/index specifications
* No need to have a special one for project_specific_id (the composite index
on repository and project_specific_id covers the one legitimate use case)
* Remove duplicate index definition for guid (unique implies index)
https://github.com/mozilla/treeherder/commit/89a327d1fe5a53071741636fc5514210af13132b
Bug 1333156 - Add an index to speed up job requests w/ option collection, push, machine platform
Reporter | ||
Comment 18•8 years ago
|
||
Confirmed on my end on the stage server, definitely seems to be < 1s.
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•