Closed Bug 1207648 Opened 9 years ago Closed 9 years ago

Getting list of performance signatures is rather slow

Categories

(Tree Management :: Perfherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

References

Details

Attachments

(2 files)

The endpoint to get performance signatures for a particular platform is rather slow. Example: https://treeherder.mozilla.org/api/project/mozilla-inbound/performance/signatures/?platform=linux64 On some basic profiling, this boils down to a query like this: SELECT DISTINCT `performance_signature`.`signature_hash`, `option_collection`.`option_collection_hash`, `machine_platform`.`platform`, `performance_signature`.`suite`, `performance_signature`.`test`, `performance_signature`.`extra_properties` FROM `performance_datum` INNER JOIN `performance_signature` ON ( `performance_datum`.`signature_id` = `performance_signature`.`id` ) INNER JOIN `machine_platform` ON ( `performance_signature`.`platform_id` = `machine_platform`.`id` ) INNER JOIN `option_collection` ON ( `performance_signature`.`option_collection_id` = `option_collection`.`id` ) WHERE (`performance_datum`.`repository_id` = 2 AND `performance_datum`.`push_timestamp` >= '2015-09-16 08:57:26' AND (`performance_signature`.`platform_id`) IN (SELECT U0.`id` FROM `machine_platform` U0 WHERE U0.`platform` = 'linux64')) I'm not 100% sure why this is slow. Basically we're doing a join from the performance datum data to performance signature data to get signatures within a time interval, which is a little roundabout, but I believe everything is indexed, so shouldn't the query be fast? Pythian: If you could tell me why the above query takes a few seconds (as opposed to the few hundred ms or less that I'd prefer), I would appreciate it. :) This is on the treeherder production database (the one you looked at in bug 1192976)
Flags: needinfo?(team73)
Acked. We will take a look and get back to you. Thanks, Rohit Kumar
Hi, I have collected the stats for this query and sending it to Lead Database Consultant for senior advice. Thanks, Rohit Kumar
Hi wlach, I took a look to the SQL code and DDL of the involved tables and I found the culprit of the performance issue. Is not in the SQL code itself, instead is related with the index definition on the table. I also rewrote the query and get rid off the subquery over the machine_platorm table and I gain some performance enhacement, not much as I expected. I ran an ANALYZE TABLE on performance_datum, compared the before and after values and stats looks healthy. The estimated rows is 39270590 on that table, and the estimated selectivity of the query over the filtered values on that table is 19635854 rows (far beyond what I expected). After examine the push_timestamp distribution I realized, that the culprit of such bad estimation was in fact on the index definition. So far, the count over the existent rows was far lower than the estimation. The issue is that none of the indexes are declared in the correct order in their column definitions. None of the current indexes prioritize the push_timestamp column, which is the best filter on the table. There is actually one index over push_timestamp column, however is not situable for the query as it doesn't have declared any other of the columns used for the filtering. Checked distribution of the table: ``` mysql> select count(*), date(push_timestamp) from performance_datum where repository_id = 2 and push_timestamp > '2015-09-15 00:00:00' group by 2; +----------+----------------------+ | count(*) | date(push_timestamp) | +----------+----------------------+ | 233968 | 2015-09-15 | | 188960 | 2015-09-16 | | 286753 | 2015-09-17 | | 234844 | 2015-09-18 | | 77190 | 2015-09-19 | | 94022 | 2015-09-20 | | 249304 | 2015-09-21 | | 303299 | 2015-09-22 | | 288898 | 2015-09-23 | | 202522 | 2015-09-24 | | 105795 | 2015-09-25 | +----------+----------------------+ 11 rows in set (10.78 sec) mysql> select min(push_timestamp) from performance_datum; +----------------------------+ | min(push_timestamp) | +----------------------------+ | 2014-12-22 15:19:44.000000 | +----------------------------+ 1 row in set (0.00 sec) ``` The index order should be changed to push_timestamp, signature_id, repository_id in the order from the column wiht best cardinality to the worst: Original indexes: ``` KEY `performance_datum_repository_id_795c16d99ae557e2_idx` (`repository_id`,`signature_id`,`push_timestamp`), UNIQUE KEY `performance_datum_repository_id_7c997e2277477458_uniq` (`repository_id`,`job_id`,`result_set_id`,`signature_id` ,`push_timestamp`), KEY `performance_datum_ce25c56a` (`push_timestamp`) ``` This is the expected rows when running an explain using different indexes. | Index | Exp. Rows | performance_datum_repository_id_795c16d99ae557e2_idx | 19058298 | performance_datum_ce25c56a | 7405816 I increased innodb_stats_transient_sample_pages to 16, but no substantial changes were found after the table analyze. I expect that the expected rows with a correct index definition as suggested on this post, will be far better than the estimated by `performance_datum_ce25c56a` index. Regards,
Flags: needinfo?(team73)
Also, the rewrote query is here (same amount of records returned, slight better performance not substantial): mysql> explain SELECT DISTINCT `performance_signature`.`signature_hash`, -> `option_collection`.`option_collection_hash`, -> `machine_platform`.`platform`, -> `performance_signature`.`suite`, -> `performance_signature`.`test`, -> `performance_signature`.`extra_properties` -> FROM `performance_datum` -> INNER JOIN `performance_signature` ON (`performance_datum`.`signature_id` = `performance_signature`.`id`) -> INNER JOIN `machine_platform` ON (`performance_signature`.`platform_id` = `machine_platform`.`id`) -> INNER JOIN `option_collection` ON (`performance_signature`.`option_collection_id` = `option_collection`.`id`) -> WHERE (`performance_datum`.`repository_id` = 2 -> AND `performance_datum`.`push_timestamp` >= '2015-09-16 08:57:26' -> AND machine_platform.platform = 'linux64')\G
I added two indexes to the table on stage with the structure you suggested: KEY `signature_repo` (`signature_id`,`repository_id`), KEY `timestamp_signature_repo` (`push_timestamp`,`signature_id`,`repository_id`), (the former I used to experiment with a query unrestricted by time) ... but am not seeing substantial performance improvements. The query still takes on the order of 5 seconds to complete, which is not really acceptable. I am most interested in knowing if it is even possible for this type of query to be fast (i.e. I want a sub-second query). I'm completely open to changing the table schema. But if it isn't, then I need to look into other types of caching outside of mysql.
Flags: needinfo?(team73)
Hi wlach, If we want to keep this query fast, we may need to reduce the examined rows by narrowing the filters over performance_datum. One way to do this is to enclose the time filtering between 2 dates, and iterate through the days, forcing less examined rows per query. Other way will be looking for another filter over that table (a new column filter on the query). About changing the table schema, we can totally help with that. Can you point us to a documentation about this schema? Regards,
Flags: needinfo?(team73)
(In reply to Pythian Team73 from comment #6) > Hi wlach, > > If we want to keep this query fast, we may need to reduce the examined rows > by narrowing the filters over performance_datum. One way to do this is to > enclose the time filtering between 2 dates, and iterate through the days, > forcing less examined rows per query. > > Other way will be looking for another filter over that table (a new column > filter on the query). So after some pondering, I think the best thing might be to just keep a seperate table of performance signature update information, that way we can limit the query to just the performance signature data. I have a proof of concept already.
Attached file PR
Hey Mauro, can you give me some feedback on this approach? I think using another database table is better than the previous strategy of etag caching (which got killed with the performance refactor), but maybe there's something I'm missing. You can ignore all the commits in the PR except for the last. I'm basing this off of unlanded work in bug 1175295, so I don't have to redo it later.
Assignee: nobody → wlachance
Attachment #8668679 - Flags: feedback?(mdoglio)
Comment on attachment 8668679 [details] [review] PR :wlach can we have a chat about this? I want to be sure I get the right context of this change.
Flags: needinfo?(wlachance)
Attachment #8668679 - Flags: feedback?(mdoglio)
We discussed over vidyo. I'm going to redo the patch to extend PerformanceSignature to include date/branch information (and remove the uniqueness constraint on the signature hash), since that avoids the need for a new table.
Flags: needinfo?(wlachance)
Comment on attachment 8668679 [details] [review] PR Ok, so this is adjusted as you suggest, to incorporate the repository / update information inside the series itself. I also added a migration script to move over / update old data. There's a few things that will need to be cleaned up after the migration is finished: 1. We allow repository/last_updated to be null and shouldn't. 2. We should remove the old migration script.
Attachment #8668679 - Flags: review?(mdoglio)
Attachment #8668679 - Flags: review?(mdoglio)
Attachment #8668679 - Flags: review?(mdoglio) → review+
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/818c955bbe62d8e6eeeeb4860b1c293afc2b3984 Bug 1207648 - Test timestamp of perf datums after adaptation We were checking that the value was correct, but not the timestamp. https://github.com/mozilla/treeherder/commit/50a879fbfdc256aaf9492cc5b3ca3159ebfca87b Bug 1207648 - Speed up getting performance signatures We used to determine which performance signatures were in a repository by looking at that repository's performance datums. Unfortunately, that method was rather slow as the query had to work over millions of data points. So instead, store repository and last modified information in the signature table itself so it can be looked up quickly.
Depends on: 1215269
This is live now, waiting on bug 1215269 being fixed to resolve this though.
Depends on: 1216153
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/74c97147faf0a3b343399f26b92ea39f4521132a Bug 1207648 - Fix bugs in performance signature optimization This mostly worked fine except for a few cases where you're importing existing data: * import_perf_data was broken * handle case where we don't have an initial value for the signature last_updated field (probably only happens in the case of import_perf_data, but still)
Depends on: 1217630
I think we're done here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Forgot to remove some temporary code to support a migration after this landed, reopening to take care of that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8727975 [details] [review] [treeherder] wlach:1207648-followup > mozilla:master Cam, could you do a quick sanity check on this?
Attachment #8727975 - Flags: review?(cdawson)
Attachment #8727975 - Flags: review?(cdawson) → review+
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/f19d24fba92fb0b16741ccd2bf422c8910c086bd Bug 1207648 - Remove temporary migration code and schemas * We needed to allow the respository and last_updated fields to be null temporarily while we waited to run a migration, but this is no longer necessary * Also remove temporary migration script, it's no longer needed
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: