Closed
Bug 1207648
Opened 9 years ago
Closed 9 years ago
Getting list of performance signatures is rather slow
Categories
(Tree Management :: Perfherder, defect)
Tree Management
Perfherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wlach, Assigned: wlach)
References
Details
Attachments
(2 files)
The endpoint to get performance signatures for a particular platform is rather slow. Example:
https://treeherder.mozilla.org/api/project/mozilla-inbound/performance/signatures/?platform=linux64
On some basic profiling, this boils down to a query like this:
SELECT DISTINCT `performance_signature`.`signature_hash`, `option_collection`.`option_collection_hash`, `machine_platform`.`platform`, `performance_signature`.`suite`, `performance_signature`.`test`, `performance_signature`.`extra_properties` FROM `performance_datum` INNER JOIN `performance_signature` ON ( `performance_datum`.`signature_id` = `performance_signature`.`id` ) INNER JOIN `machine_platform` ON ( `performance_signature`.`platform_id` = `machine_platform`.`id` ) INNER JOIN `option_collection` ON ( `performance_signature`.`option_collection_id` = `option_collection`.`id` ) WHERE (`performance_datum`.`repository_id` = 2 AND `performance_datum`.`push_timestamp` >= '2015-09-16 08:57:26' AND (`performance_signature`.`platform_id`) IN (SELECT U0.`id` FROM `machine_platform` U0 WHERE U0.`platform` = 'linux64'))
I'm not 100% sure why this is slow. Basically we're doing a join from the performance datum data to performance signature data to get signatures within a time interval, which is a little roundabout, but I believe everything is indexed, so shouldn't the query be fast?
Pythian: If you could tell me why the above query takes a few seconds (as opposed to the few hundred ms or less that I'd prefer), I would appreciate it. :) This is on the treeherder production database (the one you looked at in bug 1192976)
Flags: needinfo?(team73)
Comment 1•9 years ago
|
||
Acked.
We will take a look and get back to you.
Thanks,
Rohit Kumar
Comment 2•9 years ago
|
||
Hi,
I have collected the stats for this query and sending it to Lead Database Consultant for senior advice.
Thanks,
Rohit Kumar
Comment 3•9 years ago
|
||
Hi wlach,
I took a look to the SQL code and DDL of the involved tables and I found the culprit
of the performance issue. Is not in the SQL code itself, instead is related with
the index definition on the table.
I also rewrote the query and get rid off the subquery over the machine_platorm table
and I gain some performance enhacement, not much as I expected.
I ran an ANALYZE TABLE on performance_datum, compared the before and after values
and stats looks healthy. The estimated rows is 39270590 on that table, and the
estimated selectivity of the query over the filtered values on that table is
19635854 rows (far beyond what I expected). After examine the push_timestamp
distribution I realized, that the culprit of such bad estimation was in fact on
the index definition. So far, the count over the existent rows was far lower
than the estimation.
The issue is that none of the indexes are declared in the correct order in their
column definitions. None of the current indexes prioritize the push_timestamp
column, which is the best filter on the table. There is actually one index over
push_timestamp column, however is not situable for the query as it doesn't have
declared any other of the columns used for the filtering.
Checked distribution of the table:
```
mysql> select count(*), date(push_timestamp)
from performance_datum where repository_id = 2 and push_timestamp > '2015-09-15 00:00:00' group by 2;
+----------+----------------------+
| count(*) | date(push_timestamp) |
+----------+----------------------+
| 233968 | 2015-09-15 |
| 188960 | 2015-09-16 |
| 286753 | 2015-09-17 |
| 234844 | 2015-09-18 |
| 77190 | 2015-09-19 |
| 94022 | 2015-09-20 |
| 249304 | 2015-09-21 |
| 303299 | 2015-09-22 |
| 288898 | 2015-09-23 |
| 202522 | 2015-09-24 |
| 105795 | 2015-09-25 |
+----------+----------------------+
11 rows in set (10.78 sec)
mysql> select min(push_timestamp) from performance_datum;
+----------------------------+
| min(push_timestamp) |
+----------------------------+
| 2014-12-22 15:19:44.000000 |
+----------------------------+
1 row in set (0.00 sec)
```
The index order should be changed to push_timestamp, signature_id, repository_id
in the order from the column wiht best cardinality to the worst:
Original indexes:
```
KEY `performance_datum_repository_id_795c16d99ae557e2_idx`
(`repository_id`,`signature_id`,`push_timestamp`),
UNIQUE KEY `performance_datum_repository_id_7c997e2277477458_uniq`
(`repository_id`,`job_id`,`result_set_id`,`signature_id`
,`push_timestamp`),
KEY `performance_datum_ce25c56a` (`push_timestamp`)
```
This is the expected rows when running an explain using different indexes.
| Index | Exp. Rows
| performance_datum_repository_id_795c16d99ae557e2_idx | 19058298
| performance_datum_ce25c56a | 7405816
I increased innodb_stats_transient_sample_pages to 16, but no substantial changes
were found after the table analyze.
I expect that the expected rows with a correct index definition as suggested on
this post, will be far better than the estimated by `performance_datum_ce25c56a`
index.
Regards,
Flags: needinfo?(team73)
Comment 4•9 years ago
|
||
Also, the rewrote query is here (same amount of records returned, slight better performance not substantial):
mysql> explain SELECT DISTINCT `performance_signature`.`signature_hash`,
-> `option_collection`.`option_collection_hash`,
-> `machine_platform`.`platform`,
-> `performance_signature`.`suite`,
-> `performance_signature`.`test`,
-> `performance_signature`.`extra_properties`
-> FROM `performance_datum`
-> INNER JOIN `performance_signature` ON (`performance_datum`.`signature_id` = `performance_signature`.`id`)
-> INNER JOIN `machine_platform` ON (`performance_signature`.`platform_id` = `machine_platform`.`id`)
-> INNER JOIN `option_collection` ON (`performance_signature`.`option_collection_id` = `option_collection`.`id`)
-> WHERE (`performance_datum`.`repository_id` = 2
-> AND `performance_datum`.`push_timestamp` >= '2015-09-16 08:57:26'
-> AND machine_platform.platform = 'linux64')\G
Assignee | ||
Comment 5•9 years ago
|
||
I added two indexes to the table on stage with the structure you suggested:
KEY `signature_repo` (`signature_id`,`repository_id`),
KEY `timestamp_signature_repo` (`push_timestamp`,`signature_id`,`repository_id`),
(the former I used to experiment with a query unrestricted by time)
... but am not seeing substantial performance improvements. The query still takes on the order of 5 seconds to complete, which is not really acceptable.
I am most interested in knowing if it is even possible for this type of query to be fast (i.e. I want a sub-second query). I'm completely open to changing the table schema. But if it isn't, then I need to look into other types of caching outside of mysql.
Flags: needinfo?(team73)
Comment 6•9 years ago
|
||
Hi wlach,
If we want to keep this query fast, we may need to reduce the examined rows by narrowing the filters over performance_datum. One way to do this is to enclose the time filtering between 2 dates, and iterate through the days, forcing less examined rows per query.
Other way will be looking for another filter over that table (a new column filter on the query).
About changing the table schema, we can totally help with that. Can you point us to a documentation about this schema?
Regards,
Flags: needinfo?(team73)
Assignee | ||
Comment 7•9 years ago
|
||
(In reply to Pythian Team73 from comment #6)
> Hi wlach,
>
> If we want to keep this query fast, we may need to reduce the examined rows
> by narrowing the filters over performance_datum. One way to do this is to
> enclose the time filtering between 2 dates, and iterate through the days,
> forcing less examined rows per query.
>
> Other way will be looking for another filter over that table (a new column
> filter on the query).
So after some pondering, I think the best thing might be to just keep a seperate table of performance signature update information, that way we can limit the query to just the performance signature data. I have a proof of concept already.
Assignee | ||
Comment 8•9 years ago
|
||
Hey Mauro, can you give me some feedback on this approach? I think using another database table is better than the previous strategy of etag caching (which got killed with the performance refactor), but maybe there's something I'm missing.
You can ignore all the commits in the PR except for the last. I'm basing this off of unlanded work in bug 1175295, so I don't have to redo it later.
Assignee: nobody → wlachance
Attachment #8668679 -
Flags: feedback?(mdoglio)
Comment 9•9 years ago
|
||
Comment on attachment 8668679 [details] [review]
PR
:wlach can we have a chat about this? I want to be sure I get the right context of this change.
Flags: needinfo?(wlachance)
Attachment #8668679 -
Flags: feedback?(mdoglio)
Assignee | ||
Comment 10•9 years ago
|
||
We discussed over vidyo. I'm going to redo the patch to extend PerformanceSignature to include date/branch information (and remove the uniqueness constraint on the signature hash), since that avoids the need for a new table.
Flags: needinfo?(wlachance)
Assignee | ||
Comment 13•9 years ago
|
||
Comment on attachment 8668679 [details] [review]
PR
Ok, so this is adjusted as you suggest, to incorporate the repository / update information inside the series itself. I also added a migration script to move over / update old data.
There's a few things that will need to be cleaned up after the migration is finished:
1. We allow repository/last_updated to be null and shouldn't.
2. We should remove the old migration script.
Attachment #8668679 -
Flags: review?(mdoglio)
Updated•9 years ago
|
Attachment #8668679 -
Flags: review?(mdoglio)
Assignee | ||
Comment 14•9 years ago
|
||
Comment on attachment 8668679 [details] [review]
PR
Fixed save() override, and added a test for it: https://github.com/mozilla/treeherder/pull/1028/files#diff-9ef5663d9b42ca443d9e352ce4905ecbR148
Attachment #8668679 -
Flags: review?(mdoglio)
Updated•9 years ago
|
Attachment #8668679 -
Flags: review?(mdoglio) → review+
Comment 15•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/818c955bbe62d8e6eeeeb4860b1c293afc2b3984
Bug 1207648 - Test timestamp of perf datums after adaptation
We were checking that the value was correct, but not the timestamp.
https://github.com/mozilla/treeherder/commit/50a879fbfdc256aaf9492cc5b3ca3159ebfca87b
Bug 1207648 - Speed up getting performance signatures
We used to determine which performance signatures were in a repository by
looking at that repository's performance datums. Unfortunately, that method
was rather slow as the query had to work over millions of data points. So
instead, store repository and last modified information in the signature
table itself so it can be looked up quickly.
Assignee | ||
Comment 16•9 years ago
|
||
This is live now, waiting on bug 1215269 being fixed to resolve this though.
Comment 17•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/74c97147faf0a3b343399f26b92ea39f4521132a
Bug 1207648 - Fix bugs in performance signature optimization
This mostly worked fine except for a few cases where you're importing
existing data:
* import_perf_data was broken
* handle case where we don't have an initial value for the signature
last_updated field (probably only happens in the case of
import_perf_data, but still)
Assignee | ||
Comment 18•9 years ago
|
||
I think we're done here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 19•9 years ago
|
||
Assignee | ||
Comment 20•9 years ago
|
||
Forgot to remove some temporary code to support a migration after this landed, reopening to take care of that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 21•9 years ago
|
||
Comment on attachment 8727975 [details] [review]
[treeherder] wlach:1207648-followup > mozilla:master
Cam, could you do a quick sanity check on this?
Attachment #8727975 -
Flags: review?(cdawson)
Updated•9 years ago
|
Attachment #8727975 -
Flags: review?(cdawson) → review+
Comment 22•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/f19d24fba92fb0b16741ccd2bf422c8910c086bd
Bug 1207648 - Remove temporary migration code and schemas
* We needed to allow the respository and last_updated fields to be null
temporarily while we waited to run a migration, but this is no longer
necessary
* Also remove temporary migration script, it's no longer needed
Assignee | ||
Updated•9 years ago
|
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•