Closed
Bug 1246196
Opened 9 years ago
Closed 9 years ago
loading 'Similar jobs' is slow/can take very long
Categories
(Tree Management :: Treeherder, defect, P2)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: aryx, Assigned: camd)
Details
Attachments
(2 files)
After 'Similar jobs' has been clicked, it takes in general very long until the job list has been retrieved, e.g. tried it for a job: 1.89 KB got sent after 14 seconds.
Is this an indexing issue and can this be fixed?
Same platform
Same options
Same machine
allow 7 different non-empty combinations.
| Assignee | ||
Updated•9 years ago
|
Priority: -- → P2
| Assignee | ||
Updated•9 years ago
|
Assignee: nobody → cdawson
Comment 1•9 years ago
|
||
Any news on this?
I also find it very frustrating when looking at oranges and, at least in theory, you'd have thought that returning this information ought to be pretty fast.
Thanks.
Flags: needinfo?(cdawson)
| Assignee | ||
Comment 2•9 years ago
|
||
I have not been able to address this yet. Other work has crept up higher in the priority list. But I will take a look next week and see what I can do.
Flags: needinfo?(cdawson)
Comment 3•9 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #2)
> I have not been able to address this yet. Other work has crept up higher in
> the priority list. But I will take a look next week and see what I can do.
I had a quick look at this. I tried running this query manually:
SELECT j.id, j.`job_guid`, j.`signature`, j.`job_coalesced_to_guid`, j.`build_platform_id`, j.`option_collection_hash`, j.failure_classification_id, m.`name` AS machine_name, mp.`platform` AS platform, mp.`os_name` AS machine_platform_os, mp.`architecture` AS machine_platform_architecture, bp.`platform` AS build_platform, bp.`os_name` AS build_os, bp.`architecture` AS build_architecture, j.`job_type_id` AS job_type_id, jt.`name` AS job_type_name, jt.`symbol` AS job_type_symbol, jt.`description` AS job_type_description, jg.`name` AS job_group_name, jt.`job_group_id` AS job_group_id, jg.`symbol` AS job_group_symbol, jg.`description` AS job_group_description, j.`who`, j.`result_set_id`, j.`result`, j.`state`, j.`reason`, j.`start_timestamp`, j.`end_timestamp`, j.`submit_timestamp`, j.`running_eta`, j.`last_modified`, j.`tier`, rds.`name` AS ref_data_name, rds.`build_system_type` AS build_system_type FROM `job` AS j LEFT JOIN `treeherder`.`machine` AS m ON j.`machine_id` = m.id LEFT JOIN `treeherder`.`machine_platform` AS mp ON j.`machine_platform_id` = mp.id LEFT JOIN `treeherder`.`build_platform` AS bp ON j.`build_platform_id` = bp.id LEFT JOIN `treeherder`.`job_type` AS jt ON j.`job_type_id` = jt.id LEFT JOIN `treeherder`.`job_group` AS jg ON jt.`job_group_id` = jg.id LEFT JOIN `treeherder`.reference_data_signatures rds ON j.signature = rds.signature LEFT JOIN result_set rs ON rs.id = j.result_set_id WHERE 1 AND j.option_collection_hash = '32faaecac742100f7753f0c1d0aa0add01b4046b' AND j.build_platform_id = 106 AND j.id <> 28033815 AND j.job_type_id = 4542 GROUP BY j.id ORDER BY rs.push_timestamp DESC LIMIT 21 OFFSET 0
It's pretty slow (5-10 secs) to run. However, if we change `ORDER BY rs.push_timestamp` to `ORDER BY j.result_set_id` it gets a lot faster. Do we really need to be that precise about which result sets we're getting jobs from? It stands to reason to me that result_set_id is normally incrementing, so should be good enough for a view like this.
Comment 4•9 years ago
|
||
(In reply to William Lachance (:wlach) from comment #3)
> (In reply to Cameron Dawson [:camd] from comment #2)
> > I have not been able to address this yet. Other work has crept up higher in
> > the priority list. But I will take a look next week and see what I can do.
>
> I had a quick look at this. I tried running this query manually:
I used to do a lot of database development in a previous life and I'm intrigued by the huge query.
I assume we're actually using all those files, there are a lot more fields than I see in the Similar Jobs UI.
Do we need that GROUP BY j.id?
Also, out of interest, what database is this?
Flags: needinfo?(wlachance)
Comment 5•9 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #4)
> (In reply to William Lachance (:wlach) from comment #3)
> > (In reply to Cameron Dawson [:camd] from comment #2)
> > > I have not been able to address this yet. Other work has crept up higher in
> > > the priority list. But I will take a look next week and see what I can do.
> >
> > I had a quick look at this. I tried running this query manually:
>
> I used to do a lot of database development in a previous life and I'm
> intrigued by the huge query.
> I assume we're actually using all those files, there are a lot more fields
> than I see in the Similar Jobs UI.
I don't think we really do, but I don't think that part is really worth optimizing at this point (it's not the bottleneck, and we're going to be completely rewriting this code shortly)
> Do we need that GROUP BY j.id?
Almost certainly not. :)
> Also, out of interest, what database is this?
It's the treeherder database.
Flags: needinfo?(wlachance)
Comment 6•9 years ago
|
||
(In reply to William Lachance (:wlach) from comment #3)
> (In reply to Cameron Dawson [:camd] from comment #2)
> > I have not been able to address this yet. Other work has crept up higher in
> > the priority list. But I will take a look next week and see what I can do.
>
> I had a quick look at this. I tried running this query manually:
>
> ...
>
> It's pretty slow (5-10 secs) to run. However, if we change `ORDER BY
> rs.push_timestamp` to `ORDER BY j.result_set_id` it gets a lot faster. Do we
> really need to be that precise about which result sets we're getting jobs
> from? It stands to reason to me that result_set_id is normally incrementing,
> so should be good enough for a view like this.
So after some testing on stage, I realized that things were still rather slow. It turns out that what we really need is a composite index on all the columns we're indexing on.
The combination of (1) the suggested change to use result_set_id and (2) bobowen's suggestion of not bothering with group by and (3) adding this index to the mozilla inbound jobs table on stage: `create index wlach_test_idx_jobs_all_the_things on job (result_set_id, option_collection_hash, build_platform_id, job_type_id)` seems to make things perform quite nicely. I'll post a patch!
Comment 7•9 years ago
|
||
Comment 8•9 years ago
|
||
Comment on attachment 8753496 [details] [review]
[treeherder] wlach:1246196 > mozilla:master
I *think* this should be enough. We'll have to manually create the indexes ourselves, of course.
Attachment #8753496 -
Flags: review?(cdawson)
| Assignee | ||
Comment 9•9 years ago
|
||
Comment on attachment 8753496 [details] [review]
[treeherder] wlach:1246196 > mozilla:master
lgtm. :)
Attachment #8753496 -
Flags: review?(cdawson) → review+
Comment 10•9 years ago
|
||
So I forgot to mention this, but in further testing I wasn't 100% convinced I had actually fixed the problem.
I am testing another approach made possible by some data model changes we've made, hopefully this will finally fix the problem.
Comment 11•9 years ago
|
||
Updated•9 years ago
|
Attachment #8816538 -
Flags: review?(cdawson)
| Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8816538 [details] [review]
[treeherder] wlach:similar_jobs_speedup > mozilla:master
I haven't tested this myself, but the reasoning seems sound. :)
Attachment #8816538 -
Flags: review?(cdawson) → review+
Comment 13•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/d3827f2e8c31e440c83c09a97b34994a51b4eb25
Bug 1246196 - Speed up loading similar jobs (#2015)
* Instead of ordering by push time (which is a foreign key), order by job
start time.
* Add composite indexes to speed up the query
Comment 14•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/5141dada8fa81abb555a222c848e88dd0e8e2a96
Bug 1246196 - Update migrations
Comment 15•9 years ago
|
||
Deployed change to stage and loading similar jobs is now pretty snappy there. Production should see this on the next deploy (I'll probably announce/blog when that happens)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•