[scheduling] Consider changing SETA to run the full set of tasks every 10th push or 2 hours
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox73 fixed)
| Tracking | Status | |
|---|---|---|
| firefox73 | --- | fixed |
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
We currently do this every 5th push or 1 hour.
I have a recipe that analyzes the effects of this change. Results below:
| Scheduler | Total Tasks | Primary Backouts | Secondary Backouts | Secondary Backout Rate | Scheduler Efficiency |
|---|---|---|---|---|---|
| seta_10_120 | 447160 | 68 | 42 | 0.38 | 0.59 |
| baseline | 677090 | 77 | 43 | 0.36 | 0.41 |
This data still has a few bugs in it I need to iron out*. But essentially it means that since Aug 29th, this change would reduce the total tasks we run on autoland/inbound by around 34% while only very slightly increasing the secondary backout rate.
The secondary backout rate is the percentage of backouts where none of the failing tasks ran on the offending push (i.e a sheriff needed to backfill them to find the regression).
In other words, this change could provide a substantial decrease in the number of tasks we run, at the cost of slightly more work to the sheriffs.
* The bugs are that sometimes downloading the seta_10_120 artifact is failing for some reason, so the number of tasks is artificially low. So the actual percentage decrease is lower than 34% (but still substantial). I'll update the numbers when I have this fixed. It only happens on a small number of pushes.
** There may also be bugs in the methodology here (e.g how a secondary backout is calculated). This is all very new and unproven. Though I do think there is at least a signal here that making this change might be a good idea.
| Assignee | ||
Comment 1•6 years ago
|
||
Hi Sebastian, this change isn't imminent or anything but wanted to get your opinion on it. Aside from fixing the data, do you have any reservations about this change? Anything you'd like to see before it lands?
Comment 2•6 years ago
|
||
Sorry for the delayed reply. It looks good to ship.
Concerns:
- Less likely to have a merge candidate because there are fewer => sometimes delays in shipping changes with Nightly, they will catch the next one instead.
- Sometimes the frequency with which a failure can be observed contributes to its detection (e.g. intermittent or existed as an intermittent before).
Actions:
- Treeherder's Backfill and Custom Action > Backfill commands need the default values changed to run on the 10 (better: 9) previous jobs
Questions:
- Is there an estimation available how much the total testing costs will decrease due to that?
Thank you for working on the load reduction.
| Assignee | ||
Comment 3•6 years ago
|
||
Updated•6 years ago
|
| Assignee | ||
Comment 4•6 years ago
|
||
We'll need to fix the treeherder backfill button before landing this, but just wanted to get a patch ready in the meantime.
Comment 5•6 years ago
|
||
:armenzg, can you coordinate getting the treeherder backfill button to do 9 jobs (instead of 4 or 5 now)?
Updated•6 years ago
|
| Assignee | ||
Comment 6•6 years ago
|
||
This should cover all the pushes between the ones that scheduled all tasks.
Depends on D55020
Comment 7•6 years ago
|
||
This is an in-tree request. The patch has been reviewed and will be landed next week.
Comment 9•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/ad90e3772a31
https://hg.mozilla.org/mozilla-central/rev/b487505e9ebe
Description
•