Use a build optimizer for |mach try auto|
Categories
(Firefox Build System :: Task Configuration, enhancement, P3)
Tracking
(firefox78 fixed)
Tracking | Status | |
---|---|---|
firefox78 | --- | fixed |
People
(Reporter: ahal, Assigned: marco)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
On autoland we run every build on every push. This is done so that sheriffs can have quick backfills when investigating failures, ensuring we don't close the trees longer than necessary.
With ./mach try auto
we figured running every build on people's try push was wasteful. So instead we opted to simply not select any builds, and then only the builds that are dependencies of people's tests would be filled in.
However feedback is already coming in that devs are expecting certain builds to run on their pushes and are surprised when they aren't. It's responsible for at least one backout on a build that doesn't contain tests so far.
So it seems like we need a build optimizer that gets us somewhere in between the behaviour of "everything" and "nothing". Maybe we can simply use a modified skip-unless-schedules
(I think the current build-related rules would schedule too much). Or maybe we need to invent something more clever.
I believe the long-term goal is to have the ML also select builds, but that's likely a long way off.
Assignee | ||
Comment 1•5 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #0)
I believe the long-term goal is to have the ML also select builds, but that's likely a long way off.
It shouldn't be too hard as the whole pipeline for test tasks is almost the same as what is required for build tasks (the model will likely require some changes to make it more effective for builds). Some initial results for builds (without changes to the features considered by the model):
Testing on 1557 (47 with failures) out of 15569. 51 schedulable tasks.
For confidence threshold 0.3: scheduled 14.571612074502248 tasks on average (min 0, max 38). In 89.36170212765957% of pushes we caught at least one failure. On average, we caught 79.577178045559% of all seen failures.
For confidence threshold 0.5: scheduled 11.355812459858702 tasks on average (min 0, max 35). In 87.23404255319149% of pushes we caught at least one failure. On average, we caught 74.89332783988785% of all seen failures.
For confidence threshold 0.7: scheduled 8.364161849710984 tasks on average (min 0, max 33). In 82.97872340425532% of pushes we caught at least one failure. On average, we caught 64.59131961635926% of all seen failures.
For confidence threshold 0.8: scheduled 6.452793834296725 tasks on average (min 0, max 32). In 78.72340425531915% of pushes we caught at least one failure. On average, we caught 56.297566431391246% of all seen failures.
For confidence threshold 0.85: scheduled 5.318561335902376 tasks on average (min 0, max 31). In 76.59574468085107% of pushes we caught at least one failure. On average, we caught 51.05994275634404% of all seen failures.
For confidence threshold 0.9: scheduled 4.03082851637765 tasks on average (min 0, max 31). In 70.2127659574468% of pushes we caught at least one failure. On average, we caught 42.682776485513926% of all seen failures.
For confidence threshold 0.95: scheduled 2.317276814386641 tasks on average (min 0, max 31). In 59.57446808510638% of pushes we caught at least one failure. On average, we caught 29.926171886439697% of all seen failures.
Assignee | ||
Comment 2•5 years ago
|
||
Depends on D75621
Updated•5 years ago
|
Comment 4•5 years ago
|
||
bugherder |
Description
•