run opt builds by default on autoland to save some money and time
Categories
(Testing :: General, task)
Tracking
(firefox81 fixed)
Tracking | Status | |
---|---|---|
firefox81 | --- | fixed |
People
(Reporter: jmaher, Assigned: bhearsum)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
I calculated the savings we would get by running opt builds instead of shippable builds to be $1.14/push.
I then calculated the cost of running a full set of shippable builds to be $2.50/push.
the average push load on autoland the last 3 months is ~3000 pushes/month.
Querying the number of talos and raptor jobs that we have on autoland over that same period of time, it turns out that we would need enough builds to cover ~1/3 of the pushes.
so $1.143000 - $2.501000 = ~$1K/month.
I say $1K as we would run shippable builds on the backstop pushes already, so those 10 pushes/day would not be subtracted.
- I do wonder why not just optimize our
instr
andrun
tasks to be faster? That is unknown in terms of feasibility and engineering effort- worth answering though. - how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.
- what is the risk of a shippable only unittest regression showing up and adding a lot of load (this could be confusing for the code sheriffs on the backstop pushes when backfilling)
- what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.
Reporter | ||
Comment 1•4 years ago
|
||
:glandium can you help answer #1
:marco/:bhearsum can you help answer #2
:aryx: can you help answer #3
:igoldan can you help answer #4
Assignee | ||
Comment 2•4 years ago
|
||
how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.
I'm going to defer this to Tom.
Assignee | ||
Comment 3•4 years ago
|
||
Somewhat related bugs: https://bugzilla.mozilla.org/show_bug.cgi?id=1650083, https://bugzilla.mozilla.org/show_bug.cgi?id=1648292#c14. We probably only want to do one of the three of them.
Comment 4•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
I calculated the savings we would get by running opt builds instead of shippable builds to be $1.14/push.
I then calculated the cost of running a full set of shippable builds to be $2.50/push.
These numbers don't feel right. Looking at a recent m-c landing I can see a "Linux x64 shippable opt" build taking:
- instr - 36 min
- run - 7 min
- B - 50 min
So 93 minutes total build time.
For a "Linux x64 opt" build we have:
- B - 36 min
So 36 minutes total build time.
If shippable are $2.50/push I'd expect opt to be $0.96/push. That puts us closer to saving ~5K/month. Of course I'm just looking at 1 data point and platform so it could be the averages are different. Do we have raw numbers on total time for all shippable (instr + run + B) builds over June vs opt (just B)? I might be completely missing something here of course, if you could elaborate on how you came to your numbers that might be helpful.
Comment 5•4 years ago
•
|
||
Before reaching to this, we should finish bug 1637544, and improve the build times for the shippable build itself (the new dump_syms will help a lot, and there are other things in the pipe that will improve things too).
Reporter | ||
Comment 6•4 years ago
|
||
:erahm, I didn't realize opt-B vs shippable-B were different; I will look at the difference- the savings is not $5K, it is $1K on my numbers, with 14 more minutes saved, that might be $1.2K/month.
Reporter | ||
Comment 7•4 years ago
|
||
ok, looking more, we have a lot of opt builds that were included in the total cost which do not have shippable partners. I adjusted the parameters to focus on specific builds ( https://gist.github.com/jmaher/1b77b25928bcad397b978939828a1bb3 in bigquery )
average for Q2:
opt: $0.47
ship: $1.14
so that is a $0.67 savings on just the 'B' job and $1.14 for instr/run for a total of: $1.81/push.
This puts the savings closer to $3K/month.
Updated•4 years ago
|
Comment 9•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
- how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.
For the smart scheduler, I just need to be aware if the change is actually going to happen, and I need to make it so that in the training data shippable is considered the same as opt (otherwise we'll lose the entire training data) and to schedule opt instead of shippable on autoland.
It's a relatively easy change, we just need to coordinate.
Comment 10•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
- what is the risk of a shippable only unittest regression showing up and adding a lot of load (this could be confusing for the code sheriffs on the backstop pushes when backfilling)
Based on experiences with previouses switches from opt to shippable, the risk should be low. If autoland has shippable for backstop pushes and opt for the other pushes, this will make tracing issues across pushes in Treeherder problematic with current tooling. The 'Similar Jobs' tab won't show the other config and sheriffs might request backfills and get confused why it's not shown in 'Similar Jobs' or a filtered view.
Comment 11•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
[...]
4) what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.
I'm deferring this to Bebe & Alexandru, as they can provide the most precise answers.
Comment 12•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
...
4) what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.
I'm not very clear what "1+ hour delay" means. Most of the times, to identify a regression we have to make some backfills, filling a gap with a rough average of 5 revisions. If the entire backfill will be delayed with about 1h, I don't see a problem. But if every build will be delayed with this amount of time, giving we sometimes have to backfill several times, can increase our performance metrics (time-to's).
Reporter | ||
Comment 13•4 years ago
|
||
:alexandrui, every time you would request a backfill on a revision with no existing perf jobs, you would have to wait for builds to be generated. It doesn't mean that if you backfill 10 jobs it would take 10 hours, it would take 1-2 hours for the jobs to be run in parallel, then the perf tests would start. If you retrigger an existing job, it would just run the job as normal.
How often do you have to backfill multiple times? I assume this means for a single alert, you backfill, then have to repeat to get more data points? Is this because it didn't backfill enough revisions to start with?
Comment 14•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13)
How often do you have to backfill multiple times? I assume this means for a single alert, you backfill, then have to repeat to get more data points? Is this because it didn't backfill enough revisions to start with?
Not too often. Usually I do backfill+retrigger. But when I do backfill+backfill it means that the culprit wasn't covered by the initial backfill.
Reporter | ||
Comment 15•4 years ago
|
||
I think the risk here is low, and tweaking our tools for backfill range will be able to help perf sheriffs out.
It sounds like we really need to wait for bug 1637544 to land before deciding on this- also determining which big picture route we want to take (:bhearsum's comment 3) between this, bug 1650083, and bug 1648292
Comment 16•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #15)
I think the risk here is low, and tweaking our tools for backfill range will be able to help perf sheriffs out.
It sounds like we really need to wait for bug 1637544 to land before deciding on this- also determining which big picture route we want to take (:bhearsum's comment 3) between this, bug 1650083, and bug 1648292
I'm not sure what we need to wait on. It might reduce the time of the "instr" build, but wouldn't affect the "run" or "B" task that are also required for shippable and take up a majority of the build time.
Reporter | ||
Comment 17•4 years ago
|
||
I was echoing :glandium's comment, maybe we don't need to wait on that.
Assignee | ||
Comment 18•4 years ago
|
||
Rereading, it sounds like there's no reason we've come up with not go ahead and run opt
for all pushes on autoland, and only run shippable
for backstop and backfilling (please speak up if this is wrong). I'm going to work on this.
Comment 19•4 years ago
|
||
Looks good to me!
many thanks!
Assignee | ||
Comment 20•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 21•4 years ago
|
||
It seems like this didn't actually reduce the number of shippable builds on autoland. bhearsum, can you double-check that it's working as intended?
Reporter | ||
Comment 22•4 years ago
|
||
I don't think this has landed yet
Comment 23•4 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #22)
I don't think this has landed yet
That'd explain it! Not sure how I missed that, I think I confused the dup'd bug be resolved with landing.
Assignee | ||
Comment 24•4 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #23)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #22)
I don't think this has landed yet
That'd explain it! Not sure how I missed that, I think I confused the dup'd bug be resolved with landing.
Soon! Really!
Comment 25•4 years ago
|
||
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a706413a07d0 run opt builds by default on autoland; shippable builds by default on central r=tomprince,ahal,marco
Comment 26•4 years ago
|
||
bugherder |
Description
•