1650208 - run opt builds by default on autoland to save some money and time

Reporter

Description

•

5 years ago

:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.

I calculated the savings we would get by running opt builds instead of shippable builds to be $1.14/push.

I then calculated the cost of running a full set of shippable builds to be $2.50/push.

the average push load on autoland the last 3 months is ~3000 pushes/month.

Querying the number of talos and raptor jobs that we have on autoland over that same period of time, it turns out that we would need enough builds to cover ~1/3 of the pushes.

so $1.143000 - $2.501000 = ~$1K/month.

I say $1K as we would run shippable builds on the backstop pushes already, so those 10 pushes/day would not be subtracted.

I do wonder why not just optimize our instr and run tasks to be faster? That is unknown in terms of feasibility and engineering effort- worth answering though.
how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.
what is the risk of a shippable only unittest regression showing up and adding a lot of load (this could be confusing for the code sheriffs on the backstop pushes when backfilling)
what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

5 years ago

:glandium can you help answer #1
:marco/:bhearsum can you help answer #2
:aryx: can you help answer #3
:igoldan can you help answer #4

Flags: needinfo?(mh+mozilla)

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(igoldan)

Flags: needinfo?(aryx.bugmail)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 2

•

5 years ago

how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.

I'm going to defer this to Tom.

Flags: needinfo?(mozilla)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 3

•

5 years ago

Somewhat related bugs: https://bugzilla.mozilla.org/show_bug.cgi?id=1650083, https://bugzilla.mozilla.org/show_bug.cgi?id=1648292#c14. We probably only want to do one of the three of them.

Eric Rahm [:erahm]

Comment 4

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.

I calculated the savings we would get by running opt builds instead of shippable builds to be $1.14/push.

I then calculated the cost of running a full set of shippable builds to be $2.50/push.

These numbers don't feel right. Looking at a recent m-c landing I can see a "Linux x64 shippable opt" build taking:

instr - 36 min
run - 7 min
B - 50 min

So 93 minutes total build time.

For a "Linux x64 opt" build we have:

B - 36 min

So 36 minutes total build time.

If shippable are $2.50/push I'd expect opt to be $0.96/push. That puts us closer to saving ~5K/month. Of course I'm just looking at 1 data point and platform so it could be the averages are different. Do we have raw numbers on total time for all shippable (instr + run + B) builds over June vs opt (just B)? I might be completely missing something here of course, if you could elaborate on how you came to your numbers that might be helpful.

Flags: needinfo?(jmaher)

Mike Hommey [:glandium]

Comment 5

•

5 years ago

•

Edited

Before reaching to this, we should finish bug 1637544, and improve the build times for the shippable build itself (the new dump_syms will help a lot, and there are other things in the pipe that will improve things too).

Flags: needinfo?(mh+mozilla)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

5 years ago

:erahm, I didn't realize opt-B vs shippable-B were different; I will look at the difference- the savings is not $5K, it is $1K on my numbers, with 14 more minutes saved, that might be $1.2K/month.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

5 years ago

ok, looking more, we have a lot of opt builds that were included in the total cost which do not have shippable partners. I adjusted the parameters to focus on specific builds ( https://gist.github.com/jmaher/1b77b25928bcad397b978939828a1bb3 in bigquery )

average for Q2:
opt: $0.47
ship: $1.14

so that is a $0.67 savings on just the 'B' job and $1.14 for instr/run for a total of: $1.81/push.

This puts the savings closer to $3K/month.

Flags: needinfo?(jmaher)

Marco Castelluccio [:marco]

Updated

•

5 years ago

Blocks: cost-reduction

Flags: needinfo?(mcastelluccio)

Sylvestre Ledru [:Sylvestre]

Updated

•

5 years ago

Comment 9

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

how difficult would it be to adjust our scheduling to be opt by default on autoland, but ensure that perf tests are only run on shippable and every backstop we only run shippable.

For the smart scheduler, I just need to be aware if the change is actually going to happen, and I need to make it so that in the training data shippable is considered the same as opt (otherwise we'll lose the entire training data) and to schedule opt instead of shippable on autoland.
It's a relatively easy change, we just need to coordinate.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 10

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

what is the risk of a shippable only unittest regression showing up and adding a lot of load (this could be confusing for the code sheriffs on the backstop pushes when backfilling)
Based on experiences with previouses switches from opt to shippable, the risk should be low. If autoland has shippable for backstop pushes and opt for the other pushes, this will make tracing issues across pushes in Treeherder problematic with current tooling. The 'Similar Jobs' tab won't show the other config and sheriffs might request backfills and get confused why it's not shown in 'Similar Jobs' or a filtered view.

Flags: needinfo?(aryx.bugmail)

Ionuț Goldan [:igoldan]

Comment 11

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
[...]
4) what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.

I'm deferring this to Bebe & Alexandru, as they can provide the most precise answers.

Flags: needinfo?(igoldan)

Flags: needinfo?(fstrugariu)

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 12

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)

:erahm asked why we do shippable builds on autoland and not opt. opt builds are faster (and cheaper), but perf tests are on shippable.
...
4) what is the risk of the perf sheriffs as builds wouldn't be available and backfilling would have an extra 1+ hour delay.

I'm not very clear what "1+ hour delay" means. Most of the times, to identify a regression we have to make some backfills, filling a gap with a rough average of 5 revisions. If the entire backfill will be delayed with about 1h, I don't see a problem. But if every build will be delayed with this amount of time, giving we sometimes have to backfill several times, can increase our performance metrics (time-to's).

Flags: needinfo?(aionescu)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

5 years ago

:alexandrui, every time you would request a backfill on a revision with no existing perf jobs, you would have to wait for builds to be generated. It doesn't mean that if you backfill 10 jobs it would take 10 hours, it would take 1-2 hours for the jobs to be run in parallel, then the perf tests would start. If you retrigger an existing job, it would just run the job as normal.

How often do you have to backfill multiple times? I assume this means for a single alert, you backfill, then have to repeat to get more data points? Is this because it didn't backfill enough revisions to start with?

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 14

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #13)

How often do you have to backfill multiple times? I assume this means for a single alert, you backfill, then have to repeat to get more data points? Is this because it didn't backfill enough revisions to start with?

Not too often. Usually I do backfill+retrigger. But when I do backfill+backfill it means that the culprit wasn't covered by the initial backfill.

Flags: needinfo?(aionescu)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 15

•

5 years ago

I think the risk here is low, and tweaking our tools for backfill range will be able to help perf sheriffs out.

It sounds like we really need to wait for bug 1637544 to land before deciding on this- also determining which big picture route we want to take (:bhearsum's comment 3) between this, bug 1650083, and bug 1648292

Flags: needinfo?(mozilla)

Flags: needinfo?(fstrugariu)

Eric Rahm [:erahm]

Comment 16

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #15)

I think the risk here is low, and tweaking our tools for backfill range will be able to help perf sheriffs out.

It sounds like we really need to wait for bug 1637544 to land before deciding on this- also determining which big picture route we want to take (:bhearsum's comment 3) between this, bug 1650083, and bug 1648292

I'm not sure what we need to wait on. It might reduce the time of the "instr" build, but wouldn't affect the "run" or "B" task that are also required for shippable and take up a majority of the build time.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 17

•

5 years ago

I was echoing :glandium's comment, maybe we don't need to wait on that.

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 18

•

5 years ago

Rereading, it sounds like there's no reason we've come up with not go ahead and run opt for all pushes on autoland, and only run shippable for backstop and backfilling (please speak up if this is wrong). I'm going to work on this.

Assignee: nobody → bhearsum

Sylvestre Ledru [:Sylvestre]

Comment 19

•

5 years ago

Looks good to me!
many thanks!

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 20

•

5 years ago

Attached file Bug 1650208: run opt builds by default on autoland; shippable builds by default on central — Details

Phabricator Automation

Updated

•

5 years ago

Attachment #9162922 - Attachment description: Bug 1650208: run opt builds by default on autoland to save some money and time → bug 1650208: run opt builds by default on autoland; shippable builds by default on central

Phabricator Automation

Updated

•

5 years ago

Attachment #9162922 - Attachment description: bug 1650208: run opt builds by default on autoland; shippable builds by default on central → Bug 1650208: run opt builds by default on autoland; shippable builds by default on central

Eric Rahm [:erahm]

Comment 21

•

5 years ago

It seems like this didn't actually reduce the number of shippable builds on autoland. bhearsum, can you double-check that it's working as intended?

Flags: needinfo?(bhearsum)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 22

•

5 years ago

I don't think this has landed yet

Eric Rahm [:erahm]

Comment 23

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #22)

I don't think this has landed yet

That'd explain it! Not sure how I missed that, I think I confused the dup'd bug be resolved with landing.

Flags: needinfo?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Assignee

Comment 24

•

5 years ago

(In reply to Eric Rahm [:erahm] from comment #23)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #22)

I don't think this has landed yet

That'd explain it! Not sure how I missed that, I think I confused the dup'd bug be resolved with landing.

Soon! Really!

Pulsebot

Comment 25

•

5 years ago

Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a706413a07d0 run opt builds by default on autoland; shippable builds by default on central r=tomprince,ahal,marco

Cristina Coroiu [:ccoroiu]

Comment 26

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a706413a07d0

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox81: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Future

Marco Castelluccio [:marco]

Updated

•

4 years ago

Blocks: 1709810