Closed Bug 1621764 Opened 4 years ago Closed 4 years ago

reduce build-plain to either every 10th push or m-c tier-2 only

Categories

(Testing :: General, task, P3)

Version 3
task

Tracking

(firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: jmaher, Assigned: bc)

References

(Regressed 1 open bug)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(2 files, 2 obsolete files)

we run linux/windows debug build-plain on every push to autoland. While we do not spend a lot of cpu hours and costs building these, it is unnecessary to be run every commit.

in the ~6 months of data we have in bigquery, there are 272 revisions where Bp jobs fail (most fail both linux/windows), and 6 of those that do not have another build failing at the same time. looking at the 6 revisions all failures are intermittent (failure to download something).

The risk is low to reduce frequency here as we are not finding plain only regressions.

we should consider valgrind builds as well

valgrind build jobs have found 3 regressions all in the month of January, but that is all in the 7 months I looked at data.

Priority: -- → P3

in addition we should reduce win/aarch64* builds and linux64/aarch64 builds to be every 10th push as we are not running tests for those on autoland.

icing on the cake is the opt builds (as we only test on shippable) once we move the few remaining tests that depend on regular opt builds to be on shippable.

Assignee: nobody → bob
Status: NEW → ASSIGNED

jmaher: Can you clarify the meaning of "reduce ... to either every 10th push or m-c tier-2 only" for me?

We want every 10th push on autoland and want every push on mozilla-central but they should be tier 2 there?

Flags: needinfo?(jmaher)

we should treat all the builds referenced here like fuzzing builds, forced SETA every 10th push. We might want to reconsider to be like some tier-2 perf tests and be every 25th push, but we don't have that implemented yet. Changing to every 10th would be a boost in the short term with no risk.

Flags: needinfo?(jmaher)

I was looking into this initially with the idea that these would be very much like the seta fuzzing changes I made earlier but that requires simultaneous changes to treeherder to support using seta on the specific builds and is a pain to test. I came to a different idea this morning that I could just use a normal schedule without involving seta at all. The advantage was that there are no treeherder changes required and testing is very easy. The question is whether we would want every 10th push on all trees (projects) or just autoland. I'll assume autoland and allow builds for every push on mozilla-central. I'll put up a phab in a bit to show what I'm talking about.

yeah, every build for m-c, beta, release, esr, try- this would only apply to autoland

PushIntervalStrategy is modeled on seta's approach to schedule tasks on
every Nth push. It is restricted to the autoland project.

Two strategies "push-interval-10" and "push-interval-25" are defined for
scheduling tasks for every 10th and every 25th push respectively.

Debugging output is available via the --verbose option to mach taskgraph optimized.

This patch uses the new push-interval-10 to schedule the linux, windows plain and aarch64
builds on autoland every 10th push.

Tested locally with a local checkout whose pushlog_id was not divisible
by 10 using parameters.yml downloaded from the Gecko Decision Task using

./mach taskgraph optimized --verbose --parameters /tmp/parameters.yml

parameters.yml from autoland showed the following optimizations.

0:56.13 PushIntervalStrategy: Removing task build-linux64-aarch64/opt interval 10
0:56.13 PushIntervalStrategy: Removing task build-linux64-plain/debug interval 10
0:56.13 PushIntervalStrategy: Removing task build-signing-win64-aarch64/opt interval 10
0:56.13 PushIntervalStrategy: Removing task build-win64-aarch64/debug interval 10
0:56.13 PushIntervalStrategy: Removing task build-win64-plain/debug interval 10
0:56.18 PushIntervalStrategy: Removing task valgrind-linux64-valgrind/opt interval 10

while parameters.yml from mozilla-central did not show any PushIntervalStrategy
optimizations.

Depends on D70181

Feedback appreciated on the approach and the results before I formally ask for review.

Flags: needinfo?(jmaher)
Attachment #9139124 - Attachment description: Bug 1621764 - Define PushIntervalStrategy optimization strategy. → Bug 1621764 - Define PushIntervalStrategy optimization strategy, r=ahal.
Attachment #9139125 - Attachment description: Bug 1621764 - Apply push-interval strategies for linux, windows plain and aarch64 builds. → Bug 1621764 - Apply push-interval strategies for linux, windows plain and aarch64 builds, r=jmaher.
Flags: needinfo?(jmaher)
Attachment #9139124 - Attachment is obsolete: true
See Also: → 1626962, 1619233
Depends on: 1625200

Depends on D70182

Attachment #9142487 - Attachment description: Bug 1621764 - Define PushIntervalStrategy optimization strategy, r=ahal → Bug 1621764 - Define push-interval-{10,25} Backstop optimization strategies, r=ahal
Attachment #9142488 - Attachment description: Bug 1621764 - add debug output for Backstop optimizations, r=ahal. → Bug 1621764 - add debug output for Backstop optimizations
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8808cb9cbff2
Define push-interval-{10,25} Backstop optimization strategies, r=ahal
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2001c1f52aa0
Apply push-interval strategies for linux, windows plain and aarch64 builds, r=jmaher.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
Regressions: 1633927
Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

Does this mean that an unfortunately-timed patch may make it to m-c only to find that a push-interval job breaks later? Or do we have some way of letting the interval jobs catch up before selecting merge candidates?

Flags: needinfo?(jmaher)

Thanks for asking!

we already run most jobs every 10th push, that is as safe as any other job. If we were running every 25th push that is more of what we view as tier-2; most likely it will be caught before the merge, but it could miss the timing window and there could be a regression. All the builds in this bug have been adjusted to the 10th push which will be required to be green before merging to m-c.

Flags: needinfo?(jmaher)
Attachment #9142488 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: