Closed Bug 1636902 Opened 11 months ago Closed 11 months ago

Move base-toolchains builds to run only on backstop pushes (or only on mozilla-central)

Categories

(Testing :: General, enhancement)

Version 3
enhancement

Tracking

(firefox78 fixed)

RESOLVED FIXED
mozilla78
Tracking Status
firefox78 --- fixed

People

(Reporter: marco, Assigned: marco)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(1 file)

I've found 40 pushes, out of 23144 analyzed, where one of the four base-toolchain builds failed and where other builds didn't fail.

:bc, could you run your script to see how much these builds cost on autoland?

build-linux64-base-toolchains-clang/debug
build-linux64-base-toolchains-clang/opt
build-linux64-base-toolchains/debug
build-linux64-base-toolchains/opt

NOTE: if we are not comfortable moving all of them to mozilla-central (or to backstop pushes only), we could also just keep one for autoland and move the other three.

Flags: needinfo?(bob)

Hm. It might depend on what kind of changes you are doing, but I ran quite frequently into the issue that either the build-linux64-base-toolchains-clang or build-linux64-base-toolchains failed. Removing them completely from autoland doesn't feel like a good idea to me, if we still care about their results. If we don't, we should increase the minimum requirement for building Firefox (but this would only resolve the need to the build-linux64-base-toolchains-clang builds). Fixing issues with the builds with a time distance makes that more complicated. and since this will require separate bugs to be filed, there might be a significant time distance. The alternative of backing out changes that already landed on mozilla-central would be even worse.

There's no real correlation in what the older gcc and clang compilers complain about, which are typically compiler bugs.

Restricting to only the debug builds might be acceptable. This would bear the risk of masking issues with non-debug-only code, but this is rather rare.

I am not sure what "backstop pushes" are, and if restricting these builds to such pushes might be more acceptable.

(In reply to Simon Giesecke [:sg] [he/him] from comment #2)

Hm. It might depend on what kind of changes you are doing, but I ran quite frequently into the issue that either the build-linux64-base-toolchains-clang or build-linux64-base-toolchains failed. Removing them completely from autoland doesn't feel like a good idea to me, if we still care about their results. If we don't, we should increase the minimum requirement for building Firefox (but this would only resolve the need to the build-linux64-base-toolchains-clang builds). Fixing issues with the builds with a time distance makes that more complicated. and since this will require separate bugs to be filed, there might be a significant time distance. The alternative of backing out changes that already landed on mozilla-central would be even worse.

I agree with you, but we have to weigh the cost of running these builds with the probability they find unique regressions (e.g. if it's 50k$ per year, 40 failures out of 23144 pushes might not be enough to justify the expense).

There's no real correlation in what the older gcc and clang compilers complain about, which are typically compiler bugs.

Restricting to only the debug builds might be acceptable. This would bear the risk of masking issues with non-debug-only code, but this is rather rare.

There were 8 cases where a base-toolchain opt build failed and the equivalent debug didn't. So, one order of magnitude more rare.

I am not sure what "backstop pushes" are, and if restricting these builds to such pushes might be more acceptable.

Once every 10 pushes on autoland, we have a "backstop" push which runs everything. So moving to backstop, compared to moving to mozilla-central only, would decrease the delay between landing the patch and noticing the failure (at a higher cost as we'd be running the builds more often than if we moved to mozilla-central).

I think we should:

  • at least move these builds to "backstop" pushes only, given their low failure frequency (and they have no tests associated to them, so no need to backfill);
  • if the cost is high, move the opt builds to mozilla-central only;
  • if the cost is very high, move both the debug builds and the opt builds to mozilla-central only.

I agree with you, but we have to weigh the cost of running these builds with the probability they find unique regressions (e.g. if it's 50k$ per year, 40 failures out of 23144 pushes might not be enough to justify the expense).

Sure.

I also am quite serious about "if we still care about their results". What somewhat puzzles me is that we run these builds, but do not run any tests based on these builds. Assuming that code generation is free from bugs, when we (reasonably) assume their might be bugs in the parser would be somewhat inconsistent. But if we don't assume that, then it's not really clear why we run these builds at all. If we wanted to assert that distributions that use these older compilers won't run into problems, we would need to do more. If we rely on them reporting any issues (and maybe require them to supply fixes themselves?), we could stop running these builds altogether.

OOC, has there been any analysis done of how many issues have been first seen on backstop pushes, how much work was required to determine where the actual problem occurred, and whether the issue would have been found for cheaper by just running the jobs all the time?

backfilling is a simple 5 second operation for sheriffs, then wait for the jobs to complete which is baked into their workflow. Possibly we could save some human time.

If we have 120 pushes on autoland/day and run a job 12 times with 1 backfill, that would be 21 (maybe 19 given build breakage) runs vs 120 (probably 100 given build breakage) by running it every push. As you can see we reduce the amount of machine execution time significantly, it is more the human cost.

For the Firefox 75 to 76 cycle 2020-04-07 to 2020-05-05

provisionerId workerType project symbol collection cost label
gecko-3 b-linux autoland Bbc debug 515 build-linux64-base-toolchains-clang/debug
gecko-3 b-linux autoland Bbc opt 538 build-linux64-base-toolchains-clang/opt
gecko-3 b-linux autoland Bb debug 455 build-linux64-base-toolchains/debug
gecko-3 b-linux autoland Bb opt 467 build-linux64-base-toolchains/opt
Flags: needinfo?(bob)
Whiteboard: [ci-costs-2020:todo]
Assignee: nobody → mcastelluccio
Status: NEW → ASSIGNED
Pushed by mcastelluccio@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0f9156096e13
Run opt base-toolchains builds only on mozilla-central, and debug ones only on backstop pushes. r=jmaher
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]
See Also: → 1649416
You need to log in before you can comment on or make changes to this bug.