Closed Bug 1268187 Opened 8 years ago Closed 8 years ago

Enable taskcluster build coalescing

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(firefox50 fixed)

RESOLVED FIXED
Tracking Status
firefox50 --- fixed

People

(Reporter: dividehex, Assigned: dividehex)

References

Details

Attachments

(1 file)

The coalescing service is just about ready to be enabled for builds under taskcluster.

This means adding the route.coalesce.v1.<key> and the supersederUrl: 'https://coalesce.mozilla-releng.net/v1/list/<key>' to the build yaml files under moz central.

For each key, we will also need to set a size and age threshold in config.py.  Size is the number pending and age is the age of the oldest task in the list (in secs)
https://github.com/mozilla/tc-coalesce/blob/master/config/config.py

I'd like some input/guidance on which yaml build files should this be added to, what the keys should be comprised of and what threshold setting should be set for each key.
https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/tasks/builds
Coalesce service is ready to put to use.  I'm looking for guidance on which builds to enabled this on and what the thresholds should be for triggering coalescing.
Blocks: 1213039
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(catlee)
Coop? Joel?
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(jmaher)
Flags: needinfo?(coop)
woohoo!  I am not sure of the best guidance here- I think we coalesce in buildbot when the backlog is higher.  SETA assumes we are not coalescing builds so we can quickly backfill test jobs that are coalesced.

If we want to coalesce task jobs, we could get something in place until SETA is online to feed us the right data.  the typical rules are around time since last job ran and/or how many jobs to skip.

So take for example job A with 60 minute, 5 skip parameters:
60 minutes - if a new job is scheduled and it has been >60 minutes since the last time jobA was scheduled, do not coalesce
5 skip - assuming we don't meet criteria 1 for time, if we have skipped this job <5 times, skip it, but if we have skipped it 5 times already, run the job

If we want, I would be fine skipping every other test job (1 skip) with a 30 minute timeout.
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #3)
> woohoo!  I am not sure of the best guidance here- I think we coalesce in
> buildbot when the backlog is higher.  SETA assumes we are not coalescing
> builds so we can quickly backfill test jobs that are coalesced.

We don't coalesce most builds. If you look at the buildbot-configs for builds and tests, we use the 'merge_builds' flag to disable this for integration and release branches, e.g.:

https://hg.mozilla.org/build/buildbot-configs/file/3a7c1d3dbace/mozilla/config.py#l2239
https://hg.mozilla.org/build/buildbot-configs/file/3a7c1d3dbace/mozilla/project_branches.py#l6

The short answer is that we can turn on coalescing on almost any project branch with impunity if we want to test thresholds, etc.
  
> If we want, I would be fine skipping every other test job (1 skip) with a 30
> minute timeout.
 
I'm fine with this as a starting point. If we have the ability to rewind in the case of an actual error or regression, I'm also fine with being more aggressive.
Flags: needinfo?(coop)
Flags: needinfo?(catlee)
I think someone has confused SETA with coalescing.  The coalescer doesn't have a number of skips or a timeout -- that's SETA.

The coalescer looks at the time a task has been pending, and the number of tasks of that type (as defined by the coalescing key) that are pending, and when either of those grows over a pre-defined limit, it begins allowing newer tasks to "supersede" older tasks, so the older tasks are not performed.

In Buildbot, as coop has suggested, this is known as build request merging; releng generally refers to it as queue collapsing.
Jake, coop - What are next steps to get this moving?
Flags: needinfo?(jwatkins)
Flags: needinfo?(coop)
If we're just looking to run a test case, then I'd say any non-tier 1 build on a non-release branch is good. 

As a specific example, I'd choose Linux 32-bit pot and debug builds on mozilla-inbound.
Flags: needinfo?(coop)
See Also: → 1274310
Jake -- is that enough to go on?
See Also: → 1275972
(In reply to Chris Cooper [:coop] from comment #7)
> If we're just looking to run a test case, then I'd say any non-tier 1 build
> on a non-release branch is good. 
> 
> As a specific example, I'd choose Linux 32-bit pot and debug builds on
> mozilla-inbound.

Thanks :coop.  Any suggestions on what the thresholds should be set at or at least a something to start with that we can tweak.  Coalescing won't take place unless the queue size and age of the oldest task in the queue exceed those two settings.  :mtabara has enabled coalescing linux64_pgo on inbound in bug 1275972.   You can looks at https://coalesce.mozilla-releng.net/v1/threshold as an example.
Flags: needinfo?(jwatkins)
Flags: needinfo?(coop)
We've turned off most/all of the Linux32 testing, so I'm OK with being more aggressive here with Linux32 builds (all types). Let's try 5 builds and a 1 hour timeout.
Flags: needinfo?(coop)
Jake -- what's next here?
Flags: needinfo?(jwatkins)
Enables taskcluster coalescing on opt_linux32 and dbg_linux32 builds
Thresholds for each build type are configured under the coalesce service

Review commit: https://reviewboard.mozilla.org/r/58854/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/58854/
Attachment #8761747 - Flags: review?(dustin)
Attachment #8761747 - Flags: review?(coop)
Assignee: nobody → jwatkins
Flags: needinfo?(jwatkins)
Attachment #8761747 - Flags: review?(dustin) → review+
Comment on attachment 8761747 [details]
Bug 1268187 - Enable taskcluster coalesce for linxu32 builds

https://reviewboard.mozilla.org/r/58854/#review55806

I can't speak to the taskcluster specifics, but this is targeting the builds we want.
Attachment #8761747 - Flags: review?(coop) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/196943fc2c0d
Enable taskcluster coalesce for linxu32 builds. r=coop, r=dustin
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/196943fc2c0d
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: