Closed Bug 1633866 (manifest-scheduling) Opened 7 months ago Closed 6 months ago

Dynamically select which tests run during taskgraph generation

Categories

(Firefox Build System :: Task Configuration, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [manifest-scheduling])

Attachments

(18 files, 1 obsolete file)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

Currently how we schedule tests in ./mach try auto goes something like this:

  1. Define how many chunks N should exist for each configuration/suite combo in the .yml files.
  2. Create N tasks for each configuration/suite.
  3. Take all tests and distribute them somewhat evenly across those N tasks.
  4. Later on, determine which tests are important and run only the tasks that contain those tests.

This scheme isn't ideal because we determine which tests run in which tasks right at the outset, before we know whether the tests are relevant to the push or not. So when we schedule the tasks containing "important" tests, we also run all the other tests that just happen to be chunked in that task as well.

A much better process would look something like:

  1. Determine which tests are important.
  2. Compute how many chunks N it would take to run those tests in a reasonable amount of time.
  3. Distribute only the important tests evenly across those chunks.
  4. Don't optimize further (based on test containment at least)

This way we only run the important tests. There is no collateral damage.

This will be a pretty large departure and break many workflows and assumptions we have in our tooling. Including but not limited to:

  • Push Health
  • Backfills
  • Sheriff / developer treeherder workflows
  • Cost analysis data
  • Task selection with ./mach try

To start, this will likely only be enabled when using ./mach try auto. Though eventually this mode will make its way to autoland itself.

We use the term 'tests' to refer to 'tasks' in the tests.py transforms. Imo,
this makes things very hard to follow as the term 'test' is also used for all
kinds of other contexts in that file. Let's just call them what they are:
tasks.

I decided to land this as part of this series as I will be adding further uses
of the word 'test' later on.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b8f00e815023
[taskgraph] Call tasks 'tasks' rather than 'tests' in tests.py transforms, r=jmaher

With dynamic-test-selection, we'll also need to query the bugbug service from
the transforms. Let's move the querying logic to a utility file to share it
more easily.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b387530c523f
[taskgraph] Move logic to set test-verify chunks to a new transform, r=jmaher
https://hg.mozilla.org/integration/autoland/rev/a4552592ac13
[taskgraph.optimize] Move logic to query 'bugbug' service to a utility file, r=marco
Whiteboard: [manifest-scheduling]
Duplicate of this bug: 1603454
See Also: → 1632946

Small improvement with respect to the single responsibility principle. The
taskgraph diff is identical.

This is instead of tests and will make it easier to re-use in the taskgraph.
This commit is a straight refactor and results in zero differences in the
taskgraph.

Depends on D74448

In the 'chunk_by_runtime' algorithm a 'get_manifest' helper function is used to
determine what "manifest" a test belongs to. The logic is basically, if
'ancestor-manifest' exists then use that. Otherwise use 'manifest_relpath'.

We need to do this because in some cases a "shared" manifest can be included
multiple times from parent manifests, each with a different configuration.

However, when we calculate the "skipped" manifests in chunking.py, we were
simply using 'manifest_relpath' and ignoring 'ancestor-manifest'. I believe
this meant we were mis-reporting which manifests were skipped in the task logs.
It possibly even meant we were double-scheduling some tests (i.e, if the
'skip-if' was in the ancestor, not the shared one). I'm not sure if the double
scheduling was actually happening in practice, but it's certainly theoretically
possible.

Afaict, this only affected a handful of xpcshell manifests on Windows and
Android.

Depends on D74449

This sets things up to be a little bit easier and cleaner to modify going
forward.

Depends on D74451

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/de52ed9b5676
[taskgraph] Pull logic to determine test manifests into its own transform, r=egao
https://hg.mozilla.org/integration/autoland/rev/4ab2a3ed953a
[manifestparser] Modify 'chunk_by_runtime.get_chunked_manifests' to accept a list of manifests, r=egao
https://hg.mozilla.org/integration/autoland/rev/f67e18eedad5
[taskgraph] Take 'ancestor-manifest' into account when computing skipped manifests, r=egao
Alias: dynamic-test-selection → manifest-scheduling

When performing diffs, identical parameters files need to be used. This makes
it possible to wget the exact same parameters used by a previous |mach
taskgraph| run.

This moves the creation of WPT groups into the TestResolver, where all kinds of
other consumers will be able to access them.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4904625d5db9
[taskgraph] Log parameters url when using 'project=' or 'task-id=', r=tomprince

Comment on attachment 9150346 [details]
Bug 1633866 - [taskgraph] Move WPT group assignment into the TestResolver, r?egao

Revision D76085 was moved to bug 1634551. Setting attachment 9150346 [details] to obsolete.

Attachment #9150346 - Attachment is obsolete: true
Attachment #9150346 - Attachment is obsolete: false

Comment on attachment 9150346 [details]
Bug 1633866 - [taskgraph] Move WPT group assignment into the TestResolver, r?egao

Revision D76085 was moved to bug 1634551. Setting attachment 9150346 [details] to obsolete.

Attachment #9150346 - Attachment is obsolete: true
Attachment #9146872 - Attachment description: Bug 1633866 - [taskgraph] Refactor logic around manifest chunking → Bug 1633866 - [taskgraph] Refactor logic around manifest chunking, r?egao
See Also: → 1639873
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e63cc7a1cbaa
[taskgraph] Refactor logic around manifest chunking, r=egao

This allows tasks to opt into dynamic chunking. This means rather than define
the chunks ahead of time, in-tree manifest runtime data is used to guess how
many chunks are needed for a reasonable average task runtime (currently 20
min).

Only suites that are chunked in the taskgraph, and therefore whose manifests
are known ahead of time, can opt into this feature.

Currently test manifests are loaded by instantiating a TestResolver and
traversing moz.build files to find manifests.

With 'manifest-scheduling', we'll want to grab the manifests directly from the
bugbug service instead. Initially we'll want to enable manifest-scheduling with
|mach try auto|, but not on autoland yet.

This patch will allow |mach try auto| to set the parameter that causes bugbug
to be used (see future commits in this bug).

Depends on D76497

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4685792a4e57
[taskgraph] Implement the ability for tasks to dynamically specify chunks, r=gbrown,jmaher
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/52716d101bc7
[taskgraph] Create a system to choose how test manifests are loaded via a parameter, r=tomprince

This change is a straight refactor and results in identical full taskgraphs.

Attachment #9151122 - Attachment description: Bug 1633866 - [taskgraph] Create a 'bugbug' based test manifest loader → Bug 1633866 - [taskgraph] Create a 'bugbug' based test manifest loader, r?marco
Attachment #9151124 - Attachment description: Bug 1633866 - [tryselect] Use 'bugbug' loader with |mach try auto| → Bug 1633866 - [tryselect] Use 'bugbug' loader with |mach try auto|, r?marco

Here's my latest progress:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06a60a11960d2d645c25c9d25f46742a1bb713ad

Combined with the fixes to stop scheduling shippable builds, I'd say this is looking really good!

A couple issues to resolve:

  1. Why is the mochitest-plain symbol "unknown" ?
  2. Why did the Linux xpcshell tasks result in only one manifest per task? (Notably the Android ones didn't)
  3. Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.

(In reply to Andrew Halberstadt [:ahal] from comment #33)

  1. Why is the mochitest-plain symbol "unknown" ?

This one is easy. The default symbols for mochitest-plain are just numbers and nothing else. When we split chunks, the numbers are appended to the existing symbol here:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#1495

But since we only have one chunk we never enter that if statement and we likely assign the symbol an null value, which treeherder then defaults to "unknown".

(In reply to Andrew Halberstadt [:ahal] from comment #33)

  1. Why did the Linux xpcshell tasks result in only one manifest per task? (Notably the Android ones didn't)

The algorithm is working as expected. This was happening because there is one manifest that takes 100+ minutes apparently (5x the desired length). So this causes the computed number of dynamic chunks to be artificially high. For example, assume we have three manifests:

foo -> 98 minutes
bar -> 1 minute
baz -> 1 minute

And let's say we want average chunk lengths of ~20 minutes. The calculation would currently say we need 5 chunks ((98 + 1 + 1) / 20). Even if set the max chunks to the # of manifests, it would still return 3. Clearly this only needs 2 chunks to run.

I think if we truncated all runtimes that exceed the desired chunk duration, it would balance things out nicer.

  1. Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.

I think it's ok to not block on this for |mach try auto|. Though we'll likely need better runtime data before we make it live on autoland.

In my try push, truncating task runtimes would have reduced xpcshell chunks from 13 -> 7. So fix seems to work.

For example, assume we have three manifests:

foo -> 98 minutes
bar -> 1 minute
baz -> 1 minute

And let's say we want average chunk lengths of ~20 minutes. The calculation
would currently say we need 5 chunks ((98 + 1 + 1) / 20). Even if set the max
chunks to the # of manifests, it would still return 3. Clearly this only needs
at most 2 chunks to run.

By truncating the runtime, we'd have:

foo -> 20 minutes (truncated)
bar -> 1 minute
baz -> 1 minute

So we'd actually only assign one chunk (round((20 + 1 + 1) / 20)). Despite
having a single 100 minute chunk, this outcome is desirable as we now avoid the
overhead of a second chunk that only runs 2 minutes worth of tests.

Depends on D77871

Attachment #9153525 - Attachment description: Bug 1633866 - [taskgraph] Truncate runtimes that exceed DYNAMIC_CHUNK_DURATION so they don't artificially increadse chunks, r?gbrown → Bug 1633866 - [taskgraph] Truncate runtimes that exceed DYNAMIC_CHUNK_DURATION so they don't artificially increase chunks, r?gbrown

(In reply to Andrew Halberstadt [:ahal] from comment #35)

  1. Do we need to add more configuration specific runtime data to get balanced chunks everywhere? Or is what we have good enough for now.

I think it's ok to not block on this for |mach try auto|. Though we'll likely need better runtime data before we make it live on autoland.

I might have spoke too soon. The runtime data for xpcshell seems particularly bad. It's scheduling ~4 minute tasks and the full set of manifests take 20+ chunks. I'll need to either fix the data (preferable) or create some kind of max-chunk mechanism. Though the latter solution will still result in very short chunks when we're under the max, so it's a last resort.

A better "last resort" might be to artificially scale all xpcshell runtimes down. But I'll definitely spend a lot of time investigating why the runtime data doesn't seem to match reality.

I think the xpcshell issue is due to a combination of problems:

  1. A bug in the writeruntimes script, which is adding up the tests of included manifests multiple times. For example, if xpcshell-common.ini is included by three different parent manifests, then writeruntimes will collate each instance up into the ancestor manifest here:
    https://searchfox.org/mozilla-central/source/testing/runtimes/writeruntimes#187

So though those tests are counted 3 times under the same manifest.

  1. Insufficient mozinfo guessing. I see some values missing (processor, bits, toolkit) that are commonly used to skip entire manifests in xpcshell.

Nvm, xpcshell chunks are way off because they run in parallel ><

It looks like they run on 8 cores in CI (at least for Linux). I'll try dividing the total computed runtime by that.

This is needed because xpcshell runs in parallel. Therefore we need to know the number
of threads are used (which varies by platform) in order to accurately determine how
many chunks are needed.

This allows us to specify a platform / suite specific multiplier to compensate via trial
and error.

Ensure runtimes are up to date prior to enabling manifest-scheduling
anywhere.

Depends on D78153

Adds a few properties to the mozinfo guess that were previously missing.

Depends on D78155

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/acace34eb88d
[taskgraph] Simplify how 'suites' are handled in manifest resolving and chunking, r=egao
https://hg.mozilla.org/integration/autoland/rev/69790da5ec2c
[taskgraph] Create a 'bugbug' based test manifest loader, r=marco
https://hg.mozilla.org/integration/autoland/rev/b39e4d67af92
[taskgraph] Always use chunk number as symbol if it would otherwise be empty, r=jmaher
https://hg.mozilla.org/integration/autoland/rev/fd095da6e40d
[taskgraph] Truncate runtimes that exceed DYNAMIC_CHUNK_DURATION so they don't artificially increase chunks, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/828a49b4b520
[taskgraph] Implement a multiplier to tweak the duration for certain platforms / suites, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/2dcae3ff17c1
[runtimes] Update all test runtime data, r=egao
https://hg.mozilla.org/integration/autoland/rev/941d4125e9c3
[taskgraph] Improve 'guess_mozinfo_from_task' for more balanced chunks, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/275d082f51b3
[tryselect] Use 'bugbug' loader with |mach try auto|, r=marco

I'm going to call this fixed. I filed bug 1643689 to track enabling manifest-scheduling on autoland.

Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.