Closed Bug 1583353 Opened 5 years ago Closed 5 years ago

Test path chunking in the taskgraph

Categories

(Firefox Build System :: Task Configuration, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 3 open bugs)

Details

Attachments

(27 files, 5 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

A major problem we've had with our CI for as long as I can remember (predating taskcluster), is that the tests that run in a task aren't known until task runtime. Instead the chunking information (thisChunk / totalChunks) is defined in the taskgraph and then an algorithm uses that information to compute which tests should run within the test harness itself.

This is a problem for several reasons:

  1. Tests can jump around chunks between pushes.
  2. Given a test, hard to find which task it ran in.
  3. Failure bisection might yield a false positive (e.g test moved chunks rather than fixed).
  4. Scheduling is difficult, advanced techniques like ccov or machine learning won't work.
  5. Makes getting rid of chunks (from a developer UX perspective) impossible.

There are probably many more problems I could come up with if pressed. Over the years we've tried many solutions to several of these problems, all with minimal levels of success. There's a hard barrier that we keep running up against.

We need to define which tests run in which tasks within the taskgraph itself. In other words, rather than setting chunks in the taskgraph and performing the chunking operation later, we do this chunking operation as part of task generation. So the decision task knows exactly which tests will run in which tasks.

This is a major shift in how we think about CI. It will upend workflows, break assumptions and cause all sorts of new problems. But I'm firmly convinced it will be well worth the effort.

This bug is being filed as a consequence of the following document:
https://gist.github.com/ahal/5b66ec2cf981d1398c05335bbe44633b

It will act as a meta bug to collect smaller work items as dependencies.

Depends on: 1583360
Depends on: 1583363
Depends on: 1583364
Blocks: 1507108
See Also: → 1434914

Expanding scope a bit here. The goal will be to perform chunking in the taskgraph. At first we'll simply attempt to chunk the full set of tasks.

Summary: [meta] Test paths in the taskgraph → [meta] Test path chunking in the taskgraph
See Also: → 1580893

On second thought I don't think this warrants a meta bug. I'll just keep tasks separated by commit.

Summary: [meta] Test path chunking in the taskgraph → Test path chunking in the taskgraph

This key doesn't appear to be used by anything. Let's get rid of it to remove
some of the complexity around chunking.

Summary: Test path chunking in the taskgraph → [meta] Test path chunking in the taskgraph
Keywords: meta
Summary: [meta] Test path chunking in the taskgraph → Test path chunking in the taskgraph
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Keywords: leave-open
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/78cc1e5a395f [taskgraph] Remove 'mozharness.chunking-args' key from the test_description_schema, r=gbrown

Since TestResolver is a subclass of MozbuildObject, there's no need to create
separate repository object. It already has one.

Depends on D49761

Encapsulates all the logic around generating and loading the build backend
metadata on the TestMetadata class. Previously the TestResolver would trigger
the generation if necessary, and TestMetadata would load it. Now both
generation and loading happens in TestMetadata.load_tests.

This change also adds some convenience properties to make it easier to query
the loaded data.

Depends on D49764

Similar to the vcs change, the MozbuildObject already has a reader attribute
available. So we can re-use that instead of creating our own.

Depends on D49765

A minor cleanup. Re-write paths will now automatically be joined to
self.topobjdir.

Depends on D49766

Previously there was a somewhat strange setup where we had both TestResolver
and TestMetadata classes. Both had 'resolve_tests' function and the separation
of concerns between the two were not clear.

With this change, all of the logic that is related to manipulating and
resolving the loaded tests has been moved to the TestResolver class. Also, the
TestMetadata class has been renamed to TestLoader, and it is solely responsible
for loading the metadata (from the build backend).

Future commits will add other types of TestLoaders.

Depends on D49767

This prevents us from adding the puppeteer tests over and over again. It
follows the wpt example.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8f80a93d486b [moztest.resolve] Remove unused 'tests_with_flavor' function from metadata class r=gbrown https://hg.mozilla.org/integration/autoland/rev/0c7112ed3da7 [moztest.resolve] Use 'MozbuildObject.repository' in the TestResolver class r=gbrown https://hg.mozilla.org/integration/autoland/rev/ed42501638e4 [moztest.resolve] Create an 'is_puppeteer_loaded' flag r=gbrown https://hg.mozilla.org/integration/autoland/rev/aaeaea784276 [moztest.resolve] Move all mozbuild backend logic into the TestMetadata class r=gbrown https://hg.mozilla.org/integration/autoland/rev/588f0a9ac709 [moztest.resolve] Use MozbuildObject.mozbuild_reader to resolve outgoing files r=gbrown https://hg.mozilla.org/integration/autoland/rev/391773a3c2e6 [moztest.resolve] Move test_rewrites to a class property r=gbrown https://hg.mozilla.org/integration/autoland/rev/74b431af3f3f [moztest.resolve] Move everything from TestMetadata into the TestResolver class (except the load_tests function) r=gbrown
Regressions: 1590680
Regressions: 1591512

Having a distinction between -chunked and not adds unnecessary complexity. It's
possible to simply remove them because:

  1. The mozharness definitions for 'jittest' and 'jittest-chunked' are
    identical, so it is literally not serving any purpose.

  2. The definitions for 'mochitest' only add either '--chunk-by-dir' or
    '--chunk-by-runtime'. Both of these are no-ops in the mochitest harness
    unless '--total-chunks' is also supplied. Therefore, if we ever use these
    suites without chunking (which I don't think we do anyway), then it'll
    still work fine as those options won't have any affect.

This allows subclasses of MozbuildObject to define additional instance
arguments and still use 'from_environment'.

Depends on D51173

I'm making this refactor now because a future commit is going to completely re-write the
test data (so that it matches the data that a future TestManifestLoader is going to use).

With this method, updating the data will be a lot less tedious.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b989eb460f44 [mozharness] Remove '-chunked' mochitest and jittest suite definitions, r=gbrown https://hg.mozilla.org/integration/autoland/rev/bca7369cc0d3 [mozbuild] Forward unrecognized kwargs to underlying class in MozbuildObject.from_environment(), r=firefox-build-system-reviewers,mshal https://hg.mozilla.org/integration/autoland/rev/f6d096dd754d [moztest] Convert test_resolve.py to the pytest format, r=egao https://hg.mozilla.org/integration/autoland/rev/c7b6e1aef3a0 [moztest] Use a fixture to generate the test data in test_resolve.py, r=egao

Despite what the comment says, the finder does pick up the root moz.build. So
we end up processing it twice. This was never caught before because we only ever
used this function to read Sphinx related variables, of which the root moz.build
contains none.

Test test data looks like it was pulled from a live all_tests.pkl file. There
are way more path components than necessary.

This simplifies the test paths so they are easy to manipulate/add/inspect. It
will also make it easier to craft a fake "sourcedir" to test the
TestManifestLoader in the next commit such that the data from both matches.

I decided to use a fruit theme for directories because:

  1. Using real directories will pollute grep/searchfox/etc queries with junk.
  2. Using a 'dirA', 'dirB', 'dirC' scheme is hard to read.
  3. Why not?

This change does not functionally modify what is being tested.

Depends on D51832

This loader uses 'reader.find_variables_from_ast' to parse all *_MANIFESTS variables from
moz.build files using the abstract syntax tree. This means it will find all such variables
regardless of the current buildconfig.

Depends on D51833

Blocks: 1582516

Allows 'paths' passed into the pathprefix filter to be manifests. Any path that
ends with '.ini' is considered a manifest.

Depends on D51899

This gives us the ability to retrieve all browser-chrome tests (no flavor) but
not devtools-chrome (have a flavor).

While not strictly necessary for this series, this patch allows:
./mach test path/to/manifest.ini

Which will be especially useful to have now that we are switching to running
tasks by manifest.

Depends on D52241

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8d28c3108fc0 [mozbuild.reader] Don't return root moz.build twice in 'all_mozbuild_paths', r=froydnj https://hg.mozilla.org/integration/autoland/rev/b96f1cf028b6 [moztest] Simplify the test data in test_resolve.py, r=egao https://hg.mozilla.org/integration/autoland/rev/cc2575c34202 [moztest.resolve] Implement a TestLoader that doesn't rely on the build system, r=gbrown https://hg.mozilla.org/integration/autoland/rev/ca54f70e0297 [manifestparser] Convert test_filters.py to the pytest format, r=egao https://hg.mozilla.org/integration/autoland/rev/98bbf6967fd6 [manifestparser] Support manifests in the 'pathprefix' filter, r=egao https://hg.mozilla.org/integration/autoland/rev/d741ca0e07b1 [moztest.resolve] Add ability to resolve only tests *without* a subsuite, r=gbrown https://hg.mozilla.org/integration/autoland/rev/7ea00823df0a [moztest.resolve] Add ability to resolve manifest paths, r=gbrown
Depends on: 1600314

Depends on D52730

Attachment #9112555 - Attachment is obsolete: true
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6740ae92e125 [taskgraph] Implement test chunking in transforms r=gbrown https://hg.mozilla.org/integration/autoland/rev/ce77c1c7dd9f [taskgraph.decision] Create an artifact mapping tasks to the test manifests that are run within r=tomprince https://hg.mozilla.org/integration/autoland/rev/a76095565161 [ci] Chunk mochitest-media and devtools-chrome in the taskgraph, r=gbrown
Pushed by geoff@darktrojan.net: https://hg.mozilla.org/integration/autoland/rev/27df7ff304df Add mochitest-browser-chrome-thunderbird and mozmill to chunked test blacklist r=egao
Pushed by malexandru@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/d6d7d85a1b56 Add mochitest-browser-chrome-thunderbird and mozmill to chunked test blacklist DONTBUILD
Blocks: 1603459
Regressions: 1603561
Depends on: 1603844

Previously, when running |mach mochitest path/to/manifest.ini|, any tests in
manifests that that one includes would not be run.

This fixes that behaviour.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd2b064f3c17 [manifestparser] Take 'ancestor_manifest' into account in the pathprefix filter, r=gbrown https://hg.mozilla.org/integration/autoland/rev/721f68718966 [ci] Chunk mochitest-chrome, gpu and remote in the taskgraph, r=gbrown
Depends on: 1603561
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/355cde565ce1 [ci] Chunk xpcshell in the taskgraph, r=gbrown
Depends on: 1606271
Regressions: 1606271

Looks like one of the ccov tasks happened to get only a single manifest that was skipped on that configuration. So it didn't run any tests and the harness complained. I'll need to try and filter out skipped tests before running the chunking algorithm.

Flags: needinfo?(ahal)

Previously we simply ignored skipped tests when performing test chunking in the
taskgraph. This could sometimes result in a task that runs a single manifest
that happens to be skipped on that task's configuration. This results in an error
that no tests were run from the harness.

This patch attempts to guess a mozinfo based on a task's definition. Then it
uses those values to take skipped tests into account when running the chunking
algorithm. That way the chances of scheduling a manifest that would be skipped
anyway are much lower.

Depends on D58987

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1044ce0c5962 [taskgraph] Move test chunking logic to a utility file, r=gbrown https://hg.mozilla.org/integration/autoland/rev/2c7ec324fc15 [taskgraph] Run test task chunking through manifestparser's skip-if filter, r=gbrown

Depends on D59559

I also noticed that xpcshell ccov chunking had one chunk that ran much longer than the others, such that it was hitting a timeout. The latest two patches (adding ccov to mozinfo guess and updating runtimes) seems to fix this.

Hm, something is still not working with the xpcshell chunking. Now ccov is green, but Windows opt has one chunk hitting the max runtime (90+ min) and the other only taking 14 min.

Originally I was hoping to just use chunk_by_runtime for everything, but I think I'll have to add the ability for tasks to define which chunking algorithm they use.

Relatedly, I also think this bug has run its course. I'm going to close it out and file follow-ups for the remaining suites.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Keywords: leave-open
Blocks: 1608833
Blocks: 1608834
Blocks: 1608836
Blocks: 1608837

Comment on attachment 9120160 [details]
Bug 1583353 - [taskgraph] Add 'ccov' to the mozinfo guess when chunking tests, r?gbrown

Revision D59559 was moved to bug 1608833. Setting attachment 9120160 [details] to obsolete.

Attachment #9120160 - Attachment is obsolete: true

Comment on attachment 9120161 [details]
Bug 1583353 - Update test runtimes files, r?gbrown

Revision D59560 was moved to bug 1608833. Setting attachment 9120161 [details] to obsolete.

Attachment #9120161 - Attachment is obsolete: true

Comment on attachment 9116138 [details]
Bug 1583353 - [ci] Chunk xpcshell in the taskgraph, r?gbrown

Revision D57326 was moved to bug 1608833. Setting attachment 9116138 [details] to obsolete.

Attachment #9116138 - Attachment is obsolete: true

Comment on attachment 9117354 [details]
Bug 1583353 - [ci] Chunk mochitest-a11y and plain in the taskgraph, r?gbrown

Revision D58039 was moved to bug 1608834. Setting attachment 9117354 [details] to obsolete.

Attachment #9117354 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: