Open Bug 1583353 Opened 2 months ago Updated 7 days ago

Test path chunking in the taskgraph

Categories

(Firefox Build System :: Task Configuration, enhancement)

enhancement
Not set

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: ahal, Assigned: ahal)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

(Keywords: leave-open)

Attachments

(21 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

A major problem we've had with our CI for as long as I can remember (predating taskcluster), is that the tests that run in a task aren't known until task runtime. Instead the chunking information (thisChunk / totalChunks) is defined in the taskgraph and then an algorithm uses that information to compute which tests should run within the test harness itself.

This is a problem for several reasons:

  1. Tests can jump around chunks between pushes.
  2. Given a test, hard to find which task it ran in.
  3. Failure bisection might yield a false positive (e.g test moved chunks rather than fixed).
  4. Scheduling is difficult, advanced techniques like ccov or machine learning won't work.
  5. Makes getting rid of chunks (from a developer UX perspective) impossible.

There are probably many more problems I could come up with if pressed. Over the years we've tried many solutions to several of these problems, all with minimal levels of success. There's a hard barrier that we keep running up against.

We need to define which tests run in which tasks within the taskgraph itself. In other words, rather than setting chunks in the taskgraph and performing the chunking operation later, we do this chunking operation as part of task generation. So the decision task knows exactly which tests will run in which tasks.

This is a major shift in how we think about CI. It will upend workflows, break assumptions and cause all sorts of new problems. But I'm firmly convinced it will be well worth the effort.

This bug is being filed as a consequence of the following document:
https://gist.github.com/ahal/5b66ec2cf981d1398c05335bbe44633b

It will act as a meta bug to collect smaller work items as dependencies.

Depends on: 1583360
Depends on: 1583363
Depends on: 1583364
Blocks: 1507108
See Also: → 1434914

Expanding scope a bit here. The goal will be to perform chunking in the taskgraph. At first we'll simply attempt to chunk the full set of tasks.

Summary: [meta] Test paths in the taskgraph → [meta] Test path chunking in the taskgraph
See Also: → 1580893

On second thought I don't think this warrants a meta bug. I'll just keep tasks separated by commit.

Summary: [meta] Test path chunking in the taskgraph → Test path chunking in the taskgraph

This key doesn't appear to be used by anything. Let's get rid of it to remove
some of the complexity around chunking.

Summary: Test path chunking in the taskgraph → [meta] Test path chunking in the taskgraph
Keywords: meta
Summary: [meta] Test path chunking in the taskgraph → Test path chunking in the taskgraph
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Keywords: leave-open
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78cc1e5a395f
[taskgraph] Remove 'mozharness.chunking-args' key from the test_description_schema, r=gbrown

Since TestResolver is a subclass of MozbuildObject, there's no need to create
separate repository object. It already has one.

Depends on D49761

Encapsulates all the logic around generating and loading the build backend
metadata on the TestMetadata class. Previously the TestResolver would trigger
the generation if necessary, and TestMetadata would load it. Now both
generation and loading happens in TestMetadata.load_tests.

This change also adds some convenience properties to make it easier to query
the loaded data.

Depends on D49764

Similar to the vcs change, the MozbuildObject already has a reader attribute
available. So we can re-use that instead of creating our own.

Depends on D49765

A minor cleanup. Re-write paths will now automatically be joined to
self.topobjdir.

Depends on D49766

Previously there was a somewhat strange setup where we had both TestResolver
and TestMetadata classes. Both had 'resolve_tests' function and the separation
of concerns between the two were not clear.

With this change, all of the logic that is related to manipulating and
resolving the loaded tests has been moved to the TestResolver class. Also, the
TestMetadata class has been renamed to TestLoader, and it is solely responsible
for loading the metadata (from the build backend).

Future commits will add other types of TestLoaders.

Depends on D49767

This prevents us from adding the puppeteer tests over and over again. It
follows the wpt example.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f80a93d486b
[moztest.resolve] Remove unused 'tests_with_flavor' function from metadata class r=gbrown
https://hg.mozilla.org/integration/autoland/rev/0c7112ed3da7
[moztest.resolve] Use 'MozbuildObject.repository' in the TestResolver class r=gbrown
https://hg.mozilla.org/integration/autoland/rev/ed42501638e4
[moztest.resolve] Create an 'is_puppeteer_loaded' flag r=gbrown
https://hg.mozilla.org/integration/autoland/rev/aaeaea784276
[moztest.resolve] Move all mozbuild backend logic into the TestMetadata class r=gbrown
https://hg.mozilla.org/integration/autoland/rev/588f0a9ac709
[moztest.resolve] Use MozbuildObject.mozbuild_reader to resolve outgoing files r=gbrown
https://hg.mozilla.org/integration/autoland/rev/391773a3c2e6
[moztest.resolve] Move test_rewrites to a class property r=gbrown
https://hg.mozilla.org/integration/autoland/rev/74b431af3f3f
[moztest.resolve] Move everything from TestMetadata into the TestResolver class (except the load_tests function) r=gbrown
Regressions: 1590680
Regressions: 1591512

Having a distinction between -chunked and not adds unnecessary complexity. It's
possible to simply remove them because:

  1. The mozharness definitions for 'jittest' and 'jittest-chunked' are
    identical, so it is literally not serving any purpose.

  2. The definitions for 'mochitest' only add either '--chunk-by-dir' or
    '--chunk-by-runtime'. Both of these are no-ops in the mochitest harness
    unless '--total-chunks' is also supplied. Therefore, if we ever use these
    suites without chunking (which I don't think we do anyway), then it'll
    still work fine as those options won't have any affect.

This allows subclasses of MozbuildObject to define additional instance
arguments and still use 'from_environment'.

Depends on D51173

I'm making this refactor now because a future commit is going to completely re-write the
test data (so that it matches the data that a future TestManifestLoader is going to use).

With this method, updating the data will be a lot less tedious.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b989eb460f44
[mozharness] Remove '-chunked' mochitest and jittest suite definitions, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/bca7369cc0d3
[mozbuild] Forward unrecognized kwargs to underlying class in MozbuildObject.from_environment(), r=firefox-build-system-reviewers,mshal
https://hg.mozilla.org/integration/autoland/rev/f6d096dd754d
[moztest] Convert test_resolve.py to the pytest format, r=egao
https://hg.mozilla.org/integration/autoland/rev/c7b6e1aef3a0
[moztest] Use a fixture to generate the test data in test_resolve.py, r=egao

Despite what the comment says, the finder does pick up the root moz.build. So
we end up processing it twice. This was never caught before because we only ever
used this function to read Sphinx related variables, of which the root moz.build
contains none.

Test test data looks like it was pulled from a live all_tests.pkl file. There
are way more path components than necessary.

This simplifies the test paths so they are easy to manipulate/add/inspect. It
will also make it easier to craft a fake "sourcedir" to test the
TestManifestLoader in the next commit such that the data from both matches.

I decided to use a fruit theme for directories because:

  1. Using real directories will pollute grep/searchfox/etc queries with junk.
  2. Using a 'dirA', 'dirB', 'dirC' scheme is hard to read.
  3. Why not?

This change does not functionally modify what is being tested.

Depends on D51832

This loader uses 'reader.find_variables_from_ast' to parse all *_MANIFESTS variables from
moz.build files using the abstract syntax tree. This means it will find all such variables
regardless of the current buildconfig.

Depends on D51833

Blocks: 1582516

Allows 'paths' passed into the pathprefix filter to be manifests. Any path that
ends with '.ini' is considered a manifest.

Depends on D51899

This gives us the ability to retrieve all browser-chrome tests (no flavor) but
not devtools-chrome (have a flavor).

While not strictly necessary for this series, this patch allows:
./mach test path/to/manifest.ini

Which will be especially useful to have now that we are switching to running
tasks by manifest.

Depends on D52241

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d28c3108fc0
[mozbuild.reader] Don't return root moz.build twice in 'all_mozbuild_paths', r=froydnj
https://hg.mozilla.org/integration/autoland/rev/b96f1cf028b6
[moztest] Simplify the test data in test_resolve.py, r=egao
https://hg.mozilla.org/integration/autoland/rev/cc2575c34202
[moztest.resolve] Implement a TestLoader that doesn't rely on the build system, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/ca54f70e0297
[manifestparser] Convert test_filters.py to the pytest format, r=egao
https://hg.mozilla.org/integration/autoland/rev/98bbf6967fd6
[manifestparser] Support manifests in the 'pathprefix' filter, r=egao
https://hg.mozilla.org/integration/autoland/rev/d741ca0e07b1
[moztest.resolve] Add ability to resolve only tests *without* a subsuite, r=gbrown
https://hg.mozilla.org/integration/autoland/rev/7ea00823df0a
[moztest.resolve] Add ability to resolve manifest paths, r=gbrown
You need to log in before you can comment on or make changes to this bug.