Test path chunking in the taskgraph
Categories
(Firefox Build System :: Task Configuration, enhancement)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 3 open bugs)
Details
Attachments
(27 files, 5 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
A major problem we've had with our CI for as long as I can remember (predating taskcluster), is that the tests that run in a task aren't known until task runtime. Instead the chunking information (thisChunk / totalChunks) is defined in the taskgraph and then an algorithm uses that information to compute which tests should run within the test harness itself.
This is a problem for several reasons:
- Tests can jump around chunks between pushes.
- Given a test, hard to find which task it ran in.
- Failure bisection might yield a false positive (e.g test moved chunks rather than fixed).
- Scheduling is difficult, advanced techniques like ccov or machine learning won't work.
- Makes getting rid of chunks (from a developer UX perspective) impossible.
There are probably many more problems I could come up with if pressed. Over the years we've tried many solutions to several of these problems, all with minimal levels of success. There's a hard barrier that we keep running up against.
We need to define which tests run in which tasks within the taskgraph itself. In other words, rather than setting chunks in the taskgraph and performing the chunking operation later, we do this chunking operation as part of task generation. So the decision task knows exactly which tests will run in which tasks.
This is a major shift in how we think about CI. It will upend workflows, break assumptions and cause all sorts of new problems. But I'm firmly convinced it will be well worth the effort.
This bug is being filed as a consequence of the following document:
https://gist.github.com/ahal/5b66ec2cf981d1398c05335bbe44633b
It will act as a meta bug to collect smaller work items as dependencies.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Expanding scope a bit here. The goal will be to perform chunking in the taskgraph. At first we'll simply attempt to chunk the full set of tasks.
Assignee | ||
Comment 2•5 years ago
|
||
On second thought I don't think this warrants a meta bug. I'll just keep tasks separated by commit.
Assignee | ||
Comment 3•5 years ago
|
||
This key doesn't appear to be used by anything. Let's get rid of it to remove
some of the complexity around chunking.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
Assignee | ||
Comment 6•5 years ago
|
||
Since TestResolver is a subclass of MozbuildObject, there's no need to create
separate repository object. It already has one.
Depends on D49761
Assignee | ||
Comment 7•5 years ago
|
||
Encapsulates all the logic around generating and loading the build backend
metadata on the TestMetadata class. Previously the TestResolver would trigger
the generation if necessary, and TestMetadata would load it. Now both
generation and loading happens in TestMetadata.load_tests.
This change also adds some convenience properties to make it easier to query
the loaded data.
Depends on D49764
Assignee | ||
Comment 8•5 years ago
|
||
Similar to the vcs change, the MozbuildObject already has a reader attribute
available. So we can re-use that instead of creating our own.
Depends on D49765
Assignee | ||
Comment 9•5 years ago
|
||
A minor cleanup. Re-write paths will now automatically be joined to
self.topobjdir.
Depends on D49766
Assignee | ||
Comment 10•5 years ago
|
||
Previously there was a somewhat strange setup where we had both TestResolver
and TestMetadata classes. Both had 'resolve_tests' function and the separation
of concerns between the two were not clear.
With this change, all of the logic that is related to manipulating and
resolving the loaded tests has been moved to the TestResolver class. Also, the
TestMetadata class has been renamed to TestLoader, and it is solely responsible
for loading the metadata (from the build backend).
Future commits will add other types of TestLoaders.
Depends on D49767
Assignee | ||
Comment 11•5 years ago
|
||
This prevents us from adding the puppeteer tests over and over again. It
follows the wpt example.
Comment 12•5 years ago
|
||
bugherder |
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8f80a93d486b
https://hg.mozilla.org/mozilla-central/rev/0c7112ed3da7
https://hg.mozilla.org/mozilla-central/rev/ed42501638e4
https://hg.mozilla.org/mozilla-central/rev/aaeaea784276
https://hg.mozilla.org/mozilla-central/rev/588f0a9ac709
https://hg.mozilla.org/mozilla-central/rev/391773a3c2e6
https://hg.mozilla.org/mozilla-central/rev/74b431af3f3f
Assignee | ||
Comment 15•5 years ago
|
||
Having a distinction between -chunked and not adds unnecessary complexity. It's
possible to simply remove them because:
-
The mozharness definitions for 'jittest' and 'jittest-chunked' are
identical, so it is literally not serving any purpose. -
The definitions for 'mochitest' only add either '--chunk-by-dir' or
'--chunk-by-runtime'. Both of these are no-ops in the mochitest harness
unless '--total-chunks' is also supplied. Therefore, if we ever use these
suites without chunking (which I don't think we do anyway), then it'll
still work fine as those options won't have any affect.
Assignee | ||
Comment 16•5 years ago
|
||
This allows subclasses of MozbuildObject to define additional instance
arguments and still use 'from_environment'.
Depends on D51173
Assignee | ||
Comment 17•5 years ago
|
||
Depends on D51174
Assignee | ||
Comment 18•5 years ago
|
||
I'm making this refactor now because a future commit is going to completely re-write the
test data (so that it matches the data that a future TestManifestLoader is going to use).
With this method, updating the data will be a lot less tedious.
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
bugherder |
Assignee | ||
Comment 21•5 years ago
|
||
Despite what the comment says, the finder does pick up the root moz.build. So
we end up processing it twice. This was never caught before because we only ever
used this function to read Sphinx related variables, of which the root moz.build
contains none.
Assignee | ||
Comment 22•5 years ago
|
||
Test test data looks like it was pulled from a live all_tests.pkl file. There
are way more path components than necessary.
This simplifies the test paths so they are easy to manipulate/add/inspect. It
will also make it easier to craft a fake "sourcedir" to test the
TestManifestLoader in the next commit such that the data from both matches.
I decided to use a fruit theme for directories because:
- Using real directories will pollute grep/searchfox/etc queries with junk.
- Using a 'dirA', 'dirB', 'dirC' scheme is hard to read.
- Why not?
This change does not functionally modify what is being tested.
Depends on D51832
Assignee | ||
Comment 23•5 years ago
|
||
This loader uses 'reader.find_variables_from_ast' to parse all *_MANIFESTS variables from
moz.build files using the abstract syntax tree. This means it will find all such variables
regardless of the current buildconfig.
Depends on D51833
Assignee | ||
Comment 24•5 years ago
|
||
Assignee | ||
Comment 25•5 years ago
|
||
Allows 'paths' passed into the pathprefix filter to be manifests. Any path that
ends with '.ini' is considered a manifest.
Depends on D51899
Assignee | ||
Comment 26•5 years ago
|
||
This gives us the ability to retrieve all browser-chrome tests (no flavor) but
not devtools-chrome (have a flavor).
Assignee | ||
Comment 27•5 years ago
|
||
While not strictly necessary for this series, this patch allows:
./mach test path/to/manifest.ini
Which will be especially useful to have now that we are switching to running
tasks by manifest.
Depends on D52241
Comment 28•5 years ago
|
||
Assignee | ||
Comment 29•5 years ago
|
||
Assignee | ||
Comment 30•5 years ago
|
||
Depends on D52729
Comment 31•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8d28c3108fc0
https://hg.mozilla.org/mozilla-central/rev/b96f1cf028b6
https://hg.mozilla.org/mozilla-central/rev/cc2575c34202
https://hg.mozilla.org/mozilla-central/rev/ca54f70e0297
https://hg.mozilla.org/mozilla-central/rev/98bbf6967fd6
https://hg.mozilla.org/mozilla-central/rev/d741ca0e07b1
https://hg.mozilla.org/mozilla-central/rev/7ea00823df0a
Assignee | ||
Comment 32•5 years ago
|
||
Depends on D52730
Updated•5 years ago
|
Assignee | ||
Comment 33•5 years ago
|
||
Depends on D52730
Assignee | ||
Updated•5 years ago
|
Comment 34•5 years ago
|
||
Comment 35•5 years ago
|
||
bugherder |
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
Comment 38•5 years ago
|
||
Comment 39•5 years ago
|
||
bugherder |
Assignee | ||
Comment 40•5 years ago
|
||
Previously, when running |mach mochitest path/to/manifest.ini|, any tests in
manifests that that one includes would not be run.
This fixes that behaviour.
Assignee | ||
Comment 41•5 years ago
|
||
Depends on D57164
Comment 42•5 years ago
|
||
Assignee | ||
Comment 43•5 years ago
|
||
Comment 44•5 years ago
|
||
bugherder |
Comment 45•5 years ago
|
||
Assignee | ||
Comment 46•5 years ago
|
||
Comment 47•5 years ago
|
||
bugherder |
Comment 48•5 years ago
|
||
Backed out as requested by gbrown in https://bugzilla.mozilla.org/show_bug.cgi?id=1581345#c20 for causing xpc failures.
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=282273447&repo=autoland
Backout: https://hg.mozilla.org/integration/autoland/rev/a8bcde65765354e3c4f5d4e8c5f7fc5ff288960c
Updated•5 years ago
|
Assignee | ||
Comment 49•5 years ago
|
||
Looks like one of the ccov tasks happened to get only a single manifest that was skipped on that configuration. So it didn't run any tests and the harness complained. I'll need to try and filter out skipped tests before running the chunking algorithm.
Assignee | ||
Comment 50•5 years ago
|
||
Assignee | ||
Comment 51•5 years ago
|
||
Previously we simply ignored skipped tests when performing test chunking in the
taskgraph. This could sometimes result in a task that runs a single manifest
that happens to be skipped on that task's configuration. This results in an error
that no tests were run from the harness.
This patch attempts to guess a mozinfo based on a task's definition. Then it
uses those values to take skipped tests into account when running the chunking
algorithm. That way the chances of scheduling a manifest that would be skipped
anyway are much lower.
Depends on D58987
Comment 52•5 years ago
|
||
Comment 53•5 years ago
|
||
bugherder |
Assignee | ||
Comment 54•5 years ago
|
||
Assignee | ||
Comment 55•5 years ago
|
||
Depends on D59559
Assignee | ||
Comment 56•5 years ago
|
||
I also noticed that xpcshell ccov chunking had one chunk that ran much longer than the others, such that it was hitting a timeout. The latest two patches (adding ccov to mozinfo guess and updating runtimes) seems to fix this.
Assignee | ||
Comment 57•5 years ago
|
||
Hm, something is still not working with the xpcshell chunking. Now ccov is green, but Windows opt has one chunk hitting the max runtime (90+ min) and the other only taking 14 min.
Originally I was hoping to just use chunk_by_runtime
for everything, but I think I'll have to add the ability for tasks to define which chunking algorithm they use.
Relatedly, I also think this bug has run its course. I'm going to close it out and file follow-ups for the remaining suites.
Assignee | ||
Updated•5 years ago
|
Comment 58•5 years ago
|
||
Comment on attachment 9120160 [details]
Bug 1583353 - [taskgraph] Add 'ccov' to the mozinfo guess when chunking tests, r?gbrown
Revision D59559 was moved to bug 1608833. Setting attachment 9120160 [details] to obsolete.
Comment 59•5 years ago
|
||
Comment on attachment 9120161 [details]
Bug 1583353 - Update test runtimes files, r?gbrown
Revision D59560 was moved to bug 1608833. Setting attachment 9120161 [details] to obsolete.
Comment 60•5 years ago
|
||
Comment on attachment 9116138 [details]
Bug 1583353 - [ci] Chunk xpcshell in the taskgraph, r?gbrown
Revision D57326 was moved to bug 1608833. Setting attachment 9116138 [details] to obsolete.
Comment 61•5 years ago
|
||
Comment on attachment 9117354 [details]
Bug 1583353 - [ci] Chunk mochitest-a11y and plain in the taskgraph, r?gbrown
Revision D58039 was moved to bug 1608834. Setting attachment 9117354 [details] to obsolete.
Description
•