Enable chunking in the taskgraph for all web-platform-test suites
Categories
(Firefox Build System :: Task Configuration, task, P2)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: egao)
References
(Blocks 1 open bug)
Details
Attachments
(12 files, 3 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
+++ This bug was initially created as a clone of Bug #1608836 +++
In bug 1583353 we started chunking test suites in the taskgraph (rather than at test runtime). Though I only enabled it for a subset of suites.
This bug tracks getting all web-platform-test suites enabled. To enable taskgraph chunking, remove the suite from this config:
https://searchfox.org/mozilla-central/rev/dd1dafd5c9c05640e76af30b58749076e0199704/taskcluster/taskgraph/transforms/tests.py#1280
I normally like to test that we still run the same set of tests before and after. To do this I:
- Push to try scheduling all tests in the suite, both with and without this change.
- Clone https://github.com/mozilla/ci-recipes and follow installation instructions.
- Run
poetry run adr compare_pushes --branch try -r1 <rev1> -r2 <rev2>
- It will take awhile (~10-20min).
- Make sure the output says all tasks are the same. If the output shows one task ran more manifests than another, then we need to investigate.
Getting WPT supported here will likely be a significant effort. Mainly because we rely on manifests pretty heavily. However WPT doesn't really use manifests. We'll need to figure out how to group tests by directory perhaps and update our logic everywhere to support directories instead of manifests. Or perhaps we can pretend that directories are manifests.. More investigation required.
Reporter | ||
Comment 1•5 years ago
•
|
||
Edwin and I had a chat about this bug, here's a summary of the meeting..
Relevant parts of the code:
- The split chunks transform
- The chunking.py utility file
- The TestResolver
Potential problems with WPT:
-
WPT does not have manifests, can maybe use top-level dirs as stand-ins. Could either handle this in the
TestResolver
by setting themanifest_relpath
to these values there, or by using an if statement somewhere inchunking.py
. -
WPT does not have runtime data in
testing/runtimes
. We'll either need to add WPT to these data files (by modifying thewriteruntimes
script) or use a different chunking algorithm. -
WPT does not load test objects at the same time as others. May just need to call this function manually. Though, it might also just work. I'm not sure.
How to test:
./mach taskgraph full
(see https://ahal.ca/casts/2018/taskgraph-like-a-pro/). Remember to log to stderr or taskgraph will get messed up. Look for thetest_manifests
attribute in the full task- Pushing to try and inspecting task (look for MOZHARNESS_TEST_PATHS in the env)
- Should verify we aren't missing tests (I have a script for this but we can cross that bridge later)
Assignee | ||
Comment 2•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
Assignee | ||
Comment 4•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
Changes:
Perform the chunking of web-platform-tests in the decision task.
Changes:
- add the directories under
testing/web-platform/tests
andtesting/web-platform/mozilla/tests
to the taskgraph - implement a new method to parse the directory structure at the top level and add them as web-platform-tests manifests
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
•
|
||
Summary of things investigated/tried so far.
One of the issues with performing chunking of web-platform-tests in the decision task has been how to build the list of tests available for web-platform-tests.
Currently I explored two approaches:
- add the test directories into
taskgraph
sparse-profile, then retrieving the directories under tests. This is further subdivided into two different sub-approaches:
a. use the top-level test directories (per feature) eg.testing/web-platform/tests/2dcontext/
b. use the directories immediately above the leaf nodes eg.testing/web-platform/tests/2dcontext/imagebitmap/
- incorporate
manifestupdate.py
and associated dependencies into thetaskgraph
sparse-profile.
Both approaches have their problems:
- this leads to sparse-profile containing an order of magnitude or more files in the checkout than it does currently (~300 files).
Furthermore, both sub-approaches have issues in that it does not distinguish between various sub-flavors of the web-platform-tests (crashtest, reftest). These sub-flavors are all stored in the same directory as vanilla web-platform-tests and thus the only way to distinguish is to refer to a manifest. manifestupdate.py
in its current, in-tree form requires various dependencies that can luckily be included in the sparse profile, but run into a particular problem at line 34 inmanifestdownload.py
since it attempts to find a mercurial repo (which does not exist in the sparse profile):
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/manifestupdate.py", line 115, in run
[task 2020-03-26T19:49:25.904Z] force=force_download)
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 177, in download_from_taskcluster
[task 2020-03-26T19:49:25.904Z] force)
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 132, in download_manifest
[task 2020-03-26T19:49:25.904Z] url = url_func(logger, commits)
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 85, in taskcluster_url
[task 2020-03-26T19:49:25.904Z] for revision in commits:
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 34, in hg_commits
[task 2020-03-26T19:49:25.904Z] "testing/web-platform/mozilla/tests").splitlines():
[task 2020-03-26T19:49:25.904Z] File "/builds/worker/checkouts/gecko/testing/web-platform/vcs.py", line 13, in hg
[task 2020-03-26T19:49:25.904Z] return subprocess.check_output(full_cmd, cwd=repo_root)
[task 2020-03-26T19:49:25.904Z] File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
[task 2020-03-26T19:49:25.904Z] raise CalledProcessError(retcode, cmd, output=output)
[task 2020-03-26T19:49:25.904Z] CalledProcessError: Command '['hg', 'log', '-fl50', '--template={node}\n', 'testing/web-platform/tests', 'testing/web-platform/mozilla/tests']' returned non-zero exit status 255
In https://phabricator.services.mozilla.com/D68228 as it stands currently (2020/03/26) :ahal has reviewed the patch and has suggested using approach 2 as this helps the chunking algorithm retain important sub-flavor information.
In order to successfully use approach 2, I suspect either of the following has to be done:
- update
manifestupdate.py
andmanifestdownload.py
to download from the latest revision available (as returned by https://searchfox.org/mozilla-central/source/testing/web-platform/manifestdownload.py#74) aslastpushid
. This will involve also modifying the relevant VCS methods to not attempt to call VCS methods as this will not work in sparse profile. - add the web-platform-test manifest files to tree under some directory and add a conditional at https://searchfox.org/mozilla-central/source/testing/mozbase/moztest/moztest/resolve.py#656-658 to read the local file if available. This means the manifest files will have to be manually updated every so often, similar to how
manifest-runtimes-{platform}.json
is manually updated in-tree.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
Summary of things investigated/tried so far, part 2
It was decided to use manifestupdate
and dependencies to gather a better overall picture of the available web-platform-tests. This method will also preserve metadata such as subsuite
and expected outcomes; these metadata are not available if the directory was parsed.
Including the manifestupdate
and supporting scripts as dependency worked fine in the decision task as long as the directory structure of files under testing/web-platform/
was also included. This does not appear to have increased the processing time of the sparse checkout by too much.
The new issue that was encountered and had to be resolved was the lack of runtime data for all tests as ActiveData has a limit of 10,000 results. This means that not all test runs have data recorded, or results obtained using writeruntimes
.
Several approaches were tried:
- use the default configuration of
chunk_by_runtime
, permitting the algorithm to fill in the missing runtimes with a default value. - using an adapted version of
DirectoryHashChunker
fromwptrunner.testloader
. - using a hybrid approach where tests with runtime is chunked using
chunk_by_runtime
and remaining test without runtime information are chunked using a modifiedDirectoryHashChunker
approach.
I am pursuing approach 3 since approach 1 would lead to very unbalanced chunks due to the straight-up application of an assumed average runtime value. Approach 2 would also hypothetically lead to unbalanced chunks since some tests take longer than others.
Issues I've run into with all three approaches is that:
- some chunks appear to take much longer to complete.
- some tests appear to fail consistently when such failures do not occur in the parent push.
These are issues that need to be sorted out.
Reporter | ||
Comment 10•5 years ago
|
||
I'm not super keen on the hybrid approach tbh, I feel like it will make things more complicated. There's a 4th approach which is to modify the writeruntimes
script to use ActiveData's destination: url
feature so we can get all the data required.
I'd vote we either do that or stick to approach #2. While #2 may have less balanced chunks, it has the (pretty large) benefit of not suffling everything around and reducing the chance of things going wrong. We can always switch over to approach #4 after we have taskgraph chunking working if we deem the chunks to be too unbalanced. There's something to be said for minimizing the amount of change we land all at once.
Comment 11•5 years ago
|
||
I vote for #2 as well.
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #10)
I'm not super keen on the hybrid approach tbh, I feel like it will make things more complicated. There's a 4th approach which is to modify the
writeruntimes
script to use ActiveData'sdestination: url
feature so we can get all the data required.I'd vote we either do that or stick to approach #2. While #2 may have less balanced chunks, it has the (pretty large) benefit of not suffling everything around and reducing the chance of things going wrong. We can always switch over to approach #4 after we have taskgraph chunking working if we deem the chunks to be too unbalanced. There's something to be said for minimizing the amount of change we land all at once.
:ekyle informed me of the method to obtain the larger result set after I wrote the update, and I took that and ran with it.
With the larger data set, it appears there are still 59 'manifests' that are still missing runtimes, which is a much more manageable number.
I have two pushes going on at the moment:
- relies entirely on
chunk_by_runtime
, relying on that approach to fill in missing runtime information with an average.
- uses
chunk_by_runtime
for manifests that have data; for the 59 manifests missing runtime data, modifiedDirectoryHashChunker
is used.
Assignee | ||
Comment 13•5 years ago
|
||
The results for the two pushes are in.
Both pushes produced all green results for web-platform-tests.
Given that push 1 (https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=483394b1cddf181390b6886edf56c99aa3ebcbe3) has a simpler algorithm of just using the runtimes to calculate the chunking, I think option 1 gets to be the patch that I will put up for review after some more polishing.
Assignee | ||
Comment 14•5 years ago
|
||
Modify writeruntimes
script to support the querying and writing of web-platform-test related runtimes. These are already stored on ActiveData, so it is simply a matter of querying them.
Changes:
- obtain the full list of web-platform-tests runtimes using ActiveData's destination/output clause.
- add clause to handle the case above where output format is slightly different.
- normalize the paths by prefixing
testing/web-platform
in front of the returned test paths. - record runtimes for tests using the immediate parent node as the stand-in for manifest.
Depends on D68069
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
|
||
Changes:
- if web-platform-tests is specified, call the
add_wpt_manifest_data
method to add web-platform-test information.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
Changes:
- add build_flavor attribute in order to support writeruntimes querying web-platform-tests
- add subsuite attribute for web-platform-tests
Assignee | ||
Comment 18•5 years ago
•
|
||
Few more updates since my discussion with :ahal this past Monday (04/13):
- it was agreed to split the patches into smaller sizes.
- it was agreed to find the simplest solution that works reasonably well, then iterate.
For item 2, the perpetual problem has been that if tests are grouped at the highest level (wpt terminology: groups) then we lose the ability to provide some semblance of balanced chunks. On the other hand, using any additional granularity beyond groups has the unintended effect of double or triple-scheduling tests depending on how many subdirectories exist.
Since the discussion on Monday I have attempted to work with a couple of approaches:
2a. split groups simply by the number of chunks
2b. split groups using test count
2c. split groups using test count but isolate larger groups
For 2a. the approach showed some promise but could not overcome the wildly imbalanced chunk issue:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=pending%2Crunning%2Csuperseded%2Cusercancel%2Cretry%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=61c9779cb5bead66d4161276445cc7b512b21835
Note that linux1804-64/debug chunk 6 has a runtime of 137min.
For 2b. the basic premise is that if for larger groups (eg. html, css) the number of tests contained in that directory will be much higher than groups that are smaller (eg. portals).
The base idea appeared sound, and produced more balanced chunks than before:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=superseded%2Cusercancel%2Cretry%2Cpending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=5d12dcb7942d4a59063167e695a7d11b7e2f0633
Note that linux1804-64/debug
chunk 18 has a runtime of 103min, which is an improvement.
However platform combinations like linux1804-64-asan/opt
or linux1804-64-qr/debug
still produce consistent timeouts so this was not going to work.
In 2c which is my current approach I'm trying to work out is to chunk tests using groups, but to smartly isolate the longest running groups (as reported by runtime information) and split them into smaller list of subgroups if permissible.
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
bugherder |
Assignee | ||
Comment 21•5 years ago
|
||
Perform the chunking of web-platform-tests in the decision task using a new algorithm for web-platform-tests.
Changes:
- group tests using the list of
groups
used by web-platform-tests, represented in the test object'smanifest
attribute. - for groups that are identified as long-running, attempt to group tests into sub-groups where possible while ensuring that double-scheduling does not occur.
- populate chunks based on the the following criteria, in order of importance:
- test count must not exceed a certain threshold.
- groups must stay together, or in adjascent chunks.
- test count is to be balanced as much as possible.
Assignee | ||
Comment 22•5 years ago
|
||
Changes:
- pull web-platform-tests runtime data for all platforms.
Assignee | ||
Comment 23•5 years ago
|
||
Changes:
- remove the web-platform-tests blacklist to enable chunking in the decision task.
Comment 24•5 years ago
|
||
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
bugherder |
Comment 27•5 years ago
|
||
Comment 28•5 years ago
|
||
bugherder |
Assignee | ||
Comment 29•5 years ago
|
||
Updated•5 years ago
|
Comment 30•5 years ago
|
||
Comment 31•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 32•5 years ago
|
||
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
bugherder |
Assignee | ||
Comment 35•5 years ago
|
||
Comment 36•5 years ago
|
||
Updated•5 years ago
|
Comment 37•5 years ago
|
||
bugherder |
Comment 38•5 years ago
|
||
Comment 39•5 years ago
|
||
bugherder |
Comment 40•5 years ago
|
||
Comment 42•5 years ago
|
||
Assignee | ||
Comment 43•5 years ago
|
||
(In reply to Pulsebot from comment #42)
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/f340bbb582d1
Adjust assertion count for one and disable another wpt test because
permafailed after rechunking. CLOSED TREE
Thanks for dealing with this one. I'm surprised this test is now a permafail after rechunking.
Comment 44•5 years ago
|
||
bugherder |
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 45•4 years ago
|
||
We need to figure out what's going on in bug 1608837 before landing the reftest parts of this bug.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 46•4 years ago
|
||
I was looking at this and it seems the only suites missing are print reftests and the various backlog ones. I'm not even sure if it makes sense or is worth the effort to support manifest chunking in these. I'm going to close this bug and we can file new ones for the remaining suites should we desire.
Remaining suites can easily be found here:
https://searchfox.org/mozilla-central/search?q=test-manifest-loader%3A+null&path=
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•