Open Bug 1608837 Opened 7 months ago Updated 1 month ago

Enable chunking in the taskgraph for all web-platform-test suites

Categories

(Firefox Build System :: Task Configuration, task, P2)

task

Tracking

(Not tracked)

People

(Reporter: ahal, Assigned: egao)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: leave-open)

Attachments

(12 files, 3 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

+++ This bug was initially created as a clone of Bug #1608836 +++

In bug 1583353 we started chunking test suites in the taskgraph (rather than at test runtime). Though I only enabled it for a subset of suites.

This bug tracks getting all web-platform-test suites enabled. To enable taskgraph chunking, remove the suite from this config:
https://searchfox.org/mozilla-central/rev/dd1dafd5c9c05640e76af30b58749076e0199704/taskcluster/taskgraph/transforms/tests.py#1280

I normally like to test that we still run the same set of tests before and after. To do this I:

  1. Push to try scheduling all tests in the suite, both with and without this change.
  2. Clone https://github.com/mozilla/ci-recipes and follow installation instructions.
  3. Run poetry run adr compare_pushes --branch try -r1 <rev1> -r2 <rev2>
  4. It will take awhile (~10-20min).
  5. Make sure the output says all tasks are the same. If the output shows one task ran more manifests than another, then we need to investigate.

Getting WPT supported here will likely be a significant effort. Mainly because we rely on manifests pretty heavily. However WPT doesn't really use manifests. We'll need to figure out how to group tests by directory perhaps and update our logic everywhere to support directories instead of manifests. Or perhaps we can pretend that directories are manifests.. More investigation required.

Edwin and I had a chat about this bug, here's a summary of the meeting..

Relevant parts of the code:

  1. The split chunks transform
  2. The chunking.py utility file
  3. The TestResolver

Potential problems with WPT:

  1. WPT does not have manifests, can maybe use top-level dirs as stand-ins. Could either handle this in the TestResolver by setting the manifest_relpath to these values there, or by using an if statement somewhere in chunking.py.

  2. WPT does not have runtime data in testing/runtimes. We'll either need to add WPT to these data files (by modifying the writeruntimes script) or use a different chunking algorithm.

  3. WPT does not load test objects at the same time as others. May just need to call this function manually. Though, it might also just work. I'm not sure.

How to test:

  1. ./mach taskgraph full (see https://ahal.ca/casts/2018/taskgraph-like-a-pro/). Remember to log to stderr or taskgraph will get messed up. Look for the test_manifests attribute in the full task
  2. Pushing to try and inspecting task (look for MOZHARNESS_TEST_PATHS in the env)
  3. Should verify we aren't missing tests (I have a script for this but we can cross that bridge later)
Assignee: nobody → egao
Attachment #9135474 - Attachment description: Bug 1608837 - fix incorrect suite name for web-platform-test reftests/crashtests → Bug 1608837 - fix incorrect suite name for web-platform-test reftests/crashtests in resolve.py

Changes:

Perform the chunking of web-platform-tests in the decision task.

Changes:

  • add the directories under testing/web-platform/tests and testing/web-platform/mozilla/tests to the taskgraph
  • implement a new method to parse the directory structure at the top level and add them as web-platform-tests manifests
Attachment #9135292 - Attachment is obsolete: true
Attachment #9135474 - Attachment description: Bug 1608837 - fix incorrect suite name for web-platform-test reftests/crashtests in resolve.py → Bug 1608837 - remove trailing s from reftest/crashtest variants of web-platform-tests from definition

Summary of things investigated/tried so far.

One of the issues with performing chunking of web-platform-tests in the decision task has been how to build the list of tests available for web-platform-tests.

Currently I explored two approaches:

  1. add the test directories into taskgraph sparse-profile, then retrieving the directories under tests. This is further subdivided into two different sub-approaches:
    a. use the top-level test directories (per feature) eg. testing/web-platform/tests/2dcontext/
    b. use the directories immediately above the leaf nodes eg. testing/web-platform/tests/2dcontext/imagebitmap/
  2. incorporate manifestupdate.py and associated dependencies into the taskgraph sparse-profile.

Both approaches have their problems:

  1. this leads to sparse-profile containing an order of magnitude or more files in the checkout than it does currently (~300 files).
    Furthermore, both sub-approaches have issues in that it does not distinguish between various sub-flavors of the web-platform-tests (crashtest, reftest). These sub-flavors are all stored in the same directory as vanilla web-platform-tests and thus the only way to distinguish is to refer to a manifest.
  2. manifestupdate.py in its current, in-tree form requires various dependencies that can luckily be included in the sparse profile, but run into a particular problem at line 34 in manifestdownload.py since it attempts to find a mercurial repo (which does not exist in the sparse profile):
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/manifestupdate.py", line 115, in run
[task 2020-03-26T19:49:25.904Z]     force=force_download)
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 177, in download_from_taskcluster
[task 2020-03-26T19:49:25.904Z]     force)
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 132, in download_manifest
[task 2020-03-26T19:49:25.904Z]     url = url_func(logger, commits)
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 85, in taskcluster_url
[task 2020-03-26T19:49:25.904Z]     for revision in commits:
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/manifestdownload.py", line 34, in hg_commits
[task 2020-03-26T19:49:25.904Z]     "testing/web-platform/mozilla/tests").splitlines():
[task 2020-03-26T19:49:25.904Z]   File "/builds/worker/checkouts/gecko/testing/web-platform/vcs.py", line 13, in hg
[task 2020-03-26T19:49:25.904Z]     return subprocess.check_output(full_cmd, cwd=repo_root)
[task 2020-03-26T19:49:25.904Z]   File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
[task 2020-03-26T19:49:25.904Z]     raise CalledProcessError(retcode, cmd, output=output)
[task 2020-03-26T19:49:25.904Z] CalledProcessError: Command '['hg', 'log', '-fl50', '--template={node}\n', 'testing/web-platform/tests', 'testing/web-platform/mozilla/tests']' returned non-zero exit status 255

In https://phabricator.services.mozilla.com/D68228 as it stands currently (2020/03/26) :ahal has reviewed the patch and has suggested using approach 2 as this helps the chunking algorithm retain important sub-flavor information.

In order to successfully use approach 2, I suspect either of the following has to be done:

  1. update manifestupdate.py and manifestdownload.py to download from the latest revision available (as returned by https://searchfox.org/mozilla-central/source/testing/web-platform/manifestdownload.py#74) as lastpushid. This will involve also modifying the relevant VCS methods to not attempt to call VCS methods as this will not work in sparse profile.
  2. add the web-platform-test manifest files to tree under some directory and add a conditional at https://searchfox.org/mozilla-central/source/testing/mozbase/moztest/moztest/resolve.py#656-658 to read the local file if available. This means the manifest files will have to be manually updated every so often, similar to how manifest-runtimes-{platform}.json is manually updated in-tree.
Attachment #9135474 - Attachment description: Bug 1608837 - remove trailing s from reftest/crashtest variants of web-platform-tests from definition → Bug 1608837 - fix incorrect names of reftest/crashtest variants of web-platform-tests
Attachment #9135474 - Attachment description: Bug 1608837 - fix incorrect names of reftest/crashtest variants of web-platform-tests → Bug 1608837 - remove trailing s from reftest/crashtest variants of web-platform-tests from definition
Attachment #9135474 - Attachment description: Bug 1608837 - remove trailing s from reftest/crashtest variants of web-platform-tests from definition → Bug 1608837 - fix incorrect names of reftest/crashtest variants of web-platform-tests
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a5c3001dceee
fix incorrect names of reftest/crashtest variants of web-platform-tests r=ahal
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Summary of things investigated/tried so far, part 2

It was decided to use manifestupdate and dependencies to gather a better overall picture of the available web-platform-tests. This method will also preserve metadata such as subsuite and expected outcomes; these metadata are not available if the directory was parsed.

Including the manifestupdate and supporting scripts as dependency worked fine in the decision task as long as the directory structure of files under testing/web-platform/ was also included. This does not appear to have increased the processing time of the sparse checkout by too much.

The new issue that was encountered and had to be resolved was the lack of runtime data for all tests as ActiveData has a limit of 10,000 results. This means that not all test runs have data recorded, or results obtained using writeruntimes.

Several approaches were tried:

  1. use the default configuration of chunk_by_runtime, permitting the algorithm to fill in the missing runtimes with a default value.
  2. using an adapted version of DirectoryHashChunker from wptrunner.testloader.
  3. using a hybrid approach where tests with runtime is chunked using chunk_by_runtime and remaining test without runtime information are chunked using a modified DirectoryHashChunker approach.

I am pursuing approach 3 since approach 1 would lead to very unbalanced chunks due to the straight-up application of an assumed average runtime value. Approach 2 would also hypothetically lead to unbalanced chunks since some tests take longer than others.

Issues I've run into with all three approaches is that:

  • some chunks appear to take much longer to complete.
  • some tests appear to fail consistently when such failures do not occur in the parent push.

These are issues that need to be sorted out.

I'm not super keen on the hybrid approach tbh, I feel like it will make things more complicated. There's a 4th approach which is to modify the writeruntimes script to use ActiveData's destination: url feature so we can get all the data required.

I'd vote we either do that or stick to approach #2. While #2 may have less balanced chunks, it has the (pretty large) benefit of not suffling everything around and reducing the chance of things going wrong. We can always switch over to approach #4 after we have taskgraph chunking working if we deem the chunks to be too unbalanced. There's something to be said for minimizing the amount of change we land all at once.

I vote for #2 as well.

(In reply to Andrew Halberstadt [:ahal] from comment #10)

I'm not super keen on the hybrid approach tbh, I feel like it will make things more complicated. There's a 4th approach which is to modify the writeruntimes script to use ActiveData's destination: url feature so we can get all the data required.

I'd vote we either do that or stick to approach #2. While #2 may have less balanced chunks, it has the (pretty large) benefit of not suffling everything around and reducing the chance of things going wrong. We can always switch over to approach #4 after we have taskgraph chunking working if we deem the chunks to be too unbalanced. There's something to be said for minimizing the amount of change we land all at once.

:ekyle informed me of the method to obtain the larger result set after I wrote the update, and I took that and ran with it.

With the larger data set, it appears there are still 59 'manifests' that are still missing runtimes, which is a much more manageable number.

I have two pushes going on at the moment:

  1. https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=483394b1cddf181390b6886edf56c99aa3ebcbe3
  • relies entirely on chunk_by_runtime, relying on that approach to fill in missing runtime information with an average.
  1. https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=d4a85237e2b809adb37aa597f5a63d38e5deb283
  • uses chunk_by_runtime for manifests that have data; for the 59 manifests missing runtime data, modified DirectoryHashChunker is used.

The results for the two pushes are in.

Both pushes produced all green results for web-platform-tests.

Given that push 1 (https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=483394b1cddf181390b6886edf56c99aa3ebcbe3) has a simpler algorithm of just using the runtimes to calculate the chunking, I think option 1 gets to be the patch that I will put up for review after some more polishing.

Modify writeruntimes script to support the querying and writing of web-platform-test related runtimes. These are already stored on ActiveData, so it is simply a matter of querying them.

Changes:

  • obtain the full list of web-platform-tests runtimes using ActiveData's destination/output clause.
  • add clause to handle the case above where output format is slightly different.
  • normalize the paths by prefixing testing/web-platform in front of the returned test paths.
  • record runtimes for tests using the immediate parent node as the stand-in for manifest.

Depends on D68069

Attachment #9137891 - Attachment is obsolete: true

Changes:

  • if web-platform-tests is specified, call the add_wpt_manifest_data method to add web-platform-test information.
Attachment #9134531 - Attachment description: Bug 1608837 - modify writeruntimes to output web-platform-tests related runtime metrics → Bug 1608837 - modify writeruntimes to output web-platform-tests runtime information
Attachment #9134531 - Attachment description: Bug 1608837 - modify writeruntimes to output web-platform-tests runtime information → Bug 1608837 - modify writeruntimes to output web-platform-tests related runtime metrics
Attachment #9134531 - Attachment description: Bug 1608837 - modify writeruntimes to output web-platform-tests related runtime metrics → Bug 1608837 - modify writeruntimes to output web-platform-tests runtime metrics

Changes:

  • add build_flavor attribute in order to support writeruntimes querying web-platform-tests
  • add subsuite attribute for web-platform-tests

Few more updates since my discussion with :ahal this past Monday (04/13):

  1. it was agreed to split the patches into smaller sizes.
  2. it was agreed to find the simplest solution that works reasonably well, then iterate.

For item 2, the perpetual problem has been that if tests are grouped at the highest level (wpt terminology: groups) then we lose the ability to provide some semblance of balanced chunks. On the other hand, using any additional granularity beyond groups has the unintended effect of double or triple-scheduling tests depending on how many subdirectories exist.

Since the discussion on Monday I have attempted to work with a couple of approaches:

2a. split groups simply by the number of chunks
2b. split groups using test count
2c. split groups using test count but isolate larger groups

For 2a. the approach showed some promise but could not overcome the wildly imbalanced chunk issue:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=pending%2Crunning%2Csuperseded%2Cusercancel%2Cretry%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=61c9779cb5bead66d4161276445cc7b512b21835

Note that linux1804-64/debug chunk 6 has a runtime of 137min.

For 2b. the basic premise is that if for larger groups (eg. html, css) the number of tests contained in that directory will be much higher than groups that are smaller (eg. portals).

The base idea appeared sound, and produced more balanced chunks than before:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=superseded%2Cusercancel%2Cretry%2Cpending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=5d12dcb7942d4a59063167e695a7d11b7e2f0633

Note that linux1804-64/debug chunk 18 has a runtime of 103min, which is an improvement.

However platform combinations like linux1804-64-asan/opt or linux1804-64-qr/debug still produce consistent timeouts so this was not going to work.

In 2c which is my current approach I'm trying to work out is to chunk tests using groups, but to smartly isolate the longest running groups (as reported by runtime information) and split them into smaller list of subgroups if permissible.

Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8ba34bc805aa
add build_flavor and subsuite information to web-platform-tests r=ahal

Perform the chunking of web-platform-tests in the decision task using a new algorithm for web-platform-tests.

Changes:

  • group tests using the list of groups used by web-platform-tests, represented in the test object's manifest attribute.
  • for groups that are identified as long-running, attempt to group tests into sub-groups where possible while ensuring that double-scheduling does not occur.
  • populate chunks based on the the following criteria, in order of importance:
    • test count must not exceed a certain threshold.
    • groups must stay together, or in adjascent chunks.
    • test count is to be balanced as much as possible.

Changes:

  • pull web-platform-tests runtime data for all platforms.

Changes:

  • remove the web-platform-tests blacklist to enable chunking in the decision task.
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ed9b827a4672
modify writeruntimes to output web-platform-tests runtime metrics r=ahal
Depends on: 1633001
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/129843155fa7
add updated runtime files with web-platform-tests data r=ahal
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ae97a2314ad8
add tests from web-platform-tests manifests into list of supported tests r=ahal
Attachment #9144463 - Attachment description: Bug 1608837 - change the name attribute of web-platform-test object to use test.id → Bug 1608837 - change the name attribute of web-platform-test test object to use test.id from the manifest
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ecde25ae664f
change the name attribute of web-platform-test test object to use test.id from the manifest r=ahal
Blocks: 1634551
Blocks: 1634554
Status: REOPENED → NEW
Target Milestone: mozilla76 → ---
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/073717ec4eba
do not normalize web-platform-test paths in writeruntimes r=ahal
Depends on: 1636600
Depends on: 1636602
Depends on: 1637011
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4729e4a1fe6c
perform task chunking in decision task for web-platform-tests r=ahal,jgraham
Attachment #9147553 - Attachment is obsolete: true
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/93ad253ab519
add necessary files to taskgraph sparse-profile for web-platform-tests chunking r=ahal
Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/940a99ae6cfe
enable decision task chunking codepath for web-platform-tests r=ahal
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/f340bbb582d1
Adjust assertion count for one and disable another wpt test because permafailed after rechunking. CLOSED TREE

(In reply to Pulsebot from comment #42)

Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/f340bbb582d1
Adjust assertion count for one and disable another wpt test because
permafailed after rechunking. CLOSED TREE

Thanks for dealing with this one. I'm surprised this test is now a permafail after rechunking.

Regressions: 1638492
Depends on: 1640580
Regressions: 1640580

We need to figure out what's going on in bug 1608837 before landing the reftest parts of this bug.

No longer regressions: 1640580
No longer blocks: 1634554
Depends on: 1634554
Depends on: 1638430
Depends on: 1644593
Depends on: 1645867
Depends on: 1646813
Depends on: 1647383
You need to log in before you can comment on or make changes to this bug.