Open Bug 1907919 Opened 8 months ago Updated 6 months ago

Add support for runtime based chunking to WPT

Categories

(Testing :: web-platform-tests, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: teoxoy, Unassigned)

References

(Blocks 1 open bug)

Details

While working on bug 1907637 we noticed that the current round-robin chunking for WPT might assign lots of long-running tests to the same chunk causing said chunk/job to timeout (for a concrete example see this run). It would be great if tests would be chunked based on previous runtimes similar to other suites where I've seen this supported (mochitests).

I doubt that the runtime of individual tests is available at the time of chunking. We would have to access recordings from CI to determine that, which would be different for platform, build type etc...

What about actually chunking the tests by equally splitting the tests across all the included folders? That would already be a great improvements as the following example shows:

https://treeherder.mozilla.org/jobs?repo=try&revision=09ad5c60a36074eab763c0f5e5550272bd10cdd4&searchStr=canvas%2Casan

Here we have 3 jobs for canvas tests as run for TSAN and ASAN. For all of them I would expect a similar runtime. But that is not the case and we have the following chunking present:

  • canvas1: 3 tests (html/canvas/[historical.any.html, historical.any.worker.html, historical.window.html) and 3 minutes runtime
  • canvas2: 1101 tests (html/canvas/element/*) and 38 minutes runtime
  • canvas3: 1520 tests (html/canvas/offscreen/*) and task timeout after 45min

Allowing an equal split of tests across all the three chunks would make sure that we have a useful usage of resources (no need to setup a worker for just 3 tests) and do not run into task timeouts. It would also help us to keep the runtime in view and increasing chunks whenever we are close to the task timeout.

Sasha and James what do you think?

Flags: needinfo?(james)
Flags: needinfo?(aborovova)

There are two levels of grouping going on here:

  • Test groups, which are sets of tests run in a given browser instance in a defined order without a restart (in other suites these correspond to manifests).
  • Chunks, which are sets of test groups run in the same job in CI.

There are multiple different factors to consider when deciding how to chunk tests into different jobs/tasks:

  • Chunks should be of similar size.
  • Each test group has a fixed overhead from restarting the browser
  • Test groups should remain similar between runs to avoid intermittents that depend on test grouping
  • Ease of reproducing runs between different environments (local vs CI)
  • Implementation complexity

Unlike other test suites, wpt doesn't hardcode test groups using manifest files, but have runtime logic to decide how to group tests.

Currently we don't actually implement any grouping or chunking in wptrunner for CI; the groups are defined by the CI system and we use whatever is passed in for a given chunk. However the CI system uses a relatively simple grouping setup, which also happens to be pretty easy to implement for local runs; basically we group tests by subdirectory, usually up to three levels deep (but with some exceptions where we know the tests have a deeper directory structure). Then, as noticed, the assignment from groups to chunks is rather simplistic.

So there are now two separate issues being discussed in this bug:

  • Can we make test groups more similar sizes, especially to help with pathological cases like canvas where we have ~1 test group per chunk.
  • Can we make chunks more similar in size by adjusting the groups that they contain.

Of course both of those things are possible, but should be separate issues. The latter, which the bug was originally filed for, is something that would only require adjustment of the decision task code that decides which test groups go to make a single chunk. I think you probably want to talk to jmaher about that.

The former (adjusting test group sizes) should be filed as a separate bug, but since I'm here, a few thoughts:

  • We have code in wptrunner that was designed to chunk the tests by maximum possible runtime (using timeout long vs standard). In practice this worked less well than you'd expect and was complex to maintain. Moving to chunking at random turned out to be simpler and work better.
  • Dividing directories at a deeper level is fine, but dividing tests within a directory makes it even harder to figure out how to rerun the tests locally in the same group as was used in CI. In theory we could make this easier to manage e.g. if each test job had an easy way to figure out which tests ran in each group, but it's probably hard to actually implement.
Flags: needinfo?(james)
Flags: needinfo?(aborovova)
You need to log in before you can comment on or make changes to this bug.