Open Bug 1641592 Opened 5 years ago Updated 2 years ago

Better propagation of test paths

Categories

(Firefox Build System :: Task Configuration, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: egao, Unassigned)

References

(Blocks 1 open bug)

Details

Currently, all test paths scheduled for the chunk are passed to mozharness using environment variables.

They are structured like so:

"MOZHARNESS_TEST_PATHS": {
  'web-platform-tests': [
     'test',
     'test-2'
  ]
}

This is problematic for two reasons:

  1. it is unwieldy as the test paths grow.
  2. environment variables have a finite character limit and this is rather quickly reached for windows platforms.

Addressing this issue with some sort of an artifact generated by the decision task that can be downloaded then read by the chunks would result in significantly better chunk balance as it would unlock ability to specific individual tests to be run.

:ahal - I am looking to making this change as the panacea for my web-platform-test chunk imbalance issues.

My work on web-platform-tests-reftest so far leads me to believe that best permanent solution for chunking issues for web-platform-tests is to be able to specify individual tests, thus avoiding both problems: double-scheduling and imbalanced chunks.

Specifically, what led to this bug is that web-platform-tests-reftest on android-em experience permanent chunk timeout anywhere the /css/CSS2 directory was run, since this is quite a large directory. In the current paradigm, web-platform-tests harness splits the /css/CSS2 directory across two or more chunks. I could not replicate this behavior with the current chunking algorithm.

Am I going in the right direction for this? Is this what you had in mind in https://bugzilla.mozilla.org/show_bug.cgi?id=1634554#c4?

Flags: needinfo?(ahal)

Some pushes to back up the statements:

try push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedTaskRun=YQoe1jTYTv2G4_8wGy_1tw-0&revision=a37ed848ff5f09e0af284dfe8bdaa1453c0cac68

try push with timeout extended to 180min: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedTaskRun=YQoe1jTYTv2G4_8wGy_1tw-0&revision=9148eed0e7c1cc8732afb01502a2fd0f2633c77d

Both pushes see chunk 6 of web-platform-tests-reftest time out on android-em.

As an experiment I resurrected the path splitting algorithm that I devised for web-platform-tests, but android-em still times out after 90min.

push with path split: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=c843568b85140061cb2e2037b1c38fc91cbc2bcd

This leads me to believe that web-platform-tests-reftest is irreconcilable with the current chunking algorithm.

Instead of passing in individual tests, maybe start in conjunction with Bug 1634554 so continue passing in groups.

Yes, we'll want to keep passing in groups. I do agree that MOZHARNESS_TEST_PATHS feels like a hack and a more robust method of passing down tests would be valuable.

That said I can think of two cons to the artifact approach:

  1. The task definitions for each task would be identical (other than the label name). I don't know if this is a problem or not.
  2. Setting an env is "free", whereas downloading an artifact adds overhead.

I think this is fairly tangential to bug 1634554 (i.e, we can solve that without this). Though it's definitely worth keeping this one on file too.

Flags: needinfo?(ahal)

Note: Taskcluster does have the ability to "mount" files on the workers:
https://docs.taskcluster.net/docs/reference/workers/generic-worker/docker-posix-payload

Though looks like if we're writing the contents explicitly there's a 64Kb limit. So would have a similar limitation in size as the env. It's also possible to do a url or artifact mount, so could directly mount the file on the worker so mozharness wouldn't need to handle it.

(In reply to Andrew Halberstadt [:ahal] from comment #5)

Note: Taskcluster does have the ability to "mount" files on the workers:
https://docs.taskcluster.net/docs/reference/workers/generic-worker/docker-posix-payload

Note that while I'd really like to have this feature on docker workers, generic-worker's docker support is not complete (and certainly not deployed everywhere.

Severity: -- → N/A
You need to log in before you can comment on or make changes to this bug.