Bug 1583353 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

A major problem we've had with our CI for as long as I can remember (predating taskcluster), is that the tests that run in a task aren't known until task runtime. Instead the chunking information (thisChunk / totalChunks) is defined in the taskgraph and then an algorithm use that information to compute which tests should run within the test harness itself.

This is a problem for several reasons:
1. Tests can jump around chunks between pushes.
2. Given a test, hard to find which task it ran in.
3. Failure bisection might yield a false positive (e.g test moved chunks rather than fixed).
4. Scheduling is difficult, advanced techniques like ccov or machine learning won't work.
5. Makes getting rid of chunks (from a developer UX perspective) impossible.

There are probably many more problems I could come up with if pressed. Over the years we've tried many solutions to several of these problems, all with minimal levels of success. There's a hard barrier that we keep running up against.

**We need to define which tests run in which tasks within the taskgraph itself**. In other words, rather than setting chunks in the taskgraph and performing the chunking operation later, we do this chunking operation *as part of task generation*. So the decision task knows exactly which tests will run in which tasks.

This is a major shift in how we think about CI. It will upend workflows, break assumptions and cause all sorts of new problems. But I'm firmly convinced it will be well worth the effort.

This bug is being filed as a consequence of the following document:
https://gist.github.com/ahal/5b66ec2cf981d1398c05335bbe44633b

It will act as a meta bug to collect smaller work items as dependencies.
A major problem we've had with our CI for as long as I can remember (predating taskcluster), is that the tests that run in a task aren't known until task runtime. Instead the chunking information (thisChunk / totalChunks) is defined in the taskgraph and then an algorithm uses that information to compute which tests should run within the test harness itself.

This is a problem for several reasons:
1. Tests can jump around chunks between pushes.
2. Given a test, hard to find which task it ran in.
3. Failure bisection might yield a false positive (e.g test moved chunks rather than fixed).
4. Scheduling is difficult, advanced techniques like ccov or machine learning won't work.
5. Makes getting rid of chunks (from a developer UX perspective) impossible.

There are probably many more problems I could come up with if pressed. Over the years we've tried many solutions to several of these problems, all with minimal levels of success. There's a hard barrier that we keep running up against.

**We need to define which tests run in which tasks within the taskgraph itself**. In other words, rather than setting chunks in the taskgraph and performing the chunking operation later, we do this chunking operation *as part of task generation*. So the decision task knows exactly which tests will run in which tasks.

This is a major shift in how we think about CI. It will upend workflows, break assumptions and cause all sorts of new problems. But I'm firmly convinced it will be well worth the effort.

This bug is being filed as a consequence of the following document:
https://gist.github.com/ahal/5b66ec2cf981d1398c05335bbe44633b

It will act as a meta bug to collect smaller work items as dependencies.

Back to Bug 1583353 Comment 0