Open Bug 1432287 Opened 2 years ago Updated 4 months ago
[meta] Run tests from source checkouts in CI
We have a number of bugs floating around for this already. But we're lacking a tracking bug... Essentially, we want to run test tasks from source checkouts. Today, we ship zip or compressed tar files around to different machines. This has a number of problems: 1) Creating archives creates overhead 2) Extracting archives creates overhead 3) Tasks don't have full source context and thus can't easily do things like implement task logic as mach commands 4) Simplifies management of test environment Expanding on these... Creating an archive of files adds overhead to the task producing that archive. It takes time to read files, compress them, and upload them. This increases end-to-end time. While we've spent many engineering hours optimizing this process, no matter how you slice it, it is still overhead. Extracting archives to the local filesystem obviously takes time. You need to download the archive. Spend CPU to decompress it. Incur I/O to write out changes. But the big inefficiency here is that naive extracting of archives is inefficient. You can't just extract an archive over an existing directory because you may have orphaned files since the last extraction. So, we tend to blow away the destination directory first. Or we use separate destination directories for each source archive. CI redundantly extracts the same files over and over. To make matters worse, many files don't change often from archive to archive. e.g. in the archives of test files, typically only a few files change. Assuming you could use a cache, ~100% of the file extractions are redundant. Yes, we could implement "smart" archive extraction. It would know how to delete orphaned files. How to look for unmodified files and skip them. But at the point you do this, you are reinventing version control. And version control is highly optimized to solve this problem of incrementally updating a working directory. So why not use version control? Using version control also gives tasks access to potentially the contents of the full repo. This means you can do things like run `mach` without jumping through hoops. Things like mozharness (which was invented to bridge the gap between mozilla-central and CI) wouldn't need to exist (although mozharness is in tree today so that gap doesn't really exist as much). Having access to the full repo contents would allow more test logic to exist in repo. It would enable code used by people and machines to converge. Finally, by not leaning on source archives and by using version control checkouts, we remove a number of problems around managing the task state. If most of the files we care about are in the version control checkout, we can use version control to manage these files for us. We have a lot of code around ensuring the state of caches, extracting archives to the correct places, etc. If your goal is "ensure a pristine state of files is in a directory," version control solves that quite well and eliminates a whole lot of code from CI land. This work is currently blocked on getting efficient partial clones rolled out to Firefox CI. That is tracked in bug 1428470. That work is staffed for 2018 and should hopefully be deployed and ready to go sometime in Q3.
You need to log in before you can comment on or make changes to this bug.