Open Bug 1421834 Opened 6 years ago Updated 1 year ago

Use something faster than xz for transferring artifacts between tasks


(Firefox Build System :: Task Configuration, task)



(Not tracked)


(Reporter: gps, Unassigned)


(Blocks 1 open bug)


We currently use xz archives in CI for transferring artifacts between tasks. e.g. for toolchains.

xz archives are notoriously slow to create and extract.

There is a compelling reason to use xz archives: users with slow Internet connections will benefit from the smaller file sizes. Plus xz is pretty ubiquitous.

But for Firefox CI, the use of xz archives slows down tasks since they are waiting on decompression. We have plentiful bandwidth in CI and the benefits of xz (small file sizes) are not needed.

We should replace xz for artifact consumption in CI with something faster.

zstd is an obvious alternative. That will require zstd support on various Docker images and in various utilities.

But using straight archives has another problem: you either have redundant I/O or you have extra files on the filesystem.

When extracting archives, we want the filesystem state to match the archive contents exactly. If you just "extract to path," a previous extracted archive would leave files on the filesystem that aren't in the new archive. The recourse here is to remove the destination directory then extract the new archive. This yields redundant I/O. And for our use cases, it is common for archive content to be very similar, so all this extra work is wasted.

One solution is to extract archives to directories that are keyed to their hash. You won't have the overhead of deleting the destination directory. But you will end up extracting N copies of a file (once for each archive it lives in).

When you squint hard enough, this problem of synchronizing a source with the filesystem has been solved before. rsync is one solution (although you can't rsync from an archive last I checked). Version control tools also solve this general problem. Version control tools are essentially virtual filesystems that have the ability to realize specific revisions on a real filesystem (as a checkout/working directory). When they update between revisions on the filesystem, they only touch files that need touched. And they can purge unwanted/old files.

I had a crazy idea of using Mercurial bundles for the artifact exchange. Essentially, the artifacts would be self-contained Mercurial bundles containing a single changeset. On the destination machine, we would `hg init` an empty repository (if needed), `hg unbundle` to import the contents of the bundle, then `hg purge --all` and `hg up <rev>` to the revision inside the bundle. The bulk of the bundle would be file data. Mercurial would de-dupe this as part of importing the bundle. And when you `hg update` the working directory, Mercurial would essentially do a manifest diff and perform a minimal, incremental update. As a benefit, you would get multi-threaded working directory updates, which would be faster than a naive archiving tool extracting an archive.

While the Mercurial bundle idea is interesting, let's not dwell on it. Swapping in even gzip for xz would several many seconds in automation.
I noticed this while filing bug 1421734. We spend ~35s unpacking gtk3.tar.xz in the linux64-opt build log I linked there. I would assume that all of our build environments already have zstd available, given that we use that for hg bundle compression, but perhaps I'm mistaken.
The time spent on gtk3.tar.xz is not from decompression. The archive is small. The problem is the setup execution. Which will go away with bug 1399679.
As for zstd, hg uses its own copy. So hg using it means nothing about other things. Also, zstd would mean the various commands that use mach artifact toolchain (like mach clang-tidy) would have to cope with the fact that developers don't necessarily have zstd on their machine, or worse, not necessarily the right version (like, the last ubuntu LTS, 16.04, only has an old version that's not compatible with the current zstd format)
zstd on developer machines is a bit scary, as glandium says.

I think whatever we do here, we'll want to continue producing xz archives and using them in certain situations. I think we should produce a supplemental and faster artifact for use in CI (and it can be used elsewhere where supported).
Note that the production of xz artifacts does take a long time for clang and gcc. Like, multiple minutes (5?). On already long jobs. Those jobs don't happen often, but when they do, they delay everything else, so it's actually a problem.
Yes, xz creation time is an issue.

Few ideas for fixing this.

1. Use for parallel compression. We'll sacrifice compression ratio. But it should be minimal.
2. Having secondary tasks for producing additional artifacts. e.g. the "main" task creates a "fast" artifact. This artifact is consumed by other tasks and a "slow" artifact(s) is produced and uploaded.
Product: TaskCluster → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.