Closed Bug 1392370 Opened 3 years ago Closed 2 years ago

Investigate and potentially use multi-threaded xz compression

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: gps, Assigned: glandium)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Toolchain tasks can spend >7 minutes in xz compression. Modern versions of xz-utils support multi-threaded compression via -T. Archives produced with multi-threaded compression can be read by legacy xz tools. Essentially, it splits the input archive into segments and compresses each independently. You will lose some compression ratio. How much depends on block sizes and the input to the compressor.

Given that we spent minutes in xz, let's investigate parallel compression. If the compression ratio loss is minimal, we should probably use `xz -T 0` everywhere so we automatically scale to the number of available threads. Or at least something like `xz -T 4` to get some compression speed-up.
Why not use zstd?
I'm not opposed to zstd. We already use zstd (parallel compression even) for Docker images. zstd isn't packaged as widely yet. So the main downside is we'd have to install binaries in various places. Not the hardest thing to do. Also, tooltool may need to be taught up zstd for cases where we use "unpack": true.
Ah, gah, it would make life miserable for mach bootstrap. For instance, Ubuntu 16.04 (LTS and released a few months ago) comes with an old version of zstd, so even if it's installed, it can't read files produced by zstd 1.x.
While the xz-utils provided in Debian 7 is not modern enough, I added pxz to the toolchain-build docker image in bug 1427326.

Comparing the time spent compression an archive of GCC 6:

$ time tar -Jcf gcc.tar.xz gcc
real    6m17.828s
user    6m17.500s
sys     0m1.964s
$ time tar -cf - gcc | pxz --compress -T $(nproc) > gcc.tar.pxz
real    0m55.424s
user    9m50.416s
sys     0m1.824s
$ ls -l gcc.tar.xz gcc.tar.pxz
-rw-r--r-- 1 root root 160037852 Dec 29 08:52 gcc.tar.pxz
-rw-r--r-- 1 root root 156507952 Dec 29 08:50 gcc.tar.xz

Marginally larger, significantly faster.
Depends on: 1427326
FWIW:

# time tar -cf - gcc | zstd -o gcc.tar.zst
/*stdin*\            : 29.60%   (860518400 => 254714054 bytes, gcc.tar.zst)   
real    0m7.734s
user    0m7.380s
sys     0m0.728s

Even with max compression and max time, we don't get near xz/pxz in terms of size:
# time tar -cf - gcc | zstd -19 -o gcc.tar.zst
/*stdin*\            : 21.21%   (860518400 => 182553689 bytes, gcc.tar.zst2)   
real    4m1.778s
user    4m1.168s
sys     0m1.176s
# time tar -cf - gcc | zstd -19 -T$(nproc) -o gcc.tar.zst
/*stdin*\            : 21.55%   (860518400 => 185462606 bytes, gcc.tar.zst2)   
real    0m26.783s
user    5m29.768s
sys     0m1.016s
zstd level 21 or 22 should get pretty close to xz. But probably still a bit larger. Those modes allocate a very large window and thus require a lot of memory to decompress. We actually had to stop using the higher levels for Mercurial bundles because 32-bit Python processes couldn't allocate enough memory. See bug 1344790.
Depends on: 1431297
Assignee: nobody → mh+mozilla
Comment on attachment 8943513 [details]
Bug 1392370 - Enable xz parallel compression on Debian-based docker images.

https://reviewboard.mozilla.org/r/213852/#review220054

I'll likely autoland this later once the trees are reopened. Don't want to land it now in case tons of stuff piles up behind it.
Attachment #8943513 - Flags: review?(gps) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78e74514176f
Enable xz parallel compression on Debian-based docker images. r=gps
https://hg.mozilla.org/mozilla-central/rev/78e74514176f
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Blocks: 1430878
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.