Linux: Switch from bz2 to xz for Firefox releases on https://ftp.mozilla.org/
Categories
(Release Engineering :: General, enhancement)
Tracking
(relnote-firefox ?, firefox135 fixed)
People
(Reporter: aros, Assigned: hneiva)
References
(Blocks 1 open bug, )
Details
Attachments
(4 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0
Steps to reproduce:
Consider these:
-rw-r--r--. 1 user user 75181163 May 11 11:01 firefox-88.0.1.tar.bz2
-rw-r--r--. 1 user user 64302774 May 11 11:03 firefox-88.0.1.tar.zst
I.e. over 15% smaller.
Time to unpack:
time bzip2 -t firefox-88.0.1.tar.bz2
real 0m7.095s
user 0m7.072s
sys 0m0.018s
time zstd -t firefox-88.0.1.tar.zst
firefox-88.0.1.tar.zst : 230144000 bytes
real 0m0.318s
user 0m0.311s
sys 0m0.007s
I.e. 22 times faster.
Compressed using --ultra -22 --long
Comment 1•4 years ago
|
||
There are compatibility reasons to keep bz2, but last time I think we discussed this, zstd was still young. Maybe time to reconsider?
Comment 2•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #1)
There are compatibility reasons to keep bz2, but last time I think we discussed this, zstd was still young. Maybe time to reconsider?
All the main distros support zstd out of box now as far as I know, so I don't see a compatability reason why we shouldn't do this. The only old bug I can find on this is https://bugzilla.mozilla.org/show_bug.cgi?id=1303190.
Our builds are much less used than distro builds, though, so the benefits are more marginal than one might think. Given that we're very, very understaffed at the moment, I doubt we'd be able to prioritize this right now.
Reporter | ||
Comment 3•4 years ago
|
||
- ZSTD nowadays is supported out of the box by all major distros
- Most distro users use Firefox bundled by the distro regardless, so they don't use the official packages, so whatever format you're distributing Firefox in doesn't affect them
- Those who actually download Firefox from your website I'm sure can install ZSTD on their system even if their distro is really old.
You'll save a lot of space by converting to ZSTD.
Updated•3 years ago
|
Reporter | ||
Comment 4•2 years ago
|
||
This is still actual.
Updated•2 years ago
|
Comment 5•2 years ago
|
||
(In reply to Artem S. Tashkinov from comment #3)
You'll save a lot of space by converting to ZSTD.
Thank you for bringing this up, Artem! I know cloud storage is usually cheap. That said, I don't know how much bandwidth we could save if we publish Linux archives that are 15% smaller. Tom, is this number easy to estimate for the CloudOps team? If not, what team should we reach out to?
Reporter | ||
Comment 6•2 years ago
|
||
It's not about just saving space. As I've shown earlier zstd is 30 faster at decompression than bzip2.
Comment 7•2 years ago
•
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #5)
(In reply to Artem S. Tashkinov from comment #3)
You'll save a lot of space by converting to ZSTD.
Thank you for bringing this up, Artem! I know cloud storage is usually cheap. That said, I don't know how much bandwidth we could save if we publish Linux archives that are 15% smaller. Tom, is this number easy to estimate for the CloudOps team? If not, what team should we reach out to?
Yes, cloud storage is on the cheaper side but the bandwidth and CDN costs come into play as well to distribute it. It's not a quick estimate as we'd have to segment it out from the other items sharing the bandwidth and CDN. Honestly, if the level of development effort is not that big, I don't see a downside into saving costs and speeding up decompression.
Updated•3 months ago
|
Comment 8•3 months ago
|
||
downloading a recent nightly:
$ wget https://archive.mozilla.org/pub/firefox/nightly/2024/09/2024-09-08-21-18-00-mozilla-central-l10n/firefox-132.0a1.fr.linux-x86_64.tar.bz2
# size = 97805542
$ tar jxvf ../firefox-132.0a1.fr.linux-x86_64.tar.bz2
I just tried with:
$ ZSTD_CLEVEL=19 tar -I zstd -cvpf firefox-132.0a1.fr.linux-x86_64.tar.zst firefox
# size = 84456956
xz is doing better:
$ XZ_OPT=-9 tar -cJf firefox-132.0a1.fr.linux-x86_64.tar.xz firefox
# size = 76992276
Reporter | ||
Comment 9•3 months ago
|
||
XZ is very slow to unpack, in fact it's more than ten times slower and offers only a marginal ~5% compression ratio improvement.
In fact a year ago or so I convinced NVIDIA to switch from XZ to ZSTD and they actually liked it.
Reporter | ||
Comment 10•3 months ago
|
||
Also, you did not use better compression options for ZSTD, please do:
They are --ultra -22 --long
.
The default options are meant for serving web content and are not so good for offline compression.
Reporter | ||
Comment 11•3 months ago
|
||
And with proper options:
zstd --ultra -22 --long *.tar
xz -9e *.tar
bz2
-rw-r--r--. 1 birdie birdie 97805542 Sep 8 23:49 firefox-132.0a1.fr.linux-x86_64.tar.bz2
-rw-r--r--. 1 birdie birdie 77070748 Sep 8 23:49 firefox-132.0a1.fr.linux-x86_64.tar.xz
-rw-r--r--. 1 birdie birdie 82182583 Sep 8 23:49 firefox-132.0a1.fr.linux-x86_64.tar.zst
And time to decompress:
time zstd -t *.zst
firefox-132.0a1.fr.linux-x86_64.tar.zst: 316395520 bytes
real 0m0.303s
user 0m0.296s
sys 0m0.029s
time xz -t *xz
real 0m2.601s
user 0m2.579s
sys 0m0.017s
time bzip2 -t *bz2
real 0m5.486s
user 0m5.454s
sys 0m0.020s
bzip2 is the absolute worst.
ZSTD is 8.7 times faster than XZ and 18 times faster than bzip2.
Assignee | ||
Comment 12•3 months ago
|
||
Ran a few tests to see what the average gain would be across different locales:
ach linux-x86_64: 83.38mb -> 69.84mb (16.23% smaller)
ach linux-i686 : 84.37mb -> 72.75mb (13.77% smaller)
en-CA linux-i686 : 84.63mb -> 72.88mb (13.89% smaller)
en-US linux-x86_64: 83.18mb -> 69.94mb (15.91% smaller)
fr linux-i686 : 84.68mb -> 73.21mb (13.55% smaller)
en-US linux-i686 : 84.50mb -> 72.85mb (13.78% smaller)
en-CA linux-x86_64: 83.27mb -> 69.96mb (15.98% smaller)
fr linux-x86_64: 83.67mb -> 70.28mb (16.01% smaller)
fi linux-i686 : 84.38mb -> 72.82mb (13.69% smaller)
fi linux-x86_64: 83.19mb -> 69.90mb (15.97% smaller)
es-ES linux-x86_64: 83.68mb -> 70.11mb (16.22% smaller)
bs linux-x86_64: 83.16mb -> 69.86mb (15.99% smaller)
he linux-x86_64: 83.27mb -> 69.90mb (16.07% smaller)
pt-BR linux-x86_64: 83.59mb -> 70.06mb (16.19% smaller)
bs linux-i686 : 84.34mb -> 72.79mb (13.69% smaller)
es-ES linux-i686 : 84.63mb -> 73.03mb (13.70% smaller)
he linux-i686 : 84.32mb -> 72.83mb (13.62% smaller)
pt-BR linux-i686 : 84.53mb -> 72.99mb (13.65% smaller)
Average reduction: 14.88%
I used what we'd use in CI for zstd:
cctx = zstd.ZstdCompressor(level=22)
with open(output_file, "wb") as f, cctx.stream_writer(f) as z:
with tarfile.open(mode="w|", fileobj=z) as tf:
with chdir(input_dir):
tf.add("firefox")
Comment 13•2 months ago
|
||
What's the impact on memory usage for decompression, when using level=22 vs lower levels (and vs bzip2, I guess)?
Comment 14•2 months ago
|
||
Also, what's the impact on the time spent compressing the archive (vs xz, I guess).
Assignee | ||
Comment 15•2 months ago
•
|
||
Was curious about xz, so switched my script to use xz/lzma compression:
en-US linux-x86_64: 83.18mb -> 65.46mb (21.30% smaller)
fr linux-i686 : 84.68mb -> 68.52mb (19.09% smaller)
fr linux-x86_64: 83.67mb -> 65.79mb (21.37% smaller)
pt-BR linux-i686 : 84.53mb -> 68.32mb (19.17% smaller)
ach linux-x86_64: 83.38mb -> 65.36mb (21.60% smaller)
en-US linux-i686 : 84.50mb -> 68.18mb (19.31% smaller)
ach linux-i686 : 84.37mb -> 68.09mb (19.30% smaller)
en-CA linux-i686 : 84.63mb -> 68.21mb (19.41% smaller)
fi linux-i686 : 84.38mb -> 68.15mb (19.23% smaller)
fi linux-x86_64: 83.19mb -> 65.43mb (21.35% smaller)
pt-BR linux-x86_64: 83.59mb -> 65.59mb (21.53% smaller)
en-CA linux-x86_64: 83.27mb -> 65.49mb (21.35% smaller)
es-ES linux-x86_64: 83.68mb -> 65.63mb (21.58% smaller)
bs linux-x86_64: 83.16mb -> 65.40mb (21.36% smaller)
he linux-x86_64: 83.27mb -> 65.43mb (21.43% smaller)
es-ES linux-i686 : 84.63mb -> 68.36mb (19.22% smaller)
bs linux-i686 : 84.34mb -> 68.11mb (19.24% smaller)
he linux-i686 : 84.32mb -> 68.16mb (19.16% smaller)
Average reduction: 20.33%
Also ran some tests to find memory usage + time spent.
(keep in mind this is running on my local computer, YMMV)
Using build: en-US linux-x86_64 firefox 130.0 (84MB in tar.bz2 format)
Compressing
zstd with -22
: ~1GB of memory in 171 seconds -> file size: 71MB
xz with -9
: ~730MB of memory in 143 seconds -> file size: 65MB
bzip2 with -9
: ~20MB of memory in 20 seconds -> file size: 84MB (ran via cli and not python)
Decompressing
(all via cli)
zstd: 140MB of memory in 5 seconds
xz: 73MB of memory in 6 seconds
bzip2: 12MB of memory in 10 seconds
By looking at these numbers, it seems xz is a more sensible option than zstd?
It does use more memory than bzip2 to decompress, but I don't think <100MB of ram usage is a huge concern?
Comment 16•2 months ago
|
||
By looking at these numbers, it seems xz is a more sensible option than zstd?
I think we should test on other hardware like old SDD before making a decision.
You should also use hyperfine for benchmarking
Assignee | ||
Comment 17•2 months ago
•
|
||
Ran some decompression benchmarks with hyperfine as suggested by :Sylvestre (thanks for the tool suggestion BTW!)
Note: vms running with the same version of debian on GCP
2 Core - 2GB ram - VM with balanced SSD disk
$ hyperfine --runs 10 --prepare 'rm -rf firefox/; sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
"tar xf firefox-130.0.tar.xz" \
"tar xf firefox-130.0.tar.zst" \
"tar xf firefox-130.0.tar.bz2"
Benchmark 1: tar xf firefox-130.0.tar.xz
Time (mean ± σ): 6.273 s ± 0.251 s [User: 6.057 s, System: 1.118 s]
Range (min … max): 6.055 s … 6.690 s 10 runs
Benchmark 2: tar xf firefox-130.0.tar.zst
Time (mean ± σ): 970.2 ms ± 59.9 ms [User: 876.4 ms, System: 697.9 ms]
Range (min … max): 867.2 ms … 1090.6 ms 10 runs
Benchmark 3: tar xf firefox-130.0.tar.bz2
Time (mean ± σ): 20.612 s ± 1.315 s [User: 20.216 s, System: 2.002 s]
Range (min … max): 19.011 s … 22.862 s 10 runs
Summary
'tar xf firefox-130.0.tar.zst' ran
6.47 ± 0.48 times faster than 'tar xf firefox-130.0.tar.xz'
21.25 ± 1.89 times faster than 'tar xf firefox-130.0.tar.bz2'
2 Core - 2GB ram - VM with basic HDD disk
$ hyperfine --runs 10 --prepare 'rm -rf firefox/; sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
"tar xf firefox-130.0.tar.xz" \
"tar xf firefox-130.0.tar.zst" \
"tar xf firefox-130.0.tar.bz2"
Benchmark 1: tar xf firefox-130.0.tar.xz
Time (mean ± σ): 7.420 s ± 0.058 s [User: 6.950 s, System: 1.107 s]
Range (min … max): 7.340 s … 7.515 s 10 runs
Benchmark 2: tar xf firefox-130.0.tar.zst
Time (mean ± σ): 1.942 s ± 0.119 s [User: 1.176 s, System: 0.758 s]
Range (min … max): 1.813 s … 2.202 s 10 runs
Benchmark 3: tar xf firefox-130.0.tar.bz2
Time (mean ± σ): 20.234 s ± 0.500 s [User: 19.183 s, System: 1.370 s]
Range (min … max): 19.793 s … 21.289 s 10 runs
Summary
'tar xf firefox-130.0.tar.zst' ran
3.82 ± 0.24 times faster than 'tar xf firefox-130.0.tar.xz'
10.42 ± 0.69 times faster than 'tar xf firefox-130.0.tar.bz2'
Thoughts
zstd is the fastest option, but uses a bit more ram compared to xz (~140mb vs ~75mb)
xz is the most efficient compression ratio (~20% reduction vs bzip2, zstd has ~15% reduction)
Either one of those options are a great improvement over bzip2.
I'd be happy to run other scenarios if you need.
Comment 18•2 months ago
|
||
We can tweak the compression effort of those, but xz
will always beat zstd
in compression ratio, zstd
will always beat everything in speed. bz2
shouldn't be measured at this point.
In general, it's important to take into account the number of downloads and decompression of a file when deciding on a compression format, and the projected cost of bandwidth.
For CI, zstd can be nice: we don't page for transfer, we pay per second. zstd compresses faster, decompresses faster, bandwidth is extremely high.
For releases, what I believe to be the focus here xz
shaves 6MB off the size. We should do further test with the --x86
flag, for binaries (see man page). That can improve compression by a few percentage points (10 ish ?). We don't really case about decode speed (not our problem, and we can do it in the background), because we care about storage cost and egress cost. I believe compression duration isn't of importance, considering the relative amount of compression vs. decompression.
Assignee | ||
Comment 19•2 months ago
|
||
I tried adding the x86 filter like the docs suggested:
filters = [
{"id": lzma.FILTER_X86},
{"id": lzma.FILTER_LZMA2, "preset": 9 | lzma.PRESET_EXTREME},
]
with lzma.open(output_file, "wb", filters=filters) as f:
with tarfile.open(mode="w|", fileobj=f) as tf:
with chdir(input_dir):
tf.add("firefox")
This shaves another 0.55mb off the compressed file:
Original XZ -9: 65.46MB
XZ with filters: 64.91MB
Comment 20•2 months ago
|
||
I think there are two factors, somehow touched by Paul, and what I'm going to talk about is broader than what this single bug is about.
- We care about the size of what we ship to users because it impacts how much external bandwidth we use.
- We care how long our CI pipeline takes, and the longest compressing the archive/installer takes, the worst it is for us. That is even compounded by the fact that we do that twice(!), and I think we should change that.
What happens now is that we build Firefox, and in the same job, we create an archive/installer. Then in a separate task, we unpack that archive/installer, sign things, and recreate an archive/installer. There is, in fact, no reason, for us to keep creating an efficient archive/installer in the first step. In fact, there isn't even a reason to keep creating a .dmg or a .exe in the first step. The output of the build tasks could be a tar.zst with not even the best compression level. What matters is that the repack jobs do their best.
Now, as whether to choose xz or zstd specifically for what we ship to users for linux, all things considered, I would pick xz.
Comment 21•2 months ago
|
||
Assignee | ||
Comment 22•2 months ago
|
||
While I see the appeal of using zstd in CI for its speed, I also think that sticking with xz across the board makes the most sense for simplicity and consistency. Juggling between zstd for CI and xz for shipping would introduce more complexity, especially when ensuring that xz works correctly for final delivery. Keeping things straightforward by using only xz would reduce potential issues, while still achieving our goals for efficient shipping and pipeline optimization. Ideally, we can revisit zstd once things are more aligned, but for now, focusing on one method seems like the most efficient approach.
Comment 23•2 months ago
|
||
We should do a back-of-the-envelope calculation about the number of seconds we'd win by speeding up compression and decompression by 5-ish, and translate this to $, especially in light of what glandium says. It probably isn't negligible, and we seem to already use zst on various artifact, so that would be a win for consistency, not a regression.
Assignee | ||
Comment 24•2 months ago
|
||
Comment 25•2 months ago
|
||
Assignee | ||
Comment 26•1 months ago
|
||
Here are some rough estimates based on a release graph, which I estimated at ~2300 linux tasks
Speed gain zst vs xz (range is with* and without SSD VMs)
- Decompression: 5.2* ~ 5.5 seconds
- Compression: 49 ~ 79* seconds
Total: from 124660 to 194350 seconds
Converted estimate: 35 to 54 hours
Based on c2-standard-8 machine @ $0.07 hourly (before discounts)
ZST over XZ potential savings: $2.45 to $3.78 per release.
All that being said, supporting ZST for in-between tasks and XZ for final product would require us to add steps and intentionally switch the format (likely a repackage job, which currently isn't used by linux builds).
I'll leave that idea as a potential future improvement, unless if anyone objects.
Compression benchmarks:
SSD:
hneiva@hneiva-compression-study-ssd:~/study2$ hyperfine --runs 10 --prepare 'rm -rf ./firefox.tar.*; sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
"tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/" \
"tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox"
Benchmark 1: tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/
Time (mean ± σ): 90.025 s ± 1.301 s [User: 176.509 s, System: 0.719 s]
Range (min … max): 88.407 s … 92.032 s 10 runs
Benchmark 2: tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox
Time (mean ± σ): 169.736 s ± 6.836 s [User: 169.435 s, System: 0.988 s]
Range (min … max): 165.144 s … 183.177 s 10 runs
Summary
'tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/' ran
1.89 ± 0.08 times faster than 'tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox'
Non-ssd:
hneiva@hneiva-crompression-study:~/study2$ hyperfine --runs 10 --prepare 'rm -rf ./firefox.tar.*; sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
"tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/" \
"tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox"
Benchmark 1: tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/
Time (mean ± σ): 159.919 s ± 0.892 s [User: 312.455 s, System: 1.173 s]
Range (min … max): 158.797 s … 161.456 s 10 runs
Benchmark 2: tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox
Time (mean ± σ): 209.398 s ± 3.043 s [User: 206.060 s, System: 1.209 s]
Range (min … max): 202.577 s … 213.635 s 10 runs
Summary
'tar -I 'zstd -22 -T0' -cf firefox-130.0.tar.zst firefox/' ran
1.31 ± 0.02 times faster than 'tar -I 'xz -9 -T0' -cf firefox-130.0.tar.xz firefox'
Comment 27•1 month ago
|
||
Updated•1 month ago
|
Comment 28•1 month ago
|
||
Jason, this change will likely also require changes in fuzzfetch, correct?
Comment 29•1 month ago
|
||
Nope. Fuzzfetch only pulls builds from taskcluster. Assuming this is limited to just ftp.mozilla.org, we don't need to do anything.
Updated•1 month ago
|
Comment 30•1 month ago
|
||
The artifacts on taskcluster will also become target.tar.xz.
Comment 31•27 days ago
|
||
If the files are changing from bz2 to xz, should we rename the bug? ;-)
Updated•27 days ago
|
Comment 32•17 days ago
|
||
Will this only apply (In reply to Julien Cristau [:jcristau] from comment #30)
The artifacts on taskcluster will also become target.tar.xz.
Support for this has been added to fuzzfetch.
Comment 33•17 days ago
|
||
Heitor -- I don't see any discussion of mozregression
on this ticket. Does it need any changes to accommodate this change?
Comment 35•17 days ago
•
|
||
We will need to support two extensions (.bz2
and .xz
) either in a fall-back fashion or more deterministic fashion, since currently mozregression looks for the builds with .bz2
extension in a hard-coded fashion.
Edit: The build extensions are part of a regex, so this work should not be too complex.
Comment 36•17 days ago
|
||
It sounds like we need a blocking ticket for the mozregression
changes, since we can't (shouldn't?) break that tool while rolling this. I'll leave that to the team to manage.
Comment 37•16 days ago
•
|
||
Should be able to merge bug 1931405 today, and it will go out with the next mozregression release which I can deploy shortly thereafter.
Assignee | ||
Comment 38•16 days ago
|
||
Looks like we have all parts covered.
I'm planning on landing this on Monday, November 18th, 2024.
Please post any questions or concerns before landing.
Assignee | ||
Comment 39•13 days ago
|
||
New landing date: November 25th after central->beta merge.
Landing on this date should maximize the time it stays in Nightly.
Comment 40•5 days ago
|
||
Updated•5 days ago
|
Assignee | ||
Comment 41•5 days ago
•
|
||
[Why is this notable]: Should be added to release notes for 134
[Affects Firefox for Android]: No
[Suggested wording]: "Linux binaries are now provided in XZ format, replacing the previous BZ2 format, offering faster unpacking and smaller file sizes."
[Links (documentation, blog post, etc)]: Will add nightly blog post link once it's published
Comment 42•5 days ago
|
||
Backed out for causing mochitest failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/c462120b56496dc32aa7753549002f9df90218e5
Failure log -> self.installer_url was found but symbols_url could not be determined
Comment 43•5 days ago
|
||
Comment 44•5 days ago
|
||
Backed out for causing Slimyet failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/bcb3b60f75ba099feaa0566039049de7c755aeeb
Push where failures started: https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=SEopbd6SQdi2yY8WOWGEag.0&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=337d4c8f793e8f4ef5a8efbca1f9d02edb56b409
Failure log: https://treeherder.mozilla.org/logviewer?job_id=484197209&repo=autoland&lineNumber=686
Comment 45•4 days ago
|
||
Comment 46•4 days ago
|
||
bugherder |
Assignee | ||
Updated•1 day ago
|
Comment 47•1 day ago
|
||
Updated•1 day ago
|
Description
•