Use xz (lzma) for compressing release tarballs instead of bzip2 : smaller archive and faster decompression

RESOLVED FIXED

Status

defect
P4
normal
RESOLVED FIXED
7 years ago
Last year

People

(Reporter: jerome.bouat, Unassigned)

Tracking

(Depends on 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(firefox41 fixed)

Details

(Whiteboard: [release])

Attachments

(1 attachment, 1 obsolete attachment)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20121025205329

Steps to reproduce:

I downloaded the linux archive tar.bz2 in french here :
https://www.mozilla.org/fr/thunderbird/


Actual results:

Bzip2 compression increases the archive size by 15% compared to xz (lzma) :
-----
j@dt:~$ bunzip2 -c thunderbird-16.0.2.tar.bz2 | xz -9e > thunderbird-16.0.2.tar.xz
j@dt:~$ ls -lk thunderbird-16.0.2.tar.*
-rw-r--r-- 1 j j 20455 2012-11-01 12:23 thunderbird-16.0.2.tar.bz2
-rw-r--r-- 1 j j 17820 2012-11-01 12:26 thunderbird-16.0.2.tar.xz
j@dt:~$
-----

Note that decompression is faster with xz than bzip2 :
-----
j@dt:~/tmp$ time { for i in $(seq 1 9) ; do bunzip2 -c thunderbird-16.0.2.tar.bz2 > /dev/null ; done ; }

real	0m48.445s
user	0m48.160s
sys	0m0.320s
j@dt:~/tmp$ time { for i in $(seq 1 9) ; do unxz -c thunderbird-16.0.2.tar.xz > /dev/null ; done ; }

real	0m19.643s
user	0m18.750s
sys	0m0.710s
-----


Expected results:

I think that all tar archives should be compressed with xz in order to save servers and users bandwidths, as well as all intermediate networks and storages (on servers or into temporary user directory).

Moreover, when uncompressing the archive on a laptop, unxz (unlzma) will use less energy that bunzip2. On a laptop, the battery operation could be extended with lzma, especially if you perform a lot of software installations and updates.
is unxz installed by default on linux distros ?
This is more a release engineering or core change rather than being specific to Thunderbird.
Component: Installer → Release Engineering: Releases
Product: Thunderbird → mozilla.org
QA Contact: bhearsum
Version: 16 → other
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> is unxz installed by default on linux distros ?

It usually is I think. At least in openSUSE it is since a few distribution releases. Our RPM payload is lzma/xz compressed as well.
GNU tar seems to have xz compression since 2009-03-05
(http://www.gnu.org/software/tar/)
On the current stable release of Debian :
- the xz-utils package has "required" priority
- xz-utils is a strong dependency of dpkg (which handles the Debian archives to be installed)

See the details on the stable release of Debian :
-----
j@d64:~$ dpkg-query -S /usr/bin/unxz
xz-utils: /usr/bin/unxz
j@d64:~$ dpkg-query -s xz-utils | head -5
Package: xz-utils
Status: install ok installed
Priority: required
Section: utils
Installed-Size: 460
j@d64:~$ dpkg-query -s dpkg | grep -i depends
Pre-Depends: libbz2-1.0, libc6 (>= 2.6), libselinux1 (>= 1.32), zlib1g (>= 1:1.1.4), coreutils (>= 5.93-1), xz-utils
j@d64:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 6.0.6 (squeeze)
Release:	6.0.6
Codename:	squeeze
-----
In Ubuntu 10.04, it's not installed by default, but I believe it is in Ubuntu 12.04
The only drawback of lzma compression is the required memory for decompression (the man page says 65 MiB with -9 option for decompression). However Thunderbird already requires this amount of memory at runtime. Requiring the same memory at installation time shouldn't be an issue.

The man page says xz requires 674 MiB of memory for compression (and a lot of time). However, this step is done only 1 time against the decompression which will be performed thousands times.
In reply to comment #5, the long term support of Ubuntu is now the 12.04 altough the 10.04 is supported without charges up to 2013, April.

For older distributions, a comment on the download page would explain how to install xz if it is missing :
-----
Execute the below command in a terminal :

For Ubuntu like distributions :
"sudo apt-get install xz-utils"

For Debian like distributions :
(as root) "apt-get install xz-utils"

For Redhat like distributions :
(as root) "yum install xz"
-----
Maybe if the developpement releases of Thunderbird start to use the xz compression then the stable releases of Thunderbird will start to use xz compression when the Ubuntu 10.04 version will be ended ?
Component: Release Engineering: Releases → Release Engineering: Automation (General)
QA Contact: bhearsum → catlee
Priority: -- → P4
Summary: xz (lzma) versus bzip2 : smaller archive and faster decompression → Use xz (lzma) for compressing release tarballs instead of bzip2 : smaller archive and faster decompression
Whiteboard: [release]
Product: mozilla.org → Release Engineering
In general I'm in favour of doing this, but I do worry about what other impact this change will have to downstream consumers of these files.

Please start a newsgroup thread advocating for this change so that we can have some discussion there instead of here in the bug.
Please consider also to use lzip. lzip is another user of the lzma algorithm but has some advantages over xz:
- plzip supports multithreaded compression/decompression. xz doesn't support this option yet.
- the lzip project has differents compressors and decompressors written in C and C++. So mozilla would have always an alternative plan if something goes wrong.
- lunzip is really tiny and the next version will have an option to limit the memory use. This is perfect for memory-constrained enviroments. BTW, the standard version uses only 33MB of RAM for files compressed with -9.
- the lzip project uses the GPLv3 license for every project but also has an alternative compressor/decompressor with a public domain license (no complains from i-dont-like-the-gpl environments).
- clzip and lunzip support any OS with a C compiler, no external dependencies. There is also a windows version.
- lzip uses the same interface and same numbers for the exit status (https://lists.gnu.org/archive/html/bug-tar/2013-05/msg00001.html) than bzip2.
- the format is focused on long term archiving and high compression ratios.

 622M firefox-25.0.source.tar
 122M firefox-25.0.source.tar.bz2
98.0M firefox-25.0.source.tar.lz
97.9M firefox-25.0.source.tar.xz

http://lzip.nongnu.org/
(In reply to Juan Francisco Cantero Hurtado from comment #10)
> Please consider also to use lzip. lzip is another user of the lzma algorithm
> but has some advantages over xz:
> - plzip supports multithreaded compression/decompression. xz doesn't support
> this option yet.

There is pxz.

> - the lzip project uses the GPLv3 license for every project but also has an
> alternative compressor/decompressor with a public domain license (no
> complains from i-dont-like-the-gpl environments).

The bug only concerns GNU/Linux packages.

> - the format is focused on long term archiving and high compression ratios.

This bug is about *release* tarballs.  Nightly builds would benefit from compression more, but release builds need recognizability first.

Numbers of packages installed and run recently on Debian popcon users' computers according to http://popcon.debian.org/:
           Inst  Vote
xz-utils 145439 39044
lzip       1272   168
xzdec       169    24
plzip       258    42
lunzip      122    24
pxz          86    10
clzip        60    20
bzip2    138838 66950
pbzip2     3926   499

Weird, bzip2 has fewer installs than xz-utils does?
(In reply to Aleksej [:Aleksej] from comment #11)
> (In reply to Juan Francisco Cantero Hurtado from comment #10)
> > Please consider also to use lzip. lzip is another user of the lzma algorithm
> > but has some advantages over xz:
> > - plzip supports multithreaded compression/decompression. xz doesn't support
> > this option yet.
> 
> There is pxz.

Good to know.

> 
> > - the lzip project uses the GPLv3 license for every project but also has an
> > alternative compressor/decompressor with a public domain license (no
> > complains from i-dont-like-the-gpl environments).
> 
> The bug only concerns GNU/Linux packages.

I was thinking in the source tarballs, my bad :) . Yes, xz makes more sense in this context. I'll open another bug report.

> 
> > - the format is focused on long term archiving and high compression ratios.
> 
> This bug is about *release* tarballs.  Nightly builds would benefit from
> compression more, but release builds need recognizability first.
> 
> Numbers of packages installed and run recently on Debian popcon users'
> computers according to http://popcon.debian.org/:
>            Inst  Vote
> xz-utils 145439 39044
> lzip       1272   168
> xzdec       169    24
> plzip       258    42
> lunzip      122    24
> pxz          86    10
> clzip        60    20
(In reply to Juan Francisco Cantero Hurtado from comment #10)
> - plzip supports multithreaded compression/decompression. xz doesn't support
> this option yet.

Maybe you should think about using pipes at first like the below command :
---
tar cf - [file 1] [file 2] | xz -9e > archive.tar.xz
---
Basically, this ensure 2 separate processes to run simultaneously. If there are many cores, you will get actual parallelism.


> - lunzip is really tiny and the next version will have an option to limit
> the memory use. This is perfect for memory-constrained enviroments. BTW, the
> standard version uses only 33MB of RAM for files compressed with -9.

The memory used for extracting Thunderbird isn't an issue since you are expecting at least 256 MB physical memory (Windows and Mac) in order to use this software : 
http://www.mozilla.org/en-US/thunderbird/system-requirements/
I just sent a message to the dev-builds mailing list.
Depends on: 733528
Duplicate of this bug: 1015139
Hi mshal,

first patch on enabling the xz archives, this switches source-packages from bz2 to xz compression.

time make -f client.mk source-package

* firefox-35.0a1.source.tar.bz2 
  real    2m30.527s
  user    1m52.656s
  sys     0m9.690s
  size:   169M 

* firefox-35.0a1.source.tar.xz
 real    10m28.532s
 user    10m32.638s
 sys      0m7.321s
 size:    137M


To have it working on my local machine (OSX 10.9) I had to change the --exclude option to: --exclude='*/dist' 
(it was: --exclude='$(MOZILLA_DIR)/dist'), without this, the final archive is bigger than 1GB.
Our release process runs source-package on linux boxes and we don't have the huge tarball issue there.

I don't know if this is a problem of my local setup and/or macosx in general.
Attachment #8497467 - Flags: feedback?(mshal)
(In reply to Massimo Gervasini [:mgerva] from comment #17)
> To have it working on my local machine (OSX 10.9) I had to change the
> --exclude option to: --exclude='*/dist' 
> (it was: --exclude='$(MOZILLA_DIR)/dist'), without this, the final archive
> is bigger than 1GB.
> Our release process runs source-package on linux boxes and we don't have the
> huge tarball issue there.
> 
> I don't know if this is a problem of my local setup and/or macosx in general.

I mentioned this in an email (re-pasting below) -

I think the problem is this block:

  --exclude='$(MOZILLA_DIR)/dist'
ifdef MOZ_OBJDIR
SRC_TAR_EXCLUDE_PATHS += --exclude='$(MOZ_OBJDIR)'
endif

In release builds, we have MOZ_OBJDIR set, so it gets an '--exclude=/path/to/release/obj-firefox'. Since dist is under obj-firefox, dist is also implicitly ignored. This line:

  --exclude='$(MOZILLA_DIR)/dist'

points to a non-existent directory, since it's really $(MOZILLA_DIR)/objdir/dist that we want to ignore. I think it's just a bug from the original implementation that was missed because the MOZ_OBJDIR exclude effectively hides it. However, when you build locally, you probably don't set MOZ_OBJDIR explicitly, so that exclude doesn't show up. We should probably fix this instead to be:

-  --exclude='$(MOZILLA_DIR)/dist'
-ifdef MOZ_OBJDIR
-SRC_TAR_EXCLUDE_PATHS += --exclude='$(MOZ_OBJDIR)'
-endif
+  --exclude='$(notdir $(MOZ_BUILD_ROOT))'

Can you try this instead and see if it helps? If so, you should probably make this a separate bug to fix the exclude paths rather than include it as part of the xz changes.
Comment on attachment 8497467 [details] [diff] [review]
xz compression for souce-package

>diff --git a/toolkit/mozapps/installer/packager.mk b/toolkit/mozapps/installer/packager.mk
>--- a/toolkit/mozapps/installer/packager.mk
>+++ b/toolkit/mozapps/installer/packager.mk
>@@ -957,14 +957,14 @@ SRC_TAR_EXCLUDE_PATHS += \
>   --exclude='.mozconfig*' \
>   --exclude='*.pyc' \
>   --exclude='$(MOZILLA_DIR)/Makefile' \
>-  --exclude='$(MOZILLA_DIR)/dist'
>+  --exclude='*/dist'

See #c18

> ifdef MOZ_OBJDIR
> SRC_TAR_EXCLUDE_PATHS += --exclude='$(MOZ_OBJDIR)'
> endif
> CREATE_SOURCE_TAR = $(TAR) -c --owner=0 --group=0 --numeric-owner \
>   --mode=go-w $(SRC_TAR_EXCLUDE_PATHS) -f
> 
>-SOURCE_TAR = $(DIST)/$(PKG_SRCPACK_PATH)$(PKG_SRCPACK_BASENAME).tar.bz2
>+SOURCE_TAR = $(DIST)/$(PKG_SRCPACK_PATH)$(PKG_SRCPACK_BASENAME).tar.xz
> HG_BUNDLE_FILE = $(DIST)/$(PKG_SRCPACK_PATH)$(PKG_BUNDLE_BASENAME).bundle
> SOURCE_CHECKSUM_FILE = $(DIST)/$(PKG_SRCPACK_PATH)$(PKG_SRCPACK_BASENAME).checksums
> SOURCE_UPLOAD_FILES = $(SOURCE_TAR)
>@@ -993,7 +993,7 @@ endif
> source-package:
> 	@echo 'Packaging source tarball...'
> 	$(MKDIR) -p $(DIST)/$(PKG_SRCPACK_PATH)
>-	(cd $(MOZ_PKG_SRCDIR) && $(CREATE_SOURCE_TAR) - $(DIR_TO_BE_PACKAGED)) | bzip2 -vf > $(SOURCE_TAR)
>+	(cd $(MOZ_PKG_SRCDIR) && $(CREATE_SOURCE_TAR) - $(DIR_TO_BE_PACKAGED)) | xz -9e > $(SOURCE_TAR)
> 	$(SIGN_SOURCE_TAR_CMD)
> 
> hg-bundle:

Do we have any interest/need to maintain both bz2 and xz? If not, this looks fine to me!
Attachment #8497467 - Flags: feedback?(mshal) → feedback+
(In reply to Michael Shal [:mshal] from comment #18)

> +  --exclude='$(notdir $(MOZ_BUILD_ROOT))'

This assumes MOZ_BUILD_ROOT is a subdirectory of the current directory.

It's preferable not to do such assumptions.
(In reply to Mike Hommey [:glandium] from comment #20)
> (In reply to Michael Shal [:mshal] from comment #18)
> 
> > +  --exclude='$(notdir $(MOZ_BUILD_ROOT))'
> 
> This assumes MOZ_BUILD_ROOT is a subdirectory of the current directory.
> 
> It's preferable not to do such assumptions.

Ahh, ok. What should we use here then?
Depends on: 1075422
I would like to reply to comment #10 to #14 for later reference : the parallel compression makes the compression ratio worth than a single thread compression.

Since the compression speed isn't an issue when building the release tarball, I think the "tar ... | xz -9e" command is a good tradeoff.

If you want to speed up the compression, maybe you could try the "tar ... | xz -9" command (without "-e" extreme option) which provides a good compression ratio.
OS: Linux → All
Hardware: x86_64 → All
Do we need to block on the local OSX case here?
Attachment #8497467 - Attachment is patch: true
Attachment #8497467 - Attachment mime type: text/x-patch → text/plain
Attachment #8497467 - Attachment is obsolete: true
Attachment #8623190 - Flags: review?(mshal)
Comment on attachment 8623190 [details] [diff] [review]
use xz for source archives

Looks fine, as long as try is happy.
Attachment #8623190 - Flags: review?(mshal) → review+
Comment on attachment 8623190 [details] [diff] [review]
use xz for source archives

there is no try
Attachment #8623190 - Flags: checked-in+
https://hg.mozilla.org/mozilla-central/rev/834ad47007f2
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Duplicate of this bug: 935704
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.