Closed Bug 1352595 Opened 7 years ago Closed 4 years ago

Switch to !zlib for omni.ja compression

Categories

(Core :: Networking: JAR, enhancement, P5)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gps, Assigned: glandium)

References

Details

(Whiteboard: [necko-would-take])

Attachments

(2 files)

I wrote a very long blog post on the Zstandard compression algorithm a few weeks back: http://gregoryszorc.com/blog/2017/03/07/better-compression-with-zstandard/

Buried in that post are some promising results from compressing omni.ja with zstandard instead of zlib. The total size of an omni.ja with various compression settings are as follows:

zlib:      9,783,749 bytes
zstd l=12: 9,177,410 
zstd dict: 8,073,958

So, using zstandard with dictionary compression yields a total compressed size over 1.5 MB smaller than zlib. On top of that, zstandard decompression speed is faster than zlib. So, if we switched omni.ja from zlib to zstandard, Firefox downloads would be smaller and reading performance may improve due to faster decompress speeds.

This came up in #developers a few minutes ago and bholley - amused by the 1.5 MB reduction - said "that is huge... we should totally do that yesterday." He encouraged me to file a bug, which I'm doing.

As I noted in the blog post and I'll note here, we don't currently ship zstandard in Gecko. But we do ship brotli. And brotli's performance across the board (ratio, compression speed, and decompression speed) is often very similar to zstandard and I suspect that the promising results I achieved with zstandard could be replicated approximately with brotli.

glandium noted that there's nothing requiring us to use zlib in omni.ja. omni.ja just so happens to be a mostly compliant zip file today. But that can change.

If we were to change omni.ja to !zlib, we'd need to change the code for loading it (libjar), the code producing it (in the build system), and possibly code in the updater (wherever that lives). But for a 1+ MB download size reduction, it might be worth the work.

The big wins in size come from the use of dictionary compression. I recommend reading that section in my blog post for more on the topic. If we went with dictionary compression for omni.ja, it would slow down omni.ja generation significantly (probably dozens of seconds to multiple minutes) because "training" a dictionary is CPU intensive. There are workflows where dictionary training could be prohibitively expensive. So we may want to make dictionary compression optional so it doesn't get in the way of e.g. developers doing local builds/testing.
Since we're talking about omni.ja compression and download sizes, I'd like to explicitly point out that the compression strategy used by omni.ja is intrinsically inefficient.

The way omni.ja works is each entry in the archive is compressed independently. Contrast with something like a tar archive, where you feed the raw input into a single compression stream. Using a single compression context yields smaller archives because the compressor can "share" common data across discrete entries. But the flip side - and the reason we use independent compression contexts - is that read operations are fast: you don't have to decompress "extra" data in order to access a particular entry/file. (Dictionary compression is a 3rd approach where you preserve the fast independent access property while yielding smaller compressed sizes via shared data in the dictionary. But dictionary compression won't yield as small an archive as a single compression stream will.)

I'm not sure how the installer works, but if we're shipping omni.ja as-is - as a bucket of discretely compressed files - we could make the installer significantly smaller by compressing each file into a larger stream and then having the installer produce the omni.ja file during installation on the target machine. Of course, this involves writing code to do that. And, there would be overhead to recompress all the files into an omni.ja, slowing down installation. But this should make the download size smaller than merely swapping out zlib for <insert other algorithm>, even with dictionary compression involved.

Another advantage to doing on-device omni.ja creation is you could target the install device. For example, if we could detect we're on a beefy machine with fast I/O, we could opt for no compression inside omni.ja. Firefox could mmap() omni.ja and file access would be insanely fast. Or if you have slow I/O, you could continue to use compression, possibly choosing between {zlib, brotli, zstd, lz4, etc} as we deem appropriate.
CCing some people who care about download size - 1.5MB is a lot.

Also, comment 0 indicates that this may improve startup time. Ehsan, how much (if at all) is QF focused on startup?
Flags: needinfo?(ehsan)
Bugs 772868 and 1231379 have more background & thinking about how to improve omni.ja compression.

Another thing to keep in mind is update sizes - keeping omni.ja uncompressed resulted in remarkably smaller installers and updates. The installers and complete updates get smaller because they compress over the complete set of files, and the partial updates get smaller because diffing uncompressed data generally yields better results than diffing compressed data.

If we want to assemble a local version of omni.ja on installation, we'd have to add the same support for updates. Historically we've been reluctant to add this kind of complexity to the updater.

Perhaps we could have an omni.ja with two sections: the first section of 'preloaded' files, and the second section of everything else. Each could be compressed independently. Firefox would load and decompress the first section on start in order to speed up start time.
Some specific numbers for disabling omni.ja compression from our release sizes dashboard[1]:
* Win64 en-US complete update with disabled compression: 52.9MB
* Win64 en-US complete update with regular zip compression in omni.ja: 57.9MB (+5MB)

Partial updates also jumped from about 8MB to about 12MB.

[1] https://www.hostedgraphite.com/da5c920d/86a8384e-d9cf-4208-989b-9538a1a53e4b/grafana/dashboard/db/release-sizes
(In reply to Chris AtLee [:catlee] from comment #3)
> Perhaps we could have an omni.ja with two sections: the first section of
> 'preloaded' files, and the second section of everything else. Each could be
> compressed independently. Firefox would load and decompress the first
> section on start in order to speed up start time.

That's a fantastic idea. I can easily see this reducing omni.ja size by dozens if not a few hundred kilobytes. It should also make startup faster since it will be faster to decompress a single stream with N discrete elements than to perform N discrete decompression operations. And it will have to perform less overall I/O to read the compressed data from storage.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #2)
> CCing some people who care about download size - 1.5MB is a lot.
> 
> Also, comment 0 indicates that this may improve startup time. Ehsan, how
> much (if at all) is QF focused on startup?

I'm not as confident that !zlib will noticeably improve startup time. Yes, !zlib should be faster on paper. But for small inputs into the decompressor, execution time can be dominated by per-item overhead. (It takes time/data for a decompressor to "warm up" and attain peak speed.) There's a substantial chance we may not notice a speedup switching away from zlib. I just don't know how well a faster decompression algorithm will translate to wall time savings given all the other variables in play.  Given its importance, it is probably worth investigating.
If anyone is looking for a micro-optimization in libjar, we're currently using inflateEnd() in nsZipCursor::~nsZipCursor() and nsJARInputStream::Close(). So, for each item we decompress, we call inflateInit2() + inflateEnd().

There is an inflateReset() that logically does the same thing, but without incurring a malloc()+free(). From my work optimizing python-zstandard, eliminating extra malloc()+free() for frequent compress/decompress operations was good for several MB/s in throughput. And that was with Python in play. This could matter for startup perf.
See Also: → 1351071
I grabbed an ETW trace for Nightly with my daily profile. And, MOZ_Z_inflate_fast() was the most common function at the bottom of the stack from xul.dll during statistical profiling during startup. We spent more time in that function than what looks to be parsing/loading JS (js::frontend::TokenStream::getTokenInternal - which was the 2nd most common function).

While MOZ_Z_inflate_fast() did account for >1% of CPU profiled time, I think there are bigger fish to fry. For example, mozilla::safebrowsing::LookupCacheV4::VerifyChecksum was in the stack for like 5x more time than MOZ_Z_inflate_fast(). Although it was in a separate thread, so it might not be in the critical path. It is using SHA-256 for that checksum verification though. Not sure why we need a cryptographic hash for file integrity in the critical path of startup - one would think a fast hash like CRC would be sufficient for this use case. I dunno.

Also, everything JavaScript seems to dominate the stack profiles on startup. js::jit::DoCallFallback is on the stack for ~25% of samples in the main thread. It is hard for me to make sense of what all is happening. But it certainly looks like if we want to make startup faster we should be optimizing the JS that runs on startup (probably by running less of it).

Another random observation: XPIs used by add-ons also use zib/zlib file formats. So any performance issues we identify with omni.ja also apply to some extent to add-ons. Not sure if we have the flexibility to change the archive format used by add-ons, however.
So, building a brotli-based omni.ja, only doing the straightforward thing (replacing zlib with brotli) made the package step take 1.5 minutes on my machine, and yielded the following sizes when repacking last linux64 nightly:
                     zlib      brotli     diff
  omni.ja           5770429   4904625   -865804
  browser/omni.ja  13059559  11606962  -1452597
  total            18829988  16511587  -2318401

(with the defaults used by the python brotli module, which are apparently quality 11 (max), log window size 22, and no dictionary).

That's an easy 2.3MB win (assuming I didn't mess things up), without doing more than switching the compression algorithm. I'll go ahead and implement the necessary bits for this. I expect the most problematic part will be building the python module...

Other ideas mentioned in comments would require much more work, and should be filed separately if deemed worth pursuing.
Assignee: nobody → mh+mozilla
(In reply to Gregory Szorc [:gps] from comment #8)
> For example, mozilla::safebrowsing::LookupCacheV4::VerifyChecksum was in the stack for
> like 5x more time than MOZ_Z_inflate_fast().

You should definitely file a bug on that, because it doesn't make sense that we'd spend time doing safebrowsing checks during startup, except if you have a lot of open tabs that are being restored.
Component: General → Networking: JAR
Product: Firefox → Core
Depends on: 1340910
(In reply to Mike Hommey [:glandium] from comment #9)
> (with the defaults used by the python brotli module, which are apparently
> quality 11 (max), log window size 22, and no dictionary).

That window size seems a bit large for our data. You may want to make it smaller for better compatibility with memory-constrained devices. Also, the compressor may want to take the original file size into account and clip the window size so it isn't larger than that. Otherwise if this isn't done in the internals of the compressor, you may just waste memory on the decompressor allocating space for a window that will never be filled. Of course, there are advantages to using a single allocation for the window on the decompressor. So we may want to use a single window size determined by the largest input file.

(Window sizes aren't really a problem with zlib because the max window size is 32kb and it is really hard to get into trouble with 32kb in 2017.)
(In reply to Mike Hommey [:glandium] from comment #10)
> (In reply to Gregory Szorc [:gps] from comment #8)
> > For example, mozilla::safebrowsing::LookupCacheV4::VerifyChecksum was in the stack for
> > like 5x more time than MOZ_Z_inflate_fast().
> 
> You should definitely file a bug on that, because it doesn't make sense that
> we'd spend time doing safebrowsing checks during startup, except if you have
> a lot of open tabs that are being restored.

Bug 1353956
No longer depends on: 1340910
Depends on: 1355661
Depends on: 1355671
Whiteboard: [necko-would-take]
Comment on attachment 8899726 [details]
Bug 1352595 - Add basic support for brotli compression to the packager.

https://reviewboard.mozilla.org/r/171054/#review177032

This generally seems to be on the right track.

The thing that bothers me most about this patch is trying to shoehown ``False``, ``True``, and string values to mean different compression settings. That's a recipe for bugs. I'd feel much better if we used strings (or non-bool "constants") everywhere. But the commit message implies this is meant for experimental testing. So I'm inclined to defer to a follow-up if it unblocks initial Brotli experimentation.

::: python/mozbuild/mozpack/copier.py:582
(Diff revision 1)
> -
> -                if path in old_contents:
> +                # Because l10n repacks can't handle brotli just yet, but need
> +                # to be able to decompress those files, per UnpackFinder and
> +                # formatters, we force deflate on them.
> +                if compress == JAR_BROTLI and (
> +                        isinstance(file, (ManifestFile, XPTFile)) or
> +                        mozpath.basename(path) == 'install.rdf'):
> +                    compress = True

This is a bit hacky. Per commit message, this is intended as temporary. Could you please reinforce that assertion with an in-line comment?

::: python/mozbuild/mozpack/mozjar.py:700
(Diff revision 1)
> +                    compress_level, zlib.DEFLATED, -MAX_WBITS)
> +            else:
> +                self._deflater = BrotliCompress()
>              self._deflated = BytesIO()
>          else:
> +            assert compress == False

There are ``is`` checks elsewhere. Should this be ``compress is False``?

::: python/mozbuild/mozpack/mozjar.py:801
(Diff revision 1)
> +        proc = subprocess.Popen([Brotli.brotli_tool()] + args,
> +                                stdin=subprocess.PIPE,
> +                                stdout=subprocess.PIPE)
> +        (stdout, _) = proc.communicate(input)
> +        proc.wait()
> +        return stdout

Can `subprocess.check_output()` be used?

Also, should this check the return code?
Attachment #8899726 - Flags: review?(gps) → review-
Comment on attachment 8899727 [details]
Bug 1352595 - Use brotli for omni.ja and xpis on nightly builds.

https://reviewboard.mozilla.org/r/171056/#review177040

This seems fine. I'll conduct final review once I see the next version of the first patch.
Attachment #8899727 - Flags: review?(gps)
(In reply to Gregory Szorc [:gps] from comment #15)
> Can `subprocess.check_output()` be used?

How do you feed stdin to check_output?
(In reply to Gregory Szorc [:gps] from comment #15)
> Comment on attachment 8899726 [details]
> Bug 1352595 - Add basic support for brotli compression to the packager.
> 
> https://reviewboard.mozilla.org/r/171054/#review177032
> 
> This generally seems to be on the right track.
> 
> The thing that bothers me most about this patch is trying to shoehown
> ``False``, ``True``, and string values to mean different compression
> settings.

Huh? there's no string values involved except in packager.py.
Comment on attachment 8899726 [details]
Bug 1352595 - Add basic support for brotli compression to the packager.

https://reviewboard.mozilla.org/r/171054/#review178248

This looks better. Thanks for addressing feedback.

::: python/mozbuild/mozpack/mozjar.py:814
(Diff revision 2)
> +            raise Exception("Brotli compression failed")
> +        return stdout
> +
> +    @staticmethod
> +    def compress(data):
> +        return Brotli.run_brotli_tool(['--window', '17'], data)

This corresponds to a window size of ``(1 << 17) - 16 == 131056`` bytes. This is ~4x the size of zlib's 32kB window.

This seems like a reasonable value to me. The decompressor will allocate this much memory, so you don't want it to be too large. And I doubt many assets in these files are larger than ~128kb. If anything, we may want to decrease the window because assets aren't that large. But we can experiment with this later.
Attachment #8899726 - Flags: review?(gps) → review+
Comment on attachment 8899726 [details]
Bug 1352595 - Add basic support for brotli compression to the packager.

https://reviewboard.mozilla.org/r/171054/#review177032

> Can `subprocess.check_output()` be used?
> 
> Also, should this check the return code?

To answer your question, I believe you can pass an ``io.BytesIO`` as ``stdin`` to any ``subprocess`` function.
Comment on attachment 8899727 [details]
Bug 1352595 - Use brotli for omni.ja and xpis on nightly builds.

https://reviewboard.mozilla.org/r/171056/#review178252
Attachment #8899727 - Flags: review?(gps) → review+
(In reply to Gregory Szorc [:gps] from comment #22)
> To answer your question, I believe you can pass an ``io.BytesIO`` as
> ``stdin`` to any ``subprocess`` function.

FWIW, according to the subprocess source, it only accepts PIPE, an int (a file descriptor) or a file-like object that has a fileno() method that returns a file descriptor. I thought that might be something new in python 3, so I checked python 3.5, and it's the same, except it also allows DEVNULL too now.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/fb235372e172
Add basic support for brotli compression to the packager. r=gps
https://hg.mozilla.org/integration/autoland/rev/9b80c62b2cbd
Use brotli for omni.ja and xpis on nightly builds. r=gps
In bug 1362377, we ended up disabling the zlib omni.ja compression to address ts_paint regressions. We also got substantial download size improvements.

I expect that using brotli will regress installer and update size to some extent.
Depends on: 1395067
Depends on: 1395008
this appeared to have given the webextensions talos test an improvement:
== Change summary for alert #9122 (as of August 28 2017 22:48 UTC) ==

Improvements:

  7%  ts_paint_webext linux64 opt e10s     1,028.36 -> 958.92
  2%  tp5o_webext Main_RSS linux64 opt e10s191,190,122.47 -> 187,239,309.78

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9122
(In reply to tabmix.onemen from comment #30)
> After this patch I'm unable to extract omni.ja with 7-Zip and winrar

I consider this a feature not a bug. omni.ja is an implementation detail of how Firefox is built and distributed. It should be considered a black box by everything except Firefox's build tooling and Firefox itself.

If you feel you lost some needed functionality, please tell us what that is so we may consider workarounds.
(In reply to Gregory Szorc [:gps] from comment #31)
> If you feel you lost some needed functionality, please tell us what that is
> so we may consider workarounds.

I've just stumbled upon this. I always unpack/repack omni.ja to do quick tests on strings or code, without having a full build system configured.

Is a follow-up bug the right place to discuss this? I don't mind omni.ja using a different compression system, I just need a tool to unpack and repack it.
(In reply to tabmix.onemen from comment #32)
> One of my needs is the ability to open the file omni.ja, as an open source
> project omni.ja was not design as a way of hiding the code in black box.

Nothing is preventing you from opening omni.ja: it is essentially a zip file but with entries using brotli compression instead of deflate.

This change does not undermine tenets of open source software: the contents of omni.ja are still readable and the original source code is open source. It is no worse (in fact less worse) than compiling C/C++/Rust into .so files (in this case you can't recover the original source code from the Firefox distribution).

> I hope that you can provide an easy way to open and inflate both omni.ja
> files includes with Firefox.

Using off the shelf software like 7-zip, probably not. Using the Python code we have in python/mozbuild/mozpack, it isn't too difficult. ...

(In reply to Francesco Lodolo [:flod] from comment #33)
> Is a follow-up bug the right place to discuss this? I don't mind omni.ja
> using a different compression system, I just need a tool to unpack and
> repack it.

Yes, a follow-up requesting tools for specific workflows/scenarios is the appropriate place to discuss this. It's probably no more than a few dozen lines of Python to come up with a minor tool to do unpacking and repacking for simple scenarios like replacing contents of a single file or "merging" in a directory.
Depends on: 1396203
Couple of questions:

I'm lost track a bit in this bug, do we have the success criteria for brotli laid out here or somewhere else?

Second question, how will we do brotli for artifact builds? In particular for mac, where the upstream builds are done on linux machines. (I hit that platform problem the other day when trying to find a pre-built mar tool.)
This reduced the installer size for the Windows 64 bit pgo installer by 720 KB or one 5¼-inch QD disk :).
(In reply to Kevin Brosnan [:kbrosnan] from comment #37)
> This reduced the installer size for the Windows 64 bit pgo installer by 720
> KB or one 5¼-inch QD disk :).

except... see comment 27.

(In reply to Axel Hecht [:Pike] from comment #35)
> Couple of questions:
> 
> I'm lost track a bit in this bug, do we have the success criteria for brotli
> laid out here or somewhere else?

Not regressing performance like zlib compression does and moving the needle on slower machines.

> Second question, how will we do brotli for artifact builds? In particular
> for mac, where the upstream builds are done on linux machines. (I hit that
> platform problem the other day when trying to find a pre-built mar tool.)

Artifact builds don't care about the contents of omni.ja.
(In reply to Gregory Szorc [:gps] from comment #31)
> If you feel you lost some needed functionality, please tell us what that is
> so we may consider workarounds.

The qbrt project lost the ability to extract devtools from omni.ja for use with qbrt apps.  That was a hacky way to integrate devtools anyway, so it isn't a reason to do something different here.  And while I could presumably add brotli decompression support to the Node module that qbrt is using to expand ZIP archives, a better approach would be to integrate devtools source code into qbrt directly, which may become more feasible once devtools is a system addon.
Backed out the mozconfig changes, but left the packager code in.
https://hg.mozilla.org/integration/mozilla-inbound/rev/b21ad1316ba6

What this means in practice is that the in-tree code can still produce brotli-compressed omni.ja, but doesn't do that on nightlies anymore.

There are two main reasons why I did this:
- Since this wasn't meant to ride the trains in the first place, this solves the immediate issues that brotli-compressed omni.ja causes cf. the comments in this bug because we actually address them with code.
- This will allow to see whether some changes I observed on telemetry were really due to this or to something else (like, for example, there's a discernable drop of the 95th percentile of SIMPLE_MEASURES_SESSIONRESTORED around when this landed. If that's true, we should see a regression after this backout (and once telemetry data piles up, so after a few days).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Mike Hommey [:glandium] from comment #40)
> (...) because we actually address them with code.

s/because/before/
Depends on: 1399427
Status: REOPENED → ASSIGNED
Target Milestone: mozilla57 → ---
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
The backout canceled the improvements. These are similar to those from comment 29:

== Change summary for alert #9396 (as of September 12 2017 08:30 UTC) ==

Regressions:

  7%  tp5o_webext Main_RSS windows7-32 pgo e10s     126,083,304.46 -> 134,552,952.29
  2%  tp5o_webext Main_RSS linux64 pgo e10s         183,752,696.11 -> 187,947,160.48

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9396
The backout also brought back the previous build time, before this bug landed:

== Change summary for alert #9389 (as of September 12 2017 08:30 UTC) ==

Improvements:

  6%  build times summary windows2012-32 pgo taskcluster-c4.4xlarge     5,169.98 -> 4,876.37

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9389
The backout also left an AWSY regression:

== Change summary for alert #9397 (as of September 12 2017 08:30 UTC) ==

Regressions:

  3%  Resident Memory summary windows10-64 pgo      450,113,394.16 -> 462,843,503.24
  2%  Resident Memory summary windows10-64 opt      460,784,518.92 -> 471,259,319.81

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9397
This makes sense, given the fact that when the bug landed on August 29, this improvement landed:

== Change summary for alert #9439 (as of September 14 2017 12:36 UTC) ==

Improvements:

  2%  Resident Memory summary windows10-64 opt      453,595,412.78 -> 444,709,181.15

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9439
(In reply to Bobby Holley (parental leave - send mail for anything urgent) from comment #2)
> CCing some people who care about download size - 1.5MB is a lot.
> 
> Also, comment 0 indicates that this may improve startup time. Ehsan, how
> much (if at all) is QF focused on startup?

It hasn't been a primary focus of QF but I know some people are watching it closely. I feel like mconley is involved with that effort or would know who was focusing on startup time.
Flags: needinfo?(ehsan) → needinfo?(mconley)
Redirecting to florian, who has been the one grinding down on start-up time.:
Flags: needinfo?(mconley) → needinfo?(florian)
We care about startup time, yes. But I don't see a specific question, so removing the needinfo for now.
Flags: needinfo?(florian)

We switched to not compress omni.ja at all a while ago, so this is now WONTFIX.

Status: ASSIGNED → RESOLVED
Closed: 7 years ago4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.