Closed Bug 772868 Opened 8 years ago Closed 7 months ago

partial updates for omni.ja are inefficient

Categories

(Toolkit :: Application Update, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX
blocking-kilimanjaro +
blocking-basecamp -

People

(Reporter: catlee, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: b2g-testdriver, unagi)

Attachments

(2 files)

We use bzip2 compressed bsdiff patches for all contents of partial updates. This works generally well for executable files, but turns out not to work well at all for omni.ja. For firefox 12.0 -> 13.0.1 win32 en-US the partial update is 10,244,590 bytes, 3,471,633 of which is omni.ja.patch - around 33%.

This makes sense as bsdiff is trying to generate a binary patch between two compressed archives, and does a bad job of it. Any small change in one of the original uncompressed files would result in a large change to the compressed version.

To measure the approximate amount of changed data, I unpacked omni.ja from both 12.0 and 13.0.1 and ran bsdiff for each pair of files. Removed files are added to a manifest, and new files are copied into the patch directory directly. I then tar/bzip2 the resulting directory of new and patched files. The resulting .tar.bz2 file is 1,361,770 bytes - 2.1MB smaller than the current compressed bsdiff patch. Put another way, the current compressed bsdiff patch is at least 2.5x larger than it  has to be.

For 13.0 -> 13.0.1 the results aren't as striking but still give a 100k reduction in overall size:
omni.ja.patch: 194,763 bytes
tar.bz2: 93,768 bytes
Can you attach your script here please?
I wonder if bug 772841 might have affected these numbers.

I *think* we'll want to implement courgette and then re-evaluate this bug.
Using -0 for zip compression yields some very good results here.

omni.ja.patch is reduced to 1,083,439 bytes and the overall partial mar is reduced to 7,790,845 bytes (vs. 3,471,633 for the patch and 10,244,590 for the whole mar when omni.ja is compressed).

In addition, the complete mars are also slightly smaller as well: the 13.0.1 mar is reduced to 19,094,358 bytes from 20,645,936
(In reply to comment #4)
> Using -0 for zip compression yields some very good results here.
> 
> omni.ja.patch is reduced to 1,083,439 bytes and the overall partial mar is
> reduced to 7,790,845 bytes (vs. 3,471,633 for the patch and 10,244,590 for the
> whole mar when omni.ja is compressed).
> 
> In addition, the complete mars are also slightly smaller as well: the 13.0.1
> mar is reduced to 19,094,358 bytes from 20,645,936

Hmm, wait a second, are you talking about us not compressing omni.ja?  Have you measured how much that regresses Ts and friends? ;-)
CCing the people who may know how much of a regression we're talking about here.
(In reply to Ehsan Akhgari [:ehsan] from comment #5)
> (In reply to comment #4)
> > Using -0 for zip compression yields some very good results here.
> > 
> > omni.ja.patch is reduced to 1,083,439 bytes and the overall partial mar is
> > reduced to 7,790,845 bytes (vs. 3,471,633 for the patch and 10,244,590 for the
> > whole mar when omni.ja is compressed).
> > 
> > In addition, the complete mars are also slightly smaller as well: the 13.0.1
> > mar is reduced to 19,094,358 bytes from 20,645,936
> 
> Hmm, wait a second, are you talking about us not compressing omni.ja?  Have
> you measured how much that regresses Ts and friends? ;-)
See bug 711811 comment #5
We could ship an uncompressed omni.ja and it would actually be a speedup if we could also recompress it according to the user profile to speed things up(ie reorder), decrease installation size(by shipping it uncompressed), compress using filesystem compression on mac, etc. 
Simply leaving omni.ja uncompressed adds megabytes of IO to startup which can translate into seconds.
(In reply to Taras Glek (:taras) from comment #8)
> We could ship an uncompressed omni.ja and it would actually be a speedup if
> we could also recompress it according to the user profile to speed things
> up(ie reorder), decrease installation size(by shipping it uncompressed),
> compress using filesystem compression on mac, etc. 

There are plenty of cases where omni.ja is not writable by firefox. The only reliable way to do these optimizations is to do them in the profile.

That being said, our best bet is to split zips open, inflate all its parts, and bsdiff them. The update process could then split zips open, inflate what it has, apply bsdiffs, deflate and recreate a zip, possibly in a different order.
(In reply to Taras Glek (:taras) from comment #8)
> We could ship an uncompressed omni.ja and it would actually be a speedup if
> we could also recompress it according to the user profile to speed things
> up(ie reorder), decrease installation size(by shipping it uncompressed),
> compress using filesystem compression on mac, etc. 
> Simply leaving omni.ja uncompressed adds megabytes of IO to startup which
> can translate into seconds.
Are the files in omni.ja fastloaded so this would only apply when the fastload file needs to be regenerated?
(In reply to Mike Hommey [:glandium] from comment #9)
> That being said, our best bet is to split zips open, inflate all its parts,
> and bsdiff them. The update process could then split zips open, inflate what
> it has, apply bsdiffs, deflate and recreate a zip, possibly in a different
> order.

Reordering scares me a bit; it makes it harder to verify that an omnijar is legit, and (as a general principle) I'd want users to be running the same bits that we test against. But we need not solve this here. :)

Now that we're able to pre-stage updates, it would be reasonable to extract omnijar to a temp dir (or a big flat file?), apply bdiffs to that, and then recompress the patched omnijar. Would need to fiddle with the timestamps, but otherwise it should be wholly deterministic, with an identical checksum / signature (compared to a complete-update).
I can't help but wonder if there isn't a better way since we could just fastload omni.ja and if it already is then what we are talking about is a one time startup hit after an update being solved by a large amount of additional complexity on the updater side that could likely be solved in other ways.
(In reply to Robert Strong [:rstrong] (do not email) from comment #12)
> I can't help but wonder if there isn't a better way since we could just
> fastload omni.ja and if it already is then what we are talking about is a
> one time startup hit after an update being solved by a large amount of
> additional complexity on the updater side that could likely be solved in
> other ways.

By fastload, you mean keeping a copy of omni.jar in the profile directory? That's an option, assuming we can find the profile directory without loading anything from omnijar.
By fastload I am referring to whatever has taken over for XUL.mfl.
(In reply to Robert Strong [:rstrong] (do not email) from comment #14)
> By fastload I am referring to whatever has taken over for XUL.mfl.

omni.ja is supposed to ship with fastloaded files(though perf benefits of fastload given recent js improvements are not clear)
For the record, what data is there for the perf improvement provided by the compression? It would be handy to know what the perf difference would be with having a single uncompressed fastload file outside of the omni.ja along with the omni.ja uncompressed. I highly suspect that there are other ways to skin this fish without additional one-off complexity to how updating works.
I haven't kept up with how we cache code for faster startup since it has changed so please forgive my ignorance.

There also appears to be resources stored in startupCache.4.little. Does this mean that the main usage of omni.ja and the fastload files in omni.ja is to populate startupCache.4.little? Basically, I'm trying to find out whether omni.ja is heavily read during every startup or only after startupCache.4.little is invalidated? If the latter then then could startupCache.4.little be better used during normal startup so the main purpose of compressing the omni.ja would be for startup when startupCache.4.little is invalidated?
(In reply to Justin Dolske [:Dolske] from comment #11)
> (In reply to Mike Hommey [:glandium] from comment #9)
> > That being said, our best bet is to split zips open, inflate all its parts,
> > and bsdiff them. The update process could then split zips open, inflate what
> > it has, apply bsdiffs, deflate and recreate a zip, possibly in a different
> > order.
> 
> Reordering scares me a bit

The different order part was referring to the fact that old omni.ja and updated omni.ja may not have the same ordering, so that needs to be dealt with in the upgrade process.
(In reply to Robert Strong [:rstrong] (do not email) from comment #17)
> I haven't kept up with how we cache code for faster startup since it has
> changed so please forgive my ignorance.
> 
> There also appears to be resources stored in startupCache.4.little. Does
> this mean that the main usage of omni.ja and the fastload files in omni.ja
> is to populate startupCache.4.little? Basically, I'm trying to find out
> whether omni.ja is heavily read during every startup or only after
> startupCache.4.little is invalidated? If the latter then then could
> startupCache.4.little be better used during normal startup so the main
> purpose of compressing the omni.ja would be for startup when
> startupCache.4.little is invalidated?

startupCache.4.little is only filled with data from omni.ja that doesn't already have a startupcache version in the omni.ja (under jsloader and jssubloader)

As for decompression performance, it is faster to read compressed data and decompress it than to read the same uncompressed data. This is especially true on mobile, where reading is at least an order of magnitude slower than on desktop.

Speaking of mobile, anything that would make the profile bigger than what it already is has to be considered very carefully.
With the info in comment #19 other possible alternatives could be to extract the files on upgrade and recompress them in a file alongside the startupcache on version change, lessen the size of the patch from what it currently by zipping the omni.ja with better compression or possibly use a better compressor algorithm than zip.

Also, Mobile is not an issue for the updater since it doesn't use the updater.
AIUI, the updater may be used on mobile in the future.
Whatever the case, I'm busy as is the rest of my team with a bunch of other work and will be busy for the foreseeable future. Before I'd even consider doing anything with this bug I'd first want bug 504624 fixed. In the mean time it would be a good thing to have a way where we can simply check startup perf with and without the omni.ja compressed as well as consider other ways of implementing caching so we don't have to one-off the updater for specific files.
For b2g, we'll now be using partial updates for gaia applications as well.  These applications are packaged as jars so suffer from a similar problem as omni.ja.

Since the initial target market will be quite bandwidth sensitive, this is important product work.
blocking-basecamp: --- → ?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #23)
> For b2g, we'll now be using partial updates for gaia applications as well. 
> These applications are packaged as jars so suffer from a similar problem as
> omni.ja.
> 
> Since the initial target market will be quite bandwidth sensitive, this is
> important product work.

Do we have the flexibility to change the packaging of B2G system apps instead of changing the way we update them? Given the work Rob's team already has, we may need to look at alternatives to this bug.
(In reply to comment #24)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #23)
> > For b2g, we'll now be using partial updates for gaia applications as well. 
> > These applications are packaged as jars so suffer from a similar problem as
> > omni.ja.
> > 
> > Since the initial target market will be quite bandwidth sensitive, this is
> > important product work.
> 
> Do we have the flexibility to change the packaging of B2G system apps instead
> of changing the way we update them? Given the work Rob's team already has, we
> may need to look at alternatives to this bug.

This bug is really about the fact that doing diffs on any compressed file format will be non-optimal.
This patch decompresses any zip archive before handing off to bsdiff to generate or apply a patch. Zip archives are then recompressed after patching.

I tested this on Linux x86-64 builds on the update from 14.0.1 to 15.0 - ftp://ftp.mozilla.org/pub/firefox/releases/15.0/update/linux-x86_64/en-US/firefox-14.0.1-15.0.partial.mar , which is 10596323 bytes. With this, the incremental update mar is 6605509 bytes.
> I tested this on Linux x86-64 builds on the update from 14.0.1 to 15.0 -
> ftp://ftp.mozilla.org/pub/firefox/releases/15.0/update/linux-x86_64/en-US/
> firefox-14.0.1-15.0.partial.mar , which is 10596323 bytes. With this, the
> incremental update mar is 6605509 bytes.

fwiw, that's almost a 40% reduction - 37.66% to be exact. That's a great optimization.

How about other .ja files, if any? Or increasing the compression?
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #27)
> How about other .ja files, if any?

omni.ja is the only zip archive in the Linux x86-64 build.

> Or increasing the compression?

Compression of what?

Entries in the omnijar are compressed at 9 (max compression) and this patch also uses 9. The result is actually slightly worse after recompressing - omni.ja ends up about 4k bigger. Not sure where the difference comes from. Our jar readahead code may need to be tuned to take this into account.

As for the mar itself, we probably can't increase the compression there by much without changing the file format.
> Compression of what?
> 
> Entries in the omnijar are compressed at 9 (max compression) and this patch
> also uses 9.

You've answered the question that I had, thanks. :)
(In reply to Michael Wu [:mwu] from comment #28)
> (In reply to Gary Kwong [:gkw, :nth10sd] from comment #27)
> > How about other .ja files, if any?
> 
> omni.ja is the only zip archive in the Linux x86-64 build.
> 
> > Or increasing the compression?
> 
> Compression of what?
> 
> Entries in the omnijar are compressed at 9 (max compression) and this patch
> also uses 9. The result is actually slightly worse after recompressing -
> omni.ja ends up about 4k bigger. Not sure where the difference comes from.

Are you recompressing all files? zip only compresses if the result is actually smaller.

> Our jar readahead code may need to be tuned to take this into account.

In any case, it would be better if the resulting omni.ja from the repack was the exact same as the original one.
(In reply to Michael Wu [:mwu] from comment #28)
> (In reply to Gary Kwong [:gkw, :nth10sd] from comment #27)
> > How about other .ja files, if any?
> 
> omni.ja is the only zip archive in the Linux x86-64 build.

There will soon be other omni.ja files. Actually, there are already, in the metro builds.
As part of the dogfooding process, one frustration I face is long download times for updates if I do not have a fast broadband connection over wifi.

I don't know how large they are, it does not show up on the UI, but if this helps by reducing download size by 1/3 or more it may be a huge win, especially if the wifi is unreliable.
My Git commit info currently shows:

2012-10-24 11:07:05
fcfa1857bed6596e992263206451c6814e4b2... (I see ellipsis at the end)
basecamp- as we can ship with this particular inefficiency however we will want to reduce update size going forward.
blocking-basecamp: ? → -
blocking-kilimanjaro: --- → +
Just 2 comments that might be useful to consider ... from a developer who isn't really a Mozilla developer.

1) One way to greatly reduce download size is to have delta-by-file updates.
Make the delta updates relative to a certain release, and only change the reference release from time to time.  Full releases would still be done (but not necessarily every time).  If it is done right, an update release could be used for any full release on or after the reference release.  (but not before, evidently)
Since relatively few files change between releases, this should result in a considerable gain in download time, for a very moderate loss in start time.

2) Before omni.ja* arrived, I would patch messenger.jar (a workaround for a bug nobody wanted to fix), recompressing by deflation (default on my system) to a lot smaller size.  I would also modify the localisation file, with similar size gains.  That gave me a considerable load time gain.
When omni.jar arrived, there was no noticeable load time gain. (It actually seemed a little slower on average.)  And it was much more difficult to make my patches.
Blocks: 1303172
catlee, is this still worth fixing now that omni.ja is no longer compressed?
Flags: needinfo?(catlee)
Not for zlib. Depends on what ends up happening with the proposed brotli support (bug 1352595)
Flags: needinfo?(catlee) → needinfo?(mh+mozilla)

Closing per comment #37 since this isn't worth fixing for our current state with zlib and since there hasn't been any movement on bug 1352595. If things get further along please reopen. Thanks!

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.