Investigate compressing omni.ja's (with lz4)

NEW
Unassigned

Status

()

task
P3
normal
2 months ago
14 days ago

People

(Reporter: dthayer, Unassigned)

Tracking

(Blocks 1 bug, {main-thread-io, perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [fxperf:p2])

Attachments

(1 attachment)

Reporter

Description

2 months ago
Posted image lz4.png

In bug 1362377 we decided to stop compressing omni.ja with deflate, because it increased the compressed installer size and performed worse with ts_paint. However, ts_paint is measured on warm startups, so it disproportionately weighs CPU time over IO time. Since lz4 is an order of magnitude faster on decode, I'm proposing we compress our omni.ja's with it, so that we can have our cake (not pay significant CPU costs on decode) and eat it too (smaller omni.ja size -> less IO time).

I'm attaching a file with measurements of cold startups on 2017 reference hardware. The lighter values are the control build, and the darker values are with omni.ja's compressed with lz4. Green is delayedStartupFinished, red is firstPaint, and blue is blankWindowShown. The improvement is around 10%.

An added benefit here is that if we prove this out we can look to replacing other cases that use deflate with lz4, like the StartupCache, which takes up ~25ms of main thread CPU on startup on my (relatively beefy) development machine.

Reporter

Updated

2 months ago
Type: defect → enhancement

Installer size was not the main benefit. Another was the update sizes. Yet another is memory use. All these need to be considered.

See also bug 1352595 where using brotli was attempted (and the code still lies around in both the packager and nsZipArchive).

zstd is something else that is both fast and provides good compression and that might be worth comparing too.

Reporter

Comment 2

2 months ago

(In reply to Mike Hommey [:glandium] from comment #1)

Installer size was not the main benefit. Another was the update sizes. Yet another is memory use. All these need to be considered.

See also bug 1352595 where using brotli was attempted (and the code still lies around in both the packager and nsZipArchive).

zstd is something else that is both fast and provides good compression and that might be worth comparing too.

Yeah, I used the Brotli code as a reference for the proof of concept for this - the reason I was interested in lz4 over both Brotli and zstd is that it has an order of magnitude better decode speed than both of them, and lz4hc maintains this speed improvement while also coming close to zlib and zstd compression ratios, and with dictionary compression should yield similar improvements as mentioned in bug 1352595 (though a separate dictionary with lz4 does reduce decompression speed, it's still substantially faster than other algs.)

Type: enhancement → task
Priority: -- → P3

(In reply to Mike Hommey [:glandium] from comment #1)

Installer size was not the main benefit. Another was the update sizes. Yet another is memory use. All these need to be considered.

Hm? From what I can tell, memory use regressed (ie increased) when disabling compression, based on bug 1362377 comment 30 / comment 31 - am I misreading the bug?

It's true that partial update sizes improved, though it'd be useful to see numbers for a release update (as opposed to nightly) given that that's where most of our users are.

In any case, fxperf:p2 given the potential for improvement here.

Blocks: 1543096
Whiteboard: [fxperf] → [fxperf:p2]

Comment 4

14 days ago

Consider using zstd. It is from the author of the lz4, and it has some nice features.
It can be almost as fast as lz4, but it also supports --rsyncable mode, so probably it will help with update sizes.
https://github.com/facebook/zstd/releases - you can read here about new features of the zstd.

You need to log in before you can comment on or make changes to this bug.