Closed Bug 1550108 Opened 6 years ago Closed 5 years ago

Reduce time spent decompressing StartupCache entries

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla71

Tracking Flags:

Tracking

Status

firefox68

wontfix

firefox71

---

fixed

People

(Reporter: alexical, Assigned: alexical)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

(Keywords: main-thread-io, perf, Whiteboard: [fxperf:p1])

Attachments

(10 files, 2 obsolete files)

Bug 1550108 - Pull in secondary lz4 libraries r?glandium 6 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Switch StartupCache's compression to lz4 r?glandium 6 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Update usages of nsIZipWriter to include method r?glandium 6 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Avoid decompressing entries just to check if they exist r?kmag 6 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Don't read from app/gre caches in StartupCache r?kmag 6 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Split out Input/OutputBuffer into their own file r?kmag 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Change StartupCache format from zip to custom r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Eliminate large buffer copies from StartupCache r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Compact the StartupCache if it is bloated r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Prefetch StartupCache off main thread r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Reduce stack size on StartupCache threads r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1550108 - Add mapErr method to Result r?froydnj 5 years ago Alex Thayer [:alexical] (she/her) 47 bytes, text/x-phabricator-request		Details \| Review

Alex Thayer [:alexical] (she/her)

Assignee

Description

•

6 years ago

On reference hardware I see anywhere from 600ms to 3000ms spent in inflate under StartupCache::GetBuffer. I think we can do better with lz4. Currently we compress at compression level 1 with deflate, I imagine because compressing with deflate at max compression is too expensive. With lz4 I'm able to compress at its level 6 while taking less time, meaning the overall filesize is about the same, while taking an order of magnitude less time.

Dave Townsend [:mossop]

Updated

•

6 years ago

status-firefox68: --- → affected

tracking-firefox68: --- → -

Priority: -- → P3

Alex Thayer [:alexical] (she/her)

Assignee

Comment 1

•

6 years ago

[fxperf:p1] because I'm currently working on it.

Assignee: nobody → dothayer

Status: NEW → ASSIGNED

Keywords: main-thread-io, perf

Whiteboard: [fxperf:p1]

Alex Thayer [:alexical] (she/her)

Assignee

Comment 2

•

6 years ago

Talos results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=d1dca85363be0e1e1eb6d48c4deaa73b0712887d&selectedTimeRange=172800

Alex Thayer [:alexical] (she/her)

Assignee

Comment 3

•

6 years ago

Attached file Bug 1550108 - Pull in secondary lz4 libraries r?glandium — Details

I opted to go with what I perceived as the more expedient route
of leaving lz4 roughly where it is and just adding to that. The
biggest complication was xxhash, which is included elsewhere.
I'm not generally proficient with build-related things though so
my solution may be wrong and not just ugly.

Alex Thayer [:alexical] (she/her)

Assignee

Comment 4

•

6 years ago

Attached file Bug 1550108 - Switch StartupCache's compression to lz4 r?glandium (obsolete) — Details

Edits to all of the other locations that use nsIZipWriter to
follow. This only switches StartupCache. Regarding the substance
of the change, though, I opted for lz4 because it's the fastest
by a long shot on decompress, so we'll pay the smallest cost for
warm startups, and at compression level 6 it's able to match the
compression ratio of deflate at level 1, while still being
faster to compress. We could potentially opt for a higher
compression ratio alg, but I think the shutdown cost of doing so
may not be worth paying, considering we already opted to compress
with deflate at 1.

Depends on D30640

Alex Thayer [:alexical] (she/her)

Assignee

Comment 5

•

6 years ago

Attached file Bug 1550108 - Update usages of nsIZipWriter to include method r?glandium (obsolete) — Details

I split this out to hopefully make it easier to digest the more
substantive changes in the previous commit. The changes here
should just leave the existing behavior unchanged.

Depends on D30641

Alex Thayer [:alexical] (she/her)

Assignee

Comment 6

•

6 years ago

Attached file Bug 1550108 - Avoid decompressing entries just to check if they exist r?kmag — Details

This will not behave exactly the same if we had previously written bad
data for the entry that would fail to decompress. I imagine this is rare
enough, and the consequences are not severe enough, that this should be
fine.

Depends on D30642

Alex Thayer [:alexical] (she/her)

Assignee

Comment 7

•

6 years ago

Attached file Bug 1550108 - Don't read from app/gre caches in StartupCache r?kmag — Details

I thought I had already written out the patch to remove these, but
apparently not. Per discussion in the startup cache telemetry bug,
there should be no reason for doing this.

Depends on D30640

Phabricator Automation

Updated

•

6 years ago

Attachment #9065458 - Attachment description: Bug 1550108 - Don't both reading omnijars from StartupCache r?kmag → Bug 1550108 - Don't read from app/gre caches in StartupCache r?kmag

Alex Thayer [:alexical] (she/her)

Assignee

Comment 8

•

5 years ago

Attached file Bug 1550108 - Split out Input/OutputBuffer into their own file r?kmag — Details

This just splits out the InputBuffer and OutputBuffer helper classes
to make it cleaner for the StartupCache to include them.

Depends on D30643

Alex Thayer [:alexical] (she/her)

Assignee

Comment 9

•

5 years ago

Attached file Bug 1550108 - Change StartupCache format from zip to custom r?froydnj — Details

I am not aware of anything that depends on StartupCache being a
zip file, and since I want to use lz4 compression because inflate
is showing up quite a lot in profiles, it's simplest to just use
a custom format. This loosely mimicks the ScriptPreloader code,
with a few diversions:

Obviously the contents of the cache are compressed. I used lz4
for this as I hit the same file size as deflate at a compression
level of 1, which is what the StartupCache was using previously,
while decompressing an order of magnitude faster. Seemed like
the most conservative change to make. I think it's worth
investigating what the impact of slower algs with higher ratios
would be, but for right now I settled on this. We'd probably
want to look at zstd next.
I use streaming compression for this via lz4frame. This is not
strictly necessary, but has the benefit of not requiring as
much memory for large buffers, as well as giving us a built-in
checksum, rather than relying on the much slower CRC that we
were doing with the zip-based approach.
I coded the serialization of the headers inline, since I had to
jump back to add the offset and compressed size, which would
make the nice Code(...) method for the ScriptPreloader stuff
rather more complex. Open to cleaner solutions, but moving it
out just felt like extra hoops for the reader to jump through
to understand without the benefit of being more concise.

Depends on D34651

Alex Thayer [:alexical] (she/her)

Assignee

Comment 10

•

5 years ago

Attached file Bug 1550108 - Eliminate large buffer copies from StartupCache r?froydnj — Details

We should be able to just let consumers borrow the buffer, as long
as we can guarantee that we'll keep it around. We shouldn't be
invalidating the cache with any significant frequency, so we
should be able to just hold onto old items indefinitely.

Depends on D34652

Alex Thayer [:alexical] (she/her)

Assignee

Comment 11

•

5 years ago

Attached file Bug 1550108 - Compact the StartupCache if it is bloated r?froydnj — Details

The first run loads more things into the StartupCache than are
used on the second and subsequent runs. This just ensures that if
the StartupCache diverges too far from its actual use that we will
rebuild it.

Depends on D34653

Phabricator Automation

Updated

•

5 years ago

Attachment #9064033 - Attachment is obsolete: true

Phabricator Automation

Updated

•

5 years ago

Attachment #9064034 - Attachment is obsolete: true

Alex Thayer [:alexical] (she/her)

Assignee

Comment 12

•

5 years ago

I updated the patch to just change the StartupCache's underlying format entirely, rather than extending nsIZipWriter to support lz4, as it's nonstandard. This means that D30640 is no longer strictly necessary, but I would like to keep it for reasons mentioned in the description of D34652.

This also facilitated eliminating some copies and making it easier to compact the cache if it's larger than it needs to be. Removing unused items from the cache means that every page that straddles the boundary between two cache items is fully utilized.

Talos seems rather happy with the changes overall. In my original approach, I compressed the cache as one big chunk and decompressed the whole thing on initialize. This had more dramatic results, but strangely it improved WebRender-enabled ts_paint while degrading WR-disabled ts_paint. I couldn't get this effect to go away, so I went with the per-item compression.

Alex Thayer [:alexical] (she/her)

Assignee

Comment 13

•

5 years ago

Attached file Bug 1550108 - Prefetch StartupCache off main thread r?froydnj — Details

Does what it says on the tin. Once we have a central scheduling
system this should likely just consume that.

Depends on D34654

Alex Thayer [:alexical] (she/her)

Assignee

Comment 14

•

5 years ago

Kris, could you give an update on the status of the reviews of this? Does it look like it will be generally acceptable or do you foresee major changes required?

Flags: needinfo?(kmaglione+bmo)

Mike Conley (:mconley) (:⚙️)

Comment 15

•

5 years ago

Hey kmag, any chance you'll have a chance to look at this soon? Or would it be best to find a different reviewer?

Mike Conley (:mconley) (:⚙️)

Comment 16

•

5 years ago

Hey froydnj, would you be a good candidate to review these last few patches?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 17

•

5 years ago

(In reply to Mike Conley (:mconley) (:⚙️) from comment #16)

Hey froydnj, would you be a good candidate to review these last few patches?

Sure, I can look at these. Doug, throw these into my phabricator queue, please?

Flags: needinfo?(nfroyd)

Flags: needinfo?(kmaglione+bmo)

Flags: needinfo?(dothayer)

Nathan Froyd [:froydnj]

Comment 18

•

5 years ago

Reviewed (most of) the patches.

Flags: needinfo?(dothayer)

Phabricator Automation

Updated

•

5 years ago

Attachment #9071478 - Attachment description: Bug 1550108 - Change StartupCache format from zip to custom r?kmag → Bug 1550108 - Change StartupCache format from zip to custom r?froydnj

Phabricator Automation

Updated

•

5 years ago

Attachment #9071479 - Attachment description: Bug 1550108 - Eliminate large buffer copies from StartupCache r?kmag → Bug 1550108 - Eliminate large buffer copies from StartupCache r?froydnj

Phabricator Automation

Updated

•

5 years ago

Attachment #9071480 - Attachment description: Bug 1550108 - Compact the StartupCache if it is bloated r?kmag → Bug 1550108 - Compact the StartupCache if it is bloated r?froydnj

Phabricator Automation

Updated

•

5 years ago

Attachment #9073067 - Attachment description: Bug 1550108 - Prefetch StartupCache off main thread r?kmag → Bug 1550108 - Prefetch StartupCache off main thread r?froydnj

Alex Thayer [:alexical] (she/her)

Assignee

Comment 19

•

5 years ago

Attached file Bug 1550108 - Reduce stack size on StartupCache threads r?froydnj — Details

Depends on D35454

Alex Thayer [:alexical] (she/her)

Assignee

Comment 20

•

5 years ago

Quick update on where this is: I ran it through Talos to check that the numbers were still good, and somehow a ts_paint regression has crept in, so I'm trying to diagnose where this was introduced, as ts_paint was something like 5% green in the earlier versions of the patches.

Alex Thayer [:alexical] (she/her)

Assignee

Comment 21

•

5 years ago

I am almost certain that the above result is spurious at this point. ts_paint performs better locally, startup profiles better locally (less time spend in the StartupCache), and ts_paint on try profiles better all with these patches applied. I think stuff is just being shuffled around by this performing faster such that it happens to be a little worse on the hardware we're running this on.

Waiting for profiles on try from shippable builds, but assuming those come in looking the same my plan is just to button this up and take the nominal loss.

Alex Thayer [:alexical] (she/her)

Assignee

Comment 22

•

5 years ago

Attached file Bug 1550108 - Add mapErr method to Result r?froydnj — Details

Adds an explicit error mapping function to Result, to simplify
using MOZ_TRY... macros.

Depends on D34651

Pulsebot

Comment 23

•

5 years ago

Cosmin Sabou [:CosminS]

Comment 24

•

5 years ago

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=classified&revision=cbadfa2bbd7edefab2596c2fa7661fe04cc65ad6

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=268934700&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=268934680&repo=autoland

Backout link: https://hg.mozilla.org/integration/autoland/rev/f43fd233453d0ffdf1a866e0d4061a2c5a9e7f46

Flags: needinfo?(dothayer)

Pulsebot

Comment 25

•

5 years ago

Bogdan Tara[:bogdan_tara | bogdant]

Comment 26

•

5 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox71: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla71

Marian Raiciof [:marauder]

Updated

•

5 years ago

Regressions: 1586744

Alexandre Poirot [:ochameau]

Comment 27

•

5 years ago

Thanks a lot, it looks like it improved DevTools startup quite significantly!

== Change summary for alert #23353 (as of Sat, 05 Oct 2019 21:21:33 GMT) ==

Improvements:

43% damp console.openwithcache.open.DAMP linux64-shippable opt e10s stylo 506.01 -> 287.60
7% damp simple.webconsole.open.DAMP linux64-shippable opt e10s stylo 295.81 -> 275.02
7% damp complicated.webconsole.open.DAMP windows10-64-shippable-qr opt e10s stylo 322.21 -> 300.79
6% damp cold.webconsole.open.DAMP windows10-64-shippable opt e10s stylo 306.30 -> 287.18
6% damp cold.webconsole.open.DAMP windows10-64-shippable-qr opt e10s stylo 305.04 -> 286.02
...

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23353

:Harald Kirschner :digitarald

Updated

•

5 years ago

Blocks: devtools-performance

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox68: affected → wontfix

Daniel Holbert [:dholbert]

Updated

•

5 years ago

Depends on: 1587567

Daniel Holbert [:dholbert]

Updated

•

5 years ago

No longer depends on: 1587567

Regressions: 1587567

Alex Thayer [:alexical] (she/her)

Assignee

Updated

•

5 years ago

Flags: needinfo?(dothayer)

Regressions: 1587112, 1587107
No longer regressions: 1587567

Alex Thayer [:alexical] (she/her)

Assignee

Comment 29

•

5 years ago

Update on this for any interested parties: it looks like we improved median time to first paint by something like 5-10% with this change. Here is the raw data - med_val is the median time to first paint, and the change landed on the 20191005 build:

----- before change -----
build: 20190917, median_first_paint: 2507
build: 20190918, median_first_paint: 2385
build: 20190919, median_first_paint: 2254
build: 20190920, median_first_paint: 2381
build: 20190921, median_first_paint: 2416
build: 20190922, median_first_paint: 2361
build: 20190923, median_first_paint: 2395
build: 20190924, median_first_paint: 2407
build: 20190925, median_first_paint: 2492
build: 20190926, median_first_paint: 2507
build: 20190927, median_first_paint: 2383
build: 20190928, median_first_paint: 1991
build: 20190929, median_first_paint: 2431
build: 20190930, median_first_paint: 2364
build: 20191001, median_first_paint: 2349
build: 20191002, median_first_paint: 2359
build: 20191003, median_first_paint: 2352
build: 20191004, median_first_paint: 2401
mean value: 2374

----- after change -----
build: 20191005, median_first_paint: 2246
build: 20191006, median_first_paint: 2223
build: 20191007, median_first_paint: 2182
mean value: 2217

6.61% improvement

The builds after 20191007 show even lower (better) numbers, but it's too early to tell if they are valid.

Jonathan Kew [:jfkthame]

Comment 30

•

5 years ago

Just FTR, as it may be too late to be worth doing anything about it: I think this bug should have updated the CLOBBER file -- at least, I've had broken builds on a couple of machines recently, complaining about "No rule to make target '..../mfbt/lz4.c', needed by 'lz4.obj'", which I believe is a result of the file moves here (lz4.{c,h} moved into a subdirectory). Clobbering the objdir (where that out-of-date dependency is recorded) fixes this.

Jonathan Watt [:jwatt]

Updated

•

5 years ago

Blocks: clobber

(No longer employed by Mozilla) Aaron Klotz

Updated

•

5 years ago

Regressions: 1591172

(No longer employed by Mozilla) Aaron Klotz

Comment 31

•

5 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #30)

Just FTR, as it may be too late to be worth doing anything about it: I think this bug should have updated the CLOBBER file -- at least, I've had broken builds on a couple of machines recently, complaining about "No rule to make target '..../mfbt/lz4.c', needed by 'lz4.obj'", which I believe is a result of the file moves here (lz4.{c,h} moved into a subdirectory). Clobbering the objdir (where that out-of-date dependency is recorded) fixes this.

I filed bug 1591172 about this, as the issue reappeared for me even after clobbering.

ISHIKAWA, Chiaki (I will be tied up until Dec 14 with day job events.)

Comment 32

•

5 years ago

Have you by any chance relaxed the test to print out the warning result.isErr() in the following code a bit?

https://dxr.mozilla.org/mozilla-central/source/startupcache/StartupCache.cpp?q=StartupCache.cpp&redirect_type=direct#173

I am trying to figure out why I see many errors in locally executed |mach xpcshell-tests| of C-C TB (FULL DEBUG version) in the last month (it ran OK in August.).
https://bugzilla.mozilla.org/show_bug.cgi?id=1589779

While I am analyzing the many cases of indexedDB failures now observed in the local log, I noticed many warnings of the following form:
77:57.81 pid:29914 [29914, Main Thread] WARNING: 'result.isErr()', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/startupcache/StartupCache.cpp, line 173

I am not sure if the warning is related to the indexedDB error I observe or not.
But it could be that maybe xpcshell-tests harness may not populate the startup cache correctly or something.

I just wanted to be sure that the warning could be safely ignored and unrelated to the indexedDB failures I observe.
At the same time, it may be nice if we can reduce the warning to a single line instead of producing it many times (maybe very difficult to do.)

Anyway, just FYI.

TIA

[:philipp]

Updated

•

5 years ago

Regressions: 1594051

[:philipp]

Updated

•

5 years ago

Regressions: 1594056

[:philipp]

Updated

•

5 years ago

Regressions: 1595596

Andreas Tolfsen ❲:ato❳

Comment 33

•

5 years ago

I’ve also noticed an increase in these warnings since this patch
landed. It can be reproduced outside of xpcshell just by using the
-h flag:

% ./mach run -- -h
 0:00.47 /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/dist/bin/firefox -h -no-remote -profile /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/tmp/profile-default
Usage: /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/dist/bin/firefox [ options ... ] [URL]
       where options include:

X11 options
  --display=DISPLAY  X display to use
  --sync             Make X calls synchronous
  --g-fatal-warnings Make all warnings fatal

Firefox options
  -h or --help       Print this message.
  -v or --version    Print Firefox version.
  -P <profile>       Start with <profile>.
  --profile <path>   Start with profile at <path>.
  --migration        Start with migration wizard.
  --ProfileManager   Start with ProfileManager.
  --no-remote        Do not accept or send remote commands; implies
                     --new-instance.
  --new-instance     Open new instance, not a new window in running instance.
  --UILocale <locale> Start with <locale> resources as UI Locale.
  --safe-mode        Disables extensions and themes for this session.
  --allow-downgrade  Allows downgrading a profile.
  --MOZ_LOG=<modules> Treated as MOZ_LOG=<modules> environment variable,
                     overrides it.
  --MOZ_LOG_FILE=<file> Treated as MOZ_LOG_FILE=<file> environment variable,
                     overrides it. If MOZ_LOG_FILE is not specified as an
                     argument or as an environment variable, logging will be
                     written to stdout.
  --headless         Run without a GUI.
  --save-recordings  Save recordings for all content processes to a directory.
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173

Dave Townsend [:mossop]

Updated

•

5 years ago

Regressions: 1617092

Cristian Brindusan [:cbrindusan]

Updated

•

5 years ago

Regressions: 1623107

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

No longer regressions: 1623107

You need to log in before you can comment on or make changes to this bug.