1362377 - omni.ja vs ts_paint trade off

Reporter

Description

•

8 years ago

Bug 1340157 landed Feb 17 and caused a 2-3% regression in start up time as seen in 1340873. I'm not sure how much we win on the download size, but given that start up time (at least user perception of it) is an important focus for Quantum, we should be clear about the tradeoff we are making.

Chris AtLee [:catlee]

Assignee

Comment 1

•

8 years ago

So the end result of that bug is that we left omni.ja compressed, which un-intuitively increases the download size. In https://bugzilla.mozilla.org/show_bug.cgi?id=1340157#c6 I summarized the talos / telemetry results. It appeared as though the ts_paint regression wasn't corroborated by telemetry, and the advice at the time was that telemetry numbers are a better indicator of the user's experience. Other impacts of the uncompressed omni.ja files are that there is a small increase to memory usage. I'd be happy to re-revert this patch and get much smaller downloads, since that's what I was initially after!

Chris AtLee [:catlee]

Assignee

Comment 2

•

8 years ago

To summarize some of the information from other bugs: Using uncompressed omni.ja files significantly reduces download size for partial updates and initial install. It also seems to improve startup time as measured by talos' ts_paint, but that is harder to verify with telemetry. It also increases memory usage by ~8MB Right now we are using compressed omni.ja files everywhere. More background: * omni.ja files were originally compressed, and were always compressed on aurora, beta and release channels. * bug 1231379 landed in 2015-12-17 to disable compression on all desktop platforms. This change never rode the trains past nightly. I estimated we would see an improvement of ~15% for Windows installer size, and up to 40% improvement in partial update size. * bug 1233214 indicated some talos regressions / improvements. We saw an improvement in ts_paint, and a regression in memory usage No clear resolution on the impact, and telemetry didn't seem to confirm the talos data * I wanted to be be able to make a decision about whether to let this ride the trains past nightly, or to back out the change from nightly. Unfortunately, telemetry and talos data was no longer available for the original change, so I filed bug 1340157 to track the impact of re-enabling compression for omni.ja * bug 1340157 landed on 2017-02-16 and re-enabled compression for omni.ja on nightly * bug 1340873 indicated the new set of talos regressions/improvements * I summed up the talos and telemetry data here: https://bugzilla.mozilla.org/show_bug.cgi?id=1340157#c6 Based on that data I decided to leave the omni.ja files compressed. At the time it appeared as though the telemetry data contradicted the talos data. Looking back now, that trend isn't clear any more to me.

Benjamin Smedberg

Comment 3

•

8 years ago

Is the Talos number warm-start or cold-start number? If I remember the past corrctly, the tradeoff about omnijar compression has been about cold I/O: compressing makes cold start faster because cold start is highly gates on I/O. It makes warm start slower because CPU becomes more important and we can't do direct memory mapping of omnijar. Telemetry for startup time is *very* difficult to model, because it aggregates both warm and cold and other starts all together, so analyzing it for small signal is hard. I actually trust our talos numbers rather more strongly. The links you provide at https://bugzilla.mozilla.org/show_bug.cgi?id=1340157#c6 show the 95th percentile, which would in general be the much higher cold-start times. I'm not sure I see any improvement at all, but look at the same graphs with 25/50/75/95 percentile charted: session restored: https://mzl.la/2pgV5vX firstpaint: https://mzl.la/2pICQm2

Benjamin Smedberg

Comment 4

•

8 years ago

Would it be helpful to have Dominik repeat the warm and cold startup time metrics documented at https://docs.google.com/a/mozilla.com/document/d/1jyZ-GCQwR2EGYsbb_fTADYnpFJyxrmsRjRpUvf6GBAw/edit?usp=sharing against equivalent builds with compression on and off, to see more precisely how it affects cold versus warm?

Flags: needinfo?(catlee)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

8 years ago

talos is warm start as we create a profile warm up it on a first launch, then start measuring the startup time for 20 iterations. We do have a session restore test that doesn't warmup the profile. In addition, we do not do a cold start of the OS, so that plays into real cold start times.

Gregory Szorc [:gps]

Comment 6

•

8 years ago

See also bug 1352595 where glandium is looking to use brotli for compression inside omni.ja. This should reduce omni.ja size *and* CPU usage for decompression. catlee also suggested a good idea in that bug, which would be to put all omni.ja files accessed during startup in a single compression context that was bulk decompressed on startup. That will result in a smaller omni.ja *and* faster decompression since decompressors get much faster when you feed them chunks larger than a few dozen kb. The idea could be extended to related files. e.g. you could group sync files, devtools files, etc. Of course, this would require a refactoring to omni.ja reading since there would no longer be a single index (the jar file itself).

Chris AtLee [:catlee]

Assignee

Comment 7

•

8 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #4) > Would it be helpful to have Dominik repeat the warm and cold startup time > metrics documented at > https://docs.google.com/a/mozilla.com/document/d/1jyZ- > GCQwR2EGYsbb_fTADYnpFJyxrmsRjRpUvf6GBAw/edit?usp=sharing against equivalent > builds with compression on and off, to see more precisely how it affects > cold versus warm? Yes, I've already asked him to test out https://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-03-03-02-12-mozilla-central/firefox-55.0a1.en-US.win64.zip vs https://www.dropbox.com/s/xwcv4swmwgv33th/firefox-55.0a1.en-US.win64-new.zip?dl=0 They should be identical except for omni.ja compression.

Flags: needinfo?(catlee)

Benjamin Smedberg

Comment 8

•

8 years ago

Setting a flag to track next steps.

Flags: needinfo?(dstrohmeier)

:Dominik Strohmeier [:dstrohmeier]

Comment 9

•

8 years ago

(In reply to Chris AtLee [:catlee] from comment #7) > Yes, I've already asked him to test out > https://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-03-03-02-12- > mozilla-central/firefox-55.0a1.en-US.win64.zip > vs > https://www.dropbox.com/s/xwcv4swmwgv33th/firefox-55.0a1.en-US.win64-new. > zip?dl=0 > > They should be identical except for omni.ja compression. Here's the outcome of comparing these builds: https://metrics.mozilla.com/protected/shiny/dstrohmeier/omnija/ Results show that the build without omni.ja compression is faster for both First Paint (64ms in median) and Hero Element (161ms in median). First Paint is application appears on fullscreen, Hero Element is the "search" placeholder in the search box of the content. In addition, the test runs for the build without compression have a much lower standard deviation resulting in a more streamlined experience for users when doing multiple start ups.

Flags: needinfo?(dstrohmeier)

:Dominik Strohmeier [:dstrohmeier]

Comment 10

•

8 years ago

I re-ran the comparison again on today's Nightly as the previous numbers were off to what we had measured last week for Nightly. There is also drop for ts_paint on May 4, 2017: https://archive.mozilla.org/pub/firefox/nightly/2017/05/2017-05-08-03-02-04-mozilla-central/firefox-55.0a1.en-US.win64.zip vs https://www.dropbox.com/s/xwcv4swmwgv33th/firefox-55.0a1.en-US.win64-new-20170508.zip?dl=0 Based on a comparison of just 5 runs, the results for these two builds are the same. In median, both reach First Paint at 1133ms and Hero Element at 1650ms.

(No longer employed by Mozilla) Aaron Klotz

Comment 11

•

8 years ago

Don't forget about the PGO specific optimizations that are done on omni.ja. We probably should be comparing ts_paint on PGO builds only.

rweiss@mozilla.com

Comment 12

•

8 years ago

Can we see the results of an inference test between these two builds, not just links to boxplots? We should track the significance of the difference rather than eyeballing charts for these types of tests. I expect that the conclusions will be the same, but we need to keep the evidence of testing observable within the context of this bug. Reporting the inference test used, the difference in median, and the p-value of the difference should be sufficient.

Flags: needinfo?(dstrohmeier)

:Dominik Strohmeier [:dstrohmeier]

Comment 13

•

8 years ago

As outlined above, the current testing is done based on 5 runs per build. Find the data here: https://docs.google.com/a/mozilla.com/spreadsheets/d/1LW3Skd2RhTTtypV5_rIv09jAnU5iu7UUHZ-c-3our_U/edit?usp=sharing. I think it's pretty obvious that the numbers are comparable. The ones in the chart are outdated as per comment 10. I understand that this is far off from being scientific, but given the time constraints and running these tests manually, this is what we currently can afford to do. Once we get into the range of running these tests 100 times (or even in an automated way allowing us to collect even larger sets of data), it will definitely make sense for us to follow the recommended procedure of reporting results.

Flags: needinfo?(dstrohmeier)

rweiss@mozilla.com

Comment 14

•

8 years ago

This is a public bug, and your link is behind LDAP and thus not accessible for non-Mozilla employees (and neither is that spreadsheet). Assume it is possible that metrics.mozilla.com could disappear or that we lost that link; we would never be able to replicate your conclusion. That is why we should record the results of the comparison in this bug; at least note the differences in medians, and ideally the significance of that difference *even if the p-value is strange due to small samples*. At a minimum, please record the observable numbers (the boxplot cutoff values) for each group that you report in your charts here.

Flags: needinfo?(dstrohmeier)

Mike Conley (:mconley) (:⚙️)

Comment 15

•

8 years ago

Is there enough information in here to make a call on this? The trade-off is download size vs first paint. I think somebody just needs to choose which one we care about more. jpr, do you want to make a call?

Flags: needinfo?(jpr)

Whiteboard: [qf] → [qf:p1]

Mike Conley (:mconley) (:⚙️)

Comment 16

•

8 years ago

From catlee: just to be clear in https://bugzilla.mozilla.org/show_bug.cgi?id=1362377#c15 - this latest regression is a regression in both ts paint AND file size if we disable compression in omni.ja again, we will decrease file size and improve ts paint times in talos anyway So this kinda sounds like a no-brainer. We should disable compression. catlee, are you willing to do this?

Flags: needinfo?(jpr) → needinfo?(catlee)

Masatoshi Kimura [:emk]

Comment 17

•

8 years ago

The bug summary should be changed because this is not a trade-off.

Mike Hommey [:glandium]

Comment 18

•

8 years ago

Note that omni.ja was uncompressed on nightly only. It has never been uncompressed on other branches. So while there may be a regression for nightly, things have never actually changed for users. Bug 1352595 is meant to make omni.ja both smaller and faster to read.

JP Rosevear [:jpr]

Reporter

Comment 19

•

8 years ago

Originally it was thought to be a tradeoff because of telemetry data, but that data ended up being neutral according to :catlee. I suspect we should just do it unless we have any concerns about stability, and let it ride the 55 train.

Chris AtLee [:catlee]

Assignee

Comment 20

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #18) > Note that omni.ja was uncompressed on nightly only. It has never been > uncompressed on other branches. So while there may be a regression for > nightly, things have never actually changed for users. > > Bug 1352595 is meant to make omni.ja both smaller and faster to read. Although it makes the omni.ja smaller, it doesn't do as much to improve download sizes. e.g. for two nightly builds, partial update sizes are: 10,114,084 original partial 10,018,540 partial between mars with omni.tar files with brotli compressed contents 8,497,367 partial between mars with uncompressed omni.ja files complete updates are a bit better: 54,756,449 original 52,314,206 brotli 51,196,058 uncompressed omni.ja and finally full installers: 45,395,514 original 43,202,792 brotli 40,142,349 uncompressed omni.ja Do we know the performance implications of the broli compressed omni.ja files yet?

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Assignee

Comment 21

•

8 years ago

Perhaps we should land the decompressed omni.ja patch now, and then evaluate brotli when it's ready?

Mike Conley (:mconley) (:⚙️)

Comment 22

•

8 years ago

Sounds like a fine plan to me.

Chris AtLee [:catlee]

Assignee

Comment 23

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c37f65b6245d

(no longer active)

Comment 24

•

8 years ago

(In reply to Chris AtLee [:catlee] from comment #21) > Perhaps we should land the decompressed omni.ja patch now, and then evaluate > brotli when it's ready? Yes. FWIW switching to brotli or other compression schemes is totally beyond the scope of this bug.

Comment hidden (mozreview-request)

Chris AtLee [:catlee]

Assignee

Updated

•

8 years ago

Attachment #8868558 - Flags: review?(mconley)

Mike Conley (:mconley) (:⚙️)

Comment 26

•

8 years ago

Comment on attachment 8868558 [details] Bug 1362377: Disable omni.ja compression Let's do it!

Attachment #8868558 - Flags: review?(mconley) → review+

Mike Conley (:mconley) (:⚙️)

Comment 27

•

8 years ago

mozreview-review

Comment on attachment 8868558 [details] Bug 1362377: Disable omni.ja compression https://reviewboard.mozilla.org/r/140168/#review143650 Whoops - should probably have done this through MozReview.

Pulsebot

Comment 28

•

8 years ago

Pushed by catlee@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/05c7e4f31712 Disable omni.ja compression r=mconley

Iris Hsiao [:ihsiao]

Comment 29

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/05c7e4f31712

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 55

Eric Rahm [:erahm]

Comment 30

•

8 years ago

Please flag bugs where you plan to increase memory with the MemShrink whiteboard tag. Chris what kind of measurements did you do? I see a reference to ~8MB in comment 2, was that just for the parent process or is it across the board? Generally speaking an 8MB regression even in one process is enough for me to encourage a backout.

Flags: needinfo?(catlee)

Whiteboard: [qf:p1] → [qf:p1][MemShrink]

Chris AtLee [:catlee]

Assignee

Comment 31

•

8 years ago

The 8MB reference was from when this originally landed in 2015-12-17. The only talos alert about RSS right now is on Linux64, where we increased about 3.5MB, although it looks like it's come back down since then: https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=%5Bmozilla-inbound,d22e336c3614c2b16d5e9b238ba67c087a6eeb85,1%5D&series=%5Bautoland,d22e336c3614c2b16d5e9b238ba67c087a6eeb85,0%5D&selected=%5Bmozilla-inbound,d22e336c3614c2b16d5e9b238ba67c087a6eeb85%5D Windows looks like it's increased about 2MB: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-central,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D&series=%5Bmozilla-inbound,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D&series=%5Bautoland,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D Telemetry for MEMORY_TOTAL looks unaffected right now: https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F55&measure=MEMORY_TOTAL&min_channel_version=nightly%252F55&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

Flags: needinfo?(catlee)

Joel Maher ( :jmaher ) (UTC -8)

Comment 32

•

8 years ago

a lot of installer size improvements: == Change summary for alert #6741 (as of May 19 2017 15:04 UTC) == Improvements: 8% installer size summary windows2012-32 opt 53,621,828.92 -> 49,236,118.42 8% installer size summary windowsxp opt 53,684,430.42 -> 49,296,799.25 8% installer size summary windows2012-32 pgo 55,071,287.67 -> 50,652,881.67 7% installer size summary windows2012-64 pgo 59,212,856.92 -> 54,777,425.25 7% installer size summary windows2012-64 opt 58,863,314.75 -> 54,476,106.17 7% installer size summary windows8-64 opt 58,921,135.67 -> 54,534,357.75 6% installer size summary linux64 opt static-analysis57,762,442.42 -> 54,464,938.92 6% installer size summary linux64 opt artifact 58,120,355.83 -> 54,816,022.42 6% installer size summary linux64 opt 58,134,365.25 -> 54,836,783.83 6% installer size summary linux64-stylo opt 59,805,760.83 -> 56,506,713.17 6% installer size summary linux32 opt 58,818,503.00 -> 55,579,628.08 5% installer size summary linux64 pgo 63,603,808.25 -> 60,307,555.75 5% installer size summary linux32 pgo 63,809,020.67 -> 60,619,178.58 5% installer size summary linux64 debug static-analysis70,141,543.83 -> 66,873,595.83 2% build times summary windows2012-32 pgo taskcluster-c4.4xlarge4,156.88 -> 4,067.01 2% installer size summary linux64 asan 214,984,531.00 -> 211,676,488.08 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6741

:Dominik Strohmeier [:dstrohmeier]

Comment 33

•

8 years ago

(In reply to :Dominik Strohmeier [:dstrohmeier] from comment #13) > As outlined above, the current testing is done based on 5 runs per build. > Find the data here: > https://docs.google.com/a/mozilla.com/spreadsheets/d/ > 1LW3Skd2RhTTtypV5_rIv09jAnU5iu7UUHZ-c-3our_U/edit?usp=sharing. I think it's > pretty obvious that the numbers are comparable. The ones in the chart are > outdated as per comment 10. To complete, here's the data for the five runs that are also captured in the linked spreadsheet: With omni.ja compression (first paint / Hero Element in frames at 60fps): (68 / 99), (68 / 99), (68 / 99), (68 / 100), (68 / 100) Without omni.ja compression: (68 / 99), (69 / 99), (68 / 99), (68 / 100), (67 / 99)

Flags: needinfo?(dstrohmeier)

Florian Quèze [:florian]

Comment 34

•

8 years ago

As indicated in comment 3, compression was meant to improve cold startup. Dominik, are any of the measurements you provided here for cold startup?

Flags: needinfo?(dstrohmeier)

:Dominik Strohmeier [:dstrohmeier]

Comment 35

•

8 years ago

No, I haven't done any other measurements on cold startup as this wasn't defined to be in focus for 57. However, we do have a measurement setup available at SF AllHands. Shall we run one round of measurements together? I'll only have some time to do it this week as then I'll be out for 2 months and then join a new team.

Flags: needinfo?(dstrohmeier)

u601173

Comment 36

•

7 years ago

Bug 1352595 - Switch to !zlib for omni.ja compression Again regressed the improvements made by this Bug in terms of size.

Chris AtLee [:catlee]

Assignee

Updated

•

6 years ago

Blocks: 1303172

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Assignee: nobody → catlee

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Build Config → General

Product: Firefox → Firefox Build System

Dylan Hardison [:dylan] (he/him)

Updated

•

6 years ago

Target Milestone: Firefox 55 → mozilla55

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:p1][MemShrink] → [MemShrink]