omni.ja vs ts_paint trade off

RESOLVED FIXED in Firefox 55



Build Config
19 days ago
12 hours ago


(Reporter: jpr, Unassigned, NeedInfo)


Firefox 55

Firefox Tracking Flags

(firefox55 fixed)


(Whiteboard: [qf:p1][MemShrink])

MozReview Requests


Submitter Diff Changes Open Issues Last Updated
Error loading review requests:


(1 attachment)



19 days ago
Bug 1340157 landed Feb 17 and caused a 2-3% regression in start up time as seen in 1340873.

I'm not sure how much we win on the download size, but given that start up time (at least user perception of it) is an important focus for Quantum, we should be clear about the tradeoff we are making.
So the end result of that bug is that we left omni.ja compressed, which un-intuitively increases the download size.

In I summarized the talos / telemetry results. It appeared as though the ts_paint regression wasn't corroborated by telemetry, and the advice at the time was that telemetry numbers are a better indicator of the user's experience.

Other impacts of the uncompressed omni.ja files are that there is a small increase to memory usage.

I'd be happy to re-revert this patch and get much smaller downloads, since that's what I was initially after!
To summarize some of the information from other bugs:

Using uncompressed omni.ja files significantly reduces download size for partial updates and initial install.
It also seems to improve startup time as measured by talos' ts_paint, but that is harder to verify with telemetry.
It also increases memory usage by ~8MB

Right now we are using compressed omni.ja files everywhere.

More background:

* omni.ja files were originally compressed, and were always compressed on aurora, beta and release channels.

* bug 1231379 landed in 2015-12-17 to disable compression on all desktop platforms.
This change never rode the trains past nightly. I estimated we would see an improvement of ~15% for Windows installer size, and up to 40% improvement in partial update size.

* bug 1233214 indicated some talos regressions / improvements.
We saw an improvement in ts_paint, and a regression in memory usage
No clear resolution on the impact, and telemetry didn't seem to confirm the talos data

* I wanted to be be able to make a decision about whether to let this ride the trains past nightly, or to back out the change from nightly. Unfortunately, telemetry and talos data was no longer available for the original change, so I filed bug 1340157 to track the impact of re-enabling compression for omni.ja

* bug 1340157 landed on 2017-02-16 and re-enabled compression for omni.ja on nightly

* bug 1340873 indicated the new set of talos regressions/improvements

* I summed up the talos and telemetry data here:
Based on that data I decided to leave the omni.ja files compressed.
At the time it appeared as though the telemetry data contradicted the talos data. Looking back now, that trend isn't clear any more to me.
Is the Talos number warm-start or cold-start number? If I remember the past corrctly, the tradeoff about omnijar compression has been about cold I/O: compressing makes cold start faster because cold start is highly gates on I/O. It makes warm start slower because CPU becomes more important and we can't do direct memory mapping of omnijar.

Telemetry for startup time is *very* difficult to model, because it aggregates both warm and cold and other starts all together, so analyzing it for small signal is hard. I actually trust our talos numbers rather more strongly. The links you provide at show the 95th percentile, which would in general be the much higher cold-start times. I'm not sure I see any improvement at all, but look at the same graphs with 25/50/75/95 percentile charted:

session restored:
Would it be helpful to have Dominik repeat the warm and cold startup time metrics documented at against equivalent builds with compression on and off, to see more precisely how it affects cold versus warm?
Flags: needinfo?(catlee)
talos is warm start as we create a profile warm up it on a first launch, then start measuring the startup time for 20 iterations.  We do have a session restore test that doesn't warmup the profile.

In addition, we do not do a cold start of the OS, so that plays into real cold start times.

Comment 6

19 days ago
See also bug 1352595 where glandium is looking to use brotli for compression inside omni.ja. This should reduce omni.ja size *and* CPU usage for decompression.

catlee also suggested a good idea in that bug, which would be to put all omni.ja files accessed during startup in a single compression context that was bulk decompressed on startup. That will result in a smaller omni.ja *and* faster decompression since decompressors get much faster when you feed them chunks larger than a few dozen kb. The idea could be extended to related files. e.g. you could group sync files, devtools files, etc. Of course, this would require a refactoring to omni.ja reading since there would no longer be a single index (the jar file itself).
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> Would it be helpful to have Dominik repeat the warm and cold startup time
> metrics documented at
> GCQwR2EGYsbb_fTADYnpFJyxrmsRjRpUvf6GBAw/edit?usp=sharing against equivalent
> builds with compression on and off, to see more precisely how it affects
> cold versus warm?

Yes, I've already asked him to test out

They should be identical except for omni.ja compression.
Flags: needinfo?(catlee)
Setting a flag to track next steps.
Flags: needinfo?(dstrohmeier)
(In reply to Chris AtLee [:catlee] from comment #7)
> Yes, I've already asked him to test out 
> mozilla-central/
> vs
> zip?dl=0
> They should be identical except for omni.ja compression.

Here's the outcome of comparing these builds:

Results show that the build without omni.ja compression is faster for both First Paint (64ms in median) and Hero Element (161ms in median). First Paint is application appears on fullscreen, Hero Element is the "search" placeholder in the search box of the content.

In addition, the test runs for the build without compression have a much lower standard deviation resulting in a more streamlined experience for users when doing multiple start ups.
Flags: needinfo?(dstrohmeier)
I re-ran the comparison again on today's Nightly as the previous numbers were off to what we had measured last week for Nightly. There is also drop for ts_paint on May 4, 2017: vs

Based on a comparison of just 5 runs, the results for these two builds are the same. In median, both reach First Paint at 1133ms and Hero Element at 1650ms.
Don't forget about the PGO specific optimizations that are done on omni.ja. We probably should be comparing ts_paint on PGO builds only.

Comment 12

14 days ago
Can we see the results of an inference test between these two builds, not just links to boxplots? We should track the significance of the difference rather than eyeballing charts for these types of tests.  I expect that the conclusions will be the same, but we need to keep the evidence of testing observable within the context of this bug.  Reporting the inference test used, the difference in median, and the p-value of the difference should be sufficient.
Flags: needinfo?(dstrohmeier)
As outlined above, the current testing is done based on 5 runs per build. Find the data here: I think it's pretty obvious that the numbers are comparable. The ones in the chart are outdated as per comment 10.

I understand that this is far off from being scientific, but given the time constraints and running these tests manually, this is what we currently can afford to do.

Once we get into the range of running these tests 100 times (or even in an automated way allowing us to collect even larger sets of data), it will definitely make sense for us to follow the recommended procedure of reporting results.
Flags: needinfo?(dstrohmeier)

Comment 14

14 days ago
This is a public bug, and your link is behind LDAP and thus not accessible for non-Mozilla employees (and neither is that spreadsheet).  Assume it is possible that could disappear or that we lost that link; we would never be able to replicate your conclusion.  That is why we should record the results of the comparison in this bug; at least note the differences in medians, and ideally the significance of that difference *even if the p-value is strange due to small samples*.

At a minimum, please record the observable numbers (the boxplot cutoff values) for each group that you report in your charts here.
Flags: needinfo?(dstrohmeier)
Is there enough information in here to make a call on this? The trade-off is download size vs first paint.

I think somebody just needs to choose which one we care about more. jpr, do you want to make a call?
Flags: needinfo?(jpr)
Whiteboard: [qf] → [qf:p1]
From catlee:

just to be clear in - this latest regression is a regression in both ts paint AND file size

if we disable compression in omni.ja again, we will decrease file size and improve ts paint times in talos anyway

So this kinda sounds like a no-brainer. We should disable compression.

catlee, are you willing to do this?
Flags: needinfo?(jpr) → needinfo?(catlee)
The bug summary should be changed because this is not a trade-off.
Note that omni.ja was uncompressed on nightly only. It has never been uncompressed on other branches. So while there may be a regression for nightly, things have never actually changed for users.

Bug 1352595 is meant to make omni.ja both smaller and faster to read.

Comment 19

8 days ago
Originally it was thought to be a tradeoff because of telemetry data, but that data ended up being neutral according to :catlee.  I suspect we should just do it unless we have any concerns about stability, and let it ride the 55 train.
(In reply to Mike Hommey [:glandium] from comment #18)
> Note that omni.ja was uncompressed on nightly only. It has never been
> uncompressed on other branches. So while there may be a regression for
> nightly, things have never actually changed for users.
> Bug 1352595 is meant to make omni.ja both smaller and faster to read.

Although it makes the omni.ja smaller, it doesn't do as much to improve download sizes.

e.g. for two nightly builds, partial update sizes are:
10,114,084 original partial
10,018,540 partial between mars with omni.tar files with brotli compressed contents
 8,497,367 partial between mars with uncompressed omni.ja files

complete updates are a bit better:
54,756,449 original
52,314,206 brotli
51,196,058 uncompressed omni.ja

and finally full installers:
45,395,514 original
43,202,792 brotli
40,142,349 uncompressed omni.ja

Do we know the performance implications of the broli compressed omni.ja files yet?
Flags: needinfo?(catlee)
Perhaps we should land the decompressed omni.ja patch now, and then evaluate brotli when it's ready?
Sounds like a fine plan to me.
(In reply to Chris AtLee [:catlee] from comment #21)
> Perhaps we should land the decompressed omni.ja patch now, and then evaluate
> brotli when it's ready?

Yes.  FWIW switching to brotli or other compression schemes is totally beyond the scope of this bug.
Comment hidden (mozreview-request)
Attachment #8868558 - Flags: review?(mconley)
Comment on attachment 8868558 [details]
Bug 1362377: Disable omni.ja compression

Let's do it!
Attachment #8868558 - Flags: review?(mconley) → review+

Comment 27

6 days ago
Comment on attachment 8868558 [details]
Bug 1362377: Disable omni.ja compression

Whoops - should probably have done this through MozReview.

Comment 28

6 days ago
Pushed by
Disable omni.ja compression r=mconley

Comment 29

6 days ago
Last Resolved: 6 days ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Please flag bugs where you plan to increase memory with the MemShrink whiteboard tag.

Chris what kind of measurements did you do? I see a reference to ~8MB in comment 2, was that just for the parent process or is it across the board? Generally speaking an 8MB regression even in one process is enough for me to encourage a backout.
Flags: needinfo?(catlee)
Whiteboard: [qf:p1] → [qf:p1][MemShrink]
The 8MB reference was from when this originally landed in 2015-12-17.

The only talos alert about RSS right now is on Linux64, where we increased about 3.5MB, although it looks like it's come back down since then:,d22e336c3614c2b16d5e9b238ba67c087a6eeb85,1%5D&series=%5Bautoland,d22e336c3614c2b16d5e9b238ba67c087a6eeb85,0%5D&selected=%5Bmozilla-inbound,d22e336c3614c2b16d5e9b238ba67c087a6eeb85%5D

Windows looks like it's increased about 2MB:,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D&series=%5Bmozilla-inbound,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D&series=%5Bautoland,fbf1fa83bd56a1c780d6af9cf42aca3e2e3fa63b,1,1%5D

Telemetry for MEMORY_TOTAL looks unaffected right now:!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F55&measure=MEMORY_TOTAL&min_channel_version=nightly%252F55&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0
Flags: needinfo?(catlee)
You need to log in before you can comment on or make changes to this bug.