Closed Bug 1351071 Opened 3 years ago Closed 3 years ago
Disable startup cache generation
59 bytes, text/x-review-board-request
Jeff and I were looking at why the 52.0 -> 52.0.1 partial updates where 11MB large (for win64 en-US) where very little code had changed between the builds. One of the largest contributors to the partial updates here was the omni.ja diff. We know that diffing omni.ja can be inefficient (bug 772868) because you're diffing compressed contents. The startup cache files in jsloader/ are particularly bad though. If you unpack omni.ja for 52.0 and 52.0.1, and run bsdiff on each pair of the raw files and compare that to bsdiff of deflated files, you can calculate a ratio of patch size of deflated content vs patch size of raw content. The jsloader files have patch sizes of up to 200x larger than patches of the raw files. Looking closely at differences in these files, we can see that in the case of 52.0 -> 52.0.1, the bulk of these files are unchanged. However, the buildid of the build is encoded in the first 18 bytes of the file. Because some initial bytes are different, all of the following compressed output will also be different. By resetting the compression stream after processing the initial 18 bytes containing the buildid, we can ensure that the bulk of the compressed contents are the same. Doing this reduces the size of the win64 partial update from 52.0 to 52.0.1 from 11MB to 8MB.
Attachment #8851767 - Flags: review?(mh+mozilla)
I pushed this to try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6fc736f8e88fcdee756f331cbdb5e91e1ff7a41e It doesn't seem to have Talos impact (although all the runs haven't finished yet): https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=7d3e8986a676&newProject=try&newRevision=6fc736f8e88fcdee756f331cbdb5e91e1ff7a41e&framework=1
Can't we just kill the startupcache already? There's been talk about that for a long while, and all we keep doing is trying to work around it here and there instead of just doing it. If we don't remove the startup cache, then ISTM we should instead try to remove the buildid from those files, instead of doing horrible contorsions at the packaging level.
I'd love do. Doing that results in huge startup IO regressions on Talos though. Who can make the call of whether we are willing to take the hit?
(In reply to Chris AtLee [:catlee] from comment #4) > I'd love do. > > Doing that results in huge startup IO regressions on Talos though. That doesn't make much sense. Small startup IO regressions might be expected, not huge ones.
Weird, the first time I tested I got huge tp5n nonmain_startup_fileio opt on win32. Looks like that could have been bug 1274018. I did a new try push https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2a4155360d36010c95b9bcc7fee731430ee4de7 to completely get rid of the startup cache. I also see the nonmain_startup_fileio regression. https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=9577ddeaafd8&newProject=try&newRevision=e2a4155360d36010c95b9bcc7fee731430ee4de7&framework=1
(In reply to Mike Hommey [:glandium] from comment #3) > If we don't remove the startup cache, then ISTM we should instead try to > remove the buildid from those files, instead of doing horrible contorsions > at the packaging level. The buildid is encoded as part of a version check operation, presumably to ensure that the bytecode is only used by the correct version of the JS VM.
Try push here to measure our baseline performance: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a67ebd76fcb267fb8a12fd5362d8f768e19976a
(In reply to Chris AtLee [:catlee] from comment #7) > (In reply to Mike Hommey [:glandium] from comment #3) > > If we don't remove the startup cache, then ISTM we should instead try to > > remove the buildid from those files, instead of doing horrible contorsions > > at the packaging level. > > The buildid is encoded as part of a version check operation, presumably to > ensure that the bytecode is only used by the correct version of the JS VM. ... and that's only really necessary for the startup cache in the user profile, not in the files shipped in omni.ja.
Can you get the JS VM to stop emitting these bytecodes?
Nothing is impossible. But how about we only start thinking about this if we can't disable startupcache?
No significant regressions detected: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e2a4155360d3&newProject=try&newRevision=9a67ebd76fcb267fb8a12fd5362d8f768e19976a&framework=1&showOnlyImportant=0 I think we should disable this on Nightly for all platforms and then look at Telemetry data.
Assignee: nobody → catlee
Summary: Specialize compression for startup cache files inside omni.ja → Disable startup cache generation
Comment on attachment 8851767 [details] Bug 1351071: Get rid of pre-generated startup cache https://reviewboard.mozilla.org/r/123994/#review128806 ::: commit-message-9577d:1 (Diff revision 2) > +Bug 1351071: Get rid of startup cache r=glandium We're not actually removing the startup cache, so let's go with something like "Get rid of pre-generated startup cache" ::: python/mozbuild/mozpack/test/test_packager_formats.py:411 (Diff revision 2) > - self.assertTrue( > + self.assertFalse( > is_resource(base, 'jsloader/resource/gre/modules/foo.jsm')) You should just remove this assert completely.
Attachment #8851767 - Flags: review?(mh+mozilla) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/0f320509eddb Get rid of pre-generated startup cache r=glandium
:catlee Could you confirm this also reduces time for the build process? There seem to be improvements here also: https://treeherder.mozilla.org/perf.html#/alerts?id=5846
Yeah, I'm not surprised it slightly improves build times.
We're trying to find the root cause of a 1.7ms in the 95th percentile input event responsiveness telemetry probe from the parent process in bug 1362094 and this patch is one of the three possible culprits that we currently suspect as the possible root causes. Chris, do you think this could be the possible cause of bug 1362094? Would you be OK with a temporary backout of this bug to see if the regression is reclaimed on telemetry?
AIUI not providing a startup cache means that we will pay a one-time cost per version of Firefox where we generate the precompiled versions of these files locally. I would have expected that to show up in startup times rather than input event responsiveness, but anything is possible. Please feel free to backout to verify via telemetry. A few notes: * We were planning on disabling this for the cross-compiled OSX builds anyway (bug 1314154) * If we do back this out permanently, I would like to revisit my idea for customizing the compression of these files to get more efficient partial updates (comment #0 on this bug).
It seems unlikely that this could cause input event responsiveness delay. Startup cache is a one-time thing, that this bug moves from build-time to run-time. I don't see how this one-time thing could have an impact on recurring things.
It's true that generation of the startup cache is a one-time thing, but its presence affects things by quite a bit because of conditions like this: <https://searchfox.org/mozilla-central/rev/224cc663d54085994a4871ef464b7662e0721e83/js/xpconnect/loader/mozJSComponentLoader.cpp#686> which makes us to almost no work when the startup cache is present and you call Cu.import() which our chrome code does a lot. So if hypothetically the generation of the startup cache fails on our users machines for whatever reason at startup, then they will always be taking the slow path here, and that may be a theory that could explain the slowness we have been measuring. Chris, if this makes sense, I'd appreciate if you can back out at your convenience. Thanks!
How would it fail on users machines?
Also note there's two level of cache, in-memory and on-disk. And the on-disk cache has two levels on its own, one of which was removed in this bug.
(In reply to Mike Hommey [:glandium] from comment #25) > How would it fail on users machines? I don't know.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #24) > It's true that generation of the startup cache is a one-time thing, but its > presence affects things by quite a bit because of conditions like this: > <https://searchfox.org/mozilla-central/rev/ > 224cc663d54085994a4871ef464b7662e0721e83/js/xpconnect/loader/ > mozJSComponentLoader.cpp#686> which makes us to almost no work when the > startup cache is present and you call Cu.import() which our chrome code does > a lot. So if hypothetically the generation of the startup cache fails on > our users machines for whatever reason at startup, then they will always be > taking the slow path here, and that may be a theory that could explain the > slowness we have been measuring. FTR, as bug 1363384 comment 1 shows, if this hypothetical happens, apparently we can end up doing main thread IO. :-(
I don't think this backout has shown any change in the telemetry data, we should reland this patch...
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/68cdf62dd865 Get rid of pre-generated startup cache r=glandium
Status: REOPENED → RESOLVED
Closed: 3 years ago → 3 years ago
Resolution: --- → FIXED
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in before you can comment on or make changes to this bug.