GeckoView unconditionally clears the startup cache of new profiles (less than 6 hours old)
Categories
(GeckoView :: General, defect)
Tracking
(firefox94 fixed)
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: mcomella, Assigned: robwu)
References
(Blocks 1 open bug)
Details
(Whiteboard: [addons-jira])
Attachments
(1 file)
On the Moto G5 in fenix, we spend 231ms (11% of time to navigation start in the parent process) in InterpretGeneratorResume
, which does things like _getNormalizedManifest
and extensionById
https://share.firefox.dev/3mUQ5y1: this seems like a lot of time to parse the JSON associated with the manifests from web extensions. We identified a perf issue fenix#20934 and further fixed it up bug 1727272 but I'd be curious if we could further improve the performance.
Notes:
- :agi mentioned they want to move the fenix WebExtensions into C++ to avoid the start up cost. In the mean time, maybe we can make these improvements
- web extensions are one theory for why fenix is 300ms slower than GeckoView Example on cold view page load https://github.com/mozilla-mobile/fenix/issues/20936
- curiously, in the regressed version, we noticed the
parseManifest
code too longer in nightly builds than my local nightly builds: https://github.com/mozilla-mobile/fenix/issues/18753#issuecomment-908780397
Comment 1•3 years ago
|
||
I doubt there's something different that we're doing in GV that makes this particularly slow, so I'm moving this to the Extension component. This likely also has nothing really to do with Android (but it's a lot more important there where cold start of Gecko is pretty frequent).
Feel free to move back if needed.
Comment 2•3 years ago
|
||
We don't parse extension manifests at startup, though, so if we are, it likely is something specific to GV.
Comment 3•3 years ago
|
||
How much earlier on startup the android-components may end up calling the GeckoView's WebExtensionController method ensureBuiltin?
Comment 4•3 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #2)
We don't parse extension manifests at startup, though, so if we are, it likely is something specific to GV.
That's interesting, does that include the case where extensions need to be started up before the first network request has executed (e.g. for uBlock)?
I'm assuming what's causing us to parse the extension manifest is that the app calls ensureBuiltIn
at startup every time, but I thought that this change from Rob was supposed to avoid parsing the manifest: https://hg.mozilla.org/mozilla-central/rev/19fe145c3b6214d038e053f7f95b1114bb50f03f is that not the case?
We're toyed with the idea of having the app know when the built in extension is installed and not do anything instead of trying to install it again, it sounds like that might help here? but I'm still confused why the difference since we don't explicitly parse the manifest anymore.
How much earlier on startup the android-components may end up calling the GeckoView's WebExtensionController method ensureBuiltin?
(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #3)
How much earlier on startup the android-components may end up calling the GeckoView's WebExtensionController method ensureBuiltin?
For this discussion we can assume that the call is made as early as possible at startup.
Comment 5•3 years ago
|
||
(In reply to Agi Sferro | :agi | ni? for questions | ⏰ PST | he/him from comment #4)
(In reply to Kris Maglione [:kmag] from comment #2)
We don't parse extension manifests at startup, though, so if we are, it likely is something specific to GV.
That's interesting, does that include the case where extensions need to be started up before the first network request has executed (e.g. for uBlock)?
On desktop, we handle that case with what we call "persistent listeners" and "delayed background page startup",
we basically delay starting the background page until after the first paint, the addon is technically started a bit ealier than
that but after having installed it we should not be parsing the manifest again but use what we parsed before at install time.
I'm wondering if one of the "builtin" extensions installed from the android-components (e.g. maybe webcompat?) may be getting a startupReason that isn't set to "APP_STARTUP" and we end up starting the background page sooner than we do on desktop, see the check here:
Assignee | ||
Comment 6•3 years ago
|
||
In Github, you wrote:
Could this be related to the start up cache? I think I cleared data for the app though before I took the additional measurements...
What did you do exactly? Clearing the cache or really wiping all app data?
The parsed extension manifests are saved in ProfD/StartupCache/webext.sc.lz4
(source, note ProfLD = ProfD on Android). This directory is cleared on version changes, but other than that it will be there as long as the profile exists.
(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #0)
On the Moto G5 in fenix, we spend 231ms (11% of time to navigation start in the parent process) in
InterpretGeneratorResume
, which does things like_getNormalizedManifest
andextensionById
https://share.firefox.dev/3mUQ5y1
Some remarks (after analyzing the full range of the profile):
- Most code in the extension framework is asynchronous. A block in the flame chart does not necessary cover the full duration (or even the start or end!) of a specific method call. Multiple extensions may be starting in parallel.
- The
Schemas.normalize
call at the end of_getNormalizedManifest
is synchronous though, which may contribute to the visibility of this particular method in the flame chart.
- The
- The presence of
_getNormalizedManifest
suggests that an extension was loaded whose manifest hasn't been cached yet. Schema parsing is apparently relatively expensive for the given sample. - At least one other extension is loading; at the start of the flame graph I see evidence that an extension manifest starts to be parsed (10.0ms
_getNormalizedManifest
) and even the initialization of the background script (24.2msonManifestEntry
inext-backgroundPage.js
)).- The extension uses the webRequest API. Only one built-in extension uses the API, i.e. the webcompat add-on: https://github.com/mozilla-mobile/android-components/blob/v92.0.13/components/feature/webcompat/src/main/assets/extensions/webcompat/manifest.json
- The initialization of the background script (which triggered the preliminary webRequest event registration) implies that manifest of the webcompat add-on has already been loaded (potentially from the cache, it's difficult to tell as the
_getNormalizedManifest
call in the chart could also be from a different extension).
- The presence of
readJSONAndBlobbify
(including the call toblobbify
) suggests thatSchemas.load
was called for a schema that wasn't found in the startup cache, at https://searchfox.org/mozilla-central/rev/ac7da6c7306d86e2f86a302ce1e170ad54b3c1fe/toolkit/components/extensions/Schemas.jsm#3674.- The caller of
Schemas.load
is the mainapiManager
'slazyInit
method, which is invoked as part of Extension startup, in parallel with loading the manifest (parsed or cached) - The call to
readJSONAndBlobbify
happens immediately before 135ms is spent in_getNormalizedManifest
. - The completion of
readJSONAndBlobbify
is marked by the call toblobbify
, which appears in the flame graph a few seconds after the completion of_getNormalizedManifest
. There is another earlier 15.0msblobbify
call, which is likely the result of loading the schema for an experimental API.
- The caller of
P.S. I typed this comment at the start of my day, but had to drop it in favor of another priority. This comment doesn't relate to the discussion since comment 3, but I wanted to leave the comment anyway in case it offers more context.
Assignee | ||
Comment 7•3 years ago
|
||
See question at the top of my previous comment.
Reporter | ||
Comment 8•3 years ago
|
||
(Haven't had a chance to thoroughly understand the other comments but to respond to the NI...)
(In reply to Rob Wu [:robwu] from comment #6)
In Github, you wrote:
Could this be related to the start up cache? I think I cleared data for the app though before I took the additional measurements...
What did you do exactly? Clearing the cache or really wiping all app data?
The process I try to follow for taking start up profiles and benchmarks is:
- Clear all app data:
adb shell pm clear org.mozilla.fenix
- Launch the app once to ensure we're not measuring a first run. I only let the app idle for a few seconds
- Force-stop the app:
adb shell am force-stop org.mozilla.fenix
- Launch the app, taking a profile/benchmark
It sounds like this process might not give the app enough time to populate the start up cache.
:robwu, how long does the app need to be running before the start up cache is populated? I can try waiting for that and taking another profile.
Assignee | ||
Comment 9•3 years ago
|
||
(In reply to Rob Wu [:robwu] from comment #6)
In Github, you wrote:
Could this be related to the start up cache? I think I cleared data for the app though before I took the additional measurements...
What did you do exactly? Clearing the cache or really wiping all app data?
The process I try to follow for taking start up profiles and benchmarks is:
- Clear all app data:
adb shell pm clear org.mozilla.fenix
- Launch the app once to ensure we're not measuring a first run. I only let the app idle for a few seconds
- Force-stop the app:
adb shell am force-stop org.mozilla.fenix
- Launch the app, taking a profile/benchmark
It sounds like this process might not give the app enough time to populate the start up cache.
:robwu, how long does the app need to be running before the start up cache is populated? I can try waiting for that and taking another profile.
In the profile that you've shared, it appears that all extensions have finished initializing after 3-4 seconds (not counting the load of the background script, which takes another second).
The extension's StartupCache is populated with a delay, 5 seconds after the first attempt to write to the StartupCache: https://searchfox.org/mozilla-central/rev/49b6e60550243b4b4d71d6ab35a3ff2b9a9f7c69/toolkit/components/extensions/ExtensionParent.jsm#1753,1757
... all together, if you wait for 10 seconds, that should be plenty to have initialized the extension's StartupCache.
Comment 10•3 years ago
|
||
Changing severity to S3 because this looks like a testing method issue.
Reporter | ||
Comment 11•3 years ago
|
||
... all together, if you wait for 10 seconds, that should be plenty to have initialized the extension's StartupCache.
I took a new profile that I think should have hit the start up cache but still has the same problem: https://share.firefox.dev/3DWSSNd Here were my steps:
- I took some perf tests on yesterdays nightly (25 replicates), left the device screen open for many minutes while I did other things
- Hit back to background the app
- Waited about 20 seconds (no stopwatch)
- Force-stopped the app
- Took a start up profile
And I followed similar steps (excluding the perf test) to take this one and see it too: https://share.firefox.dev/3hb01PX
Changing severity to S3 because this looks like a testing method issue.
Since I can reproduce after fixing the methodology, I'm reverting the severity.
Assignee | ||
Comment 12•3 years ago
|
||
:_:
I discovered what's happening. The startup caches are purged at startup by logic from bug 1422087, because the application never exits gracefully on Android. Bug 1624724 recognized the issue and attempted to fix it, but the logic doesn't work as explained in more detail below. This doesn't only affect extensions, but anything that's stored in the startup cache.
The "good" news is that this issue lasts for 6 hours; any profile whose age is older than 6 hours won't suffer from this bug.
Cause of purged caches
At startup, the existence of a dummy canary file called .startup-incomplete
is checked.
When the file exists and is younger than 6 hours, the logic assumes that a crash has happened, which triggers the removal of the caches (including the extension's startup cache).
The file is created if it doesn't exist (which is the expected result). This created file is removed when the application exits gracefully (which doesn't happen on mobile), or removed "shortly after startup", which is triggered via browser-idle-startup-tasks-finished
in GeckoView). The latter logic was introduced in bug 1624724, but it doesn't seem to work (as seen below, the files still exist and are many years old).
* I just opened about:debugging
on desktop, and connected the debugger to two of the many old Firefox browsers that I have on my phone, and can confirm that the file is never actually removed, based on the observation that the files are from 2018 (when the feature was introduced in bug 1422087) and 2019 (when I started using Fenix Nightly):
new Date((await IOUtils.stat(Services.dirsvc.get("ProfLD", Ci.nsIFile).path + "/.startup-incomplete")).lastModified)
Startup canary not removed via browser-idle-startup-tasks-finished
As mentioned before, the nsAppStartup::trackStartupCrashEnd
method is supposedly called by a browser-idle-startup-tasks-finished
observer in GeckoViewStartup.jsm.
This observer was introduced in bug 1624724 and is not triggered however. In the profile (e.g. https://share.firefox.dev/3hb01PX from comment 11), you can try to search for "crash" in the Marker Table. startupCrashDetectionBegin
(from TrackStartupCrashBegin
) shows up, but startupCrashDetectionEnd
(from TrackStartupCrashEnd
does not. This is despite the fact that the notification itself is triggered by geckoview.js (seen in the Marker Table as NotifyObservers - browser-idle-startup-tasks-finished
). So the issue is that the observer in GeckoViewStartup.jsm is not registered.
The only sign of an attempt at registering the observer is in GeckoView.manifest
, which contains the following:
# GeckoViewStartup.js
category app-startup GeckoViewStartup service,@mozilla.org/geckoview/startup;1
category profile-after-change GeckoViewStartup @mozilla.org/geckoview/startup;1 process=main
category browser-idle-startup-tasks-finished GeckoViewStartup @mozilla.org/geckoview/startup;1 process=main
- Observers for the
app-startup
category are initialized and triggered at https://searchfox.org/mozilla-central/rev/aa46c2dcccbc6fd4265edca05d3d00cccdfc97b9/toolkit/xre/nsXREDirProvider.cpp#976-977 - Observers for the
profile-after-change
category are initialized and triggered at https://searchfox.org/mozilla-central/rev/aa46c2dcccbc6fd4265edca05d3d00cccdfc97b9/toolkit/xre/nsAppRunner.cpp#5022 - ... the
browser-idle-startup-tasks-finished
category isn't read anywhere. The observer is never registered nor triggered.
... therefore the patch for bug 1624724 never worked.
I'll check if I can create a patch with a regression test.
Updated•3 years ago
|
Comment 13•3 years ago
|
||
(In reply to Rob Wu [:robwu] from comment #12)
... therefore the patch for bug 1624724 never worked.
It's been awhile since I worked on this, but I'm pretty sure I manually verified the patch was working. Isn't browser-idle-startup-tasks-finished
triggered from https://searchfox.org/mozilla-central/rev/aa46c2dcccbc6fd4265edca05d3d00cccdfc97b9/mobile/android/chrome/geckoview/geckoview.js#792
Assignee | ||
Comment 14•3 years ago
|
||
(In reply to Brendan Dahl [:bdahl] from comment #13)
(In reply to Rob Wu [:robwu] from comment #12)
... therefore the patch for bug 1624724 never worked.
It's been awhile since I worked on this, but I'm pretty sure I manually verified the patch was working. Isn't
browser-idle-startup-tasks-finished
triggered from https://searchfox.org/mozilla-central/rev/aa46c2dcccbc6fd4265edca05d3d00cccdfc97b9/mobile/android/chrome/geckoview/geckoview.js#792
How did you manually verify the patch?
That line indeed notifies any listening observers, and I confirmed that the call appears in the profiler from comment 11 (and also a local profile after waiting for 5 minutes; the notification was triggered after 2 seconds).
The file is however not removed (see second part of comment 12), and when I look at the implementation of nsAppStartup::TrackStartupCrashEnd
the file should have been removed along with recording a profiler marker.
In the last part of comment 12 I explained why I think that the observer listener is not registered (hence not triggered).
Comment 15•3 years ago
|
||
Here is a profile collected from a local Fenix build where the startup cache isn't discarded at every GeckoView startup (as a quick test I did just add an explicit call to Services.startup.trackStartupCrashEnd
from here in geckoview.js):
the profile data collected on the second startup is definitely quicker (and looking closely I can see that in this profile the API schema data has been loaded from the startupCache as expected).
It would be nice if we could add an automated test case that verifies that the startupCache is retained when is expected to along with the fix, I'm not sure if we can run marionette (or marionette-like) tests on the GeckoView example app that is part of mozilla-central, but that could be something to also double-check (either as part of fixing this issue, or as a follow up).
Comment 16•3 years ago
|
||
Note that because the gecko profiler is a sampling profiler, just because a function doesn't appear in the profile it doesn't mean it wasn't called. You need a profile marker if you want to be certain of that.
Assignee | ||
Comment 17•3 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #16)
Note that because the gecko profiler is a sampling profiler, just because a function doesn't appear in the profile it doesn't mean it wasn't called. You need a profile marker if you want to be certain of that.
Thanks for emphasizing that. I did rely on markers for the analysis:
(In reply to Rob Wu [:robwu] from comment #12)
In the profile (e.g. https://share.firefox.dev/3hb01PX from comment 11), you can try to search for "crash" in the Marker Table.
startupCrashDetectionBegin
(fromTrackStartupCrashBegin
) shows up, butstartupCrashDetectionEnd
(fromTrackStartupCrashEnd
does not. This is despite the fact that the notification itself is triggered by geckoview.js (seen in the Marker Table asNotifyObservers - browser-idle-startup-tasks-finished
).
(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #15)
It would be nice if we could add an automated test case that verifies that the startupCache is retained when is expected to along with the fix, I'm not sure if we can run marionette (or marionette-like) tests on the GeckoView example app that is part of mozilla-central, but that could be something to also double-check (either as part of fixing this issue, or as a follow up).
I'll attach a patch that confirms the bug, and verifies the fix.
Assignee | ||
Comment 18•3 years ago
|
||
See https://bugzilla.mozilla.org/show_bug.cgi?id=1728461#c12 for
the analysis of the problem.
Assignee | ||
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Comment 20•3 years ago
|
||
bugherder |
Reporter | ||
Comment 21•3 years ago
|
||
Perf analysis: I ran our cold view (app link) time to nav start start up test, modifying it to sleep 60 seconds after the "warm up" run before beginning the test. From these nightlies:
- 9/15: 2441ms
- 9/20 (5am): 1933ms
- delta: 508ms
So it looks like it worked! :) To be fair, there may have been improvements other than the start up cache that happened in this time frame. Here are profiles:
- 9/15 nightly: https://share.firefox.dev/2XIc9l5
- 9/20 nightly (9/18 gecko), 5-10s sleep before profile: https://share.firefox.dev/3CzkEhf
- 9/20 nightly (9/18 gecko), 60+s sleep before profile: https://share.firefox.dev/2VUt7vB
We still see some manifest work being sampled but it's not as long. Thank you for identifying this and addressing it!
Comment 22•3 years ago
|
||
Thanks Michael for linking the new profiles.
As a side note: the 3 new profiles linked from comment 21 do now include the "ExtensionSchemas" profiler marker added from Bug 1728245,
that marker may be a useful one to look at in the future startup perf analysis (especially if we suspect there may be a regression due to impact of extensions during the startup path):
-
the "ExtensionSchemas" markers from profile 9/15 nightly shows that the Schemas.load was loading all the data from file (and as an additional side effect of that the "TelemetryStopwatch" marker for WEBEXT_EXTENSION_STARTUP_MS shows that starting all the builtin addons did take longer than the other two profiles, as expected due to the additional work needed to load all the schema data from files)
-
the "ExtensionSchemas" markers from both the profiles 9/20 nightly shows that the Schemas.load was loading all the data from file (and as an additional side effect the "TelemetryStopwatch" marker for WEBEXT_EXTENSION_STARTUP_MS shows that starting all the builtin addons has been faster)
Description
•