Open Bug 1778472 Opened 2 years ago Updated 3 months ago

Firefox startup extremely slow after reboot (24 seconds wait in Web Extensions process from mozilla::dom::LSObject::EnsureDatabase)

Categories

(Core :: Storage: Quota Manager, defect, P2)

defect

Tracking

()

Performance Impact medium

People

(Reporter: emilio, Unassigned)

References

(Depends on 2 open bugs)

Details

(4 keywords)

+++ This bug was initially created as a clone of Bug #1778374 +++

See bug 1778374 comment 9 for the relevant profile. It seems "Notes by Firefox" add-on is trying to use local-storage and that hangs for ~24 seconds, which is fairly bad.

https://share.firefox.dev/3NKmDnT is the profile, for reference.

Jan, I assume this would benefit from bug 1671932, too?

Flags: needinfo?(jvarga)

(In reply to Jens Stutte [:jstutte] from comment #2)

Jan, I assume this would benefit from bug 1671932, too?

Yes, and also bug 1778496 which a just filed.

Flags: needinfo?(jvarga)
Depends on: 1671932, 1778496
Performance Impact: --- → ?
Keywords: perf:startup

jrmuizel and I took a look at the profile in comment 1, and it seems like the background thread in the parent process is busy and that's what the WebExtension process is waiting on a response from for localStorage initialization.

It'd be super helpful if we could get the background thread sampled as well.

Can you give me a hint how to create a profile containing the information you need?

(In reply to Christoph Rabel from comment #5)

Can you give me a hint how to create a profile containing the information you need?

In addition to the MOZ_PROFILER_STARTUP environment variable, you also need to set the MOZ_PROFILER_STARTUP_FILTERS environment variable.

For example, you could set it to: MOZ_PROFILER_STARTUP_FILTERS="GeckoMain,Compositor,Renderer,Worker,Background,Quota" (Mike asked for a 'background' thread; I'm personally interested in a thread named "QuotaManager IO" so I added 'Quota' to the list).

If you think your machine is fast enough to handle it, you can also try profiling all threads, with MOZ_PROFILER_STARTUP_FILTERS="*", but this may slow things down.

I tried with * but I could not upload it (too large). So, here's the profile with the listed filters:
https://share.firefox.dev/3ImKDfZ

Thanks, the activity is on the QuotaManager IO thread: https://share.firefox.dev/3InF6pg

Mike, do you want to have a look at this new profile?

Flags: needinfo?(mconley)

You're right - the QuotaManager IO thread is much more interesting.

https://share.firefox.dev/3uMRgT7

Seems like we're doing a lot of work to "ensure temporary storage is initialized", within mozilla::dom::quota::QuotaManager::InitializeOrigin. Moving this bug to the Storage Quota component.

Component: Storage: localStorage & sessionStorage → Storage: Quota Manager
Flags: needinfo?(mconley)

The profile seems to confirm that from second 8 to 27 most of the time is spent inside mozilla::dom::quota::QuotaManager::EnsureTemporaryStorageIsInitialized. In this profile, the most impacting initialization is the mozilla::dom::cache::CacheQuotaClient::InitOrigin, but that might just be a case.

We are prioritizing work on bug 1671932 and bug 1778496 for this half.

Severity: -- → S3
Priority: -- → P2

Solving this problem without a major overhaul of the quota counting mechanism is not straightforward (quote 1563023:origin dirs that were touched in the previous FF session need to be scanned despite clean shutdown) but maybe at least restore telemetry like QM_REPOSITORIES_INITIALIZATION_TIME. Especially since the time dom.quotaManager.checkQuotaInfoLoadTime is counted anyway. It would help to investigate the prevalence of the problem because since 2019, sites storing thousands of files in the dom cache have become more common. Even without telemetry just based on the archiving mechanism using SSD and so in my case at least twice the QuotaInfoLoadTime was greater than 21 s. This points to a potentially serious problem in Firefox startup latency even ignoring HDD users.

Performance Impact: ? → P1

The Performance Priority Calculator has determined this bug's performance priority to be P2. If you'd like to request re-triage, you can reset the Performance flag to "?" or needinfo the triage sheriff.

Impact on browser UI: Causes noticeable startup delay
Impact on site: Renders site effectively unusable
Page load impact: Severe
[x] Able to reproduce locally

Performance Impact: P1 → P2

We got an internal repro of this case with >60s Firefox startup times.

This should probably be S2? ("Bugs that could reasonably be expected to cause a Firefox user to switch browsers")

Flags: needinfo?(jvarga)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)

We got an internal repro of this case with >60s Firefox startup times.

This should probably be S2? ("Bugs that could reasonably be expected to cause a Firefox user to switch browsers")

The bug 1671932 this depends on is already S2. I do not think we would be able to give this more priority than we already do (that is: very high) by raising it also here.

Note also that this does not happen on every restart, most of the times we just read cached information which is much faster.

(In reply to Jens Stutte [:jstutte] from comment #14)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)

We got an internal repro of this case with >60s Firefox startup times.

This should probably be S2? ("Bugs that could reasonably be expected to cause a Firefox user to switch browsers")

The bug 1671932 this depends on is already S2. I do not think we would be able to give this more priority than we already do (that is: very high) by raising it also here.

Note also that this does not happen on every restart, most of the times we just read cached information which is much faster.

Firefox at each restart has a list of pages that were visited in the previous browsing session and for these QM fully performs a folder scan operation at browser startup. If among these pages there is one that has, for example, several thousand files in the morgue folder (e.g. web.spotify.com in my case has more than 19 thousand), then even on an SSD the waiting time for the simplest page to load after the browser starts increases by about 5 seconds multiplied by the number of such pages. With a number of such pages, which have a lot of files in morgue and are used by the user in each session, the mentioned cache mechanism is therefore of little use, because QM initialisation even on the SSD approaches and exceeds the limit of dom.quotaManager.longQuotaInfoLoadTimeThresholdMs several times a month. And on HDD it is safe to assume that this limit is always exceeded in such cases.

(In reply to Jens Stutte [:jstutte] from comment #14)

Note also that this does not happen on every restart, most of the times we just read cached information which is much faster.

Well, it hits me on every reboot. Restarting does not matter, it's fast. But when I reboot, I have to wait. No idea, why that would invalidate the cache for me, but it does. I didn't measure it for a while, but I have to wait quite some time. The profile is located on HDD.

I saw that people work hard on bug 1671932, so, thanks to all those developers, I hope you are successful soon.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)

We got an internal repro of this case with >60s Firefox startup times.

This should probably be S2? ("Bugs that could reasonably be expected to cause a Firefox user to switch browsers")

The regression has been fixed in bug 1867095 and we also fixed 1866240 which should help improve startup performance when users have pinned sites like Slack or other similar sites with intensive use of Cache API.

Flags: needinfo?(jvarga)

I believe I've been having the same issue or a closely related one for a while now.

With my profile and all extensions disabled, after dropping VM caches (with echo 3 > /proc/sys/vm/drop_caches) and SIGKILL-ing Firefox, LocalStorage-using websites take ~8 minutes before they start loading, while other websites load within a minute. Startup profile: https://share.Firefox.dev/3uswRWx. (With extensions, all websites load after ~8.5 minutes; https://share.Firefox.dev/3HSkFkP)

Same setup, just quitting Firefox from the menu instead of a SIGKILL, yet still dropping VM caches, all websites load within ~1 minute. Startup profile: https://share.Firefox.dev/3usm2DO. (With extensions, all websites load within ~1.2 minutes; https://share.Firefox.dev/3upkDht)

The profile's storage/default folder is relatively large and located on a HDD. It currently has 4k folders that take ~10 seconds to ls uncached, and a tar -c . > /dev/null takes ~110 seconds. Exploring it with Filelight reveals a total of ~1.2GB spread over 66k files, of which three files are ~60MB and the rest are tiny.

I experimented with keeping the VM caches, which brought startup down to ~3 minutes (https://share.firefox.dev/49vn1BU); and with moving storage/default to tmpfs, which had the webpages loading in under a minute. Since the 8 minutes are a lot more than the 2 minutes tar needs to read the files, and since Firefox doesn't seemed slowed down by any heavy processing in the tmpfs case, I'm guessing it has something to do with fsync latency, and indeed, that 3-minute runs spends a good 90% in fsync calls.

Either way, I hope this data helps with debugging the issue somehow.. if at all. Apologies for the long write-up, I only noticed the issue's being worked on after I had already collected the data. (Tho, if anyone directs me to the relevant parts of the code, I would be happy to poke around and see what I can do XD)

You need to log in before you can comment on or make changes to this bug.