Stuck WDBA holds profile lock, delays startup by 5s
Categories
(Firefox :: Installer, defect, P3)
Tracking
()
Performance Impact | low |
People
(Reporter: gcp, Unassigned)
References
Details
(Keywords: perf:startup, Whiteboard: [fidedi])
Attachments
(9 files)
I have a Firefox 132.0.2 release install on an AMD 5800X, Windows 10, 32G RAM RTX3050, SSD with plenty of space. There are no addons installed, browser is up to date.
Launching Firefox from the taskbar takes over 5 seconds.
This seems extremely slow: on the same machine Chrome launches in much less than a second. I checked and even Firefox Nightly launches instantly on the same machine.
Reporter | ||
Updated•3 months ago
|
Reporter | ||
Updated•3 months ago
|
Reporter | ||
Comment 1•3 months ago
|
||
Comment 2•3 months ago
|
||
Can you capture a startup profile as outlined here: https://profiler.firefox.com/docs/#/./guide-startup-shutdown
Reporter | ||
Comment 3•3 months ago
|
||
Reporter | ||
Comment 4•3 months ago
|
||
This seems to be
PR_Sleep(unsigned int) [nsprpub/pr/src/threads/prcthr.c]
nsRemoteService::LockStartup() [toolkit/components/remote/nsRemoteService.cpp]
XREMain::XRE_mainStartup(bool*) [toolkit/xre/nsAppRunner.cpp]
We did related changes in 133 (https://bugzilla.mozilla.org/show_bug.cgi?id=1915216), but not 132, confusingly?
Comment 5•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #4)
This seems to be
PR_Sleep(unsigned int) [nsprpub/pr/src/threads/prcthr.c]
nsRemoteService::LockStartup() [toolkit/components/remote/nsRemoteService.cpp]
XREMain::XRE_mainStartup(bool*) [toolkit/xre/nsAppRunner.cpp]We did related changes in 133 (https://bugzilla.mozilla.org/show_bug.cgi?id=1915216), but not 132, confusingly?
Even those changes shouldn't have really changed the way we do things on startup. The question is why the attempt to acquire a startup lock is failing. Is there anything interesting about the filesystem your temporary directory is stored on?
Reporter | ||
Comment 6•3 months ago
|
||
It's a normal Windows install, so NTFS. As said, Nightly on the same machine launches instantly.
Comment 7•3 months ago
|
||
Can you try running Process Monitor and see what Firefox is doing during those 5 seconds?
Comment 8•3 months ago
|
||
Bug 1933261 sounds somewhat similar.
Reporter | ||
Comment 9•3 months ago
•
|
||
If I drag the "nothing happens" period in the profiler, it looks to me like it's exactly 5 seconds, which is https://searchfox.org/mozilla-central/source/toolkit/components/remote/nsRemoteService.cpp#35
So I don't think this is computer performance related. That lock isn't getting cleaned up or something?
Reporter | ||
Comment 10•3 months ago
|
||
Task manager indicates there are 3 Firefox processes hanging around. They don't stop a new Firefox from launching, but I'm guessing they are stopping it from getting that lock. So that are likely at least 2 bugs here: a sort of shutdown hang where we fail to terminate the process, and old processes affecting this lock. but not the actual Firefox.
Comment 11•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)
Task manager indicates there are 3 Firefox processes hanging around. They don't stop a new Firefox from launching, but I'm guessing they are stopping it from getting that lock. So that are likely at least 2 bugs here: a sort of shutdown hang where we fail to terminate the process, and old processes affecting this lock. but not the actual Firefox.
We only hold this lock during startup so old processes that haven't shutdown shouldn't impact it. Process monitor could at least tell us what Windows is returning from the CreateFileW call.
Reporter | ||
Comment 12•3 months ago
|
||
Reporter | ||
Comment 13•3 months ago
•
|
||
I investigated a bit more: two of the Firefox.exe processes are the background update agent, so I suppose that's normal? The other is a content process that seems stuck in a Windows DLL init (interestingly, this is an issue we've seen before in telemetry/crashes). I got WinDbg set up here and got a stack for that.
However, killing the stuck content process didn't resolve the issue, and process monitor says Firefox still hangs on the parent.lock on startup. So the background update agent is what is not freeing the lock correctly?
Reporter | ||
Comment 14•3 months ago
|
||
Reporter | ||
Comment 15•3 months ago
|
||
Reporter | ||
Comment 16•3 months ago
|
||
Comment 17•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)
I investigated a bit more: two of the Firefox.exe processes are the background update agent, so I suppose that's normal?
I don't think that is normal. Nick?
Comment 18•3 months ago
|
||
(In reply to Dave Townsend [:mossop] from comment #17)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #13)
I investigated a bit more: two of the Firefox.exe processes are the background update agent, so I suppose that's normal?
I don't think that is normal. Nick?
Generally, that's not normal; I'd love to get some state on those tasks. There is a window in which background update tasks take the profile lock on the default profile, but it should be short (absent bad things happening) -- see https://searchfox.org/mozilla-central/rev/dfaf02d68a7cb018b6cad7e189f450352e2cde04/toolkit/mozapps/update/BackgroundTask_backgroundupdate.sys.mjs#253-281.
Oh -- looking at those tasks, I see the defaultagent
-- i.e., the WDBA -- and not the background update task. That's possibly even stranger, because the default agent doesn't interact with browsing profiles at all. And it really should never be running twice concurrently.
nrishel: any thoughts?
Comment 19•3 months ago
|
||
Is it possible to reproduce the apparent(?) default agent hang by manually re-running it in the Task Scheduler?
At a glance I would be surprised if the default agent-specific code is the issue here, but if there's some startup resource that isn't being freed then perhaps there's an issue with it's profile? I'm not intimately familiar with those systems so this is just supposition.
Reporter | ||
Comment 20•3 months ago
|
||
This is a screenshot of task manager. This machine has Nightly and Release installed. Are there entries here that are not supposed to be there?
Reporter | ||
Updated•3 months ago
|
Reporter | ||
Comment 21•3 months ago
|
||
If I interpret this correctly, this WBDA task invokes another firefox.exe with the same arguments (is this the launcher process or smth), which then somewhere along the way spawns a socket process that hangs. This hung socket process is never terminated and holds the profile lock alive? Is this a reasonable theory?
Reporter | ||
Comment 22•3 months ago
•
|
||
Reporter | ||
Comment 23•3 months ago
|
||
Stack of the second firefox.exe in the call tree.
Reporter | ||
Comment 24•3 months ago
|
||
spawns a socket process that hangs
Or maybe it's the main process that is just not doing anything, and the socket process is just something that accidentally gets spawned by default on startup?
Comment 25•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #23)
Created attachment 9440473 [details]
stackof2.pngStack of the second firefox.exe in the call tree.
That seems to be stuck trying to show a modal window stemming from here suggesting that there was some error reading the proxy autoconfig file. That is indeed in the block that the startup lock is held alive for so that is probably the culprit.
Reporter | ||
Comment 26•3 months ago
|
||
If I understand the code and bugs correctly, that code path shouldn't even be hit in release, only on ESR, which could explain why Firefox itself doesn't pop up the error. The warning in the sandboxed path points to a section of our website where I couldn't find the relevant explanation.
I'm not sure if the modal dialog is expected to work in case it's being invoked from Task Scheduler.
Is there some autoconfig file that is messing up things? I'll investigate this part.
Reporter | ||
Comment 27•3 months ago
|
||
I'm not sure what to look for. Network settings appear the same (use system proxy settings) between Nightly and the Release build.
Comment 28•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #26)
If I understand the code and bugs correctly, that code path shouldn't even be hit in release, only on ESR, which could explain why Firefox itself doesn't pop up the error. The warning in the sandboxed path points to a section of our website where I couldn't find the relevant explanation.
Looks like it would be hit on anything that isn't "release" or "beta" so that would mean nightly or esr at least.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #27)
I'm not sure what to look for. Network settings appear the same (use system proxy settings) between Nightly and the Release build.
We only hit this codepath if there is a value for the preference general.config.filename
. But the default agent uses its own profile so presumably the pref would be set there. The background task profiles are stored in AppData\Roaming\Mozilla\Firefox\Background Tasks Profiles
.
Though I think it is also possible to force the use of an autoconfig file via an enterprise policy: https://mozilla.github.io/policy-templates/#proxy
Reporter | ||
Comment 29•3 months ago
|
||
Though I think it is also possible to force the use of an autoconfig file via an enterprise policy: https://mozilla.github.io/policy-templates/#proxy
Can I check this somehow, i.e. whether this is active? I assume I want to find this config file and understand why we're breaking on it (which then causes the hang because we can't pop up the error).
Reporter | ||
Comment 30•3 months ago
|
||
There's a Telemetry.FailedProfileLocks.txt
file in those profiles, but I guess that's not unexpected at this point.
Comment 31•3 months ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #29)
Though I think it is also possible to force the use of an autoconfig file via an enterprise policy: https://mozilla.github.io/policy-templates/#proxy
Can I check this somehow, i.e. whether this is active?
Mike can answer this better than me
Comment 32•2 months ago
|
||
You can go to about:policies and see if one is active.
Comment 33•1 months ago
|
||
The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Impact on browser: Causes noticeable startup delay
Configuration: Rare
Moving to Fx:General, not sure which component to pick here.
Updated•1 month ago
|
Comment 34•1 month ago
|
||
The severity field is not set for this bug.
:mpohle, could you have a look please?
For more information, please visit BugBot documentation.
Comment 35•29 days ago
|
||
Hi Nicholas! With the mirrored preferences to the background task: Are these preferences stored in a file which could be used to debug mismatches in general.config.filename
(see comment 28) between Firefox and its background task?
Comment 36•9 days ago
•
|
||
Not likely. IIRC mirrored prefs for WDBA are a small set of telemetry-related prefs mirrored to the Windows registry.
Updated•7 days ago
|
Updated•7 days ago
|
Updated•7 days ago
|
Description
•