Crash in [@ shutdownhang | mozilla::ipc::GeckoChildProcessHost::WaitUntilConnected]
Categories
(Core Graveyard :: Plug-ins, defect, P2)
Tracking
(firefox-esr60 unaffected, firefox-esr68 fixed, firefox67 wontfix, firefox68+ wontfix, firefox69+ fixed, firefox70+ fixed)
People
(Reporter: skywalker333, Assigned: baku)
References
Details
(4 keywords)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
This bug is for crash report bp-fb87e717-efa3-44c1-961c-8cb480190402.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForKeyedEvent
1 ntdll.dll RtlSleepConditionVariableSRW
2 kernel32.dll SleepConditionVariableSRW
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait_for mozglue/misc/ConditionVariable_windows.cpp:78
4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:70
5 xul.dll mozilla::ipc::GeckoChildProcessHost::WaitUntilConnected ipc/glue/GeckoChildProcessHost.cpp:390
6 xul.dll mozilla::plugins::PluginProcessParent::WaitUntilConnected dom/plugins/ipc/PluginProcessParent.cpp:129
7 xul.dll mozilla::plugins::PluginModuleChromeParent::LoadModule dom/plugins/ipc/PluginModuleParent.cpp:487
8 xul.dll GetNewPluginLibrary dom/plugins/base/nsNPAPIPlugin.cpp:214
9 xul.dll nsNPAPIPlugin::CreatePlugin dom/plugins/base/nsNPAPIPlugin.cpp:230
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
The priority flag is not set for this bug.
:jimm, could you have a look please?
Updated•5 years ago
|
Comment 2•5 years ago
|
||
sorry for hijacking the bug - this signature is regressing in volume in firefox 68 (since nightly build 20190408193006) and is currently accounting for 1.5% of browser crashes in 68.0b; i'm going to mark it accordingly.
Comment 4•5 years ago
|
||
Seems plugins are having issues getting launched.
Jed, could you check to see if this correlates with any of the async or other ipc process work? This could also be other things, like a flash change.
Comment 5•5 years ago
•
|
||
Using the crash graph widget to filter by release channel, we're looking for something that landed on Nightly around 2019-04-08 and merged into Beta with 68, and would affect NPAPI plugins. Bug 1541130 would qualify; it's also the last thing that landed in ipc/
before build 20190409095652, which is the first Nightly build where we see this. But it's a relatively simple refactoring, so I don't see how it would have caused this. Also,
The other thing is: the IPC I/O thread and IPC Launch thread are idle, which means that the process has been launched and we're waiting for it to connect to the named pipe (this seems to be Windows-only) for the initial channel. If it freezes or crashes early in startup, we'll wait for the entire timeout.
Looking at other “core” modules, bug 1541792 also landed at that time and made nontrivial changes to XPCOM component registration. It's also the only other thing I can see that's in build 20190409095652
but not its predecessor 20190408193006
(in Hg, 98b223de0543 % 395a65d512e3
) that doesn't look incapable of causing this kind of problem.
However: the plugin launch timeout defaults to 45 seconds, but the shutdown hang timeout defaults to 60 seconds. So this shouldn't be causing a shutdown hang in any case… but it is.
Reporter | ||
Comment 6•5 years ago
|
||
Top Crashers for Firefox 69.0a1 (Nightly) - 7 days ago
#5 1.05% 0.43% shutdownhang | mozilla::ipc::GeckoChildProcessHost::WaitUntilConnected 72 72 0 0 69 0 2017-09-27
Signature report for shutdownhang | mozilla::ipc::GeckoChildProcessHost::WaitUntilConnected
Showing results from 7 days ago
Operating System
Windows 10 474 82.0%
Windows 7 58 10.0%
Windows 8.1 29 5.0%
Windows Vista 15 2.6%
Windows 8 2 0.3%
Product
Firefox 69.0a1 72 12.5% 66
Firefox 68.0b9 32 5.5% 37
Firefox 68.0b8 147 25.4% 99
Firefox 68.0b7 119 20.6% 99
Firefox 68.0b6 29 5.0% 18
Firefox 68.0b5 12 2.1% 14
Firefox 68.0b4 3 0.5% 3
Firefox 68.0b3 1 0.2% 1
Firefox 68.0b1 1 0.2% 1
Firefox 67.0b3 1 0.2% 1
Firefox 67.0.2 7 1.2% 8
Firefox 67.0.1 27 4.7% 38
Firefox 67.0 8 1.4% 6
Firefox 66.0.5 6 1.0% 6
Firefox 66.0.2 2 0.3% 2
Firefox 62.0.2 1 0.2% 1
Firefox 62.0 1 0.2% 1
Firefox 60.7.0esr 22 3.8% 16
Firefox 60.6.3esr 7 1.2% 8
Firefox 60.6.2esr 1 0.2% 1
Firefox 60.6.1esr 2 0.3% 2
Firefox 60.5.2esr 48 8.3% 46
Firefox 60.4.0esr 2 0.3% 2
Firefox 60.2.2esr 1 0.2% 1
Firefox 60.2.0esr 1 0.2% 1
Firefox 60.0 1 0.2% 1
Firefox 58.0.2 1 0.2% 1
Firefox 52.9.0esr 18 3.1% 15
Firefox 52.6.0esr 1 0.2% 1
Firefox 52.5.0esr 1 0.2% 1
Firefox 52.3.0esr 3 0.5% 3
Architecture
amd64 399 69.0%
x86 179 31.0%
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Hrm, I got confused. Per comment 2 and following this is tracking a regression in 68.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Discussed during triage. We are averaging around 200 crashes per beta build (and this is barely visible on 67 release) - Jed is there anything else we can do here to investigate or improve the crash situation before we ship 68?
Comment 9•5 years ago
|
||
I can't see anything useful that I didn't already mention in comment #5. There's something going on here that's specific to NPAPI plugins, though, so maybe :jimm can take another look.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
Still digging, but this might be a flash bug. They released a new rev on 4/9/2019.
Comment 15•5 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #14)
Still digging, but this might be a flash bug. They released a new rev on
4/9/2019.
Hmm, so may be flash related since the start of the crash ties back to a Flash release on 4/9/2019 for 32.0.0.172. However it doesn't impact 68, and impacts 69, so maybe some sort of incompatibility with changes we made. Which means we should be able to track this down on our end if we can figure out how.
Comment 16•5 years ago
|
||
FYI to Chris on this, I wonder if Adobe might have some input.
Comment 17•5 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #15)
(In reply to Jim Mathies [:jimm] from comment #14)
Still digging, but this might be a flash bug. They released a new rev on
4/9/2019.Hmm, so may be flash related since the start of the crash ties back to a
Flash release on 4/9/2019 for 32.0.0.172. However it doesn't impact 68, and
impacts 69, so maybe some sort of incompatibility with changes we made.
Which means we should be able to track this down on our end if we can figure
out how.
correction, Impacts 68, and not 67.
Comment 18•5 years ago
|
||
Hi Jeromie, we see a recent increase in Flash shutdown hangs on Windows. Firefox is waiting for the Flash plugin process to shut down and gives up (force crash) after a minute. The shutdown hangs might be related to some change in Firefox 68 (which will be released on 2019-07-09 next week).
Can an Adobe engineer try testing with Firefox 68 (Beta) for any shutdown problems apparent from the Flash plugin process's side? Thanks. Unfortunately, we don't have steps to reproduce the shutdown hangs, just the crash reports.
Comment 19•5 years ago
|
||
This is Adobe's annual shutdown week. We won't be back in the office until Monday.
What you're describing sounds like the majority of the latent crashes that we see in Firefox. Without minidumps, I don't hold put a lot of hope for resolving this. We're always happy to arrange to send someone to your office to collaboratively debug, and IMHO, that's the only good path forward. We're also always happy to share symbols (and potentially relevant source) under the existing NDA, etc.
Firefox is a first class citizen in our testing, so it's already getting the full battery of tests. These issues definitely persist, but infrequently that they're rare in lab conditions. That's always been one of the gating factors for progress on this front, and was one of the big initial drivers for establishing a public beta program years ago.
Comment 20•5 years ago
|
||
(In reply to Jeromie Clark from comment #19)
What you're describing sounds like the majority of the latent crashes that we see in Firefox. Without minidumps, I don't hold put a lot of hope for resolving this. We're always happy to arrange to send someone to your office to collaboratively debug, and IMHO, that's the only good path forward. We're also always happy to share symbols (and potentially relevant source) under the existing NDA, etc.
Thanks. Unfortunately, we don't have steps to reproduce. We just have Firefox hang (crash) reports with a stack trace pointing to the NPAPI loading code in Firefox.
Without minidumps, I don't hold put a lot of hope for resolving this.
@ Jim:
-
Can we get minidumps from the Socorro crash reports? Would they include the plugin processes?
-
Also, this bug is labeled a "shutdownhang", but the stack trace in comment 0 points to
nsNPAPIPlugin::CreatePlugin
. So the hang appears to be happening during plugin instantiation, not shutdown?
Comment 21•5 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #20)
(In reply to Jeromie Clark from comment #19)
What you're describing sounds like the majority of the latent crashes that we see in Firefox. Without minidumps, I don't hold put a lot of hope for resolving this. We're always happy to arrange to send someone to your office to collaboratively debug, and IMHO, that's the only good path forward. We're also always happy to share symbols (and potentially relevant source) under the existing NDA, etc.
Thanks. Unfortunately, we don't have steps to reproduce. We just have
Firefox hang (crash) reports with a stack trace pointing to the NPAPI
loading code in Firefox.Without minidumps, I don't hold put a lot of hope for resolving this.
@ Jim:
- Can we get minidumps from the Socorro crash reports? Would they include
the plugin processes?
I'll look, but I doubt it since it's a browser side hang.
- Also, this bug is labeled a "shutdownhang", but the stack trace in
comment 0 points tonsNPAPIPlugin::CreatePlugin
. So the hang appears to be
happening during plugin instantiation, not shutdown?
The plugin fails to connect, then some time later the browser shuts down and we get a hang on the plugin startup.
Comment 22•5 years ago
|
||
Cristian - Since this is fairly prevalent in 69 beta, can you or someone in SV try to reproduce this issue using Win 10 - (84.93% in signature vs 30.76% overall) platform_pretty_version = Windows 10.
Comment 23•5 years ago
|
||
Hello Marcia,
I tried several builds (Fx 70.0a1, 69.0a1, 69.0b3, 68.0a1 and 68.0b12) with the links and details from the comment section, but to no avail.
Some of the crashes have been reported on about:home/on stand-by, and others while trying to shut down the PC, none of the options reproduced this crash on my station.
Comment 24•5 years ago
|
||
After the OOM signature, this is the top overall crash on 68 release.
Comment 25•5 years ago
|
||
Tracking for 68 since this is such a high volume crash.
Comment 26•5 years ago
|
||
Since this is a hang in the plugin process, we don't have plugin version or a dump from the pluign process. Gabriel, wondering if you might have some ideas on how we could pull more information here, maybe through changes to the way we catch this.
Comment 27•5 years ago
|
||
67 is impacted but to a lesser extent. Really feels like something we landed since we didn't see a jump in 67 (hence, not a flash issue).
Comment 28•5 years ago
•
|
||
could bug 1532948 play a part here - it landed in the timeframe where the volume of this began to rise in nightly and the patch would tick the boxes shutdown and plugins?
Comment 29•5 years ago
|
||
(In reply to [:philipp] from comment #28)
could bug 1532948 play a part here - it landed in the timeframe where the volume of this began to rise in nightly and the patch would tick the boxes shutdown and plugins?
Oh that's interesting. Clearing plugin data maybe but the plugin process isn't running, so we try to fire it up in shutdown?
Updated•5 years ago
|
Comment 30•5 years ago
|
||
Almost 10K in crashes on 68 release and this bug is very evident in 69 beta so far. Since we are already at 69.0b6, can we get answers to the needinfos? Thanks in advance.
Updated•5 years ago
|
Comment 31•5 years ago
|
||
I've glanced at some of the minidumps and there's an important bit that must be taken into account: we're crashing in a very specific spot of the shutdown terminator that gets hit only if shutdown commenced but never made any progress.
While it's true that the main thread is stuck waiting for the Flash plugin to come up in nsNPAPIPlugin::CreatePlugin()
what's a few frames up in the stack is more interesting. We're attempting to launch the Flash plugin in nsPluginHost::ClearSiteData()
. To my knowledge that method is only called from ClearDataService.jsm
as part of the cleaner service.
So what seems to be happening is that we're trying to clear plugin data during shutdown (maybe in response to shutdown?) and to do so on the Flash plugin we need it to start in the first place. But it's not starting or it's taking too long. Since we're waiting for it to start on the main thread then shutdown cannot proceed (at all) and the terminator kicks in after a minute and kills Firefox.
ClearDataService.jsm
contains a comment that's rather illuminating WRT this issue:
As evidenced in bug 1253204, clearing plugin data can sometimes be
very, very long, for mysterious reasons. Unfortunately, this is not
something actionable by Mozilla, so crashing here serves no purpose.
For this reason, instead of waiting for sanitization to always
complete, we introduce a soft timeout. Once this timeout has
elapsed, we proceed with the shutdown of Firefox.
So it seems we were aware of this problem and we put a workaround in place but either it's not working or the assumptions it was making are not valid anymore.
Assignee | ||
Comment 32•5 years ago
|
||
I'm not familiar with the nsIPluginHost interface. Does it make sense to clean up data only if the plugin is active?
Or maybe just delete data if nsIPluginHost::siteHasData() returns true is enough.
Comment 33•5 years ago
|
||
I don't know enough about how plugins work to be sure of that. We can certainly try and deploy a fix that only cleans the data if the plugin is running and see if it helps. After all, if we crash we're not cleaning the data either.
Assignee | ||
Comment 34•5 years ago
|
||
Comment 35•5 years ago
|
||
While reviewing your patch I noticed that nsIPluginHost::SiteHasData()
also tries to load the plugin if it's not running yet. So it's likely to get stuck in the same place during shutdown.
Would it be A very bad thing™ to clear the data only if the plugin has already been loaded during shutdown? I also wonder if this is only happening during shutdown or if users are also getting freezes at runtime.
Comment 36•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/225c129a3f3e Delete plugin-data only for the sites with data,r=gsvelto
Comment 37•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 38•5 years ago
|
||
the patch seems to have worked well so far on nightly. could you please request an uplift to beta if it applies there as well and you deem fit to do so?
thank you
Assignee | ||
Comment 39•5 years ago
|
||
Comment on attachment 9083030 [details]
Bug 1541344 - Delete plugin-data only for the sites with data,r?gsvelto
Beta/Release Uplift Approval Request
- User impact if declined: A crash can occur during the shutdown, when cleaning plugin data if the plugin process has not been executed yet.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch checks if the plugin has been activated before starting the data cleanup. The active/inactive state is a boolean check, added in ClearDataService.jsm. Low risk.
- String changes made/needed:
Comment 40•5 years ago
|
||
Comment on attachment 9083030 [details]
Bug 1541344 - Delete plugin-data only for the sites with data,r?gsvelto
Will hopefully help with one of our top crashes at the moment. Approved for 69.0b15.
Comment 41•5 years ago
|
||
bugherder uplift |
Updated•5 years ago
|
Comment 42•5 years ago
|
||
This is looking good on Nightly and Beta so far, should we uplift this to ESR68 also?
Assignee | ||
Comment 43•5 years ago
|
||
Comment on attachment 9083030 [details]
Bug 1541344 - Delete plugin-data only for the sites with data,r?gsvelto
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: This patch fixes a major crash in nightly/beta, just avoiding the cleanup of data for unloaded plugins.
- User impact if declined:
- Fix Landed on Version: 70
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): See the previous beta-approval comments
- String or UUID changes made by this patch: none
Comment 44•5 years ago
|
||
Comment on attachment 9083030 [details]
Bug 1541344 - Delete plugin-data only for the sites with data,r?gsvelto
Fixes a frequent shutdown crash. Approved for 68.1esr.
Comment 45•5 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Description
•