Crash in [@ shutdownhang | mozilla::DataStorage::WaitForReady | mozilla::DataStorage::GetAll]
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox76 | --- | unaffected |
firefox77 | --- | unaffected |
firefox78 | blocking | verified disabled |
firefox79 | - | wontfix |
People
(Reporter: calixte, Unassigned)
References
(Regression)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [psm-assigned])
Crash Data
Attachments
(2 files)
This bug is for crash report bp-11c5043b-7416-4bac-8a30-e6c9b0200601.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableSRW
2 kernelbase.dll SleepConditionVariableSRW
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::DataStorage::WaitForReady security/manager/ssl/DataStorage.cpp:734
5 xul.dll mozilla::DataStorage::GetAll security/manager/ssl/DataStorage.cpp:792
6 xul.dll static mozilla::DataStorage::GetAllChildProcessData security/manager/ssl/DataStorage.cpp:257
7 xul.dll mozilla::dom::ContentParent::InitInternal dom/ipc/ContentParent.cpp:2586
8 xul.dll mozilla::dom::ContentParent::LaunchSubprocessResolve dom/ipc/ContentParent.cpp:2298
9 xul.dll mozilla::dom::ContentParent::LaunchSubprocessAsync::<unnamed-tag>::operator const dom/ipc/ContentParent.cpp:2369
There are 30 crashes in nightly 78.
The moz_crash_reason is mainly MOZ_CRASH(Shutdown hanging before starting)
.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
There is a spike in 20200530211958, the pushlog for this build is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8aaca63ec5c6&tochange=548ffce7ad57
Comment 2•4 years ago
|
||
Seeing a number of crashes in 78.0b1 now, which just started rolling out.
Comment 4•4 years ago
|
||
This is the #5 overall topcrash on Nightly also.
Updated•4 years ago
|
Comment 5•4 years ago
|
||
This patch is known to change timings, and this can aggravate existing race conditions or expose existing bugs. (Of course it could have a bug of it's own, but from the history of dealing with oranges to land it I'd start with those possibilities).
There appears to have been a steady low incidence of this before landing. I'll look at possible causes. My apologies for the late-ish response; I've been without power for 3 days and just got it back.
Comment 6•4 years ago
|
||
I'm wondering if we can back the change out of beta to buy some time?
Comment 8•4 years ago
|
||
We can back it out. Julien, do you want to do it or should I? I landed one quick followup patch as well.
Warning, kmag landed (I assume) a patch against this code that will require rebasing if it did land (or back both out and re-land his pre-rebase patch I r+'d)
Comment 9•4 years ago
|
||
If you can take care of it that'd be great. Thanks!
Comment 10•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Backed landed on Beta for 78.0b5.
https://hg.mozilla.org/releases/mozilla-beta/rev/6aafb2261c55f300ba9289ad196d6d344686bd89
Comment 12•4 years ago
|
||
I suspect this may be due to us continuing to prestart processes during shutdown (until final-CC); the patch for bug 1642491 stops us from re-creating the Preallocator during shutdown once we destroy it. Moving back to clearing it on normal shutdown (instead of post-CC) may fix this.
This however merely would return us to the original low-intermittent state; I suspect this is fundamentally due to the async-launch landing - we're resolving an async launch from within shutdown for DataStorage, and resolving a process launch requires setting up DataStorage, so we effectively deadlock.
Yoric: what do you think?
Comment 13•4 years ago
|
||
Post-CC is no longer needed given the landing of bug 1642491
Comment 14•4 years ago
|
||
:jesup, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Comment 17•4 years ago
|
||
No crashes in 78.0b5, looks like the backout worked there.
Comment 18•4 years ago
|
||
Crashes seem to be continuing (not sure if frequency has changed) since the checkin; reopening
Updated•4 years ago
|
Comment 19•4 years ago
|
||
The severity field is not set for this bug.
:keeler, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Comment 20•4 years ago
|
||
Jesup: Is this issue going to be addressed for 79?
Comment 21•4 years ago
|
||
kmag landed a patch to this code a few days ago which might help this. There were crashes here without my code; my code appears to have just aggravated them. Likely this was the Async ProcessLaunch code, from the debugging of these I've done.
Since Fission isn't supposed to go beyond Nightly yet, we could back out of beta again for 79. However, I'll see if I can find a fix before that happens.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 22•4 years ago
|
||
Hi Randell, were we going to move forward with the backout patch for Beta79?
Comment 23•4 years ago
|
||
We were... but we're not seeing the spike in beta that we did in 77b. This seems to have gone down to around the level in 77b after we landed the backout in 77b5. I think if we don't see a spike we shouldn't back out.
Comment 24•4 years ago
|
||
Resetting the priority and severity given the change in frequency for 79+. This bug is still an issue, but not at the level it was for 78 when we landed the backout.
Comment 25•4 years ago
|
||
[Tracking Requested - why for this release]:
per ryan making it fix-optional moving tracking back to ?
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•3 years ago
|
Comment 26•3 years ago
|
||
These crashes haven't appeared in any releases since the 84 timeframe.
Description
•