Closed Bug 1642290 Opened 4 years ago Closed 3 years ago

Crash in [@ shutdownhang | mozilla::DataStorage::WaitForReady | mozilla::DataStorage::GetAll]

Categories

(Core :: Security: PSM, defect, P1)

78 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- unaffected
firefox77 --- unaffected
firefox78 blocking verified disabled
firefox79 - wontfix

People

(Reporter: calixte, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [psm-assigned])

Crash Data

Attachments

(2 files)

This bug is for crash report bp-11c5043b-7416-4bac-8a30-e6c9b0200601.

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:50
4 xul.dll mozilla::DataStorage::WaitForReady security/manager/ssl/DataStorage.cpp:734
5 xul.dll mozilla::DataStorage::GetAll security/manager/ssl/DataStorage.cpp:792
6 xul.dll static mozilla::DataStorage::GetAllChildProcessData security/manager/ssl/DataStorage.cpp:257
7 xul.dll mozilla::dom::ContentParent::InitInternal dom/ipc/ContentParent.cpp:2586
8 xul.dll mozilla::dom::ContentParent::LaunchSubprocessResolve dom/ipc/ContentParent.cpp:2298
9 xul.dll mozilla::dom::ContentParent::LaunchSubprocessAsync::<unnamed-tag>::operator const dom/ipc/ContentParent.cpp:2369

There are 30 crashes in nightly 78.
The moz_crash_reason is mainly MOZ_CRASH(Shutdown hanging before starting).

There is a spike in 20200530211958, the pushlog for this build is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8aaca63ec5c6&tochange=548ffce7ad57

Seeing a number of crashes in 78.0b1 now, which just started rolling out.

Randell, could bug 1602757 explain this somehow?

Flags: needinfo?(rjesup)

This is the #5 overall topcrash on Nightly also.

This patch is known to change timings, and this can aggravate existing race conditions or expose existing bugs. (Of course it could have a bug of it's own, but from the history of dealing with oranges to land it I'd start with those possibilities).

There appears to have been a steady low incidence of this before landing. I'll look at possible causes. My apologies for the late-ish response; I've been without power for 3 days and just got it back.

I'm wondering if we can back the change out of beta to buy some time?

We can back it out. Julien, do you want to do it or should I? I landed one quick followup patch as well.

Warning, kmag landed (I assume) a patch against this code that will require rebasing if it did land (or back both out and re-land his pre-rebase patch I r+'d)

Flags: needinfo?(rjesup) → needinfo?(jcristau)

If you can take care of it that'd be great. Thanks!

Flags: needinfo?(jcristau)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED

I suspect this may be due to us continuing to prestart processes during shutdown (until final-CC); the patch for bug 1642491 stops us from re-creating the Preallocator during shutdown once we destroy it. Moving back to clearing it on normal shutdown (instead of post-CC) may fix this.

This however merely would return us to the original low-intermittent state; I suspect this is fundamentally due to the async-launch landing - we're resolving an async launch from within shutdown for DataStorage, and resolving a process launch requires setting up DataStorage, so we effectively deadlock.

Yoric: what do you think?

Blocks: 1602757
Flags: needinfo?(dteller)

:jesup, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)
Flags: needinfo?(rjesup)
Regressed by: 1602757
Has Regression Range: --- → yes
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/autoland/rev/3a6ed2262ba4 stop the process preallocator during normal shutdown, not post-CC r=nika
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

No crashes in 78.0b5, looks like the backout worked there.

Crashes seem to be continuing (not sure if frequency has changed) since the checkin; reopening

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Flags: needinfo?(dteller)

The severity field is not set for this bug.
:keeler, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)
Severity: -- → S3
Flags: needinfo?(dkeeler)
Priority: -- → P1
Whiteboard: [psm-assigned]

Jesup: Is this issue going to be addressed for 79?

Flags: needinfo?(rjesup)

kmag landed a patch to this code a few days ago which might help this. There were crashes here without my code; my code appears to have just aggravated them. Likely this was the Async ProcessLaunch code, from the debugging of these I've done.

Since Fission isn't supposed to go beyond Nightly yet, we could back out of beta again for 79. However, I'll see if I can find a fix before that happens.

Flags: needinfo?(rjesup)
Status: REOPENED → NEW
Target Milestone: mozilla79 → ---
Severity: S3 → S1
Status: NEW → ASSIGNED
Fission Milestone: --- → M6a
Fission Milestone: M6a → ---

Hi Randell, were we going to move forward with the backout patch for Beta79?

Flags: needinfo?(rjesup)

We were... but we're not seeing the spike in beta that we did in 77b. This seems to have gone down to around the level in 77b after we landed the backout in 77b5. I think if we don't see a spike we shouldn't back out.

Flags: needinfo?(rjesup)

Resetting the priority and severity given the change in frequency for 79+. This bug is still an issue, but not at the level it was for 78 when we landed the backout.

Severity: S1 → --
Priority: P1 → --

[Tracking Requested - why for this release]:
per ryan making it fix-optional moving tracking back to ?

Priority: -- → P1
Severity: -- → S3
Crash Signature: [@ shutdownhang | mozilla::DataStorage::WaitForReady | mozilla::DataStorage::GetAll] → [@ shutdownhang | mozilla::DataStorage::GetAll] [@ shutdownhang | mozilla::DataStorage::WaitForReady | mozilla::DataStorage::GetAll]
Assignee: rjesup → nobody
Status: ASSIGNED → NEW

These crashes haven't appeared in any releases since the 84 timeframe.

Status: NEW → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: