Closed Bug 1745043 Opened 2 years ago Closed 2 years ago

Shutdown Crash in [@ mozilla::dom::serviceWorkerScriptCache::(anonymous namespace)::CompareNetwork::Abort]

Categories

(Core :: DOM: Service Workers, defect)

defect

Tracking

()

RESOLVED FIXED
97 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox95 --- unaffected
firefox96 + fixed
firefox97 + fixed

People

(Reporter: aryx, Assigned: valentin)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

New crash in Firefox 97.0a1 20211208095339, 4 crashes from 4 different installations. One crash reporter mentioned this hit during shutdown. Did this hit the light of day after bug 1738984 got resolved?

ShutdownProgress profile-before-change
XPCOMSpinEventLoopStack default: nsThread::Shutdown: URL Classifier

Crash report: https://crash-stats.mozilla.org/report/index/2dddfc15-ded7-4d47-8383-3340c0211208

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::serviceWorkerScriptCache::`anonymous namespace'::CompareNetwork::Abort dom/serviceworkers/ServiceWorkerScriptCache.cpp:803
1 xul.dll mozilla::dom::serviceWorkerScriptCache::`anonymous namespace'::CompareManager::Cleanup dom/serviceworkers/ServiceWorkerScriptCache.cpp:1419
2 xul.dll mozilla::dom::serviceWorkerScriptCache::`anonymous namespace'::CompareManager::ResolvedCallback dom/serviceworkers/ServiceWorkerScriptCache.cpp:1361
3 xul.dll mozilla::dom::`anonymous namespace'::PromiseNativeHandlerShim::ResolvedCallback dom/promise/Promise.cpp:385
4 xul.dll mozilla::dom::NativeHandlerCallback dom/promise/Promise.cpp:338
5 xul.dll js::Call js/src/vm/Interpreter.cpp:552
6 xul.dll PromiseReactionJob js/src/builtin/Promise.cpp:2211
7 xul.dll js::Call js/src/vm/Interpreter.cpp:552
8 xul.dll JS::Call js/src/vm/CallAndConstruct.cpp:117
9 xul.dll mozilla::PromiseJobRunnable::Run xpcom/base/CycleCollectedJSContext.cpp:213

I hit this crash when shutting down the browser. For what it is worth, I was shutting down my browser in preparation to restart my computer to update to OSX Monterey.

[Tracking Requested - why for this release]: #3 top crash on 96.0b3, and a clear regression in 96.0b3.

It looks like this is happening on this line in CompareNetwork::Abort:
mChannel->Cancel(NS_BINDING_ABORTED);

It looks like this also started happening on 96.0b3. If I'm interpreting the log correctly, suspicious bugs that landed in b3 are bug 1650214 (which involves workers), bug 1744473 (which involves Stencil, so maybe it is related to script caching? But it just adds some null checks), bug 1544127 (which involves nulling out some fields on a LoadInfo, so kind of networking related?), as well as bug 1738984 (mentioned by Aryx: maybe we fail to create a channel now so mChannel is null?).

On beta, these all have a shutdown progress value, so Aryx's guess sounds plausible. Valentin, could you please look at this possible regression? Or failing that, maybe Jens can find somebody to look into this. Thanks.

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(jstutte)
Keywords: regression

Here is the regression window for the build Aryx mentioned in comment 0.

Bug 1738984 and bug 1544127 look like they are the only patches that landed in both 96.0b3 and Nightly 97.0a1 20211208095339.

Bug 1738984 does make it possible for channel creation to fail.
From what I can tell, the initialization in ServiceWorkerScriptCache does check the error code, but the failure could be happening somewhere else.
The patch that landed for bug 1544127 shouldn't cause any behaviour change.

This makes it increasinly likely that Bug 1738984 is the cause. We could try to backout just attachment 9254073 [details], and see if that fixes this crash.

Flags: needinfo?(valentin.gosu)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)

Bug 1738984 does make it possible for channel creation to fail.
This makes it increasinly likely that Bug 1738984 is the cause. We could try to backout just attachment 9254073 [details], and see if that fixes this crash.

Yes, I think so, too. I see the CompareManager::Cleanup() on the stack, which is called by some guards like the one failing here which makes me think that those cleanup functions should be more resilient against uninitialized variables than just assert?

Flags: needinfo?(jstutte) → needinfo?(valentin.gosu)
Has Regression Range: --- → yes

(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)

We could try to backout just attachment 9254073 [details], and see if that fixes this crash.

Backed out from m-c for the next round of nightlies starting in a couple hours. We'll decide what to do for 96.0b5 depending on how things look around this time tomorrow.
https://hg.mozilla.org/mozilla-central/rev/a2e05f530d49fc02680eafeabaaf8acbde6c4d7b

See Also: → 1745925

The Beta backout went into 96.0b5 and we haven't seen any reports of this crash since. We can track fixing this crash in bug 1745925 as part of re-landing the reverted change.

Assignee: nobody → valentin.gosu
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(valentin.gosu)
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch
You need to log in before you can comment on or make changes to this bug.