Crash in [@ std::_Func_impl_no_alloc<T>::_Do_call]
Categories
(Core :: DOM: Service Workers, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | unaffected |
firefox69 | --- | unaffected |
firefox70 | --- | unaffected |
firefox71 | blocking | verified |
People
(Reporter: pascalc, Assigned: asuth)
References
(Regression)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [rca - Coding Error])
Crash Data
Attachments
(2 files)
This bug is for crash report bp-045e7748-7d88-4c82-b52b-69f290191009.
Top 10 frames of crashing thread:
0 xul.dll void std::_Func_impl_no_alloc<`lambda at z:/task_1570615541/build/src/dom/serviceworkers/ServiceWorkerRegistration.cpp:241:7', void, const mozilla::dom::ServiceWorkerRegistrationDescriptor&>::_Do_call
1 xul.dll void std::_Func_impl_no_alloc<`lambda at z:/task_1570615541/build/src/dom/serviceworkers/RemoteServiceWorkerRegistrationImpl.cpp:60:7', void, mozilla::dom::IPCServiceWorkerRegistrationDescriptorOrCopyableErrorResult&&>::_Do_call
2 xul.dll mozilla::dom::PServiceWorkerRegistrationChild::OnMessageReceived ipc/ipdl/PServiceWorkerRegistrationChild.cpp:283
3 xul.dll mozilla::ipc::PBackgroundChild::OnMessageReceived ipc/ipdl/PBackgroundChild.cpp:5876
4 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2109
5 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1954
6 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1985
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110
This signature is exploding since build 20191009103354. Setting to a blocker for 71 and P1.
Comment 1•5 years ago
|
||
Moving to DOM:Service Workers.
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Some correlations:
(64.29% in signature vs 10.59% overall) Module "imagehlp.dll" = true [83.33% vs 18.97% if platform_version = 10.0.18362]
(42.86% in signature vs 00.02% overall) moz_crash_reason = MOZ_DIAGNOSTIC_ASSERT(bc->GetOpenerId() == parentBC)
(42.86% in signature vs 02.89% overall) Module "winsta.dll" = true
(35.71% in signature vs 00.02% overall) moz_crash_reason = MOZ_DIAGNOSTIC_ASSERT(target->GetInFlightProcessId() == inFlightProcessId)
This crash was averaging 200-300 per crashes per nightly build, which is very high and in total has eclipsed 2600 crashes. In the last few days it has tapered down. Since we are coming up on the merge next week, can someone please take a look at this and try to figure out what is going on?
In terms of URLs, it does look as if based on the install time users were crashing consistently on these sites:
Comment 3•5 years ago
|
||
I guess this also might be related to Bug 1456995, since there are several other SW bugs that occurred after that landing.
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Adding perry since he worked in the bug in Comment 3. This continues to average around 250 crashes per nightly build.
Reporter | ||
Comment 5•5 years ago
|
||
Andrew, could you help us find somebody to fix our recent #1 top crasher? Thanks!
Comment 6•5 years ago
|
||
this tab crash basically instantly reproducible for me in a new profile on windows when visiting https://www.sammobile.com/ and accepting their privacy terms (not sure if the popup is only shown in the EU).
running mozregression on this confirms the hunch from comment #3 that this regressed with bug 1456995:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=720c1e5a8dd3f5bda4ae32137d1c624a1ad55301&tochange=be9a6289486a6f366e431782b84a0c0633f8fec2
at this point 98% of crashes come with MOZ_DIAGNOSTIC_ASSERT(global)
which originally got added in bug 1456466 and/or bug 1454646.
Assignee | ||
Comment 7•5 years ago
|
||
This makes sense if the DOMEventTargetHelper has been disconnected from its owning global. It probably makes sense to turn this into an early return but I want to make sure the actor lifecycles are correct. The most likely explanation is a MozPromise adding an event loop turn.
Assignee | ||
Comment 9•5 years ago
|
||
It appears the scenario is this:
- Originally we handled the global being null because of disconnection.
- Bug 1456466 and patch P5 changed this to a diagnostic assert at https://hg.mozilla.org/mozilla-central/rev/89c938649297#l1.39 because we were introducing DOMMozPromiseRequestHolder which disconnects the promise when the binding gets disconnected from the global (via DETH). In that case, it should have been impossible to not have the global if the promise was resolved/rejected.
- Bug 1466681 and its patch P5 removed the use of DOMMozPromiseRequestHolder at https://hg.mozilla.org/mozilla-central/rev/ba3ec8751abc#l1.12 in favor of direct callbacks because of event ordering problems when IPC is used.
(Perry also recognized this in the other signature variant of this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1588149#c1.)
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 10•5 years ago
|
||
Assignee | ||
Comment 11•5 years ago
|
||
This is effectively a reversion of the change made in
https://hg.mozilla.org/mozilla-central/rev/89c938649297#l1.39 when
DOMMozPromiseRequestHolder was introduced. I've tried to add some
comments to contextualize what's happening there and why it differs
from other similar callsites.
Longer term we might move to just deleting the underlying actor when
we are disconnected. Those actors were written assuming an
execution model where letting either end delete the actor would result
in intentional process crashes when a message was received for a
destroyed actor. That is no longer the case.
Assignee | ||
Comment 12•5 years ago
|
||
Because this seems fairly Gecko-specific, I've put this in our mozilla-specific
WPT dir. This is as opposed to writing this as a mochitest.
If I run this test without the fix, the browser crashes. If I run it with the fix, we
pass.
I run the test via
./mach wpt testing/web-platform/mozilla/tests/service-workers/update_completes_in_disconnected_global.https.html
Note that currently because of https://bugzilla.mozilla.org/show_bug.cgi?id=1587463#c13
I had to comment out the lsan_dir lines I refer to there locally to get the
test to run.
Depends on D49671
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1ab263e4b763
https://hg.mozilla.org/mozilla-central/rev/9ed7fe36e723
Comment 15•5 years ago
|
||
This bug has a clear regressor in Fx71. Looking at crash reports from older releases, they appear to be unrelated to this specific issue.
Comment 16•5 years ago
|
||
Hello! Reproduced the issue using websites from comment 2 with Firefox 71.0a1 (20191010214019) on Windows 10x64.
The issue is verified fixed using Firefox 71.0a1 (20191020214712) on Windows 10x64, macOS 10.14 and Ubuntu 18.04. No tab crashes encountered while randomly navigating on the affected websites.
Comment 17•5 years ago
|
||
This bug has been identified as part of a pilot on determining root causes of blocking and dot release drivers.
It needs a root-cause set for it. Please see the list at https://docs.google.com/document/d/1FFEGsmoU8T0N8R9kk-MXWptOPtXXXRRIe4vQo3_HgMw/.
Add the root cause as a whiteboard
tag in the form [rca - <cause> ]
and remove the rca-needed
keyword.
If you have questions, please contact :tmaity.
Updated•5 years ago
|
Updated•3 years ago
|
Description
•