Closed Bug 1587794 Opened 5 years ago Closed 5 years ago

Crash in [@ std::_Func_impl_no_alloc<T>::_Do_call]

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla71

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox-esr68

---

unaffected

firefox67

---

unaffected

firefox68

---

unaffected

firefox69

---

unaffected

firefox70

---

unaffected

firefox71

blocking

verified

People

(Reporter: pascalc, Assigned: asuth)

References

(Regression)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [rca - Coding Error])

Crash Data

Attachments

(2 files)

Bug 1587794 - Do not assume global exists. r=perry 5 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1587794 - Add a test that verifies we don't crash on a disconnected global. r=perry 5 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-phabricator-request		Details \| Review

Pascal Chevrel:pascalc

Reporter

Description

•

5 years ago

This bug is for crash report bp-045e7748-7d88-4c82-b52b-69f290191009.

Top 10 frames of crashing thread:

0 xul.dll void std::_Func_impl_no_alloc<`lambda at z:/task_1570615541/build/src/dom/serviceworkers/ServiceWorkerRegistration.cpp:241:7', void, const mozilla::dom::ServiceWorkerRegistrationDescriptor&>::_Do_call 
1 xul.dll void std::_Func_impl_no_alloc<`lambda at z:/task_1570615541/build/src/dom/serviceworkers/RemoteServiceWorkerRegistrationImpl.cpp:60:7', void, mozilla::dom::IPCServiceWorkerRegistrationDescriptorOrCopyableErrorResult&&>::_Do_call 
2 xul.dll mozilla::dom::PServiceWorkerRegistrationChild::OnMessageReceived ipc/ipdl/PServiceWorkerRegistrationChild.cpp:283
3 xul.dll mozilla::ipc::PBackgroundChild::OnMessageReceived ipc/ipdl/PBackgroundChild.cpp:5876
4 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2109
5 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1954
6 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1985
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
8 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110

This signature is exploding since build 20191009103354. Setting to a blocker for 71 and P1.

Hiroyuki Ikezoe (:hiro)

Comment 1

•

5 years ago

Moving to DOM:Service Workers.

Component: DOM: Animation → DOM: Service Workers

Pascal Chevrel:pascalc

Reporter

Updated

•

5 years ago

status-firefox69: --- → affected

status-firefox70: --- → affected

status-firefox71: --- → affected

status-firefox-esr60: --- → affected

tracking-firefox71: --- → blocking

Jan Varga [:janv]

Updated

•

5 years ago

Priority: -- → P1

Marcia Knous [:marcia]

Comment 2

•

5 years ago

Some correlations:

(64.29% in signature vs 10.59% overall) Module "imagehlp.dll" = true [83.33% vs 18.97% if platform_version = 10.0.18362]
(42.86% in signature vs 00.02% overall) moz_crash_reason = MOZ_DIAGNOSTIC_ASSERT(bc->GetOpenerId() == parentBC)
(42.86% in signature vs 02.89% overall) Module "winsta.dll" = true
(35.71% in signature vs 00.02% overall) moz_crash_reason = MOZ_DIAGNOSTIC_ASSERT(target->GetInFlightProcessId() == inFlightProcessId)

This crash was averaging 200-300 per crashes per nightly build, which is very high and in total has eclipsed 2600 crashes. In the last few days it has tapered down. Since we are coming up on the merge next week, can someone please take a look at this and try to figure out what is going on?

In terms of URLs, it does look as if based on the install time users were crashing consistently on these sites:

Flags: needinfo?(jstutte)

Marcia Knous [:marcia]

Comment 3

•

5 years ago

I guess this also might be related to Bug 1456995, since there are several other SW bugs that occurred after that landing.

Hiroyuki Ikezoe (:hiro)

Updated

•

5 years ago

OS: Windows 7 → Windows

Marcia Knous [:marcia]

Comment 4

•

5 years ago

Adding perry since he worked in the bug in Comment 3. This continues to average around 250 crashes per nightly build.

Flags: needinfo?(perry)

Pascal Chevrel:pascalc

Reporter

Comment 5

•

5 years ago

Andrew, could you help us find somebody to fix our recent #1 top crasher? Thanks!

Flags: needinfo?(overholt)

[:philipp]

Comment 6

•

5 years ago

this tab crash basically instantly reproducible for me in a new profile on windows when visiting https://www.sammobile.com/ and accepting their privacy terms (not sure if the popup is only shown in the EU).

running mozregression on this confirms the hunch from comment #3 that this regressed with bug 1456995:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=720c1e5a8dd3f5bda4ae32137d1c624a1ad55301&tochange=be9a6289486a6f366e431782b84a0c0633f8fec2

at this point 98% of crashes come with MOZ_DIAGNOSTIC_ASSERT(global) which originally got added in bug 1456466 and/or bug 1454646.

Keywords: regression

Regressed by: 1456995

Comment 7

•

5 years ago

This makes sense if the DOMEventTargetHelper has been disconnected from its owning global. It probably makes sense to turn this into an early return but I want to make sure the actor lifecycles are correct. The most likely explanation is a MozPromise adding an event loop turn.

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

Flags: needinfo?(perry)

Flags: needinfo?(overholt)

Flags: needinfo?(jstutte)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 9

•

5 years ago

It appears the scenario is this:

Originally we handled the global being null because of disconnection.
Bug 1456466 and patch P5 changed this to a diagnostic assert at https://hg.mozilla.org/mozilla-central/rev/89c938649297#l1.39 because we were introducing DOMMozPromiseRequestHolder which disconnects the promise when the binding gets disconnected from the global (via DETH). In that case, it should have been impossible to not have the global if the promise was resolved/rejected.
Bug 1466681 and its patch P5 removed the use of DOMMozPromiseRequestHolder at https://hg.mozilla.org/mozilla-central/rev/ba3ec8751abc#l1.12 in favor of direct callbacks because of event ordering problems when IPC is used.

(Perry also recognized this in the other signature variant of this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1588149#c1.)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Crash Signature: [@ std::_Func_impl_no_alloc<T>::_Do_call] → [@ std::_Func_impl_no_alloc<T>::_Do_call] [@ std::_Function_handler<T>::_M_invoke]

[:philipp]

Updated

•

5 years ago

Crash Signature: [@ std::_Func_impl_no_alloc<T>::_Do_call] [@ std::_Function_handler<T>::_M_invoke] → [@ std::_Func_impl_no_alloc<T>::_Do_call] [@ std::_Function_handler<T>::_M_invoke] [@ std::__1::__function::__func<T>::operator()]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 10

•

5 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f9abe48c066f026d96df5adb896c5cad26da482

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 11

•

5 years ago

Attached file Bug 1587794 - Do not assume global exists. r=perry — Details

This is effectively a reversion of the change made in
https://hg.mozilla.org/mozilla-central/rev/89c938649297#l1.39 when
DOMMozPromiseRequestHolder was introduced. I've tried to add some
comments to contextualize what's happening there and why it differs
from other similar callsites.

Longer term we might move to just deleting the underlying actor when
we are disconnected. Those actors were written assuming an
execution model where letting either end delete the actor would result
in intentional process crashes when a message was received for a
destroyed actor. That is no longer the case.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 12

•

5 years ago

Attached file Bug 1587794 - Add a test that verifies we don't crash on a disconnected global. r=perry — Details

Because this seems fairly Gecko-specific, I've put this in our mozilla-specific
WPT dir. This is as opposed to writing this as a mochitest.

If I run this test without the fix, the browser crashes. If I run it with the fix, we
pass.

I run the test via
./mach wpt testing/web-platform/mozilla/tests/service-workers/update_completes_in_disconnected_global.https.html

Note that currently because of https://bugzilla.mozilla.org/show_bug.cgi?id=1587463#c13
I had to comment out the lsan_dir lines I refer to there locally to get the
test to run.

Depends on D49671

Pulsebot

Comment 13

•

5 years ago

Pushed by bugmail@asutherland.org: https://hg.mozilla.org/integration/autoland/rev/1ab263e4b763 Do not assume global exists. r=perry https://hg.mozilla.org/integration/autoland/rev/9ed7fe36e723 Add a test that verifies we don't crash on a disconnected global. r=perry

Dorel Luca [:dluca]

Comment 14

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1ab263e4b763
https://hg.mozilla.org/mozilla-central/rev/9ed7fe36e723

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox71: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla71

Ryan VanderMeulen [:RyanVM]

Comment 15

•

5 years ago

This bug has a clear regressor in Fx71. Looking at crash reports from older releases, they appear to be unrelated to this specific issue.

status-firefox67: --- → unaffected

status-firefox68: --- → unaffected

status-firefox69: affected → unaffected

status-firefox70: affected → unaffected

status-firefox-esr60: affected → unaffected

status-firefox-esr68: --- → unaffected

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

5 years ago

Regressions: 1589699

Alexandru Trif, Desktop Test Engineering [:atrif]

Comment 16

•

5 years ago

Hello! Reproduced the issue using websites from comment 2 with Firefox 71.0a1 (20191010214019) on Windows 10x64.
The issue is verified fixed using Firefox 71.0a1 (20191020214712) on Windows 10x64, macOS 10.14 and Ubuntu 18.04. No tab crashes encountered while randomly navigating on the affected websites.

Status: RESOLVED → VERIFIED

status-firefox71: fixed → verified

OS: Windows → All

Hardware: x86 → Desktop

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Comment 17

•

5 years ago

This bug has been identified as part of a pilot on determining root causes of blocking and dot release drivers.

It needs a root-cause set for it. Please see the list at https://docs.google.com/document/d/1FFEGsmoU8T0N8R9kk-MXWptOPtXXXRRIe4vQo3_HgMw/.

Add the root cause as a whiteboard tag in the form [rca - <cause> ] and remove the rca-needed keyword.

If you have questions, please contact :tmaity.

Keywords: rca-needed

Jens Stutte [:jstutte]

Updated

•

5 years ago

Keywords: rca-needed

Whiteboard: [rca - Coding Error]

Julien Cristau [:jcristau]

Updated

•

4 years ago

Updated

•

3 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.