FatalError Crashes under mozilla::dom::PURLClassifierParent::Send__delete__

RESOLVED FIXED in Firefox 53

Status

()

--
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mccr8, Assigned: Ehsan)

Tracking

({crash, regression})

Trunk
mozilla53
Unspecified
Windows 10
crash, regression
Points:
---

Firefox Tracking Flags

(firefox50 unaffected, firefox51 unaffected, firefox52 unaffected, firefox53+ fixed)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-da4a0abe-1807-426e-bc63-d517b2161212.
=============================================================

Crashes with this signature persist after bug 1322204 landed, so it must be something else.

The top frames for these crashes all look like:
mozilla::ipc::FatalError
mozilla::dom::PContentChild::Write
mozilla::dom::PURLClassifierParent::Send__delete__
mozilla::dom::URLClassifierParent::OnClassifyComplete
nsUrlClassifierClassifyCallback::HandleEvent
nsUrlClassifierLookupCallback::HandleResults()
(Reporter)

Comment 1

2 years ago
Ehsan, could you take a look? Thanks.
Blocks: 1321868
status-firefox52: --- → unaffected
status-firefox53: --- → affected
Flags: needinfo?(ehsan)
(Reporter)

Comment 2

2 years ago
The IPCFatalErrorMsg for all of these is "actor has been |delete|d".
(Assignee)

Comment 3

2 years ago
Hmm, it's hard to say what's happening here.  The code is written with the assumption that the outparam to Classify() determines whether we will later receive an OnClassifyComplete() call, but clearly this assumption isn't holding in practice.  I'm not sure why.

I'm planning to write a debugging patch to help uncover what's going on here...
Flags: needinfo?(ehsan)
(Assignee)

Updated

2 years ago
Assignee: nobody → ehsan
Flags: needinfo?(ehsan)
Keywords: leave-open
(Assignee)

Comment 4

2 years ago
Created attachment 8818365 [details] [diff] [review]
Include some more diagnostic information in the crash reports
Attachment #8818365 - Flags: review?(continuation)
(Reporter)

Comment 5

2 years ago
Comment on attachment 8818365 [details] [diff] [review]
Include some more diagnostic information in the crash reports

Review of attachment 8818365 [details] [diff] [review]:
-----------------------------------------------------------------

Honestly this just feels like a generic IPDL shutdown issue, where you aren't checking for the shutdown happening out from under you, but I suppose this can't hurt.

::: dom/ipc/URLClassifierParent.cpp
@@ +27,5 @@
>    // should handle the service not being present gracefully.
>    nsCOMPtr<nsIURIClassifier> uriClassifier =
>      do_GetService(NS_URICLASSIFIERSERVICE_CONTRACTID, &rv);
>    if (NS_SUCCEEDED(rv)) {
> +    mStartClassifyLog |= 2;

Could you at least write these flags as 1 << 1 etc. please so it is a little easier to follow?

::: dom/ipc/URLClassifierParent.h
@@ +31,5 @@
>    ~URLClassifierParent() = default;
> +
> +  // Remember what happened in StartClassify in order to annotate the crash
> +  // reports later on.
> +  uint32_t mStartClassifyLog = 0;

Weird, I've never seen this initializer syntax before, but it sounds like it is supported by all compilers we use.
Attachment #8818365 - Flags: review?(continuation) → review+
(Reporter)

Comment 6

2 years ago
(In reply to Andrew McCreight [:mccr8] from comment #5)
> Could you at least write these flags as 1 << 1 etc. please so it is a little
> easier to follow?

Actually, never mind about that. I guess these are just "I got to line X of the code" so it doesn't really matter.

Comment 7

2 years ago
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8508fca15e2f
Include some more diagnostic information in the crash reports; r=mccr8
I also encountered this locally in today's build. It looks like the parent actor is destroyed before the URL classifier callback is notified.
Call stack:

(gdb) bt
#0  0x00007fe7c80578dd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fe7c805782a in sleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fe7bbc5b2fe in ah_crap_handler (signum=11) at mozilla-central/toolkit/xre/nsSigHandlers.cpp:103
#3  0x00007fe7bbc4262a in nsProfileLock::FatalSignalHandler (signo=11, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/toolkit/profile/nsProfileLock.cpp:191
#4  0x00007fe7bcb6cb56 in js::UnixExceptionHandler (signum=11, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/js/src/ds/MemoryProtectionExceptionHandler.cpp:264
#5  0x00007fe7bcde6880 in WasmFaultHandler<(Signal)0> (signum=<optimized out>, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/js/src/wasm/WasmSignalHandlers.cpp:1211
#6  <signal handler called>
#7  0x0000000000000000 in ?? ()
#8  0x00007fe7b88b9ca8 in mozilla::dom::PURLClassifierParent::Write (this=this@entry=0x7fe78e404788, v__=v__@entry=0x7fe78e404788, msg__=msg__@entry=0x7fe793e42300, nullable__=nullable__@entry=false) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:156
#9  0x00007fe7b88b9e7d in mozilla::dom::PURLClassifierParent::Send__delete__ (actor=actor@entry=0x7fe78e404788, status=...) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:52
#10 0x00007fe7baa9e88d in mozilla::dom::URLClassifierParent::OnClassifyComplete (this=0x7fe78e404780, aRv=nsresult::NS_OK) at mozilla-central/dom/ipc/URLClassifierParent.cpp:45
#11 0x00007fe7bbbe5d15 in nsUrlClassifierClassifyCallback::HandleEvent (this=0x7fe78ded1be0, tables=...) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1158
#12 0x00007fe7bbbfe7bb in nsUrlClassifierLookupCallback::HandleResults (this=this@entry=0x7fe78e199f10) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1127
#13 0x00007fe7bbbfefde in nsUrlClassifierLookupCallback::LookupComplete (this=<optimized out>, results=<optimized out>, this=<optimized out>) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:937
#14 0x00007fe7bbbdf86a in UrlClassifierLookupCallbackProxy::LookupCompleteRunnable::Run (this=0x7fe784910140) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierProxies.cpp:271
#15 0x00007fe7b7f9ed51 in nsThread::ProcessNextEvent (this=0x7fe7b4e47300, aMayWait=<optimized out>, aResult=0x7ffd58e71987) at mozilla-central/xpcom/threads/nsThread.cpp:1213
#16 0x00007fe7b7fea46c in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0x7fe7b4e47300, aMayWait=aMayWait@entry=false) at mozilla-central/xpcom/glue/nsThreadUtils.cpp:381

Where there are no active ContentParent instances. The linked list sentinel points back to itself, indicating that it's empty. There are no plugin-container processes shown in the system.
(gdb) p &mozilla::dom::ContentParent::sContentParents.mRawPtr.sentinel
$33 = (mozilla::LinkedListElement<mozilla::dom::ContentParent> *) 0x7fe792edc600
(gdb) p mozilla::dom::ContentParent::sContentParents.mRawPtr.sentinel                                                                                                                                          
$34 = {
  mNext = 0x7fe792edc600, 
  mPrev = 0x7fe792edc600, 
  mIsSentinel = true
}

URLClassifierParent's mChannel also points to garbled data:
(gdb) fr 8
#8  0x00007fe7b88b9ca8 in mozilla::dom::PURLClassifierParent::Write (this=this@entry=0x7fe78e404788, v__=v__@entry=0x7fe78e404788, msg__=msg__@entry=0x7fe793e42300, nullable__=nullable__@entry=false) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:156
156                 FatalError("actor has been |delete|d");
(gdb) p v__->mChannel
$40 = (mozilla::ipc::MessageChannel *) 0x7fe792ec40f8
(gdb) p *v__->mChannel
$41 = {
  <mozilla::ipc::HasResultCodes> = {<No data fields>},
  members of mozilla::ipc::MessageChannel:
  static kNoTimeout = -2147483648,
  mListener = 0xabababababababab,
  mChannelState = 2880154539,
  mMonitor = [(mozilla::ipc::RefCountedMonitor *) 0xabababababababab],
  mSide = (mozilla::ipc::ChildSide | mozilla::ipc::UnknownSide | unknown: 2880154536),
  mLink = 0xabababababababab,
  mWorkerLoop = 0x7fe792ec4140,
  mChannelErrorTask = [(mozilla::CancelableRunnable *) 0xabababababababab],
  mWorkerLoopID = -1414812757,
  mTimeoutMs = -1414812757,
  mInTimeoutSecondHalf = 171,
  mNextSeqno = -1414812757,
  static sIsPumpingMessages = false,
  mLastSendError = -1414812757,
  mDispatchingAsyncMessage = 171,
  mDispatchingAsyncMessageNestedLevel = -1414812757,
... (omitted for brevity)

I think you need to do something like this: https://dxr.mozilla.org/mozilla-central/rev/8103c612b79c2587ea4ca1b0a9f9f82db4b185b8/netwerk/protocol/websocket/WebSocketChannelParent.cpp#286
Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] → [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ]
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #9)
> I also encountered this locally in today's build. It looks like the parent
                                                                      ^^^^^^
I mean the manager, i.e. PContentParent.
> actor is destroyed before the URL classifier callback is notified.
(Assignee)

Comment 11

2 years ago
Thanks Cervantes.  Given your findings, I don't think we need the debugging patch any more.
Flags: needinfo?(ehsan)
(Assignee)

Updated

2 years ago
Attachment #8818365 - Attachment is obsolete: true
(Assignee)

Comment 12

2 years ago
Created attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent
Attachment #8818570 - Flags: review?(amarchesini)
(Assignee)

Updated

2 years ago
Keywords: leave-open
Attachment #8818570 - Flags: review?(amarchesini) → review+

Comment 13

2 years ago
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/40d0c5bed75d
Don't attempt to delete the child actor twice in URLClassifierParent; r=baku
(Reporter)

Updated

2 years ago
Duplicate of this bug: 1323470
(Reporter)

Updated

2 years ago
Duplicate of this bug: 1323501
(Reporter)

Updated

2 years ago
Duplicate of this bug: 1323491
(Reporter)

Updated

2 years ago
Duplicate of this bug: 1323509
(Reporter)

Updated

2 years ago
Duplicate of this bug: 1323512

Updated

2 years ago
Duplicate of this bug: 1323511

Updated

2 years ago
Summary: Crash in mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write → FatalError Crashes under mozilla::dom::PURLClassifierParent::Send__delete__

Comment 20

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/40d0c5bed75d
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox53: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ] → [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ] [@ mozilla::ipc::FatalError | mozilla::ipc::PTestShellCommandChild::Write]

Updated

2 years ago
status-firefox50: --- → unaffected
status-firefox51: --- → unaffected
(Assignee)

Comment 21

2 years ago
[Tracking Requested - why for this release]:
Crash regression in this release.
tracking-firefox53: --- → ?
tracking-firefox53: ? → +
Version: unspecified → Trunk
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent


Approval Request Comment
[Feature/Bug causing the regression]: This feature is needed for the Shield study to be run on Release 52 (bug 1335232), which we'll use to study the effect of making flash click-to-play by default.
[User impact if declined]: Can't run the study as intended
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: Not for this feature independently. We'll do QE on the study as a whole to make sure all pieces work as expected
[List of other uplifts needed for the feature/fix]: Bug 1318768, Bug 1323220, Bug 1325255, Bug 1322204, Bug 1325651, Bug 1319571, Bug 1321377, Bug 1307604, Bug 1323064, Bug 1335549, Bug 1333303, Bug 1333483, Bug 1336714, Bug 1338287
[Is the change risky?]: No
[Why is the change risky/not risky?]: Fix for Bug 1318768, also requested for uplift
[String changes made/needed]: none
Attachment #8818570 - Flags: approval-mozilla-beta?
Attachment #8818570 - Flags: approval-mozilla-aurora?
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent

It's been in FF53 already. Aurora53-.
Attachment #8818570 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent

this was deemed too risky for beta
Attachment #8818570 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.