Closed Bug 1323220 Opened 4 years ago Closed 4 years ago

FatalError Crashes under mozilla::dom::PURLClassifierParent::Send__delete__

Categories

(Toolkit :: Safe Browsing, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- unaffected
firefox53 + fixed

People

(Reporter: mccr8, Assigned: ehsan)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-da4a0abe-1807-426e-bc63-d517b2161212.
=============================================================

Crashes with this signature persist after bug 1322204 landed, so it must be something else.

The top frames for these crashes all look like:
mozilla::ipc::FatalError
mozilla::dom::PContentChild::Write
mozilla::dom::PURLClassifierParent::Send__delete__
mozilla::dom::URLClassifierParent::OnClassifyComplete
nsUrlClassifierClassifyCallback::HandleEvent
nsUrlClassifierLookupCallback::HandleResults()
Ehsan, could you take a look? Thanks.
Blocks: 1321868
Flags: needinfo?(ehsan)
The IPCFatalErrorMsg for all of these is "actor has been |delete|d".
Hmm, it's hard to say what's happening here.  The code is written with the assumption that the outparam to Classify() determines whether we will later receive an OnClassifyComplete() call, but clearly this assumption isn't holding in practice.  I'm not sure why.

I'm planning to write a debugging patch to help uncover what's going on here...
Flags: needinfo?(ehsan)
Assignee: nobody → ehsan
Flags: needinfo?(ehsan)
Keywords: leave-open
Attachment #8818365 - Flags: review?(continuation)
Comment on attachment 8818365 [details] [diff] [review]
Include some more diagnostic information in the crash reports

Review of attachment 8818365 [details] [diff] [review]:
-----------------------------------------------------------------

Honestly this just feels like a generic IPDL shutdown issue, where you aren't checking for the shutdown happening out from under you, but I suppose this can't hurt.

::: dom/ipc/URLClassifierParent.cpp
@@ +27,5 @@
>    // should handle the service not being present gracefully.
>    nsCOMPtr<nsIURIClassifier> uriClassifier =
>      do_GetService(NS_URICLASSIFIERSERVICE_CONTRACTID, &rv);
>    if (NS_SUCCEEDED(rv)) {
> +    mStartClassifyLog |= 2;

Could you at least write these flags as 1 << 1 etc. please so it is a little easier to follow?

::: dom/ipc/URLClassifierParent.h
@@ +31,5 @@
>    ~URLClassifierParent() = default;
> +
> +  // Remember what happened in StartClassify in order to annotate the crash
> +  // reports later on.
> +  uint32_t mStartClassifyLog = 0;

Weird, I've never seen this initializer syntax before, but it sounds like it is supported by all compilers we use.
Attachment #8818365 - Flags: review?(continuation) → review+
(In reply to Andrew McCreight [:mccr8] from comment #5)
> Could you at least write these flags as 1 << 1 etc. please so it is a little
> easier to follow?

Actually, never mind about that. I guess these are just "I got to line X of the code" so it doesn't really matter.
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8508fca15e2f
Include some more diagnostic information in the crash reports; r=mccr8
I also encountered this locally in today's build. It looks like the parent actor is destroyed before the URL classifier callback is notified.
Call stack:

(gdb) bt
#0  0x00007fe7c80578dd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fe7c805782a in sleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fe7bbc5b2fe in ah_crap_handler (signum=11) at mozilla-central/toolkit/xre/nsSigHandlers.cpp:103
#3  0x00007fe7bbc4262a in nsProfileLock::FatalSignalHandler (signo=11, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/toolkit/profile/nsProfileLock.cpp:191
#4  0x00007fe7bcb6cb56 in js::UnixExceptionHandler (signum=11, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/js/src/ds/MemoryProtectionExceptionHandler.cpp:264
#5  0x00007fe7bcde6880 in WasmFaultHandler<(Signal)0> (signum=<optimized out>, info=0x7ffd58e710b0, context=0x7ffd58e70f80) at mozilla-central/js/src/wasm/WasmSignalHandlers.cpp:1211
#6  <signal handler called>
#7  0x0000000000000000 in ?? ()
#8  0x00007fe7b88b9ca8 in mozilla::dom::PURLClassifierParent::Write (this=this@entry=0x7fe78e404788, v__=v__@entry=0x7fe78e404788, msg__=msg__@entry=0x7fe793e42300, nullable__=nullable__@entry=false) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:156
#9  0x00007fe7b88b9e7d in mozilla::dom::PURLClassifierParent::Send__delete__ (actor=actor@entry=0x7fe78e404788, status=...) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:52
#10 0x00007fe7baa9e88d in mozilla::dom::URLClassifierParent::OnClassifyComplete (this=0x7fe78e404780, aRv=nsresult::NS_OK) at mozilla-central/dom/ipc/URLClassifierParent.cpp:45
#11 0x00007fe7bbbe5d15 in nsUrlClassifierClassifyCallback::HandleEvent (this=0x7fe78ded1be0, tables=...) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1158
#12 0x00007fe7bbbfe7bb in nsUrlClassifierLookupCallback::HandleResults (this=this@entry=0x7fe78e199f10) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1127
#13 0x00007fe7bbbfefde in nsUrlClassifierLookupCallback::LookupComplete (this=<optimized out>, results=<optimized out>, this=<optimized out>) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:937
#14 0x00007fe7bbbdf86a in UrlClassifierLookupCallbackProxy::LookupCompleteRunnable::Run (this=0x7fe784910140) at mozilla-central/toolkit/components/url-classifier/nsUrlClassifierProxies.cpp:271
#15 0x00007fe7b7f9ed51 in nsThread::ProcessNextEvent (this=0x7fe7b4e47300, aMayWait=<optimized out>, aResult=0x7ffd58e71987) at mozilla-central/xpcom/threads/nsThread.cpp:1213
#16 0x00007fe7b7fea46c in NS_ProcessNextEvent (aThread=<optimized out>, aThread@entry=0x7fe7b4e47300, aMayWait=aMayWait@entry=false) at mozilla-central/xpcom/glue/nsThreadUtils.cpp:381

Where there are no active ContentParent instances. The linked list sentinel points back to itself, indicating that it's empty. There are no plugin-container processes shown in the system.
(gdb) p &mozilla::dom::ContentParent::sContentParents.mRawPtr.sentinel
$33 = (mozilla::LinkedListElement<mozilla::dom::ContentParent> *) 0x7fe792edc600
(gdb) p mozilla::dom::ContentParent::sContentParents.mRawPtr.sentinel                                                                                                                                          
$34 = {
  mNext = 0x7fe792edc600, 
  mPrev = 0x7fe792edc600, 
  mIsSentinel = true
}

URLClassifierParent's mChannel also points to garbled data:
(gdb) fr 8
#8  0x00007fe7b88b9ca8 in mozilla::dom::PURLClassifierParent::Write (this=this@entry=0x7fe78e404788, v__=v__@entry=0x7fe78e404788, msg__=msg__@entry=0x7fe793e42300, nullable__=nullable__@entry=false) at objdir-nightly/ipc/ipdl/PURLClassifierParent.cpp:156
156                 FatalError("actor has been |delete|d");
(gdb) p v__->mChannel
$40 = (mozilla::ipc::MessageChannel *) 0x7fe792ec40f8
(gdb) p *v__->mChannel
$41 = {
  <mozilla::ipc::HasResultCodes> = {<No data fields>},
  members of mozilla::ipc::MessageChannel:
  static kNoTimeout = -2147483648,
  mListener = 0xabababababababab,
  mChannelState = 2880154539,
  mMonitor = [(mozilla::ipc::RefCountedMonitor *) 0xabababababababab],
  mSide = (mozilla::ipc::ChildSide | mozilla::ipc::UnknownSide | unknown: 2880154536),
  mLink = 0xabababababababab,
  mWorkerLoop = 0x7fe792ec4140,
  mChannelErrorTask = [(mozilla::CancelableRunnable *) 0xabababababababab],
  mWorkerLoopID = -1414812757,
  mTimeoutMs = -1414812757,
  mInTimeoutSecondHalf = 171,
  mNextSeqno = -1414812757,
  static sIsPumpingMessages = false,
  mLastSendError = -1414812757,
  mDispatchingAsyncMessage = 171,
  mDispatchingAsyncMessageNestedLevel = -1414812757,
... (omitted for brevity)

I think you need to do something like this: https://dxr.mozilla.org/mozilla-central/rev/8103c612b79c2587ea4ca1b0a9f9f82db4b185b8/netwerk/protocol/websocket/WebSocketChannelParent.cpp#286
Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] → [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ]
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #9)
> I also encountered this locally in today's build. It looks like the parent
                                                                      ^^^^^^
I mean the manager, i.e. PContentParent.
> actor is destroyed before the URL classifier callback is notified.
Thanks Cervantes.  Given your findings, I don't think we need the debugging patch any more.
Flags: needinfo?(ehsan)
Attachment #8818365 - Attachment is obsolete: true
Keywords: leave-open
Attachment #8818570 - Flags: review?(amarchesini) → review+
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/40d0c5bed75d
Don't attempt to delete the child actor twice in URLClassifierParent; r=baku
Duplicate of this bug: 1323470
Duplicate of this bug: 1323501
Duplicate of this bug: 1323491
Duplicate of this bug: 1323509
Duplicate of this bug: 1323512
Duplicate of this bug: 1323511
Summary: Crash in mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write → FatalError Crashes under mozilla::dom::PURLClassifierParent::Send__delete__
https://hg.mozilla.org/mozilla-central/rev/40d0c5bed75d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ] → [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::Write] [@ mozilla::ipc::FatalError | mozilla::dom::PURLClassifierParent::Send__delete__ ] [@ mozilla::ipc::FatalError | mozilla::ipc::PTestShellCommandChild::Write]
[Tracking Requested - why for this release]:
Crash regression in this release.
Version: unspecified → Trunk
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent


Approval Request Comment
[Feature/Bug causing the regression]: This feature is needed for the Shield study to be run on Release 52 (bug 1335232), which we'll use to study the effect of making flash click-to-play by default.
[User impact if declined]: Can't run the study as intended
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: Not for this feature independently. We'll do QE on the study as a whole to make sure all pieces work as expected
[List of other uplifts needed for the feature/fix]: Bug 1318768, Bug 1323220, Bug 1325255, Bug 1322204, Bug 1325651, Bug 1319571, Bug 1321377, Bug 1307604, Bug 1323064, Bug 1335549, Bug 1333303, Bug 1333483, Bug 1336714, Bug 1338287
[Is the change risky?]: No
[Why is the change risky/not risky?]: Fix for Bug 1318768, also requested for uplift
[String changes made/needed]: none
Attachment #8818570 - Flags: approval-mozilla-beta?
Attachment #8818570 - Flags: approval-mozilla-aurora?
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent

It's been in FF53 already. Aurora53-.
Attachment #8818570 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment on attachment 8818570 [details] [diff] [review]
Don't attempt to delete the child actor twice in URLClassifierParent

this was deemed too risky for beta
Attachment #8818570 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.