Closed Bug 1470306 Opened Last year Closed Last year

Crash in nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal

Categories

(Core :: DOM: Core & HTML, defect, P2, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 + fixed
firefox63 --- fixed

People

(Reporter: marcia, Assigned: baku)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(5 files)

This bug was filed from the Socorro interface and is
report bp-9215996a-9ed7-444d-bf88-dc97f0180617.
=============================================================

Seen while looking at nightly crash stats - the crash is present on Mac and Android in 62 nightly and in Linux on 62.0b0: https://bit.ly/2K4f77o and https://bit.ly/2IdXAb0

Possible regression range based on Build ID:  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0344af5522398276a98629b66df90cca6f395245&tochange=91db0c695f0272f00bf92c81c471a85101056d96

Not sure if Bug 1467179 is possibly involved.

Top 10 frames of crashing thread:

0 XUL nsTArray_Impl<RefPtr<mozilla::dom::SharedMessagePortMessage>, nsTArrayInfallibleAllocator>::Clear xpcom/ds/nsTArray.h:1292
1 XUL mozilla::dom::MessagePort::CloseInternal dom/messagechannel/MessagePort.cpp:510
2 XUL mozilla::dom::MessagePort::CloseInternal dom/messagechannel/MessagePort.cpp:521
3 XUL mozilla::dom::MessagePort::DisconnectFromOwner dom/messagechannel/MessagePort.cpp:501
4 XUL std::__1::__function::__func<nsIGlobalObject::DisconnectEventTargetObjects dom/base/nsIGlobalObject.cpp:178
5 XUL nsIGlobalObject::ForEachEventTargetObject const clang/include/c++/v1/functional:1916
6 XUL <name omitted> dom/base/nsIGlobalObject.cpp:177
7 XUL mozilla::dom::WorkerPrivate::NotifyInternal dom/workers/WorkerScope.cpp:145
8 XUL mozilla::dom::WorkerPrivate::DoRunLoop dom/workers/WorkerPrivate.cpp:3253
9 XUL mozilla::dom::workerinternals:: dom/workers/RuntimeService.cpp:2707

=============================================================
Flags: needinfo?(amarchesini)
The code definitely looks error prone. It is unclear to me what keeps the MessagePort alive while DisconnectFromOwner is being called.

hmm, but is this actually sort-of a regression from https://bug1438541.bmoattachments.org/attachment.cgi?id=8952759. We should keep DETH object alive when calling DisconnectFromOwner.
Flags: needinfo?(bkelly)
uncompiled patch, guess fix. (but I think we want something like this anyhow.)
Attachment #8986928 - Flags: review?(bkelly)
Comment on attachment 8986928 [details] [diff] [review]
DETH_disconnect.diff

Review of attachment 8986928 [details] [diff] [review]:
-----------------------------------------------------------------

I think arguably this is more of a bug in the MessagePort.  The DETH owner mechanism never claims to hold the DETH alive.  Still, it doesn't hurt to make nsIGlobalObject do the kungfugrip for the DETH object automatically.
Attachment #8986928 - Flags: review?(bkelly) → review+
Crash Signature: [@ nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] → [@ nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal]
This is clearly a bug in nsIGlobalObject. I should have caught this when reviewing. Per COM rules caller should keep callee alive.
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/15c95df467be
DOMEventTargetHelper object should be kept alive while calling DisconnectFromOwner, r=bkelly
https://hg.mozilla.org/mozilla-central/rev/15c95df467be
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Flags: needinfo?(amarchesini)
Duplicate of this bug: 1470112
Flags: needinfo?(bkelly)
Crash Signature: [@ nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] → [@ nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] [@ nsTArray_Impl<T>::ClearAndRetainStorage | mozilla::dom::MessagePor…
tab crashes with this signature are still occurring in firefox 62.0b and 63.0a1
Status: RESOLVED → REOPENED
Flags: needinfo?(bugs)
Resolution: FIXED → ---
ok, this was a possible fix based on stack traces.
Flags: needinfo?(bugs) → needinfo?(amarchesini)
Attached patch fix.patchSplinter Review
Assignee: bugs → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8989038 - Flags: review?(bugs)
Attachment #8989038 - Flags: review?(bugs) → review+
Priority: -- → P2
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ecdbee46819c
MessagePort unlinked should start the cleanup/close/shutdown procedure of the object, r=smaug
https://hg.mozilla.org/mozilla-central/rev/ecdbee46819c
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
sorry for bugging again, the signatures are still present on nightly after the latest patch has landed there.
Flags: needinfo?(amarchesini)
This is a fairly high number of crashes for beta, so I'm tracking this issue for 62.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Crash Signature: mozilla::dom::MessagePort::CloseInternal ] → mozilla::dom::MessagePort::CloseInternal ] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_base<T>::ShrinkCapacity | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal]
Working on it. I hope to have a patch soon.
Attached patch msg.patchSplinter Review
This is not a fix. I want to add a MOZ_DIAGNOSTIC_ASSERT and an ScopeExit. This will increase our knowledge on what is going on here.
Flags: needinfo?(amarchesini)
Attachment #8991896 - Flags: review?(bugs)
Attachment #8991896 - Flags: review?(bugs) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0f9fb876eec
Adding some MOZ_DIAGNOSTIC_ASSERT in MessagePort, r=smaug
Crash Signature: mozilla::dom::MessagePort::CloseInternal ] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_base<T>::ShrinkCapacity | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] → mozilla::dom::MessagePort::CloseInternal ] [@ nsTArray_Impl<T>::ClearAndRetainStorage | nsTArray_base<T>::ShrinkCapacity | nsTArray_Impl<T>::Clear | mozilla::dom::MessagePort::CloseInternal] [@ mozilla::dom::MessagePort::CloseInternal]
Probably, this is not the fix, but it's something important to have.
Attachment #8992319 - Flags: review?(kyle)
Attachment #8992319 - Flags: review?(kyle) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/82aed14f5854
Ensure MessagePort is in the right state if the connection to PBackground fails, r=qdot
There are 4 crahses on android with nightly (20180715100113) and the moz_crash_reason is "MOZ_RELEASE_ASSERT(mUnshippedEntangledPort)".
Yes. That is my latest patch. I introduced that assertion to be 100% sure that the problem was a nullptr port.
I also pushed a patch yesterday about an additional secure check.
Attached patch more assertionsSplinter Review
Attachment #8994193 - Flags: review?(kyle)
Attachment #8994193 - Flags: review?(kyle) → review+
(In reply to Raul Gurzau (:RaulGurzau) from comment #26)
> https://hg.mozilla.org/mozilla-central/rev/2f59d7255981

Hey Baku, any updates after this landing?
Flags: needinfo?(amarchesini)
Yes, it seems that we don't have any more crashes in nightly after the last push. Am I right?
If yes, the problem was related to IPC initialization on workers and we should uplift that and the previous patch.

Can somebody confirm that we don't have any new crash-report in nightly? Thanks.
Flags: needinfo?(amarchesini)
yes, the most recent recorded crashes were from nightly build 20180724100052 - nothing after that...
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Keywords: leave-open
OS: Mac OS X → All
Hardware: Unspecified → All
Resolution: --- → FIXED
Target Milestone: mozilla62 → mozilla63
could you take care of requesting approval for the right patches to be uplifted to 62? thanks
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Attachment #8986928 - Flags: approval-mozilla-beta?
Attachment #8989038 - Flags: approval-mozilla-beta?
Attachment #8991896 - Flags: approval-mozilla-beta?
Attachment #8992319 - Flags: approval-mozilla-beta?
Comment on attachment 8994193 [details] [diff] [review]
more assertions

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1467179, probably.
[User impact if declined]: a crash can occur.
[Is this code covered by automated tests?]: n/a
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: n/a
[List of other uplifts needed for the feature/fix]: just all the patches of this bug.
[Is the change risky?]: no..
[Why is the change risky/not risky?]: These patches introduce return value checks. No logical changes involved.
[String changes made/needed]: none
Attachment #8994193 - Flags: approval-mozilla-beta?
Comment on attachment 8994193 [details] [diff] [review]
more assertions

Crash fix that seems to have helped in Nightly, Beta62+
Attachment #8994193 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8986928 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8989038 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8991896 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8992319 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.