Closed
Bug 1331193
Opened 9 years ago
Closed 9 years ago
[e10s] Crash in mozilla::dom::TabChild::RecvAsyncMessage
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla53
People
(Reporter: philipp, Assigned: bkelly)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(2 files, 1 obsolete file)
3.35 KB,
patch
|
bkelly
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
lizzard
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
1.40 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-e141f5ec-7648-4006-8f23-da7042170114.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 xul.dll mozilla::dom::TabChild::RecvAsyncMessage(nsString const&, nsTArray<mozilla::jsipc::CpowEntry>&&, IPC::Principal const&, mozilla::dom::ClonedMessageData const&) dom/ipc/TabChild.cpp:2365
1 xul.dll mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PBrowserChild.cpp:2894
2 xul.dll mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) obj-firefox/ipc/ipdl/PContentChild.cpp:6521
3 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) ipc/glue/MessageChannel.cpp:1662
4 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) ipc/glue/MessageChannel.cpp:1600
5 xul.dll mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ipc/glue/MessageChannel.cpp:1567
6 xul.dll mozilla::detail::RunnableMethodImpl<bool ( mozilla::ipc::MessageChannel::*)(void), 0, 1>::Run() obj-firefox/dist/include/nsThreadUtils.h:768
7 xul.dll mozilla::ipc::MessageChannel::DequeueTask::Run() obj-firefox/dist/include/mozilla/ipc/MessageChannel.h:559
8 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1067
this crash in the content process has been around for a while, but spiked up in volume in beta-nightly prerelease versions on 2017-01-13.
Reporter | ||
Comment 1•9 years ago
|
||
this would be the pushlog for 52.0a2 20170113004016 - 1 day: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=1abb4b193740653f1161d86f38dbb79287e7d69b&tochange=1e322cd9c7410917d4c5f1fa5b2cd78eb4a79ec6
and from 51.0b13 to 14: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_51_0b13_RELEASE&tochange=FIREFOX_51_0b14_RELEASE
Reporter | ||
Comment 2•9 years ago
|
||
[Tracking Requested - why for this release]:
this is a late regression in 51.0b14 and causing nearly 5% of content crashes there at the moment.
tracking-firefox51:
--- → ?
![]() |
||
Comment 3•9 years ago
|
||
#5 topcrash in Aurora 20170113004016. The crash address is usually (always?) 0x3e.
billm, any ideas?
Flags: needinfo?(wmccloskey)
Comment 5•9 years ago
|
||
The volume in beta14 is so high that I am afraid that it is going to reach unacceptable level in release. Updating the tracking value.
Comment 6•9 years ago
|
||
The only obvious DOM-looking change I see common to both pushlog ranges in #c1 is bug 1329989. Any chance it's related, Ben?
Flags: needinfo?(bkelly)
Comment 7•9 years ago
|
||
Blassey noted during the channel meeting today that this is strongly correlated to the addons cohort on 51 as well, so it could be an addon update causing this.
Assignee | ||
Comment 8•9 years ago
|
||
I don't think its related, but I can take a stab at fixing it. I don't think this code correctly handles the mMessageManager getting cycle collected.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Assignee | ||
Comment 9•9 years ago
|
||
Olli, this tries to avoid nullptr de-refs in TabChild when mMessageManager has been cleared. I *think* this can happen if the mTabChildGlobal has been CC'd but the TabChild actor is still being held alive by IPC.
I also considered calling mMessageManager->Disconnect() in the TabChildGlobal CC unlink, but not sure if that is needed.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=842fa114f2c99df72747294e86929db2d3be3663
Attachment #8827566 -
Flags: review?(bugs)
Comment 10•9 years ago
|
||
adding rhelmer to look into comment 7....
who could look into a crash stat showing some correlation with add-ons? https://crash-stats.mozilla.com/report/index/e141f5ec-7648-4006-8f23-da7042170114#tab-correlation last line is (63.06% in signature vs 19.22% overall) e10s_cohort = addons-set51alladdons-test [63.06% vs 42.67% if process_type = content].
the "extensions" correlated are all system add-ons (pocket, automatic update service, webcompat, e10sroll-out, and default theme) and present with or without the experiment.
Flags: needinfo?(rhelmer)
Comment 11•9 years ago
|
||
So IPC keeps a reference to TabChild and DeallocPBrowserChild releases it, meaning that CC can't really unlink before DeallocPBrowserChild.
Comment 12•9 years ago
|
||
Comment on attachment 8827566 [details] [diff] [review]
Make TabChild handle a nullptr mMessageManager. r=smaug
Add MOZ_DIAGNOSTIC_ASSERT as discussed on IRC.
This still puzzles me.
Attachment #8827566 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 13•9 years ago
|
||
Added diagnostic assertions.
Attachment #8827566 -
Attachment is obsolete: true
Attachment #8827604 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
I can't think of why this would happen. I agree with Olli that the CC issue shouldn't be possible. We could take Ben's patch and it would probably make the crash go away, but it's a little scary that we don't know what's happening.
The only two possibilities I can think of are:
1. We've received a message after TabChild::ActorDestroy has run, or
2. We've received a message before the message manager has been initialized.
#1 seems pretty unlikely. If that happened, lots of things would be horribly broken. We could add a release assertion for it pretty easily though. We'd just need a state variable to track if ActorDestroy ran yet or not.
#2 seems a little more likely to me. We know we have an mTabChildGlobal, so InitTabChildGlobal must have run. If this line fails, it is possible that we could get into this situation:
http://searchfox.org/mozilla-central/rev/790b2cb423ea1ecb5746722d51633caec9bab95a/dom/ipc/TabChild.cpp#2509
We could add a release assertion there and see if it fires.
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 15•9 years ago
|
||
I'll add a MOZ_DIAGNOSTIC_ASSERT after InitTabChildGlobal(). Thanks.
Assignee | ||
Comment 16•9 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #15)
> I'll add a MOZ_DIAGNOSTIC_ASSERT after InitTabChildGlobal(). Thanks.
After talking on IRC I think the plan is to keep the original patch, but also add a nightly P2 patch that tries to fix the init failure path. We will only save mTabChildGlobal if we know the initialization has succeeded. This will avoid inconsistent state.
Assignee | ||
Comment 17•9 years ago
|
||
As discussed on IRC this moves the assignment to mTabChildGlobal to take place after the conditional failure checks.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=33c30680013d5c7dcb751dcb69a6d7fc76612aae
Attachment #8827621 -
Flags: review?(wmccloskey)
Attachment #8827621 -
Flags: review?(wmccloskey) → review+
Flags: needinfo?(wmccloskey)
Comment 18•9 years ago
|
||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d75f6cff73c8
Make TabChild handle a nullptr mMessageManager. r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/013e646a2f7b
P2 Don't leave TabChild in inconsistent state if init fails. r=billm
Assignee | ||
Comment 19•9 years ago
|
||
Comment on attachment 8827604 [details] [diff] [review]
Make TabChild handle a nullptr mMessageManager. r=smaug
This is for this patch only. Do not uplift P2 as its a higher risk.
Approval Request Comment
[Feature/Bug causing the regression]: Unclear what the cause is, but we are getting a high volume of nullptr de-ref crashes on beta.
[User impact if declined]: Crashes
[Is this code covered by automated tests?]: Our automated test suite does not trigger this crash.
[Has the fix been verified in Nightly?]: We don't have STR so we can't manually verify.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal risk
[Why is the change risky/not risky?]: This patch tries to mitigate the problem by checking for nullptr here.
[String changes made/needed]: None
Attachment #8827604 -
Flags: approval-mozilla-beta?
Attachment #8827604 -
Flags: approval-mozilla-aurora?
Comment 20•9 years ago
|
||
Comment on attachment 8827604 [details] [diff] [review]
Make TabChild handle a nullptr mMessageManager. r=smaug
Top crash, blocking issue, let's try this for 51 RC2.
We have to land the changes for 51 on m-r as well as m-b. (beta->release merge was yesterday)
Attachment #8827604 -
Flags: approval-mozilla-release+
Attachment #8827604 -
Flags: approval-mozilla-beta?
Attachment #8827604 -
Flags: approval-mozilla-beta+
Attachment #8827604 -
Flags: approval-mozilla-aurora?
Attachment #8827604 -
Flags: approval-mozilla-aurora+
Comment 22•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d75f6cff73c8
https://hg.mozilla.org/mozilla-central/rev/013e646a2f7b
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Assignee | ||
Comment 23•9 years ago
|
||
Assignee | ||
Comment 24•9 years ago
|
||
Pushed to beta on closed tree:
https://hg.mozilla.org/releases/mozilla-beta/rev/df64cba2a9edf6b8046f9e7e63dc959bff63c419
Ryan, can you merge this to release? I think this is what you suggested.
Flags: needinfo?(bkelly) → needinfo?(ryanvm)
Comment 25•9 years ago
|
||
Wes, if you're around and ryan hasn't already done it, I'd like this on m-r as well so we can get it into the RC2 build.
Flags: needinfo?(wkocher)
Updated•9 years ago
|
Comment 27•9 years ago
|
||
bugherder uplift |
Assignee | ||
Updated•9 years ago
|
Crash Signature: [@ mozilla::dom::TabChild::RecvAsyncMessage] → [@ mozilla::dom::TabChild::RecvAsyncMessage]
[@ <name omitted> | mozilla::dom::TabChild::RecvAsyncMessage ]
Assignee | ||
Comment 28•9 years ago
|
||
Looks good so far. I don't see any RecvAsyncMessage() crashes in the RC2 build.
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•