Closed Bug 1269036 Opened 8 years ago Closed 8 years ago

crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition from ABORT: __delete__()d actor

Categories

(Core :: DOM: Content Processes, defect, P1)

Unspecified
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
e10s + ---
firefox47 --- wontfix
firefox48 + wontfix
firefox49 + fixed
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: dbaron, Assigned: kanru)

References

Details

(Keywords: crash, regression, Whiteboard: e10st?)

Crash Data

Attachments

(2 files)

[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-2df9207b-3c07-494c-a804-ec4722160429.
=============================================================

Note that this bug's crash signature should change when bug 1269033 is fixed.

Crashes on nightly with the signature mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError started happening on low frequency on nightly starting with build 20160408030212, although given the frequencies it's not certain that it was a regression from the previous nightly:

https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&platform=Windows&date=%3E%3D2016-01-01&signature=mozalloc_abort+%7C+NS_DebugBreak+%7C+mozilla%3A%3Aipc%3A%3ALogicError&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=abort_message&_columns=url&page=1

Rank 	Build id 	Count 	%
4 	20160408030212 	3 	8.82 %
1 	20160409030219 	6 	17.65 %
5 	20160410030224 	3 	8.82 %
2 	20160411030231 	5 	14.71 %
6 	20160413030239 	3 	8.82 %
8 	20160418030305 	2 	5.88 %
9 	20160420030213 	1 	2.94 %
7 	20160421030302 	3 	8.82 %
3 	20160423030220 	5 	14.71 %
10 	20160424030601 	1 	2.94 %
11 	20160427030215 	1 	2.94 %
12 	20160428030218 	1 	2.94 %


They've now merged over to aurora, and are showing up with somewhat higher numbers, although that's partly due to a single user crashing many times:

https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=aurora&platform=Windows&date=%3E%3D2016-01-01&signature=mozalloc_abort+%7C+NS_DebugBreak+%7C+mozilla%3A%3Aipc%3A%3ALogicError&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=abort_message&_columns=url&_columns=install_time&page=1

Rank 	Build id 	Count 	%
1 	20160429004052 	14 	82.35 %
2 	20160428004042 	3 	17.65 %


In all cases the abort message is:
[Parent NNNN] ###!!! ABORT: __delete__()d actor: file c:/builds/moz2_slave/m-cen-w64-ntly-000000000000000/build/src/ipc/glue/ProtocolUtils.cpp, line 377


The top of the stack of the report I linked to is:
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp
2 	xul.dll 	mozilla::ipc::LogicError(char const*) 	ipc/glue/ProtocolUtils.cpp
3 	xul.dll 	mozilla::dom::PBrowser::Transition(mozilla::dom::PBrowser::State, mozilla::ipc::Trigger, mozilla::dom::PBrowser::State*) 	obj-firefox/ipc/ipdl/PBrowser.cpp
4 	xul.dll 	mozilla::dom::PBrowserParent::SendDestroy() 	obj-firefox/ipc/ipdl/PBrowserParent.cpp
5 	xul.dll 	mozilla::dom::TabParent::DestroyInternal() 	dom/ipc/TabParent.cpp
6 	xul.dll 	mozilla::dom::TabParent::Destroy() 	dom/ipc/TabParent.cpp
7 	xul.dll 	nsFrameLoader::DestroyComplete() 	dom/base/nsFrameLoader.cpp
8 	xul.dll 	mozilla::dom::TabParent::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason) 	dom/ipc/TabParent.cpp
though I haven't yet verified that the other crashes have the same stack.


I'd say this is borderline for being considered frequent enough to be a topcrash.


A one day regression window for when it started on nightly would be:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b6683e141c47c022598c0caac3ea8ba8c6236d42&tochange=d9b1a9829c8ee2862955043f28183efa07de3d2b

A broader three day window would be:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d9f50aa0a1aaf90499b85c31e0f329b762e80fdd&tochange=d9b1a9829c8ee2862955043f28183efa07de3d2b
since it's not clear that the frequency is high enough that we can be confident in a one day window.
The regression window points to the fact that it was actually just a change in crash signature from:
https://hg.mozilla.org/mozilla-central/rev/0e3178883b51
I'll wait to see what the signatures look like after bug 1269033 before trying to make further progress here.
Tracking for 48 and 49 - dbaron can you help find an owner for this or do you want to follow up per Comment 2?
Flags: needinfo?(dbaron)
I'll let the e10s team triage this to find an owner.
tracking-e10s: --- → ?
The thing I wanted to follow up on was confirming that this was the bulk of the mozilla::ipc::LogicError crashes, which it is.  (Although not all.)
Flags: needinfo?(dbaron)
(I filed bug 1269854 on the rest of the crashes.)
Summary: crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError → crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition from ABORT: __delete__()d actor
Flags: needinfo?(jmathies)
Blocks: e10s-crashes
Flags: needinfo?(jmathies)
Priority: -- → P1
Keywords: topcrash
Assignee: nobody → kchen
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] [@ mozalloc_abort | Abort | NS_DebugBreak | mozilla::dom::PBrowserParent::SendDestroy]
See Also: → 1276225
Whiteboard: e10st?
Version: unspecified → Trunk
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] [@ mozalloc_abort | Abort | NS_DebugBreak | mozilla::dom::PBrowserParent::SendDestroy] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] [@ mozalloc_abort | Abort | NS_DebugBreak | mozilla::dom::PBrowserParent::SendDestroy] [@ A…
This is a fairly high volume crash in early beta, so if we come up with a fix, I'm hoping we can uplift it.
A user reported via Twitter [1] that Firefox crashed with this signature when he clicked a phone number on some address card in Google's search result page. [2]

[1] https://twitter.com/dandelionspiral/status/765121569681125377 (Chinese)
[2] https://crash-stats.mozilla.com/report/index/514feb9d-85df-4688-9e0e-8cecc2160815
Did he/she clicked on the "phone number" in the card? The only clickable part I see is the label part which is a search link.
Crash Signature: Abort | __delete__()d actor | mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] → Abort | __delete__()d actor | mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::dom::PBrowser::Transition] [@ Abort | __delete__()d actor | mozalloc_abort | NS_DebugBreak | mozilla::dom::PBrowser::Transition]
Is this related to the Windows option (or plugin/extension of some kind, I can't recall) that tries to make all phone numbers clickable? Philip does this look familiar to you?
Flags: needinfo?(madperson)
I talked to the user in comment 10. The clickable phone number is made by google to enable calling with hangouts. It's a javascript:void(0) link and it will open a new window when you click on it.

I traced the relevant code before and this confirmed by theory that when for some reason the window opener in browser.xul failed to create a new window and attach the remote browser to it, we crash with this signature.

I haven't traced to where we could fail to create a new window but I can make a band-aid patch that stops this crash. It will hide the real problem though.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #12)
> Is this related to the Windows option (or plugin/extension of some kind, I
> can't recall) that tries to make all phone numbers clickable? Philip does
> this look familiar to you?

the skype click-to-play extension doesn't seem to be involved in those kind of crashes, in case you had that in mind.
Flags: needinfo?(madperson)
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

https://reviewboard.mozilla.org/r/73826/#review71820

::: dom/ipc/ContentParent.cpp:5008
(Diff revision 1)
>    }
>  
>    TabParent* newTab = TabParent::GetFrom(aNewTab);
>    MOZ_ASSERT(newTab);
>  
> +  auto destroyNewTabOnError = MakeScopeExit([&] {

(Got to say that this kind of lambdas make the code harder to read, but I see this is very useful here.)
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

https://reviewboard.mozilla.org/r/73826/#review71830

Yeah, I think this makes sense. Did you try artificially to make window opening to fail?
Attachment #8784331 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] (vacation Aug 25-28) from comment #18)
> Comment on attachment 8784331 [details]
> Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.
> 
> https://reviewboard.mozilla.org/r/73826/#review71830
> 
> Yeah, I think this makes sense. Did you try artificially to make window
> opening to fail?

Yes, I tried to make OpenWindowWithTabParent fail.
because tree closed..
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7f1978918cb1
Never call PBrowserChild::Send__delete__ in content process. r=smaug
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7f1978918cb1
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
hi, this crash is causing 0.5% of browser crashes on 49.0b at the moment. do you think the patch would be upliftable in terms of scope/risk?
Flags: needinfo?(kchen)
(In reply to [:philipp] from comment #23)
> hi, this crash is causing 0.5% of browser crashes on 49.0b at the moment. do
> you think the patch would be upliftable in terms of scope/risk?

I think it's rather safe to uplift.
Flags: needinfo?(kchen)
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

Approval Request Comment
[Feature/regressing bug #]: n/a
[User impact if declined]: user may experience browser crashes when websites try to open a new window
[Describe test coverage new/current, TreeHerder]: landed on m-c, baked on Nightly for several days
[Risks and why]: This patch moves TabParent deallocation from the content process to the parent process. If the TabParent is not deallocated correctly it will cause small amount leak. I've tested the fail to open window scenario manually and confirmed the TabParent is correctly destroyed so I think the risk of leak is small.
[String/UUID change made/needed]: n/a
Attachment #8784331 - Flags: approval-mozilla-aurora?
Can we uplift that to beta too? Thanks
Flags: needinfo?(kchen)
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

Approval Request Comment: see comment 25
Flags: needinfo?(kchen)
Attachment #8784331 - Flags: approval-mozilla-beta?
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

The 4th signature shows a number of occurrences on Beta49 and since the fix landed in Nightly, there are no reports of this crash. Let's uplift to Aurora50.
Attachment #8784331 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8784331 [details]
Bug 1269036 - Never call PBrowserChild::Send__delete__ in content process.

Fix for top 20 browser crash in beta, let's uplift for beta 9.
Attachment #8784331 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Backed out from Beta for causing Linux timeouts in test_HighPriority.html and subsequent tests.
https://treeherder.mozilla.org/logviewer.html#?job_id=1568007&repo=mozilla-beta

https://hg.mozilla.org/releases/mozilla-beta/rev/089c3a6535cb
Flags: needinfo?(kchen)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #32)
> Backed out from Beta for causing Linux timeouts in test_HighPriority.html
> and subsequent tests.
> https://treeherder.mozilla.org/logviewer.html#?job_id=1568007&repo=mozilla-
> beta
> 
> https://hg.mozilla.org/releases/mozilla-beta/rev/089c3a6535cb

We also need to destroy TabParent properly for mozbrowser iframes. This shouldn't affect beta user because mozbrowser is not exposed to content. IMO we should reland the fixes to beta after the tests are fixed.
Flags: needinfo?(kchen)
Comment on attachment 8787547 [details]
Bug 1269036 - Also destroy TabParent properly in case of mozbrowser.

https://reviewboard.mozilla.org/r/76264/#review74410

I assume you went through all the cases when opening window might fail.
Attachment #8787547 - Flags: review?(bugs) → review+
FYI Firefox 49 goes to release candidate builds next week. We'll need to weigh risks of re-landing on 49.
Fairly high number of crashes, I'm ok to re-land this on beta if you can do it before the beta to release merge. Kan-ru can you land it?  If not, hopefully a sheriff will see this Monday morning .
Flags: needinfo?(kchen)
e10s only bug. With beta 50% and release 100%; a basic assumption is double the crashes if not fixed for release.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d75552d5fa0d
Also destroy TabParent properly in case of mozbrowser. r=smaug
Flags: needinfo?(kchen)
You need to log in before you can comment on or make changes to this bug.