crash in mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::net::PNeckoChild::Write(mozilla::dom::PBrowserChild*, IPC::Message*, bool)

RESOLVED FIXED in Firefox 42

Status

()

Core
DOM
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: lizzard, Assigned: billm)

Tracking

({crash, topcrash-win})

37 Branch
mozilla43
Unspecified
Windows NT
crash, topcrash-win
Points:
---

Firefox Tracking Flags

(e10s?, firefox40- wontfix, firefox41 affected, firefox42+ fixed, firefox43+ fixed)

Details

(crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is 
report bp-b9d1044e-e0a1-4120-ab33-a3c9c2150511.
=============================================================
This is currently the #7 topcrash in Firefox 40 and is showing up on the explosiveness report for 40. There are a few crashes in 39 but at a much lower volume. 

Crashing thread:

0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp
2 	xul.dll 	mozilla::net::PNeckoChild::Write(mozilla::dom::PBrowserChild*, IPC::Message*, bool) 	obj-firefox/ipc/ipdl/PNeckoChild.cpp
3 	xul.dll 	mozilla::net::PNeckoChild::Write(mozilla::dom::PBrowserOrId const&, IPC::Message*) 	obj-firefox/ipc/ipdl/PNeckoChild.cpp
4 	xul.dll 	mozilla::net::PNeckoChild::SendPHttpChannelConstructor(mozilla::net::PHttpChannelChild*, mozilla::dom::PBrowserOrId const&, IPC::SerializedLoadContext const&, mozilla::net::HttpChannelCreationArgs const&) 	obj-firefox/ipc/ipdl/PNeckoChild.cpp
5 	xul.dll 	mozilla::net::HttpChannelChild::ContinueAsyncOpen() 	netwerk/protocol/http/HttpChannelChild.cpp
6 	xul.dll 	mozilla::net::HttpChannelChild::AsyncOpen(nsIStreamListener*, nsISupports*) 	netwerk/protocol/http/HttpChannelChild.cpp
7 	xul.dll 	nsXMLHttpRequest::Send(nsIVariant*, mozilla::dom::Nullable<nsXMLHttpRequest::RequestBody> const&) 	dom/base/nsXMLHttpRequest.cpp
8 	xul.dll 	nsXMLHttpRequest::Send(nsIVariant*) 	dom/base/nsXMLHttpRequest.cpp
9 	xul.dll 	`anonymous namespace'::SendRunnable::MainThreadRun() 	dom/workers/XMLHttpRequest.cpp
10 	xul.dll 	`anonymous namespace'::WorkerThreadProxySyncRunnable::Run() 	dom/workers/XMLHttpRequest.cpp
11 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
12 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
13 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
14 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc

Comment 1

3 years ago
Adding a tracking flag for FF40 as KaiRo mentioned this as a top issue on dev edition/Aurora.
tracking-firefox40: --- → +
Jason, can you help on this (or assign this to someone)? This is a top crash.
Flags: needinfo?(jduell.mcbugs)
This is deep in the guts of IPDL serialization.  That doesn't mean it's not somehow a necko bug, but looking over recent changes (since the bug started showing up, in 6/8/15 builds) I don't see any necko changes to PNecko or related IPDL files in /netwerk.  So I'm punting this over to :billm to see if he knows of any IPC changes that might be the cause here.
Component: Networking → IPC
Flags: needinfo?(jduell.mcbugs) → needinfo?(wmccloskey)

Comment 4

3 years ago
Liz, does this have a nightly regression range?

The two asserts in PNeckoChild::Write are "NULL actor value passed to non-nullable param" and "actor has been |delete|d".

So I'd bet that HttpChannelChild::ContinueAsyncOpen is passing a null or deleted PBrowserChild here: http://hg.mozilla.org/mozilla-central/annotate/cef11c3e86c3/netwerk/protocol/http/HttpChannelChild.cpp#l1690

I don't think this is an IPC issue. There may be an ordering issue which exposed this bug more recently. Bumping back to necko.

Liz, is there a clear nightly regression range?
Component: IPC → Networking
Flags: needinfo?(wmccloskey) → needinfo?(lhenry)
(Reporter)

Comment 5

3 years ago
Florin can your team look for a regression range? I'm out sick today and am not sure how long it will be for.
Flags: needinfo?(lhenry) → needinfo?(florin.mezei)
I don't think we can provide a clear regression range without reliable steps to reproduce. Socorro indicates this may have started somewhere on March 31st/April 1st (https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2015-03-30&enddate=2015-04-02), but I'm not very sure of this and there doesn't seem to be a clear point at which this spiked significantly.
Flags: needinfo?(florin.mezei)
With the exception of 1 report, in the last month I only see reports on Aurora and Nightly. Can this be e10s related or related to some other config that is enabled only on these branches?

Given that this doesn't seem to affect Beta, I am marking this as wontfix for 40 and dropping tracking.
status-firefox40: affected → wontfix
tracking-firefox40: + → -

Comment 8

3 years ago
[Tracking Requested - why for this release]:
we should track this for 42 as long as it remains an e10s rollout candidate.
status-firefox42: --- → affected
tracking-firefox42: --- → ?
(Reporter)

Comment 9

3 years ago
Thanks Jim.  It is no longer a top crash but let's track it for now.
status-firefox41: --- → affected
tracking-firefox42: ? → +

Comment 10

3 years ago
(In reply to Liz Henry (:lizzard) from comment #9)
> Thanks Jim.  It is no longer a top crash but let's track it for now.

It's baaaack! :) #1 top crasher today in 43.

Updated

3 years ago
tracking-e10s: --- → ?
Flags: needinfo?(wmccloskey)
Created attachment 8649575 [details]
IRC chat jduell/billm about what's going on with this bug

Based on this IRC conversation with billm, I'm reclassifying this as a DOM bug.  The issue does not appear to be with getting a null PBrowser--the "union PBrowserOrId" type explicitly handles nullable PBrowsers.  Rather billm thinks that Manager->GetBrowserOrId() is returning a pointer to a deleted PBrowser:

  https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelChild.cpp?from=HttpChannelChild.cpp&offset=0#1711

Neither of us know how that could be happening, but we're hoping smaug may have ideas.
Flags: needinfo?(bugs)
Hmm, I think smaug is on PTO.  Maybe khuey?
Flags: needinfo?(khuey)
Component: Networking → DOM
I think the bug is in WorkerPrivate::InterfaceRequestor.  It maintains a list of weak pointers to TabChilds, but the TabChild object still being alive is a distinct state from the TabChild actor still being alive, and it doesn't check the latter.  GetAnyLiveTabChild needs to query IsDestroyed before handing out a TabChild.

I'll review a patch.
Flags: needinfo?(khuey)
Created attachment 8649596 [details] [diff] [review]
patch

Thanks Kyle. Sorry about the ugly cast. It just didn't feel right to add this method to nsITabChild.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf0ac582f4b6
Flags: needinfo?(wmccloskey)
Attachment #8649596 - Flags: review?(khuey)
Comment on attachment 8649596 [details] [diff] [review]
patch

Review of attachment 8649596 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with a better commit message
Attachment #8649596 - Flags: review?(khuey) → review+
https://hg.mozilla.org/mozilla-central/rev/8ae079610a10
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox43: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
(Reporter)

Updated

3 years ago
tracking-firefox43: --- → +

Comment 19

3 years ago
hey bill, can we uplift this to aurora?
Assignee: nobody → wmccloskey
Flags: needinfo?(wmccloskey)
Comment on attachment 8649596 [details] [diff] [review]
patch

Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: topcrash with e10s
[Describe test coverage new/current, TreeHerder]: on m-c for a few days
[Risks and why]: Low, but there is a chance of a regression.
[String/UUID change made/needed]: none
Flags: needinfo?(wmccloskey)
Attachment #8649596 - Flags: approval-mozilla-aurora?
(Reporter)

Comment 21

3 years ago
Comment on attachment 8649596 [details] [diff] [review]
patch

Looks good to me, let's uplift to aurora.
Attachment #8649596 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.