crash in mozilla::ipc::FatalError(char const*, char const*, unsigned long, bool) | mozilla::jsipc::PJavaScriptParent::Read(mozilla::jsipc::JSVariant*, IPC::Message const*, void**)

RESOLVED WORKSFORME

Status

()

Core
DOM: Content Processes
P1
critical
RESOLVED WORKSFORME
3 years ago
2 months ago

People

(Reporter: tracy, Assigned: billm)

Tracking

(Blocks: 1 bug, {crash, topcrash-win})

43 Branch
x86
Windows NT
crash, topcrash-win
Points:
---

Firefox Tracking Flags

(e10s+, firefox44+ wontfix, firefox45+ wontfix, firefox46- wontfix, firefox47- wontfix, firefox48+ wontfix, firefox49 wontfix, firefox50 wontfix)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-77951981-b7d9-42da-9c27-a73bc2151001.
=============================================================

This is a startup, or first page load, crash that is now at #2 on the topcrash volume list for the past 3 days. It first appeared on Nightly Fx44 in the build for 2015093015

Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::ipc::FatalError(char const*, char const*, unsigned long, bool) 	ipc/glue/ProtocolUtils.cpp
1 	xul.dll 	mozilla::jsipc::PJavaScriptParent::Read(mozilla::jsipc::JSVariant*, IPC::Message const*, void**) 	obj-firefox/ipc/ipdl/PJavaScriptParent.cpp
2 	xul.dll 	mozilla::jsipc::PJavaScriptParent::SendGet(unsigned __int64 const&, mozilla::jsipc::JSVariant const&, mozilla::jsipc::JSIDVariant const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*) 	obj-firefox/ipc/ipdl/PJavaScriptParent.cpp
3 	xul.dll 	mozilla::jsipc::JavaScriptBase<mozilla::jsipc::PJavaScriptParent>::SendGet(mozilla::jsipc::ObjectId const&, mozilla::jsipc::JSVariant const&, mozilla::jsipc::JSIDVariant const&, mozilla::jsipc::ReturnStatus*, mozilla::jsipc::JSVariant*) 	js/ipc/JavaScriptBase.h
4 	xul.dll 	mozilla::jsipc::WrapperOwner::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/ipc/WrapperOwner.cpp
5 	xul.dll 	CPOWProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/ipc/WrapperOwner.cpp
6 	xul.dll 	js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Proxy.cpp
7 	xul.dll 	js::DirectProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/DirectProxyHandler.cpp
8 	xul.dll 	js::CrossCompartmentWrapper::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/CrossCompartmentWrapper.cpp
9 	xul.dll 	xpc::AddonWrapper<js::CrossCompartmentWrapper>::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/xpconnect/wrappers/AddonWrapper.cpp
10 	xul.dll 	js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Proxy.cpp
11 	xul.dll 	js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/vm/NativeObject.h
12 	xul.dll 	js::jit::ComputeGetPropResult 	js/src/jit/BaselineIC.cpp
13 	xul.dll 	js::jit::DoGetPropFallback 	js/src/jit/BaselineIC.cpp
14 		@0x2e39345339d
This looks like the same crash that somebody was getting with AdblockPlus.
(Reporter)

Comment 2

3 years ago
(In reply to Andrew McCreight [:mccr8] from comment #1)
> This looks like the same crash that somebody was getting with AdblockPlus.

I think that is coincidental.  I've looked at several crash reports and none of them have adblock plus. So I don't think this can be correlated to ABP.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #2)
> (In reply to Andrew McCreight [:mccr8] from comment #1)
> > This looks like the same crash that somebody was getting with AdblockPlus.
> 
> I think that is coincidental.  I've looked at several crash reports and none
> of them have adblock plus. So I don't think this can be correlated to ABP.

I'm seeing a spike in several ipc::FatalError signatures starting with the 9/30 build. In bug 1210547 the cause was apparently Adblock Plus, but we're also seeing other crashes without ABP.

Bill, any idea what's going on here?
Flags: needinfo?(wmccloskey)
If the backtrace line numbers are correct, the FatalError is https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-unknown-linux-gnu/ipc/ipdl/PJavaScriptParent.cpp#2498 which is failing to read an int.  This got past the message framing/routing, so it wouldn't have been a low-level EOF from a child crash, I don't think.

So, if I'm reading this right, the parent blocked on a CPOW call to the child, and got back a message which seems to have the wrong type for the expected response.

Could this be related to bug 1191145, maybe?  It landed on m-c on 09-30 and it's about changing what can happen during CPOW handling.

Comment 5

3 years ago
I have just had three such crashes in quick succession.  All three on Ebay.
One thing that became obvious was on restart a previously closed tab was opened and the "We forgot your open tabs" Tab.

Tab bar order "we forgot your open tabs" then "Previously Closed Tab"

The page that was opened if it has any relevance at all was http://www.cnet.com/videos/little-known-facebook-tips/

Fourth start and no crash this time, so far.

Due to a short foray into the release version yesterday the only add-ons are signed

ADB Helper	0.8.1	true	adbhelper@mozilla.org
Add-on Compatibility Reporter	2.0.6.1-signed	true	compatibility@addons.mozilla.org
gtranslate	0.11.0.1-signed	true	{aff87fa2-a58e-4edd-b852-0a20203c1e17}
TitleCase	2.0.1-signed	true	TitleCase@htdsoftware.com
Valence	0.3.2	true	fxdevtools-adapters@mozilla.org
Video DownloadHelper	5.4.1	true	{b9db16a4-6edc-47ec-a1f4-b86292ed211d}

The Theme is standard for Win7, whatever that is.

In my crash of https://crash-stats.mozilla.com/report/index/be55bb8d-fa4f-45f7-8029-6c8382151002 Firefox had been running over night and the machine had resumed from sleep in the proceeding few minutes.  Not really a startup crash
Here is the regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=891ee0d0ba3ec42b6484cf0205b3c95e21c58f74&tochange=9169f652fe5e69c2d77ac31929934a5bc3342e6e

Bug 1191143 or bug 1191143 is probably what caused this, although it's hard to see how. I landed a couple things on top of this, so I'd like to try to reproduce over the weekend. It sounds like I just need to install some add-ons like Adblock that use CPOWs. If that doesn't work, I'll back out.
Flags: needinfo?(wmccloskey)
Created attachment 8669484 [details] [diff] [review]
patch

I don't know if this will fix it, but it might. I wasn't able to reproduce.

Jed is right that this must be caused by an ordering issue where we receive a reply, but it's not for the message we sent. One way this might happen is as follows:

- Child sends high prio sync message up
- Parent sends CPOW (high prio sync message) down
- Child sends async urgent msg up
- Parent cancels transaction through CancelCurrentTransaction, returns to event loop
- Child replies with R to CPOW before it receives the cancel message
- Later, parent sends another CPOW down and sees the reply R to the canceled transaction

This patch fixes the issue by not cancelling the transaction when we're processing an async message. That's kind of unfortunate since it defeats the purpose of bug 1191143, at least in this one case. But at least we're not crashing anymore.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8669484 - Flags: review?(dvander)
Attachment #8669484 - Flags: review?(dvander) → review+
Keywords: leave-open

Comment 10

3 years ago
hey bill, should track m8?
Flags: needinfo?(wmccloskey)
Yes.
tracking-e10s: ? → m8+
Flags: needinfo?(wmccloskey)
I backed everything out. I'm not sure if the other FatalError signatures are related (the PLayerTransaction would surprise me, but it's possible). We'll find out based on the crash rate.
Flags: needinfo?(wmccloskey)

Comment 14

3 years ago
I see that bug 1211636 has another patch related to ipc::FatalError

Updated

3 years ago
Crash Signature: [@ mozilla::ipc::FatalError(char const*, char const*, unsigned long, bool) | mozilla::jsipc::PJavaScriptParent::Read(mozilla::jsipc::JSVariant*, IPC::Message const*, void**)] → [@ mozilla::ipc::FatalError(char const*, char const*, unsigned long, bool) | mozilla::jsipc::PJavaScriptParent::Read(mozilla::jsipc::JSVariant*, IPC::Message const*, void**)] [@ mozilla::ipc::FatalError | mozilla::jsipc::PJavaScriptParent::Read]
Tracked for FF44 as this was mentioned as a top crasher on Aurora44.
tracking-firefox44: ? → +
Bill, this bug was highlighted as a top crasher on Aurora44 last week and this week. I noticed that you did a patch back out as a resolution. Do we need to do the same on mozilla-aurora? Any help is appreciated. Thanks.
Flags: needinfo?(wmccloskey)
The bugs I backed out were bug 1191145 and bug 1191143. They were backed out long before the aurora merge. I thought I verified that the backouts fixed the crash, but maybe I forgot. It would really help if someone could look at the last month of crash data to figure out the ranges of dates where this happened. I'm having a hard time seeing whether it stopped and came back or if it has always been crashing since 9/30.

Comment 18

2 years ago
Topcrash = Accessibility Active
Neither bug 1198765 nor bug 1210821 mention this, guess I'm first to highlight it. No idea if down to one bug or more.
Blocks: 1029143
It looks like this is accessibility related. Brad and I talked about disabling e10s on nightly/aurora if a11y is enabled. The crash data we're getting really isn't very useful to us since the new strategy for a11y is going to be very different from what we're doing now. So we're just inconveniencing our testers for no reason right now. David, is it all right with you if we go that route?
Flags: needinfo?(wmccloskey) → needinfo?(dbolter)
(In reply to Bill McCloskey (:billm) from comment #19)
> It looks like this is accessibility related. Brad and I talked about
> disabling e10s on nightly/aurora if a11y is enabled. The crash data we're
> getting really isn't very useful to us since the new strategy for a11y is
> going to be very different from what we're doing now. So we're just
> inconveniencing our testers for no reason right now. David, is it all right
> with you if we go that route?

Yes.
Flags: needinfo?(dbolter)
(In reply to Bill McCloskey (:billm) from comment #19)
> It looks like this is accessibility related. Brad and I talked about
> disabling e10s on nightly/aurora if a11y is enabled. The crash data we're
> getting really isn't very useful to us since the new strategy for a11y is
> going to be very different from what we're doing now. So we're just
> inconveniencing our testers for no reason right now. David, is it all right
> with you if we go that route?

uh I'd really rather not (or at least on !windows).  Is there explanation of what the issue is?

Some of the crashes certainly are useful mostly pointing out bugs in the tree caching which will stay.  The ipc reentry ones aren't that useful other than to see what things are most commonly used, but I think it should be straight forward to avoid crashing in that case if we are willing to return failure to the a11y tools in use.
tracking-e10s: m8+ → m9+
Based on the previous comments this seems to be an issue with e10s + a11y only. Given that it's a wontfix for Fx44 as e10s is not enabled by default. Moved the tracking flag to Fx45.
status-firefox44: affected → wontfix
status-firefox45: --- → affected
tracking-firefox45: --- → +
tracking-e10s: m9+ → +
Priority: -- → P1
[Tracking Requested - why for this release]:
As we are not planning to ship e10s, wontfix it for 45. Tracking request for 46.
status-firefox45: affected → wontfix
status-firefox46: --- → affected
tracking-firefox46: --- → ?
Tracking since e10s is aimed for release in 46.
status-firefox47: --- → affected
tracking-firefox46: ? → +
tracking-firefox47: --- → +
Oh, wait a11y only. Untracking for 46 since we are not shipping e10s to a11y users in 46.
status-firefox46: affected → wontfix
tracking-firefox46: + → -
Moving tracking flag to 48 (we may need to move this tracking forward a few times) until e10s+a11y becomes a priority.
status-firefox48: --- → affected
tracking-firefox47: + → -
tracking-firefox48: --- → +
This does not seem to be happening anymore in recent releases.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME

Comment 28

2 years ago
There is no place where e10s + a11y is enabled.
I think it makes sense to close this; when we start enabling e10s + a11y the code base will be pretty different from last october when this signature started showing up.  We can reopen if it crops up again.
status-firefox47: affected → wontfix
status-firefox48: affected → wontfix
status-firefox49: --- → wontfix
status-firefox50: --- → wontfix
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.