Closed Bug 637596 Opened 13 years ago Closed 7 years ago

crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "other side should have been blocked"

Categories

(Core Graveyard :: Plug-ins, defect)

defect
Not set
critical

Tracking

(firefox11-, firefox20-)

RESOLVED INCOMPLETE
Tracking Status
firefox11 - ---
firefox20 - ---

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash)

Crash Data

It is a new crash signature that first appeared in 4.0b12pre/20110222.
It is currently #66 top crasher in 4.0b12.

Signature	mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool)
UUID	d8ce41a6-b536-452f-a0e9-3d7f92110228
Time 	2011-02-28 23:13:53.816823
Uptime	1126
Last Crash	7356268 seconds (12.2 weeks) before submission
Install Age	17193 seconds (4.8 hours) since version was first installed.
Product	Firefox
Version	4.0b13pre
Build ID	20110227030400
Branch	2.0
OS	Windows NT
OS Version	5.1.2600 Service Pack 2
CPU	x86
CPU Info	GenuineIntel family 6 model 23 stepping 10
Crash Reason	EXCEPTION_BREAKPOINT
Crash Address	0x691a39
App Notes 	AdapterVendorID: 8086, AdapterDeviceID: 29c2, AdapterDriverVersion: 6.14.10.4864
xpcom_runtime_abort(###!!! ABORT: other side should have been blocked: file e:/builds/moz2_slave/cen-w32-ntly/build/ipc/glue/RPCChannel.cpp, line 679)

Frame 	Module 	Signature [Expand] 	Source
0 	mozalloc.dll 	mozalloc_abort 	memory/mozalloc/mozalloc_abort.cpp:77
1 	xul.dll 	NS_DebugBreak_P 	xpcom/base/nsDebugImpl.cpp:350
2 	xul.dll 	mozilla::ipc::RPCChannel::DebugAbort 	ipc/glue/RPCChannel.cpp:679
3 	xul.dll 	mozilla::ipc::RPCChannel::Call 	ipc/glue/RPCChannel.cpp:251
4 	xul.dll 	mozilla::plugins::PPluginScriptableObjectParent::CallGetChildProperty 	obj-firefox/ipc/ipdl/PPluginScriptableObjectParent.cpp:548
5 	xul.dll 	mozilla::plugins::PluginScriptableObjectParent::GetPropertyHelper 	dom/plugins/PluginScriptableObjectParent.cpp:1279
6 	xul.dll 	NPObjWrapper_GetProperty 	modules/plugin/base/src/nsJSNPRuntime.cpp:1349
7 	mozjs.dll 	js::Shape::get 	js/src/jsscopeinlines.h:263
8 	mozjs.dll 	js_NativeGet 	js/src/jsobj.cpp:5270
9 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4201
10 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:653
11 	mozjs.dll 	js::Execute 	js/src/jsinterp.cpp:1028
12 	mozjs.dll 	EvaluateUCScriptForPrincipalsCommon 	js/src/jsapi.cpp:5059
13 	mozjs.dll 	JS_EvaluateUCScriptForPrincipalsVersion 	js/src/jsapi.cpp:5075
14 	xul.dll 	nsJSContext::EvaluateString 	dom/base/nsJSEnvironment.cpp:1459

The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=df4d35ffa99f&tochange=1da3405c74fd

More reports at:
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=mozalloc_abort%28char%20const*%20const%29%20|%20NS_DebugBreak_P%20|%20mozilla%3A%3Aipc%3A%3ARPCChannel%3A%3ADebugAbort%28char%20const*%2C%20int%2C%20char%20const*%2C%20char%20const*%2C%20char%20const*%2C%20bool%29
Are you sure the regression range isn't an artifact of when we added NS_DebugBreak_P to the appendlist? That happened 2011-02-24. If so, I suspect the previous signature was "mozalloc_abort(char const* const) | NS_DebugBreak_P" and this was likely uncovered when bug 631002 landed.
The relevant bit here is:

###!!! ABORT: other side should have been blocked: file
e:/builds/moz2_slave/cen-w32-ntly/build/ipc/glue/RPCChannel.cpp, line 679

cjones, any clues? I tried to load a few of the URLs mentioned in the reports, all of which have windowed Flash, but I couldn't reproduce the problem.
Summary: Crash [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] → Crash [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] with abort message: "###!!! ABORT: other side should have been blocked"
This is Bad Bad Bad.  That assertion failing means that the parent side dequeued a 'sync' message, *but* there were other messages remaining in the incoming queue.  For that to happen, the child side would have had to
 - send a 'sync' message
 - wake up from waiting for the reply (!!!)
 - send some more messages after waking up

Not being unblocked from 'sync' waits is a fundamental guarantee, which is why that assertion exists.

If the windows deadlock-avoidance code makes that possible, then ... sigh.  Guess we'll need to |assert(windows || !badState)| for now.  (Aside: it's also time to start thinking about removing that code.)
I don't think the windows deadlock-avoidance code should make that possible. If it does, we should probably turn that off for sync messages (but not RPC).
If the browser process sends a synchronous windows message to the plugin process, and the plugin is blocked waiting for a 'sync' reply, then I think the plugin would need to wake up.  We may need to just convert all 'sync' messages to 'rpc' in dom/plugins until we have OOPP 2.0.
You mean if they race? Because we can know that the code which handles the sync call doesn't send windows messages. Argh.

Let's audit the list of sync messages and see if we need to do anything special to make them RPC.
IIRC most are for NPRuntime stuff where we know the Gecko code won't attempt to re-enter the plugin.  I think I saw one added for the recent redirect handling.

One option is weakening the assertion.  But if the plugin wakes up from a sync wait because of a SendMessage and sends another 'sync' message from the handler, then I think the plugin will end up processing the wrong reply, and things will go boom, possibly exploitably.  I think I'd rather not do that.

The one troubling new behavior that we'll introduce by switching these guys 'sync'->'rpc' is 'async' painting messages from the browser racing with these new 'rpc's.  Some of those messages can repaint immediately, and that would allow painting to happen possibly while the plugin is doing script-y things.  That would still convert this fatal browser-process assert into a plugin-process crash (of possibly higher volume), and we could hack around the paint re-entry if need be.
Sync methods in the plugins tree:

parent: sync PPluginModule.AppendNotesToCrashReport()
parent: sync PPluginInstance.Show()
parent: sync PPluginInstance.NegotiatedCarbon()

As long as we're ok with Show() re-entering (which I think should be ok), I think the others can be made RPC without a big deal.
Hmm, I must have been smoking crack.

Show() itself is incidentally safe from re-entry because of the guards in place for existing re-entry hazards when delivering SetWindow() to the plugin.  However, making Show() 'rpc' will allow other unrelated messages to re-enter that didn't previously.  Might cause some new plugin crashes, but I suspect they'd be OK since we only Show() after we're completely finished asking them to paint.  I.e., from the plugin's perspective we might as well have gone back to the event loop.
RPC means the method itself is sync, just that other messages can be handled in the interim, right? AppendNotesToCrashReport is definitely not reentrant-safe.
Yes. The caller isn't reentrant-safe, or the implemented isn't? The implementer in this case doesn't matter, it only matters if the caller is reentrant-safe.
I really don't know where the caller is, so I couldn't say. I was only talking about the implementation.
Perhaps more importantly, if we made that message async, would that cause problems? It seems that the only caller is the X11 error handler, which does

appendnotes(...);
NS_RUNTIMEABORT(...);

I guess that making this async could mean that the message is not delivered, because the I/O thread may not have woken up before we aborted. Argh.

Since this is currently a windows-only problem, we can probably just solve the Show() message and leave the other two.
It is #13 top crasher in 4.0 over the last 3 days.
Keywords: topcrash
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ]
It is #11 top browser crasher in 6.0b2 and #13 in 5.0.
The signature is #8 on 5 yesterday, #6 on 6, #10 on 7, #9 on 8, this is definitely high-profile and rising.
So it's a bit higher right now on 6.0 but it's really been there through 5.0 and 6.0 in the top 25. It would be nice to fix if possible. bsmedberg, can we make any further headway on this?
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] → [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate]
Blocks: 696358
It's #1 top crasher in the latest nightly builds.
Appearing at #19 for Firefox 8.0. Still a valid top crash. We get 18K+ of these on all versions over 4 weeks.
Depends on: 711971
I am tracking this for 11 since there is some unexplained spike. Seeing this on other versions so might be external factors.
This shows up on the trunk explosive report as well: https://crash-analysis.mozilla.com/rkaiser/2012-01-02/2012-01-02.firefox.12.explosiveness.html

I will check manual correlations and modules to see if anything comes up there.
Keywords: qawanted
* Beta:    It's #2 top crasher with 7% of all crashes (mozalloc_abort(char const* const) | _RTC_Terminate crash signature).
* Aurora:  It's #1 top crasher with 15% of all crashes.
* Nightly: It's #1 top crasher with 5.5% of all crashes.
There's something wrong in Aurora.
FWIW, the plugin hang timeout has been set to 25 secs from 11.0a1 (see bug 705365).
This seems to have mostly gone away once we disable the child side abort on parent hang. Graphs have this dropping off on the same day.
(In reply to Jim Mathies [:jimm] from comment #24)
> This seems to have mostly gone away once we disable the child side abort on
> parent hang.
It's no longer a top crasher in 10.0b4, 11.0a2 and 12.0a1.
Depends on: 683967
Keywords: topcrash
(In reply to mha007 from comment #26)
> A user's Crash (Report ->
> https://crash-stats.mozilla.com/report/index/031f804b-9a7c-4aee-a50a-
> ea31c2120206) was related to Malware problem.
The "mozalloc_abort | _RTC_Terminate" crash signature is a meta one (see bug 696358).
The abort message of that user is different from the one of this bug: "ABORT: Recursive layout module initialization", so it's not this bug.
What do you need from QA to get this moving? Comment #25 seems to indicate this is not as big a problem as it used to be, but we are still tracking it with qawanted
Well, it's sitting at #4 on the top crash list for 10.0.2 and at #5 for 10.0.1. I think it's still a top crash (adding keyword).
Keywords: topcrash
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] → bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] [@ mozalloc_abort(char const* const) | _RTC_Terminate | setvbuf] [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, c…
Do we have any correlations or crash data which QA can use for a starting point; I don't see much, if anything, to go on in this bug.
(In reply to Sheila Mooney from comment #29)
> Well, it's sitting at #4 on the top crash list for 10.0.2 and at #5 for
> 10.0.1. I think it's still a top crash (adding keyword).
It's about the meta mozalloc_abort(char const* const) | _RTC_Terminate crash signature that will be broken down in dozen of bugs after Socorro 2.4.4 is live on Feb 29.
Keywords: topcrash
So is there any point continuing to track this for FF11? Doesn't look like there is anything actionable here.
If this is no longer considered a top crasher, there's no need to track for FF11.
Removing qawanted since this is no longer being tracked. If there is something more we can do, please add it back.
Keywords: qawanted
Crash Signature: bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] [@ mozalloc_abort(char const* const) | _RTC_Terminate | setvbuf] → bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] [@ mozalloc_abort(char const* const) | _RTC_Terminate | setvbuf ]
Crash Signature: bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] [@ mozalloc_abort(char const* const) | _RTC_Terminate | setvbuf ] → bool)] [@ mozalloc_abort(char const* const) | _RTC_Terminate] [@ mozalloc_abort(char const* const) | _RTC_Terminate | setvbuf ] [@ mozalloc_abort(char const*) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const* char c…
Keywords: regression
OS: Windows 7 → All
Hardware: x86 → All
Summary: Crash [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ] with abort message: "###!!! ABORT: other side should have been blocked" → crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "###!!! ABORT: other side should have been blocked"
Where are you seeing that and why would that relate to Java? JavaScriptCore is the WebKit JS engine, it's rather unlikely to be embedded in plugins.
(In reply to Georg Fritzsche [:gfritzsche] from comment #36)
> Where are you seeing that and why would that relate to Java?
I didn't know for Java and I see its ranking in: https://crash-stats.mozilla.com/topcrasher/byos/Firefox/20.0a2/Mac%20OS%20X/7/browser/report
This is a top-crasher for OSX with only ~ 7 crashes, hence not tracking at this point .Please renom if volume goes up
Keywords: topcrash
Crash Signature: , char const*, char const*, bool) const] → , char const*, char const*, bool) const] [@ mozalloc_abort(char const*) | Abort ]
Crash Signature: , char const*, char const*, bool) const] [@ mozalloc_abort(char const*) | Abort ] → , char const*, char const*, bool) const] [@ mozalloc_abort(char const*) | Abort ] [@ mozalloc_abort(char const* const) | NS_DebugBreak] [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char c…
Crash Signature: , char const*, char const*, char const*, bool) ] → , char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ]
Crash Signature: , char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] → , char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ]
Summary: crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "###!!! ABORT: other side should have been blocked" → crash in mozilla::ipc::RPCChannel::DebugAbort with abort message: "other side should have been blocked"
Crash Signature: , char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] → , char const*, char const*, char const*, bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] [@ mozalloc_abort(char const*) | NS_Debug…
Crash Signature: , bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] [@ mozalloc_abort(char const*) | NS_DebugBreak ] → , bool) ] [@ mozalloc_abort(char const*) | Abort | NS_DebugBreak_P ] [@ mozalloc_abort(char const*) | NS_DebugBreak | _ZZL11toHexStringPKhjR19nsACString_internalE6digits ] [@ mozalloc_abort(char const*) | NS_DebugBreak ] [@ mozalloc_abort(char const*)…
I was able to crash Firefox 24 beta 4 using this steps:

1. Open http://silverlightgames.org/game/hamquest/play/
2. Play a bit ~2 minute.
3. In the same tab open http://www.javagameplay.com/offroadrally/rally.html

It may take some few tries but I was able to crash FF. Unable to crash latest Nightly tough.

Signature:
[@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) ]

https://crash-stats.mozilla.com/report/index/42f7d3e3-4b38-4aa5-9d87-c7f752130822
https://crash-stats.mozilla.com/report/index/662d8470-cc58-4241-9fdb-e0e7a2130822
https://crash-stats.mozilla.com/report/index/8d6a8aed-d760-49d9-b24f-f817c2130822
https://crash-stats.mozilla.com/report/index/64e06475-d1ee-433a-87f9-141992130822
(In reply to menouda from comment #40)
> We developp a project with FF ESR 17 and we are able to crash Firefox.
> I deactivated all plugins excepted Quicktime plugin 7.7.5 and we could
> reproduce it.

Firefox 17esr is no longer supported. Please update this bug if you can reproduce this with Firefox 24esr.
Crash Signature: const*) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) const ] → const*) | NS_DebugBreak | mozilla::ipc::RPCChannel::DebugAbort(char const*, int, char const*, char const*, char const*, bool) const ] [@ mozalloc_abort | NS_DebugBreak_P | mozilla::ipc::RPCChannel::DebugAbort] [@ mozalloc_abort | _RTC_Terminate] [@ mo…
Resolving old bugs which are likely not relevant any more, since NPAPI plugins are deprecated.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.