Closed Bug 680130 Opened 13 years ago Closed 8 years ago

Plugin child instance hang/crash in mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout()

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows 7
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jimm, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [read comment 12])

Crash Data

These and other similar crashes are "by design" due to the child timeout logic added in bug 677711. Causes vary, but basically this is a case where the parent is frozen and the child times out, causing it to kill itself to free the parent. This should probably serve as a meta bug since in some of these cases we should be able to find a work around for the parent hang.
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PPluginScriptableObjectChild::FatalError(char const* const) ] → [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PPluginScriptableObjectChild::FatalError(char const* const) ] [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReply…
Most common offenders so far based on a manual sampling, ordered by most common: mozilla::plugins::PluginInstanceChild::ShowPluginFrame send: http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginInstanceChild.cpp#2995 recv: http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginInstanceParent.cpp#500 mozilla::plugins::PPluginModuleChild::CallNPN_UserAgent mozilla::plugins::PPluginInstanceChild::CallPStreamNotifyConstructor mozilla::plugins::PPluginScriptableObjectChild::CallNPN_Evaluate mozilla::plugins::PPluginScriptableObjectChild::CallInvokeDefault About ~60%-70% of the timeouts are in ShowPluginFrame.
I don't think ShowPluginFrame is a real hang, but I do think it might indicate a serious bottle neck on our plugin rendering code. This reminds me of all the slow performance bugs we had back when we released oopp.
More accurate stats doing some simple screen scraping of crashstats over the last two days: 52 : 'mozilla::plugins::PPluginInstanceChild::SendShow(..)' 19 : 'mozilla::plugins::PluginModuleChild::GetUserAgent()' 7 : 'mozilla::plugins::PluginScriptableObjectChild::Evaluate(..)' 2 : 'mozilla::plugins::child::_posturlnotify' 2 : 'mozilla::plugins::PluginScriptableObjectChild::ScriptableInvokeDefault(..)' 2 : 'mozilla::plugins::child::_geturlnotify' 1 : 'mozilla::plugins::PluginInstanceChild::NPN_GetValue(NPNVariable,void*)'
Blocks: 681385
Blocks: 683967
Note dependent bugs aren't necessarily critical since the timeout code in bug 677711 will be disabled when we merge. But we want to keep this going on mc so we can continue to diagnose. Related to this I'm trying to get bug 679238 finished up so we have more visibility on the parent side when this happens. Also it would be great if we could somehow generate an accurate list of the most common stacks leading up to ShouldContinueFromReplyTimeout() in the child, similar to what I generated with some screen scraping in bug comment 3.
Summary: Plugin child instance crash in mozalloc_abort(char const* const) → Plugin child instance crash in ShouldContinueFromReplyTimeout()
[@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout()] has been rising really much in the last few days, is there an underlying problem causing this?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #5) > [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | > mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout()] has > been rising really much in the last few days, is there an underlying problem > causing this? The crash-on-parent-hang code was disabled in aurora (9.0a1, 9.0a2) on 10/11: http://hg.mozilla.org/releases/mozilla-aurora/rev/c4862aaec55b https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A9.0a2&platform=windows&query_search=signature&query_type=contains&reason_type=contains&date=10%2F14%2F2011%2009%3A23%3A46&range_value=1&range_unit=weeks&hang_type=any&process_type=plugin&plugin_field=filename&plugin_query_type=exact&do_query=1&signature=mozalloc_abort%28char%20const*%20const%29%20|%20msvcr80.dll%400xe456 which clearly shut these crashes down as intended. As far as 10.0a1, the particular stack you mention stated showing up on the same date: https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A10.0a1&platform=windows&query_search=signature&query_type=contains&reason_type=contains&date=10%2F14%2F2011 09%3A23%3A46&range_value=1&range_unit=weeks&hang_type=any&process_type=plugin&plugin_field=filename&plugin_query_type=exact&do_query=1&signature=mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla%3A%3Aplugins%3A%3APluginModuleChild%3A%3AShouldContinueFromReplyTimeout() This looks like it's caused by the crash code, which is an intended crash. The other signatures related to this, for example: https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A10.0a1&platform=windows&query_search=signature&query_type=contains&reason_type=contains&date=10%2F14%2F2011%2009%3A23%3A46&range_value=1&range_unit=weeks&hang_type=any&process_type=plugin&plugin_field=filename&plugin_query_type=exact&do_query=1&signature=mozalloc_abort%28char%20const*%20const%29%20|%20_RTC_Terminate stop on 10/10, so, I don't know, maybe the signature swap was caused by a socorro change or something and the dates between the two branches happened to match up? I landed the aurora change when I was able to get around to it, it wasn't correlated with any other landings or changes. So I don't think the changes between the two branches are related.
Looks like: https://bugzilla.mozilla.org/show_bug.cgi?id=691912 https://github.com/mozilla/socorro/commit/4209b3a5559150805a66d0e1ca98e3f3a92cff60 Might be it. That added *abort to the skip list, not sure when that went live though on our production servers.
Hmm, there were a few skiplist changes in that Socorro release, right. Not sure this one was it, but if the others all went away and are replaced by this signature, it's alright anyhow. Just wondered about the rise, which was determined to be a false alarm, so everything's alright. :)
(In reply to Jim Mathies [:jimm] from comment #2) > I don't think ShowPluginFrame is a real hang, but I do think it might > indicate a serious bottle neck on our plugin rendering code. This reminds me > of all the slow performance bugs we had back when we released oopp. I have bp-af631221-5284-4d27-8ec1-74e322111025 and bp-3f3901d8-9cf5-43bf-9520-b31762111025 "[@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout() ]", which says that this Bug 680130 is related. It occurs when I restart Nightly with over 220 Tabs (using TM+). When I use Aurora or "Release" I seldom have trouble with that few a number of Tabs and need to go up to 250 or more to crash on startup.
Severity: normal → critical
Keywords: crash
Hardware: x86_64 → x86
Blocks: 701739
No longer blocks: 681385
Blocks: 681385, 696358
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PPluginScriptableObjectChild::FatalError(char const* const) ] [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReply… → [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout() ] [@ hang | mozalloc_abort(char const* const) | NS_DebugBreak_P | mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyT…
Depends on: 711971
Summary: Plugin child instance crash in ShouldContinueFromReplyTimeout() → Plugin child instance hang/crash in mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout()
These crashes are triggered by the code that landed in bug 677711 which aborts the child if the child detects a parent hang after 15 seconds. This code was disabled on all branches on 1/7/12. We'll re-enable this code once bug 679238 is fixed. Currently we don't get good parent side stacks, so the child aborts were pretty useless, aside from giving us an idea of how many parent hangs we experience.
Whiteboard: [read comment 12]
No longer blocks: 681385
No longer blocks: 701739
See Also: → 637596
This is on my Bug List so I am checking in. (In reply to Jim Mathies [:jimm] (away 4/1-4/21) from comment #12) > These crashes are triggered by the code that landed in bug 677711 which > aborts the child if the child detects a parent hang after 15 seconds. This > code was disabled on all branches on 1/7/12. We'll re-enable this code once > bug 679238 is fixed. Currently we don't get good parent side stacks, so the > child aborts were pretty useless, aside from giving us an idea of how many > parent hangs we experience. In Bug 697739 Comment 12 I used a Registry Modification to prevent BSODs that I was getting from Watchdog.sys -- I think that defeated (fixed, for me) that Code (parent hang). I noticed that this BR was started 2011-08-18, most replies were made within 5 months, with the exception of Stephen Donner's "Depends" more than a year ago (also consisting of Comment more than one year old). ALL other Bugs mentioned on this Page are closed. I suggest closing this BR.
Crash Signature: mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout() ] [@ mozalloc_abort(char const* const) | _RTC_Terminate ] [@ hang | mozalloc_abort(char const* const) | _RTC_Terminate ] → mozilla::plugins::PluginModuleChild::ShouldContinueFromReplyTimeout() ] [@ mozalloc_abort(char const* const) | _RTC_Terminate ] [@ hang | mozalloc_abort(char const* const) | _RTC_Terminate ] [@ mozalloc_abort | NS_DebugBreak_P | mozilla::plugins::Plug…
I'm marking this bug as WORKSFORME as bug crashlog signature didn't appear from a long time (over half year).
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.