crash in nsTArray_Impl<T>::Clear() called from nsPluginFrame::SetEmptyWidgetConfiguration

RESOLVED FIXED in Firefox 47

Status

()

defect
--
critical
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: dmajor, Assigned: dbaron)

Tracking

({crash})

40 Branch
mozilla48
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox39 wontfix, firefox40+ wontfix, firefox41+ wontfix, firefox42- affected, firefox47 fixed, firefox48 fixed)

Details

(crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-fb38aa53-a640-471c-91d4-e5d1a2150705.
=============================================================

mNextConfigurationClipRegion's array header is nullptr... which apparently should never ever happen?

xul!nsTArray_Impl<WeakMapping,nsTArrayInfallibleAllocator>::Clear
xul!nsPluginFrame::SetEmptyWidgetConfiguration
xul!SetPluginHidden
xul!nsTHashtable<nsPtrHashKey<mozilla::css::ImageValue> >::s_EnumStub
xul!nsTHashtable<nsRefPtrHashKey<nsIContent> >::EnumerateEntries
xul!nsRootPresContext::ComputePluginGeometryUpdates
xul!nsLayoutUtils::PaintFrame
[Tracking Requested - why for this release]:
1.3% of crashes on 40 beta. A very small number of hits on aurora. Unclear whether nightly is affected; it may be due to differences in the userbase.
Aaron, not really sure if this is your turf, but the stack has the word Plugin several times :-)
Flags: needinfo?(aklotz)
Adding a tracking flag for FF40 and 41.
I suspect that jimm is a better person to look at this one.
Flags: needinfo?(aklotz) → needinfo?(jmathies)
This looks pretty old, I see hits back to a 24esr build.

Looks like the nsObjectFrame we get here isn't valid for some reason.

http://hg.mozilla.org/releases/mozilla-esr31/annotate/06188a3eed1f/layout/base/nsPresContext.cpp#l2768
Flags: needinfo?(jmathies)
This may be old but it is still a high enough volume crash to track.

jimm - Are you the right person to look at this? Anyone else you can recommend so that we can try and make progress?
Weird crash in member variable that shouldn't be null. Also doesn't happen with e10s enabled which is a good sign. It also mildly correlates with accessibility but isn't exclusive to it.

I can try to take a look at some point over the next few weeks, but it's not going to be a priority.
Andrew, David: I am kind of shooting in the dark here. This seems like a high volume crash. Combined for 40.0.2 and 40.0.3, there are ~2000 crash instances. Would you be able to help find an owner who can investigate? This also came up on my radar as it's a FF41+ tracking bug.
Flags: needinfo?(dmajor)
Flags: needinfo?(continuation)
This is a really generic signature. It should be added to some kind of prefix list so that different issues are split into different signatures.

The comments in this bug talk about plugins, but it felt like most of the crash reports I looked at were under nsCaseTransformTextRunFactory::RebuildTextRun(), which is more like Layout: Text.

I also noticed that there were an unusually large number of comments about weather.com. Some are pretty specific, like "Crashed during an attempt to open a Firefox web page for the 10 day forcast from The Windows 7 x64 The Weather Channel gaget."
https://crash-stats.mozilla.com/report/index/6f2e6cb8-e10a-4207-887a-8443d2150831
Flags: needinfo?(continuation)
From my highly-unscientific spot check, I still see a majority of stacks similar to comment 0. I think Jim is still in the best position to investigate. The volume is in the "nice to fix but not a fire alarm" range so I'm willing to wait a while if his current work is more urgent.
Flags: needinfo?(dmajor)
Depends on: 1200416
Thanks Andrew and David for the follow up. I will not hold my breath for a fix here but wait patiently.
Note that after bug 1200416 reaches production (a few days I think) this crash will "disappear" and be replaced by more-detailed signatures. We can focus this bug on the largest bucket once we find out what it is.
Note that text-related code showing up in stacks does not necessarily indicate that the problem lies in that code, but simply that text code such as that is hit a lot by any page so any sort of memory-stomping bug will often show up with a higher number of stacks in glyph run or textrun code.
It is getting too late to fix this in FF41 given that we don't have a patch ready. Let's try to get this fixed in 42.
Here's our new signature post- bug 1200416. SetEmptyWidgetConfiguration is indeed the largest factor.
Crash Signature: [@ nsTArray_Impl<T>::Clear()] → [@ nsTArray_Impl<T>::Clear()] [@ nsTArray_Impl<T>::Clear() | nsPluginFrame::SetEmptyWidgetConfiguration() ]
Stop tracking this bug, we have been shipping several releases with this bug.
Crash Signature: [@ nsTArray_Impl<T>::Clear()] [@ nsTArray_Impl<T>::Clear() | nsPluginFrame::SetEmptyWidgetConfiguration() ] → [@ nsTArray_Impl<T>::Clear()] [@ nsTArray_Impl<T>::Clear() | nsPluginFrame::SetEmptyWidgetConfiguration() ] [@ nsTArray_Impl<T>::Clear] [@ nsTArray_Impl<T>::Clear | nsPluginFrame::SetEmptyWidgetConfiguration ]
Flags: needinfo?(jmathies)
It seems like one relatively straightforward way for things to go wrong here is for nsPluginFrame::EndSwapDocShells to be called on an nsPluginFrame whose mInstanceOwner is null, since having an mInstanceOwner seems to be what's associated with being registered.  If such an nsPluginFrame never gets a non-null mInstanceOwner, this could lead to a dangling nsPluginFrame in the nsRootPresContext's hash.
Though user comments for these crashes don't seem to mention dragging tabs.

But they do frequently mention plugin crashes, plugin hangs, and slow script dialogs.  (Many comments mention crashing after a popup offering to stop the plugin.  Some even mention killing flash in task manager.)
Actually, I was thinking EndSwapDocShells was related to tab dragging, but I now think it's just related to frame reconstruction.
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
Summary: crash in nsTArray_Impl<T>::Clear() → crash in nsTArray_Impl<T>::Clear() called from nsPluginFrame::SetEmptyWidgetConfiguration
Having a non-null mInstanceOwner corresponds to when registration
normally happens (from nsPluginInstanceOwner::SetFrame), and it's
disconnecting the instance owner that leads to unregistration.

Review commit: https://reviewboard.mozilla.org/r/40947/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/40947/
Attachment #8731948 - Flags: review?(tnikkel)
Comment on attachment 8731948 [details]
MozReview Request: Bug 1182247 - Only RegisterPluginForGeometryUpdates in nsPluginFrame::EndSwapDocShells if mInstanceOwner is non-null.  r?tnikkel

https://reviewboard.mozilla.org/r/40947/#review39969

Sorry for taking so long, this code is complicated and I wanted to get it right.

It does seem like the code is more complicated then needed. If this doesn't work we could try simplifying the code.
Attachment #8731948 - Flags: review?(tnikkel) → review+
https://hg.mozilla.org/mozilla-central/rev/329a66b8e67c
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8731948 [details]
MozReview Request: Bug 1182247 - Only RegisterPluginForGeometryUpdates in nsPluginFrame::EndSwapDocShells if mInstanceOwner is non-null.  r?tnikkel

Approval Request Comment
[Feature/regressing bug #]: unknown (though it could probably be worked out)
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: I think it's a relatively safe patch that's likely (although not certain) to fix this crash.  We won't know until it reaches beta if the crash fix worked, given that we basically don't see this crash on nightly and aurora.  So I'd at least like to bring it forward to aurora.
[String/UUID change made/needed]: no
Attachment #8731948 - Flags: approval-mozilla-aurora?
Comment on attachment 8731948 [details]
MozReview Request: Bug 1182247 - Only RegisterPluginForGeometryUpdates in nsPluginFrame::EndSwapDocShells if mInstanceOwner is non-null.  r?tnikkel

Crash fix that hasn't shown up in Nightly/Aurora population in 47/48 but simple enough fix. Aurora47+
Attachment #8731948 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.