Closed Bug 1264530 Opened 8 years ago Closed 8 years ago

crash in nsPluginInstanceOwner::UseAsyncRendering

Categories

(Core Graveyard :: Plug-ins, defect)

Unspecified
Windows NT
defect
Not set
critical

Tracking

(firefox48 fixed, firefox49- fixed, relnote-firefox 48+, firefox50 fixed, firefox51 fixed)

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- fixed
firefox49 - fixed
relnote-firefox --- 48+
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: n.nethercote, Assigned: BenWa)

References

Details

(4 keywords)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-091b5f59-428a-4a40-9f25-9fb662160413.
=============================================================

45 crashes with this signature in the past seven days.
bgirard, any ideas?
Flags: needinfo?(bgirard)
umm, best guess is that mPluginWindow isn't always initialized.
Attached patch patchSplinter Review
I meant to attach this patch but forgot.

I'm not sure that this will help, but it's probably a good idea anyways.
Attachment #8747105 - Flags: review?(n.nethercote)
Comment on attachment 8747105 [details] [diff] [review]
patch

Review of attachment 8747105 [details] [diff] [review]:
-----------------------------------------------------------------

I also don't see how this will help, but it can't hurt. I suggest leaving the bug open after you land this patch.
Attachment #8747105 - Flags: review?(n.nethercote) → review+
The crash addresses are all over the place:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#tab-reports

And we're de-referencing mPluginwindow. So I'd say this has a non trivial chance of addressing the issue.
Flags: needinfo?(bgirard)
Whiteboard: leave-open
Keywords: checkin-needed
I see: if the do_GetService(MOZ_PLUGIN_HOST_CONTRACTID) call fails then without this patch mPluginWindow will be uninitialized. Good to fix, then.
hi, crashes with this signature are still occurring on nightly & dev-edition after the patch has landed.
Currently #6 overall top crash in 48.
Keywords: topcrash
Adding a NI to make sure BenWa notices this is still alive. We need an owner.
Flags: needinfo?(bgirard)
Milan, do you know who else could help with this? Thanks.
We could take this as a ride along of a 48.0.1
Flags: needinfo?(milan)
This has now moved up to the #2 top overall rash in 48.
Johnny, any clues? Thanks
Flags: needinfo?(jst)
I can look on Wednesday if that's not too late.
It is a bit too late, this is blocking our capability to push to more than 25% of our release users and the creation of a dot release for 48.
From the looks of it this crash started to explode recently and most of the crashing addresses are at 0xfffffffff0de8023. I haven't seen something like this before. It could be that the upper word is getting corrupted with something like -1 but then why would the lower word always be same?

Is there's an easy way to get changelogs for the period where the crash rate started exploding?
Perhaps the crash explosion is linked with something such as the e10s role out?
Looks like we slice the object here:
https://dxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginInstanceOwner.cpp#3575

Probably not the problem since it's very old code however.
(In reply to Benoit Girard (:BenWa) from comment #19)
> Perhaps the crash explosion is linked with something such as the e10s role
> out?

Does not appear to be related, it's #33 in the e10s test group, #2 overall.
Adding regressionwindow-wanted since it was requested in Comment 18.
Not sure how I can help in this area, but I wonder if bug 132759 contains any useful hints.
Flags: needinfo?(milan)
[Tracking Requested - why for this release]: Should we block on this?
Keywords: regression
Flags: needinfo?(bugs)
This is happening often on Windows 10 (10.0.10586) [~60% of crashes with this signature compared to ~30% overall] and often with an URL starting with wyciwyg [~30% with this signature compared to less than 5% overall].
Another interesting data point is that most users that are experiencing this crash don't have AdBlock Plus installed [more than 95% of crashes with this signature compared to less than 80% overall].
Thanks for the data. I'll keep it in mind while looking however nothing jumps to mind. I'll spend some of today searching.
Could the crash explosion correlate with the landing/uplift of https://hg.mozilla.org/releases/mozilla-release/rev/28ba9cceda61 ?
Some progress:
From the minidump: We're crashing looking at memory point by using nsPluginFrame's mInstanceOwner->mInstance because mInstanceOwner is bad. 28ba9cceda61 added a plugin call which can destroy the nsPluginFrame I believe so it's very likely responsible for the increase.
Build ID 20160726073904 was normal for a few days (from 2016-07-27 to 2016-08-02) and started spiking from 2016-08-03 (see https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-05-05&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#graphs, if you select "build ID" in "Crashes per day, by ..."). So the spike seems due to a chenage in the external conditions (the finding from comment 26 would support this hypothesis).

If you look at the graph for 48.0a2, the number of crashes is pretty stable and basically the same before/after the uplift of bug 1269444 (although there is an interesting spike on June 1st): https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-05-05&version=48.0a2&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#graphs

N.B.: This doesn't mean that 28ba9cceda61 can't be the cause of the crash.
Ok cool, I think we got everything adding up.
WinDbg points the address to be a block of MEM_RESERVE memory which is something that frame poisoning. This would explain why nearly all the crashes are the same address. We end up deleting nsPluginFrame, we poison the values and we end up with a lot of crashes.

What changed is we added a plug-in call which means that now plugins can do whatever they please including destroying the plugin frame while we have an active call against it. Too bad' :marco data does not support this (nor does it reject it). But I'm sure enough to proceed with a patch.

Patch incoming.
Attachment #8779850 - Flags: review?(jmathies) → review+
Just a note on the patch. Without reproducing this bug I can't say for sure that it wont just move the crash somewhere else. But likely that will have a similar problem and similar crash signature and address clustering. But it's unlikely to make things worse... maybe?
Assignee: nobody → bgirard
Benoit, while I agree that it's hard to tell for sure, I think you're very likely looking at the real issue here. We have two options here, generally speaking... We either do what your patch does and continue executing this method after the plugin is dead, as your patch does. Or we bail, as we do a bit earlier in this function (on OSX).
Flags: needinfo?(jst)
Yes, I'm being very pessimistic in that statement. My first patch was a shot in the dark but this one is not and we will likely fix the issue. We should decide how far we want to uplift this. Sounds like we missed the point release already.

Let me know if you want me to change the patch to use bail out instead but I don't know which calls are destroying the frame. I believe it's not just ResolutionMayHaveChanged.
I'm not convinced there's a good way to tell which is better. What's the crash volume for this one on nightly/aurora/beta? Can we know if this was fixed in a few days on nightly I guess is the real question? If we can, and it's not, we should change the patch to the other alternative approach.
Since it's a plugin change I want to get at least one day in nightly. However ill request uplift immediately after to aurora and beta. The fix looks good and if it isn't with that population we will know in time to have another look.
Flags: needinfo?(bgirard)
(In reply to Johnny Stenback  (:jst, jst@mozilla.com) from comment #37)
> What's the crash volume for this one on nightly/aurora/beta?
on nightly there are generally 1-2 crashes a day, on aurora 3-10 per day & on beta around 100
Benoit, could you fill the uplift requests to aurora, beta & release? Thanks
Flags: needinfo?(bgirard)
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

I'm putting the request in but I'd like to have one day coverage in nightly before we uplift.

Approval Request Comment
[Feature/regressing bug #]: Bug 1269444 *might* of made it worse, we're not sure exactly
[User impact if declined]: Topcrasher due to deleted and sometimes poison framed.
[Describe test coverage new/current, TreeHerder]: There's some plugin and flash coverage but this bug path is not exercised.
[Risks and why]: Fairly low. This has a very high chance of fixing the bug. Worse case is likely we will see another signature pop-up with frame poisoning addresses. If so we can react fast but it's very unlikely.
[String/UUID change made/needed]: none
Attachment #8779850 - Flags: approval-mozilla-beta?
Attachment #8779850 - Flags: approval-mozilla-aurora?
Attachment #8779850 - Flags: approval-mozilla-release?
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

Taking it now to have it during this week end on aurora
Will decide on release next week
Attachment #8779850 - Flags: approval-mozilla-beta?
Attachment #8779850 - Flags: approval-mozilla-beta+
Attachment #8779850 - Flags: approval-mozilla-aurora?
Attachment #8779850 - Flags: approval-mozilla-aurora+
Flags: needinfo?(bgirard)
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

Top crash #2, taking it in release.
Attachment #8779850 - Flags: approval-mozilla-release? → approval-mozilla-release+
It's been on Nightly for a bit now and this hasn't broken plugins AFAIK. Uplifting now should be fine.
Flags: needinfo?(bugs)
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: leave-open
Target Milestone: --- → mozilla51
Added in the 48.0.1 release notes: "Fix a top crash caused by plugin issues".
Looks like the volume went back to normal for 48.0 (which does not include the patch). Just one crash with 48.0.1, so I guess we can call it fixed.

I suppose the problem was due to some Flash advertisement, which bug 1269444 made worse.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: