crash in nsPluginInstanceOwner::UseAsyncRendering

RESOLVED FIXED in Firefox 48

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: njn, Assigned: BenWa)

Tracking

(4 keywords)

Trunk
mozilla51
Unspecified
Windows NT
Points:
---

Firefox Tracking Flags

(firefox48 fixed, firefox49- fixed, relnote-firefox 48+, firefox50 fixed, firefox51 fixed)

Details

(crash signature)

Attachments

(2 attachments)

Reporter

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-091b5f59-428a-4a40-9f25-9fb662160413.
=============================================================

45 crashes with this signature in the past seven days.
Reporter

Comment 1

3 years ago
bgirard, any ideas?
Reporter

Updated

3 years ago
Flags: needinfo?(bgirard)
Assignee

Comment 2

3 years ago
umm, best guess is that mPluginWindow isn't always initialized.
Assignee

Comment 3

3 years ago
Posted patch patchSplinter Review
I meant to attach this patch but forgot.

I'm not sure that this will help, but it's probably a good idea anyways.
Attachment #8747105 - Flags: review?(n.nethercote)
Reporter

Comment 4

3 years ago
Comment on attachment 8747105 [details] [diff] [review]
patch

Review of attachment 8747105 [details] [diff] [review]:
-----------------------------------------------------------------

I also don't see how this will help, but it can't hurt. I suggest leaving the bug open after you land this patch.
Attachment #8747105 - Flags: review?(n.nethercote) → review+
Assignee

Comment 5

3 years ago
The crash addresses are all over the place:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#tab-reports

And we're de-referencing mPluginwindow. So I'd say this has a non trivial chance of addressing the issue.
Flags: needinfo?(bgirard)
Whiteboard: leave-open
Assignee

Updated

3 years ago
Keywords: checkin-needed
Reporter

Comment 6

3 years ago
I see: if the do_GetService(MOZ_PLUGIN_HOST_CONTRACTID) call fails then without this patch mPluginWindow will be uninitialized. Good to fix, then.

Comment 9

3 years ago
hi, crashes with this signature are still occurring on nightly & dev-edition after the patch has landed.

Updated

3 years ago
Duplicate of this bug: 1110093
Currently #6 overall top crash in 48.

Updated

3 years ago
Keywords: topcrash
Adding a NI to make sure BenWa notices this is still alive. We need an owner.
Flags: needinfo?(bgirard)
Milan, do you know who else could help with this? Thanks.
We could take this as a ride along of a 48.0.1
Flags: needinfo?(milan)
This has now moved up to the #2 top overall rash in 48.
Johnny, any clues? Thanks
Flags: needinfo?(jst)
Assignee

Comment 16

3 years ago
I can look on Wednesday if that's not too late.
It is a bit too late, this is blocking our capability to push to more than 25% of our release users and the creation of a dot release for 48.
Assignee

Comment 18

3 years ago
From the looks of it this crash started to explode recently and most of the crashing addresses are at 0xfffffffff0de8023. I haven't seen something like this before. It could be that the upper word is getting corrupted with something like -1 but then why would the lower word always be same?

Is there's an easy way to get changelogs for the period where the crash rate started exploding?
Assignee

Comment 19

3 years ago
Perhaps the crash explosion is linked with something such as the e10s role out?
Assignee

Comment 20

3 years ago
Looks like we slice the object here:
https://dxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginInstanceOwner.cpp#3575

Probably not the problem since it's very old code however.
(In reply to Benoit Girard (:BenWa) from comment #19)
> Perhaps the crash explosion is linked with something such as the e10s role
> out?

Does not appear to be related, it's #33 in the e10s test group, #2 overall.
Adding regressionwindow-wanted since it was requested in Comment 18.
Not sure how I can help in this area, but I wonder if bug 132759 contains any useful hints.
Flags: needinfo?(milan)
[Tracking Requested - why for this release]: Should we block on this?
Keywords: regression
Flags: needinfo?(bugs)
This is happening often on Windows 10 (10.0.10586) [~60% of crashes with this signature compared to ~30% overall] and often with an URL starting with wyciwyg [~30% with this signature compared to less than 5% overall].
Another interesting data point is that most users that are experiencing this crash don't have AdBlock Plus installed [more than 95% of crashes with this signature compared to less than 80% overall].
Assignee

Comment 27

3 years ago
Thanks for the data. I'll keep it in mind while looking however nothing jumps to mind. I'll spend some of today searching.
Assignee

Comment 28

3 years ago
Could the crash explosion correlate with the landing/uplift of https://hg.mozilla.org/releases/mozilla-release/rev/28ba9cceda61 ?
Assignee

Comment 29

3 years ago
Some progress:
From the minidump: We're crashing looking at memory point by using nsPluginFrame's mInstanceOwner->mInstance because mInstanceOwner is bad. 28ba9cceda61 added a plugin call which can destroy the nsPluginFrame I believe so it's very likely responsible for the increase.
Build ID 20160726073904 was normal for a few days (from 2016-07-27 to 2016-08-02) and started spiking from 2016-08-03 (see https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-05-05&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#graphs, if you select "build ID" in "Crashes per day, by ..."). So the spike seems due to a chenage in the external conditions (the finding from comment 26 would support this hypothesis).

If you look at the graph for 48.0a2, the number of crashes is pretty stable and basically the same before/after the uplift of bug 1269444 (although there is an interesting spike on June 1st): https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-05-05&version=48.0a2&signature=nsPluginInstanceOwner%3A%3AUseAsyncRendering#graphs

N.B.: This doesn't mean that 28ba9cceda61 can't be the cause of the crash.
Assignee

Comment 31

3 years ago
Ok cool, I think we got everything adding up.
WinDbg points the address to be a block of MEM_RESERVE memory which is something that frame poisoning. This would explain why nearly all the crashes are the same address. We end up deleting nsPluginFrame, we poison the values and we end up with a lot of crashes.

What changed is we added a plug-in call which means that now plugins can do whatever they please including destroying the plugin frame while we have an active call against it. Too bad' :marco data does not support this (nor does it reject it). But I'm sure enough to proceed with a patch.

Patch incoming.
Comment hidden (mozreview-request)

Updated

3 years ago
Attachment #8779850 - Flags: review?(jmathies) → review+
Assignee

Comment 33

3 years ago
Just a note on the patch. Without reproducing this bug I can't say for sure that it wont just move the crash somewhere else. But likely that will have a similar problem and similar crash signature and address clustering. But it's unlikely to make things worse... maybe?
Assignee

Updated

3 years ago
Assignee: nobody → bgirard
Benoit, while I agree that it's hard to tell for sure, I think you're very likely looking at the real issue here. We have two options here, generally speaking... We either do what your patch does and continue executing this method after the plugin is dead, as your patch does. Or we bail, as we do a bit earlier in this function (on OSX).
Flags: needinfo?(jst)
Assignee

Comment 35

3 years ago
Yes, I'm being very pessimistic in that statement. My first patch was a shot in the dark but this one is not and we will likely fix the issue. We should decide how far we want to uplift this. Sounds like we missed the point release already.

Let me know if you want me to change the patch to use bail out instead but I don't know which calls are destroying the frame. I believe it's not just ResolutionMayHaveChanged.
I'm not convinced there's a good way to tell which is better. What's the crash volume for this one on nightly/aurora/beta? Can we know if this was fixed in a few days on nightly I guess is the real question? If we can, and it's not, we should change the patch to the other alternative approach.
Assignee

Comment 38

3 years ago
Since it's a plugin change I want to get at least one day in nightly. However ill request uplift immediately after to aurora and beta. The fix looks good and if it isn't with that population we will know in time to have another look.
Flags: needinfo?(bgirard)

Comment 39

3 years ago
(In reply to Johnny Stenback  (:jst, jst@mozilla.com) from comment #37)
> What's the crash volume for this one on nightly/aurora/beta?
on nightly there are generally 1-2 crashes a day, on aurora 3-10 per day & on beta around 100
Benoit, could you fill the uplift requests to aurora, beta & release? Thanks
Flags: needinfo?(bgirard)
Assignee

Comment 41

3 years ago
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

I'm putting the request in but I'd like to have one day coverage in nightly before we uplift.

Approval Request Comment
[Feature/regressing bug #]: Bug 1269444 *might* of made it worse, we're not sure exactly
[User impact if declined]: Topcrasher due to deleted and sometimes poison framed.
[Describe test coverage new/current, TreeHerder]: There's some plugin and flash coverage but this bug path is not exercised.
[Risks and why]: Fairly low. This has a very high chance of fixing the bug. Worse case is likely we will see another signature pop-up with frame poisoning addresses. If so we can react fast but it's very unlikely.
[String/UUID change made/needed]: none
Attachment #8779850 - Flags: approval-mozilla-beta?
Attachment #8779850 - Flags: approval-mozilla-aurora?
Assignee

Updated

3 years ago
Attachment #8779850 - Flags: approval-mozilla-release?
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

Taking it now to have it during this week end on aurora
Will decide on release next week
Attachment #8779850 - Flags: approval-mozilla-beta?
Attachment #8779850 - Flags: approval-mozilla-beta+
Attachment #8779850 - Flags: approval-mozilla-aurora?
Attachment #8779850 - Flags: approval-mozilla-aurora+
Assignee

Updated

3 years ago
Flags: needinfo?(bgirard)

Updated

3 years ago
Comment on attachment 8779850 [details]
Bug 1264530 - Hold on to Plugin Instance to survive frame poisoning.

Top crash #2, taking it in release.
Attachment #8779850 - Flags: approval-mozilla-release? → approval-mozilla-release+
Assignee

Comment 45

3 years ago
It's been on Nightly for a bit now and this hasn't broken plugins AFAIK. Uplifting now should be fine.
Flags: needinfo?(bugs)
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: leave-open
Target Milestone: --- → mozilla51
Added in the 48.0.1 release notes: "Fix a top crash caused by plugin issues".
Looks like the volume went back to normal for 48.0 (which does not include the patch). Just one crash with 48.0.1, so I guess we can call it fixed.

I suppose the problem was due to some Flash advertisement, which bug 1269444 made worse.
You need to log in before you can comment on or make changes to this bug.