Closed
Bug 1167069
Opened 9 years ago
Closed 9 years ago
Constant crashes in content [@ nsPluginInstanceOwner::GetPluginPort()]
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(e10s-, firefox38 wontfix, firefox38.0.5+ wontfix, firefox39+ wontfix, firefox40+ verified, firefox41+ verified)
People
(Reporter: whimboo, Assigned: smichaud)
References
()
Details
(Keywords: crash)
Crash Data
Attachments
(5 files, 1 obsolete file)
681 bytes,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
10.97 KB,
application/x-gzip
|
Details | |
7.98 KB,
application/x-gzip
|
Details | |
4.39 KB,
patch
|
Details | Diff | Splinter Review | |
2.41 KB,
patch
|
jimm
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:41.0) Gecko/20100101 Firefox/41.0 ID:20150511122605 CSet: 502e1a5e722f Since yesterday I see constant crashes of all open tabs with the following stack (first 10 frames): 0 XUL nsPluginInstanceOwner::GetPluginPort() obj-firefox/x86_64/dist/include/nsCOMPtr.h 1 XUL nsPluginFrame::CallSetWindow(bool) layout/generic/nsPluginFrame.cpp 2 XUL nsPluginInstanceOwner::CallSetWindow() dom/plugins/base/nsPluginInstanceOwner.cpp 3 XUL nsPluginHost::InstantiatePluginInstance(nsACString_internal const&, nsIURI*, nsObjectLoadingContent*, nsPluginInstanceOwner**) dom/plugins/base/nsPluginHost.cpp 4 XUL nsObjectLoadingContent::InstantiatePluginInstance(bool) dom/base/nsObjectLoadingContent.cpp 5 XUL nsObjectLoadingContent::SyncStartPluginInstance() dom/base/nsObjectLoadingContent.cpp 6 XUL nsObjectLoadingContent::ScriptRequestPluginInstance(JSContext*, nsNPAPIPluginInstance**) dom/base/nsObjectLoadingContent.cpp 7 XUL nsObjectLoadingContent::SetupProtoChain(JSContext*, JS::Handle<JSObject*>) dom/base/nsObjectLoadingContent.cpp 8 XUL mozilla::dom::HTMLSharedObjectElement::WrapNode(JSContext*, JS::Handle<JSObject*>) dom/html/HTMLSharedObjectElement.cpp 9 XUL nsINode::WrapObject(JSContext*, JS::Handle<JSObject*>) dom/base/nsINode.cpp 10 XUL mozilla::dom::Element::WrapObject(JSContext*, JS::Handle<JSObject*>) dom/base/Element.cpp Here is the appropriate report: bp-1fe61b33-99c0-4171-adf8-29bea2150521. This looks to be an OS X only crash which can make it up as topcrasher for OS X: Statistic of last 7 days: OS X 10.10 43.48 % 2531 OS X 10.9 21.94 % 1277 OS X 10.6 20.79 % 1210 OS X 10.7 7.37 % 429 OS X 10.8 6.42 % 374
Reporter | ||
Comment 2•9 years ago
|
||
Yes, that looks like e10s related. Since yesterday I see a lot of the notification bars in tabs that a content process (website) is slow and might wanna be killed. When I select 'Kill Web Process' the same happens. Maybe there is a situatino when this is done automatically? Even if only a single web page in a single tab is slow, every tab is affected and only shows the spinning wheel. That makes Firefox totally unusable.
Updated•9 years ago
|
tracking-e10s:
--- → ?
Reporter | ||
Comment 4•9 years ago
|
||
So this crash should be the result from the slow running website (https://vc1.ops.scl3.mozilla.com:9443/vsphere-client/). It totally hung up my browser at this time, and in non-e10s mode I had to kill my browser. So the safe-guard is working fine. So I assume this crash is ok, but what would be more interesting is the reason for the hang, right? Sadly I do not see it anymore since I had to restart this Macbook. Before it was reproducible all the time.
Assignee | ||
Comment 5•9 years ago
|
||
As best I can tell, these crashes take place here, when nsCOMPtr::get() is called on mWidget: https://hg.mozilla.org/mozilla-central/annotate/ff2e07228041/dom/plugins/base/nsPluginInstanceOwner.cpp#l2846 These crashes *aren't* null-dereferences. But oddly they're also not dereferences of a memory-poisoned value. For bp-1fe61b33-99c0-4171-adf8-29bea2150521 the (bad) value being dereferenced is 0x7ffffffff0dea7ff. This crash is in the 2015-05-11-12-26-05-mozilla-central nightly. The crash takes place at offset 0x1a40464 in the XUL module (from the raw dump). Here's the assembly code for nsPluginInstanceOwner::GetPluginPort(), which contains this address: __ZN21nsPluginInstanceOwner13GetPluginPortEv: 0000000001a40460 pushq %rbp 0000000001a40461 movq %rsp, %rbp 0000000001a40464 movq 0x68(%rdi), %rdi 0000000001a40468 testq %rdi, %rdi 0000000001a4046b je 0x1a4047f 0000000001a4046d movq CONFIG_STATIC_MSVCRT(%rdi), %rax 0000000001a40470 movq 0x278(%rax), %rax 0000000001a40477 movl $vp8_extra_bit_struct_prob, %esi 0000000001a4047c popq %rbp 0000000001a4047d jmpq *%rax 0000000001a4047f xorl %eax, %eax 0000000001a40481 popq %rbp 0000000001a40482 ret Note that the crash takes place as %rdi is dereferenced. The value of %rdi (at crash time, from the raw dump) is 0x7ffffffff0dea7ff. I haven't yet been able to take my analysis any further ... but maybe I'll think of something later.
Flags: needinfo?(smichaud)
Assignee | ||
Comment 6•9 years ago
|
||
Oops, I just realized my analysis from comment #5 is partly wrong. The crashes take place here, as nsPluginInstanceOwner::GetPluginPort's "this" pointer is dereferenced (to get the value of nsPluginInstanceOwner::mWidget): https://hg.mozilla.org/mozilla-central/annotate/ff2e07228041/dom/plugins/base/nsPluginInstanceOwner.cpp#l2840 So it's the value of mPluginInstanceOwner that's wrong (that == 0x7ffffffff0dea7ff), here: https://hg.mozilla.org/mozilla-central/annotate/ff2e07228041/layout/generic/nsPluginFrame.cpp#l613 (And so Socorro's references to nsCOMPtr::get() are simply wrong.)
Assignee | ||
Comment 7•9 years ago
|
||
By the way, I got my assembly code listing (in comment #5) by running the following comment on XUL from the 2015-05-11-12-26-05-mozilla-central nightly: otool -t -V -p __ZN21nsPluginInstanceOwner13GetPluginPortEv XUL | less
Assignee | ||
Comment 8•9 years ago
|
||
> the following comment
the following command
Assignee | ||
Comment 9•9 years ago
|
||
These crashes happen back at least to FF 37.0.1.
Version: 41 Branch → unspecified
Assignee | ||
Comment 10•9 years ago
|
||
> So it's the value of mPluginInstanceOwner that's wrong (that == 0x7ffffffff0dea7ff), here:
>
> https://hg.mozilla.org/mozilla-central/annotate/ff2e07228041/layout/generic/nsPluginFrame.cpp#l613
Looking through nsPluginFrame.cpp, it's pretty obvious how this could happen -- we're probably trying to dereference mPluginInstanceOwner before it's been initialized (so it's neither null nor a valid value).
Assignee | ||
Comment 11•9 years ago
|
||
Let me look through nsPluginFrame.cpp, to make sure mPluginInstanceOwner gets initialized properly and we do all the null checks we should be doing.
Assignee | ||
Comment 12•9 years ago
|
||
I'm no longer so sure that these crashes are caused by trying to use an uninitialized nsPluginFrame::mInstanceOwner -- if so, then why don't we crash a line earlier, here? https://hg.mozilla.org/mozilla-central/annotate/ff2e07228041/layout/generic/nsPluginFrame.cpp#l611 But this patch is trivially correct, and so I think worth a try. There aren't many of these crashes on the 41 branch: https://crash-stats.mozilla.com/search/?version=41.0a1&platform=Mac+OS+X&signature=%3DnsPluginInstanceOwner%3A%3AGetPluginPort%28%29&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature But there are enough to tell, in a week, whether or not my patch has fixed this bug.
Attachment #8611369 -
Flags: review?(jmathies)
Comment 13•9 years ago
|
||
Comment on attachment 8611369 [details] [diff] [review] Possible fix Looks reasonable to me, lets see if it helps.
Attachment #8611369 -
Flags: review?(jmathies) → review+
Updated•9 years ago
|
Assignee: nobody → smichaud
Updated•9 years ago
|
Assignee | ||
Comment 15•9 years ago
|
||
Comment on attachment 8611369 [details] [diff] [review] Possible fix Landed on mozilla-inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/419898d4e087 If, in a week, this bug's crashes haven't been completely eliminated, I'll reopen this bug.
Comment 16•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/419898d4e087
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment 17•9 years ago
|
||
Still seen on June 1's nightly: bp-34362289-2d80-460f-aaf9-99ac02150602 This crash is #12 on the weighted top-crash scores for 38.0.5.
Flags: needinfo?(smichaud)
Comment 18•9 years ago
|
||
0x7ffffffff0dea7ff is the "mozPoisonValue" used mostly by layout (different from the jemalloc poison value): https://dxr.mozilla.org/mozilla-central/source/mfbt/Poison.cpp#160
Assignee | ||
Comment 19•9 years ago
|
||
Reopening on the strength of comment #17.
Status: RESOLVED → REOPENED
Flags: needinfo?(smichaud)
Resolution: FIXED → ---
Assignee | ||
Comment 20•9 years ago
|
||
This bug is pretty far outside my area of expertise.
Assignee: smichaud → nobody
Comment 21•9 years ago
|
||
Aaron, is this something that you could help with, or redirect?
Flags: needinfo?(aklotz)
Comment 22•9 years ago
|
||
I'm pretty tied up with a bunch of stuff already; ideally it would be nice if somebody else could look into this.
Flags: needinfo?(aklotz)
Comment 23•9 years ago
|
||
When you say "someone else" it would be helpful to nominate some people, because I don't know who works on this stuff. [Tracking Requested - why for this release]: #1 crash on Mac 38.0.5 (19%!)
status-firefox38:
--- → wontfix
status-firefox38.0.5:
--- → affected
status-firefox39:
--- → affected
status-firefox40:
--- → affected
tracking-firefox39:
--- → ?
Assignee | ||
Comment 24•9 years ago
|
||
The person who takes this bug should know how the memory poisoning works that's referred to in comment #18.
Comment 25•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #23) > When you say "someone else" it would be helpful to nominate some people, > because I don't know who works on this stuff.
Flags: needinfo?(jmathies)
Updated•9 years ago
|
Flags: needinfo?(joshmoz)
Assignee | ||
Comment 27•9 years ago
|
||
I can pick this up again at some point. But when I do I'll need to spend a lot of time learning how the memory poisoning works that's referred to in comment #18. I suspect that the most telling piece of information we have so far is that this kind of memory poisoning is involved, and not jemalloc memory poisoning.
Comment 28•9 years ago
|
||
Tracking in affected versions and in 38.0.5 in Mac because top crash.
Updated•9 years ago
|
Assignee | ||
Comment 29•9 years ago
|
||
I'm going to do what I can here. I'm also going to be working with Stephen Pohl on this bug (to pass along to him my debugging techniques).
Assignee: nobody → smichaud
Status: REOPENED → ASSIGNED
Flags: needinfo?(joshmoz)
Comment 30•9 years ago
|
||
"Frame poisoning" is used for arena from which layout frames are allocated http://robert.ocallahan.org/2010/10/mitigating-dangling-pointer-bugs-using_15.html That value should only show up for frames freed in an arena used only for frames (ns*Frame* layout objects) and nothing else. You mention nsPluginFrame above, it's quite likely allocated from one of these arenas. This mechanism was designed to turn use-after-free bugs from security vulnerabilities into harmless (wrt security) crashes by making sure all the pointers in the freed frame point at unmapped memory. To fix the stability problem you still have to find the cause of the UAF, but at least it's not a security vulnerability. An ASAN build might help pinpoint the source if you have a testcase that's even a little bit reliable. I suppose the other possibility is that we've torn down a page and freed an entire arena, and then the poison values end up being used via an uninitialized variable. Seems unlikely, and especially for a nsCOMPtr.
Assignee | ||
Comment 31•9 years ago
|
||
The kind of memory poisoning we see here is used by nsPresArena::Free() when it adds nsFrame objects (and their subclasses like nsPluginFrame) to "free lists", where they can be re-used by calls to nsPresArena::Allocate(). nsPresArena::Free() poisons each object it's called on (and nsPresArena::Allocate() unpoisons it). The objects in these free lists only actually get deallocated/freed (and poisoned by jemalloc) when their associated nsPresArena object is deleted/freed. So it makes sense to look at the behavior of nsPresArena::Free(), including what code calls it. That's what this interpose library does. As Dan says, it's fair to assume, at least initially, that it's an nsPluginFrame object that's been poisoned and added to a free list.
Assignee | ||
Comment 32•9 years ago
|
||
As I mentioned above, this bug's crashes happen here, when we try to deference a memory-poisoned mInstanceOwner: http://hg.mozilla.org/mozilla-central/annotate/9ebd530c5843/layout/generic/nsPluginFrame.cpp#l614 And as we have learned, it's most likely the current nsPluginFrame object (the one on which a call to nsPluginFrame::CallSetWindow() is currently being processed) which has gotten added to a free list (by nsPresArena::Free()). I also think it's most likely that whatever caused the current nsPluginFrame object to be "freelisted" took place on the previous line, during the call to mInstanceOwner::FixUpPluginWindow(). This in turn most likely happened via a call to nsPluginFrame::DestroyFrom(). This interpose library confirms my hunch. It hooks calls to nsPluginInstanceOwner::FixUpPluginWindow() and adds a call to nsPluginFrame::DestroyFrom() (on nsPluginInstanceOwner::mPluginFrame) immediately afterwards. With this interpose library, every attempt to load any plugin reproduces this bug's crashes exactly. For example: bp-c22d92fa-2c5b-4d83-9f67-dc6392150611 bp-36a0d042-0fd5-467e-bdf4-03bc72150611 From this we also know that this bug's crashes happen on the very first call to nsPluginInstanceOwner::FixUpPluginWindow() as the plugin is being loaded. I haven't yet worked out how FixUpPluginWindow() does this. But I strongly suspect restyling or reflowing is involved.
Assignee | ||
Comment 33•9 years ago
|
||
> on the very first call to nsPluginInstanceOwner::FixUpPluginWindow() as the plugin is being loaded
Just after the very first call to nsPluginInstanceOwner::FixUpPluginWindow() as the plugin is being loaded
Comment 34•9 years ago
|
||
According to [1]: "Calling SetWindow might destroy this frame. [...]" and [2]: "This will call pi->SetWindow and take care of window subclassing if needed, see bug 132759. Calling SetWindow can destroy this frame so check for that before doing anything else with this frame's memory." This turns out to be the same thing that we do in nsPluginInstanceOwner::FixUpPluginWindow()[3] if the clipRects have changed, but we fail to check whether or not the frame has been destroyed before calling nsPluginInstanceOwner::GetPluginPort()[4]. [1] http://hg.mozilla.org/mozilla-central/annotate/bfd82015df48/layout/generic/nsPluginFrame.cpp#l644 [2] http://hg.mozilla.org/mozilla-central/annotate/bfd82015df48/layout/generic/nsPluginFrame.cpp#l648 [3] http://hg.mozilla.org/mozilla-central/annotate/bfd82015df48/dom/plugins/base/nsPluginInstanceOwner.cpp#l3035 [4] http://hg.mozilla.org/mozilla-central/annotate/bfd82015df48/layout/generic/nsPluginFrame.cpp#l614
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to comment #34) Yes, I agree this is the key to fixing this bug. Here's a patched based on this idea. In fact it just extends the fix already in use later in nsPluginFrame::CallSetWindow(), which was added to fix bug 562442. But we don't have STR for this bug, and without that we lack a good way to test this patch. You might think we could use the interpose library from comment #32 (which triggers this bug's crash loading any plugin). But when you try that on a build containing this bug's patch (e.g. http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smichaud@pobox.com-5d5a363228f1/try-macosx64/firefox-41.0a1.en-US.mac.dmg), you always crash later, in other code, dereferencing a (poisoned) pointer to the deleted frame. The problem is that the call to nsPluginFrame::DeleteFrom(), while it's enough to trigger this bug's crashes, isn't enough to accurately reproduce what happens when this bug's crashes are triggered "in the wild". We need to do more than just call nsPluginFrame::DeleteFrom(). If we only do that, we don't remove the (deleted) nsPluginFrame from the frame/view/content hierarchy -- which leads to crashes that *don't* happen "in the wild". More on this in my next comment.
Assignee | ||
Comment 36•9 years ago
|
||
If you play around with the interpose library from comment #31, using examples that take a while to load fully, you'll notice that many calls to nsPluginFrame::DeleteFrom() get logged. Here are some of the sites I tested with: http://webdemo.balsamiq.com/ https://trade.tdameritrade.com/ http://drudgereport.com/ Many of these happen as the page is restyled or reflowed. All of them have nsINode::RemoveChild(nsINode& aOldChild, ErrorResult& aError) somewhere in the stack. So this is a good candidate for something we can call from nsPluginInstanceOwner::FixUpPluginWindow(), which will delete the frame and its associated content and view, and also remove all of them from the frame/view/content hierarchy. In fact, as this patch shows, it does the trick. This patch adds a call to nsINode::RemoveChild() to the end of nsObjectInstanceOwner::FixUpPluginWindow() (to trigger a call to nsPluginFrame::DeleteFrom() and this bug's crashes). It also adds the possible fix from comment #35. Testing with a build made from this patch, I tried loading many different pages containing plugins (including all the ones mentioned above). I never crashed -- which I think shows that the patch from comment #35 will probably fix this bug.
Assignee | ||
Comment 37•9 years ago
|
||
Clearly something occasionally happens in nsPluginInstanceOwner::FixUpPluginWindow() to cause the current frame to be deleted. And, as Stephen mentioned in comment #34, it probably happens during that method's call to SetWindow(). I haven't been able to trigger this, or to figure out how it could happen during a call to SetWindow(). At this point my best guess is that, during a call to a plugin's implementation of NPP_SetWindow(), the plugin can call one or more NPN_... methods in the browser, at least one of which can trigger a reflow or restyle. Judging by the URLs in Socorro's crash reports for this bug, the plugin in question is most likely Flash. So to investigate this possibility one could write an interpose library (like the one at bug 1153145 comment #20) to trace all calls to NPN_... methods that happen during a call to Flash's implementation of NPP_SetWindow(). I'm not going to do this now, since I don't have time for it. But in case someone is curious ... :-)
Assignee | ||
Comment 38•9 years ago
|
||
This revision just changes the comment to mention that FixUpPluginWindow() calls SetWindow().
Attachment #8623306 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8623752 -
Flags: review?(jmathies)
Comment 39•9 years ago
|
||
Comment on attachment 8623752 [details] [diff] [review] More likely fix, rev1 Review of attachment 8623752 [details] [diff] [review]: ----------------------------------------------------------------- Seems plausible, lets see if it helps.
Attachment #8623752 -
Flags: review?(jmathies) → review+
Assignee | ||
Comment 41•9 years ago
|
||
Comment on attachment 8623752 [details] [diff] [review] More likely fix, rev1 I want to let this bake on m-c for a week or so -- to check that it actually fixes these crashes, and doesn't just shift them elsewhere.
Comment 42•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1efbf4e6be47
Status: ASSIGNED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Comment 43•9 years ago
|
||
This sound promising, but too late for beta. Maybe still could be uplifted to 40 once we see the effect.
Assignee | ||
Comment 44•9 years ago
|
||
There haven't been any of these crashes on m-c since my patch landed there. Since there never were many of these on the 41 branch, it'll take a lot longer to be sure I haven't just moved the crashes elsewhere -- but I think that's *very* unlikely.
Assignee | ||
Comment 45•9 years ago
|
||
Comment on attachment 8623752 [details] [diff] [review] More likely fix, rev1 Approval Request Comment [Feature/regressing bug #]: Unknown [User impact if declined]: Users will see what's a Mac topcrasher on the more "popular" branches [Describe test coverage new/current, TreeHerder]: Baked on m-c for a week. [Risks and why]: Low risk -- the patch is very simple and straightforward. [String/UUID change made/needed]: None
Attachment #8623752 -
Flags: approval-mozilla-aurora?
Comment 46•9 years ago
|
||
Comment on attachment 8623752 [details] [diff] [review] More likely fix, rev1 Fix a top crash, taking it.
Attachment #8623752 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 47•9 years ago
|
||
Comment on attachment 8623752 [details] [diff] [review] More likely fix, rev1 Landed on mozilla-aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/0eccaade78da
Assignee | ||
Updated•9 years ago
|
Comment 48•9 years ago
|
||
Socorro [1] shows zero crashes over the past 4 weeks in 40 and 41 builds after the fix landed. [1] - https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=nsPluginInstanceOwner%3A%3AGetPluginPort%28%29#tab-reports
Updated•2 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•