Closed Bug 1155503 Opened 5 years ago Closed 5 years ago

crash in mozilla::plugins::PluginInstanceParent::NPP_DestroyStream(_NPStream*, short)

Categories

(Core :: Plug-ins, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- disabled
firefox39 + fixed
firefox40 + fixed

People

(Reporter: cpeterson, Assigned: aklotz)

References

Details

(Keywords: crash, reproducible)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-1f09769b-29d8-427b-b23e-9611a2150417.
=============================================================

I hit this plugin crash three times in row when force reloading (SHIFT+CMD+R) cnn.com in a private browsing window.

bp-5914e8a2-9665-4976-9b42-12b8d2150417
bp-1f09769b-29d8-427b-b23e-9611a2150417
bp-bf8f191c-f870-4477-9a85-b180e2150417

Frame 	Module 	Signature 	Source
0 	XUL 	mozilla::plugins::PluginInstanceParent::NPP_DestroyStream(_NPStream*, short) 	dom/plugins/ipc/PluginInstanceParent.cpp
1 	XUL 	mozilla::plugins::PluginModuleParent::NPP_DestroyStream(_NPP*, _NPStream*, short) 	dom/plugins/ipc/PluginModuleParent.cpp
2 	XUL 	nsNPAPIPluginStreamListener::CleanUpStream(short) 	dom/plugins/base/nsNPAPIPluginStreamListener.cpp
3 	XUL 	nsNPAPIPluginInstance::Stop() 	dom/plugins/base/nsNPAPIPluginInstance.cpp
4 	XUL 	nsPluginHost::StopPluginInstance(nsNPAPIPluginInstance*) 	dom/plugins/base/nsPluginHost.cpp
5 	XUL 	nsPluginDestroyRunnable::Run() 	dom/plugins/base/nsPluginHost.cpp
6 	XUL 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
7 	XUL 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/glue/nsThreadUtils.cpp
8 	XUL 	nsBaseAppShell::NativeEventCallback() 	widget/nsBaseAppShell.cpp
9 	XUL 	nsAppShell::ProcessGeckoEvents(void*) 	widget/cocoa/nsAppShell.mm
Ø 10 	CoreFoundation 	CoreFoundation@0x80a00 	
Ø 11 	CoreFoundation 	CoreFoundation@0x72b8c 	
Ø 12 	CoreFoundation 	CoreFoundation@0x721be 	
Ø 13 	CoreFoundation 	CoreFoundation@0x71bd7 	
Ø 14 	HIToolbox 	HIToolbox@0x3256e 	
Ø 15 	HIToolbox 	HIToolbox@0x322e9 	
Ø 16 	HIToolbox 	HIToolbox@0x3212a 	
Ø 17 	AppKit 	AppKit@0x919ba 	
Ø 18 	AppKit 	AppKit@0x90f67 	
19 	XUL 	-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 	widget/cocoa/nsAppShell.mm
Ø 20 	AppKit 	AppKit@0x86bf2 	
21 	XUL 	nsAppShell::Run() 	widget/cocoa/nsAppShell.mm
22 	XUL 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
23 	XUL 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
24 	XUL 	XREMain::XRE_main(int, char**, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
25 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp
26 	firefox 	main 	browser/app/nsBrowserApp.cpp
27 	firefox 	start
I don't see this bug in today's m-c nightly, testing on OS X 10.8.5, 10.9.5 and 10.10.2.  (I haven't yet upgraded from 10.10.2 to 10.10.3.)

Is this consistently reproducible?  Do you see it with a clean profile?

When you pressed cmd-shift-r, did you wait until the page had finished (re)loading?  I tried it both ways, but never crashed.
This crashes are reported as null-dereferences, and they really *are* null-derefences.  Note that rbx is null in the raw dumps.  I checked in the assembly code for the 2015-04-16 m-c nightly that these crashes take place as rbx is dereferenced.

So in the following code, 's' is null:

http://hg.mozilla.org/mozilla-central/annotate/a35163f83d22/dom/plugins/ipc/PluginInstanceParent.cpp#l1422

And we could trivially fix these crashes by doing a null check.  Though I suppose it's better to try to figure out *how* 's' can be null here.

As bug 1018360 comment #0 says, AMD64 processors have a design flaw that causes crash addresses > 0x7fffffffffff to be misreported (as null or 0xffffffffffffffff).  Memory poisoned addresses match this pattern.  So many (most?) Mac crashes that Socorro reports as null dereferences are really dereferences of memory poisoned values.
(In reply to Steven Michaud from comment #2)
> And we could trivially fix these crashes by doing a null check.  Though I
> suppose it's better to try to figure out *how* 's' can be null here.
> 

Indeed. That'll be how my Friday is spent!
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
OS: Mac OS X → All
Interesting.  I assume the ASan builds don't use jemalloc.

If this really is a use-after-free and jemalloc were in play, the crash would be at a memory-poisoned address.  Chris seems to have used a standard m-c nightly (for 2015-04-16), which (as far as I know) all use jemalloc.  So I don't think Chris's crashes involved any use-after-free.
Maybe Chris's crashes are use-after-destruction-but-just-before-free ... it that's possible.
Attached file WinDbg log of crash
I can reproduce this reliably.

STR:

1. Install DivX Web Player (only Web Player is needed.  Disable all the other crap unless you need it): http://www.divx.com/en/software/web-player/features
2. Launch Firefox with a new profile
3. Enable dom.ipc.plugins.asyncInit
4. Disable e10s and restart browser
5. Visit www.stagevu.com
6. Click the featured video on the left and allow the plugin
7. Happy crashing.
Keywords: reproducible
Comment on attachment 8595036 [details] [diff] [review]
Null out NPStream::pdata and check for it

Canceling this as I'm still seeing some failures on try.
Attachment #8595036 - Flags: review?(jmathies)
[Tracking Requested - why for this release]:
This is 10% of all Dev Edition browser crashes right now.
OK, this one passes on non-e10s which is what we want.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6489f9b59b4c
Attachment #8595036 - Attachment is obsolete: true
Attachment #8595513 - Flags: review?(jmathies)
Attachment #8595513 - Flags: review?(jmathies) → review+
From crash stats it looks like 38 and earlier are unaffected. Tracking 39+
Comment on attachment 8595513 [details] [diff] [review]
Null out NPStream::pdata and check for it (r2)

Approval Request Comment
[Feature/regressing bug #]: async plugin init
[User impact if declined]: instability when starting plugin
[Describe test coverage new/current, TreeHerder]: test_pluginstream_err.html when asyncInit is enabled. Tested locally and on try, but asyncInit is disabled on central.
[Risks and why]: Low. Patch is a simple fix and cause of crashing is well understood.
[String/UUID change made/needed]: None
Attachment #8595513 - Flags: approval-mozilla-aurora?
Comment on attachment 8595513 [details] [diff] [review]
Null out NPStream::pdata and check for it (r2)

Looks like a safe patch. Let's get this on Aurora to test. Aurora+
Attachment #8595513 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/815511fdfea6
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
I just received a crash [ bp-eecf1d46-4527-469c-990a-efbbf2150423 ] on a build that does not yet have this fix with the following slightly different signature:

[@ mozilla::plugins::PluginModuleParent::NPP_DestroyStream(_NPP*, _NPStream*, short) ]

Will the patch in this bug also cover this case?
Flags: needinfo?(aklotz)
I'm not sure.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #19)
> I'm not sure.

Not knowing or understanding the code but looking at this logically...

Given that you've added null checking to:
dom/plugins/ipc/BrowserStreamParent.cpp

And made a relevant change to:
dom/plugins/ipc/PluginInstanceParent.cpp

I'm assuming it would make sense for you to make the equivalent change to:
dom/plugins/ipc/PluginModuleParent.cpp

Do you want me to open a new bug so that can be done or do we need to wait for another crash?
Flags: needinfo?(aklotz)
We haven't seen any reports for builds newer than April 22. I think we should hold off until we see any crashes with recent builds.
Flags: needinfo?(aklotz)
You need to log in before you can comment on or make changes to this bug.