Closed Bug 759788 Opened 12 years ago Closed 12 years ago

Java related Crash [@ nsPluginNativeWindow::GetPluginInstance(nsRefPtr<nsNPAPIPluginInstance>&) PluginWnd

Categories

(Core Graveyard :: Plug-ins, defect)

All
Windows XP
defect
Not set
normal

Tracking

(firefox12 unaffected, firefox13+ verified, firefox14+ verified, firefox15+ verified, firefox-esr10 unaffected)

VERIFIED FIXED
mozilla15
Tracking Status
firefox12 --- unaffected
firefox13 + verified
firefox14 + verified
firefox15 + verified
firefox-esr10 --- unaffected

People

(Reporter: bc, Assigned: benjamin)

References

()

Details

(Keywords: crash, regression, sec-critical, Whiteboard: [qa+:ashughes])

Crash Data

Attachments

(3 files)

1. http://www.planetminecraft.com/skin/dark-souls-knight/
2. Crash Windows Beta/13, Aurora/14, Nightly/15

Note deleted pointer in eax

Operating system: Windows NT
                  6.1.7601 Service Pack 1
CPU: x86
     GenuineIntel family 6 model 37 stepping 1
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash address: 0xffffffffdddddddd

Thread 0 (crashed)
 0  xul.dll!nsRefPtr<nsNPAPIPluginInstance>::assign_with_AddRef(nsNPAPIPluginInstance *) [nsAutoPtr.h : 879 + 0x3]
    eip = 0x672fe760   esp = 0x002bc908   ebp = 0x002bc90c   ebx = 0x00000000
    esi = 0x00000090   edi = 0x002bcabc   eax = 0xdddddddd   ecx = 0x002bc9d4
    edx = 0x76fb7094   efl = 0x00210286
    Found by: given as instruction pointer in context
 1  xul.dll!nsRefPtr<nsNPAPIPluginInstance>::operator=(nsRefPtr<nsNPAPIPluginInstance> const &) [nsAutoPtr.h : 956 + 0xd]
    eip = 0x67abfeb5   esp = 0x002bc914   ebp = 0x002bc91c
    Found by: call frame info
 2  xul.dll!nsPluginNativeWindow::GetPluginInstance(nsRefPtr<nsNPAPIPluginInstance> &) [nsPluginNativeWindow.h : 78 + 0xe]
    eip = 0x67abf066   esp = 0x002bc924   ebp = 0x002bc92c
    Found by: call frame info

I have a saved test case that reproduces and am reducing it now. Will work on the regression range next.

I'm concerned about bug 719117 being the cause.
Attached file testcase.tar.gz
tar xvf testcase.tar.gz
firefox ./index.html

You may need to reload. Trying to inline the script appears to make the crash go away.

Note in windbg this shows eax=0xfeeefeee which is HeapFree.
Assignee: nobody → benjamin
I'm going to work on diagnosis, but I'd love somebody to work on a regression range.
looks like we're using deleted objects so guessing sec-critical.

IIRC we fixed a plugin instance related bug in Firefox 13 -- check that one?
Keywords: sec-critical
from the debug Nightly builds from ftp.

good 20120525121926
bad  20120526213524

definitely looking like bug 719117 comment 59. I'll do an actual bisection to make sure.
Having trouble with the minimized testcase, perhaps due to permissions issues, but on the URL:

>	xul.dll!nsRefPtr<nsNPAPIPluginInstance>::assign_with_AddRef(rawPtr=0xdddddddd)  Line 845	C++
 	xul.dll!nsRefPtr<nsNPAPIPluginInstance>::operator=(rhs={...})  Line 923	C++
 	xul.dll!nsPluginNativeWindow::GetPluginInstance(aPluginInstance={...})  Line 44	C++
 	xul.dll!PluginWndProcInternal(hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 206	C++
 	xul.dll!CallWindowProcCrashProtected(wndProc=0x57aa3280, hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 32	C++

 	xul.dll!PluginWndProc(hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 347	C++
 	user32.dll!_InternalCallWinProc@20() 	
 	user32.dll!_UserCallWinProcCheckWow@32() 	
 	user32.dll!_DispatchClientMessage@24() 	
 	user32.dll!___fnDWORD@4() 	
 	ntdll.dll!_KiUserCallbackDispatcher@12() 	
 	user32.dll!_NtUserDestroyWindow@4() 	
 	xul.dll!nsWindow::Destroy()  Line 646	C++
 	xul.dll!nsWindow::~nsWindow()  Line 395	C++
 	xul.dll!ChildWindow::~ChildWindow() 	C++
 	xul.dll!ChildWindow::`scalar deleting destructor'() 	C++
 	xul.dll!nsBaseWidget::Release()  Line 56	C++
 	xul.dll!nsWindow::Release()  Line 415	C++
 	xul.dll!nsCOMPtr<nsIWidget>::~nsCOMPtr<nsIWidget>()  Line 488	C++
 	xul.dll!nsPluginInstanceOwner::~nsPluginInstanceOwner()  Line 366	C++
 	xul.dll!nsPluginInstanceOwner::`scalar deleting destructor'() 	C++
 	xul.dll!nsPluginInstanceOwner::Release()  Line 373	C++
 	xul.dll!nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener>()  Line 875	C++
 	xul.dll!nsListenerStruct::~nsListenerStruct()  Line 53	C++
 	xul.dll!nsListenerStruct::`scalar deleting destructor'() 	C++
 	xul.dll!nsTArrayElementTraits<nsListenerStruct>::Destruct(e=0x08d33c28)  Line 349	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::DestructRange(start=0x00000000, count=0x00000001)  Line 1211	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::RemoveElementsAt(start=0x00000000, count=0x00000001)  Line 932	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::RemoveElementAt(index=0x00000000)  Line 938	C++
 	xul.dll!nsAutoTObserverArray<nsListenerStruct,2>::RemoveElementAt(index=0x00000000)  Line 186	C++
 	xul.dll!nsEventListenerManager::RemoveEventListener(aListener=0x0d83d870, aType=0x0000057e, aUserType=0x06b37d68, aFlags=0x00000004)  Line 400	C++
 	xul.dll!nsEventListenerManager::RemoveEventListenerByType(aListener=0x0d83d870, aType={...}, aFlags=0x00000004)  Line 446	C++
 	xul.dll!nsEventListenerManager::RemoveEventListener(aType={...}, aListener=0x0d83d870, aUseCapture=true)  Line 919	C++
 	xul.dll!nsINode::RemoveEventListener(aType={...}, aListener=0x0d83d870, aUseCapture=true)  Line 1095	C++
 	xul.dll!nsPluginInstanceOwner::Destroy()  Line 2712	C++
 	xul.dll!nsObjectLoadingContent::DoStopPlugin(aInstanceOwner=0x0d83d868, aDelayedStop=false, aForcedReentry=false)  Line 2207	C++
 	xul.dll!nsObjectLoadingContent::StopPluginInstance()  Line 2236	C++
 	xul.dll!InDocCheckEvent::Run()  Line 153	C++
 	xul.dll!nsBaseAppShell::RunSyncSectionsInternal(aStable=true, aThreadRecursionLevel=0x00000000)  Line 354	C++
 	xul.dll!nsBaseAppShell::RunSyncSections(stable=true, threadRecursionLevel=0x00000000)  Line 91	C++
 	xul.dll!nsBaseAppShell::AfterProcessNextEvent(thr=0x00557748, recursionDepth=0x00000000)  Line 409	C++
 	xul.dll!nsThread::ProcessNextEvent(mayWait=false, result=0x002bd32f)  Line 639	C++
 	xul.dll!NS_ProcessNextEvent_P(thread=0x00557748, mayWait=false)  Line 213	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(aDelegate=0x00561ac8)  Line 82	C++
 	xul.dll!MessageLoop::RunInternal()  Line 209	C++
 	xul.dll!MessageLoop::RunHandler()  Line 202	C++
 	xul.dll!MessageLoop::Run()  Line 176	C++
 	xul.dll!nsBaseAppShell::Run()  Line 165	C++
 	xul.dll!nsAppShell::Run()  Line 232	C++
 	xul.dll!nsAppStartup::Run()  Line 256	C++
 	xul.dll!XREMain::XRE_mainRun()  Line 3786	C++
 	xul.dll!XREMain::XRE_main(argc=0x00000004, argv=0x00547ab0, aAppData=0x011cc864)  Line 3863	C++

So PluginWindProc is being called on a window which already has a deleted nsPluginNativeWindowWin, but ::GetProp(hWnd, NS_PLUGIN_WINDOW_PROPERTY_ASSOCIATION) is still returning the dead window.

The testcase uses an <applet> nested inside of a <canvas>, which already has me concerned.

Relevant debug-spew: 2x "WARNING: Plugin failed to subclass our window"
Various "Subdocument container has no frame" warnings
"WARNING: Unable to test style tree integrity -- no content node:"
It also happens that a 32-bit debug build exits because we hit the exception case in CallWindowProcCrashProtected and then just call TerminateProcess, which makes debugging this kinda painful.

When debugging, every time we reload we hit nsPluginNativeWindowWin::SubclassAndAssociateWindow, but we never hit nsPluginNativeWindowWin::UndoSubclassAndAssociateWindow

We destroy the nsPluginNativeWindow at this stack:

>	xul.dll!nsPluginNativeWindowWin::~nsPluginNativeWindowWin()  Line 506	C++
 	xul.dll!nsPluginNativeWindowWin::`scalar deleting destructor'() 	C++
 	xul.dll!PLUG_DeletePluginNativeWindow(aPluginNativeWindow=0x08217610)  Line 744	C++
 	xul.dll!nsPluginInstanceOwner::~nsPluginInstanceOwner()  Line 356	C++
 	xul.dll!nsPluginInstanceOwner::`scalar deleting destructor'() 	C++
 	xul.dll!nsPluginInstanceOwner::Release()  Line 373	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::assign_assuming_AddRef(newPtr=0x00000000)  Line 863	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::assign_with_AddRef(rawPtr=0x00000000)  Line 847	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::operator=(rhs=0x00000000)  Line 931	C++
 	xul.dll!nsObjectLoadingContent::StopPluginInstance()  Line 2238	C++
 	xul.dll!InDocCheckEvent::Run()  Line 153	C++
We need to figure out whether this needs to block Firefox 13 or not, given that it is a sec-critical.
This appears to be a regression. Bug does not crash my Firefox 12 on http://www.planetminecraft.com/skin/dark-souls-knight/ on the same machine it crashes current nightly.
Part of bug 719117 (a potential regressor?) involves backing out bug 724781, a Java sec-critical bug fix. Maybe it didn't get re-fixed correctly by 719117?
It doesn't appear that this is a dup or directly related to bug 724781, those testcases still pass and the crash here is in a different place.
Hangs java until I quite the JRE console when I run http://www.planetminecraft.com/skin/dark-souls-knight/ in the 5/27 Aurora build from ftp://ftp.mozilla.org/pub/firefox/nightly/2012-05-27-04-20-07-mozilla-aurora/. This is from the day *before* the checkin on 5/28 at https://bugzilla.mozilla.org/show_bug.cgi?id=719117#c65. 

Since Firefox 12 is affected and later Firefox 13 builds are affected, I'm going to try to narrow this down.

3/1  Firefox 13 (Mozilla Central) does *not* show bug (ftp://ftp.mozilla.org/pub/firefox/nightly/2012/03/2012-03-01-03-11-35-mozilla-central/ and Built from http://hg.mozilla.org/mozilla-central/rev/1c3b291d0830).
3/2  Firefox 13 (Mozilla Central) shows bug (ftp://ftp.mozilla.org/pub/firefox/nightly/2012/03/2012-03-02-03-11-12-mozilla-central/ and Built from http://hg.mozilla.org/mozilla-central/rev/3a7b9e61c263).

When the problem happens, Java console says:

Java Plug-in 1.6.0_31
Using JRE version 1.6.0_31-b04-415-11M3635 Java HotSpot(TM) 64-Bit Server VM
User home directory = /Users/albillload: class skinpreviewapplet.AppletLauncher not found.
java.lang.ClassNotFoundException: skinpreviewapplet.AppletLauncher
	at sun.plugin2.applet.Applet2ClassLoader.findClass(Applet2ClassLoader.java:252)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass0(Plugin2ClassLoader.java:249)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass(Plugin2ClassLoader.java:179)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass(Plugin2ClassLoader.java:160)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	at sun.plugin2.applet.Plugin2ClassLoader.loadCode(Plugin2ClassLoader.java:690)
	at sun.plugin2.applet.Plugin2Manager.createApplet(Plugin2Manager.java:3045)
	at sun.plugin2.applet.Plugin2Manager$AppletExecutionRunnable.run(Plugin2Manager.java:1497)
	at java.lang.Thread.run(Thread.java:680)
Exception: java.lang.ClassNotFoundException: skinpreviewapplet.AppletLauncher
Dveditz's guess is that maybe this checkin caused the issue:

https://bugzilla.mozilla.org/show_bug.cgi?id=724781#c14
So this is a Firefox 13 only issue. We should fix this, if possible, so we've never shipped this sec-critical bug to the public.
I reconfirmed on Windows XP that 

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-25-mozilla-central-debug/
buildid 20120525001956 does not crash

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-26-mozilla-central-debug/
buildid 20120525121926 does not crash

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-27-mozilla-central-debug/
buildid 20120526213524 does crash

bsmedberg mentioned that this might be due to webgl not being supported on vms and that he needed to disable webgl and the direct3d acceleration to reproduce on a physical windows machine.

Al mentioned that he was testing on OSX. I can confirm a hang in my nightly debug build. Breaking into gdb after the hang I see a stack involving JVM_MonitorWait, JVM_Lseek etc. 

I'm not sure Al and I are looking at the same bug.
Bob, the next time try the following, in this order:

thread apply all bt
call (void) pss()

This will output a gdb stack trace of all threads, and then a Java stack trace of all threads.

The output of pss() might go to the Console app, and not Terminal.
(Following up comment #17)

Actually don't bother.  Your hang, Bob, appears to be the same as in bug 750480.
(In reply to Steven Michaud from comment #18)
> (Following up comment #17)
> 
> Actually don't bother.  Your hang, Bob, appears to be the same as in bug
> 750480.

Is this a different hang than the one I see?
Also see bug 708461, which also involves a hang on a Java error.
See Also: → 750480, 708461
> Is this a different hang than the one I see?

I don't think so.  But if you have any doubts follow my instructions in comment #17.
By the way, I'm not at all sure the hangs various people have seen are related to the crash that was first reported.
(In reply to Steven Michaud from comment #22)
> By the way, I'm not at all sure the hangs various people have seen are
> related to the crash that was first reported.

Yeah, that is kind of what I was wondering.
bug 719117 still seems the most likely regressor in the range given in comment 15.

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-05-25+11:00&enddate=2012-05-26+22:00

Does not seem realistic to figure out the cause and fix in time for the Firefox 13 build
Attached patch wipSplinter Review
This seems to fix the crash at the URL in my trunk build.
If we're considering shipping Fx13 as is, then I think this
could be worth taking as a wallpaper at least. (I haven't
analyzed the crash enough to say if this is a complete fix or not)

The idea is to prevent nsPluginInstanceOwner::Release() to actually
delete the object, thus preventing ~nsPluginInstanceOwner() from
deleting the window, for the duration of DoStopPlugin.
The event for a delayed DoStopPlugin also has a strong ref so it
should work in that case too.
Comment on attachment 628565 [details] [diff] [review]
wip

I'd like to propose this as a low-risk wallpaper for all branches
while we continue investigating.  I think there's a good chance this
is the correct fix and I don't see how it can make things worse.
Attachment #628565 - Flags: review?(benjamin)
The first bad revision is:
changeset:   94997:8915b49bc1bc
user:        Mats Palmgren <matspal@gmail.com>
date:        Sat May 26 00:34:11 2012 +0200
summary:     Bug 719117 - Fix bug 724781 by preventing reentry to DoStopPlugin with a flag.  part 2/2, r=bsmedberg
Comment on attachment 628565 [details] [diff] [review]
wip

This does appear to fix the problem, and as far returning to the prior behavior it's correct. It's still a bit worrisome to me that we never call nsPluginNativeWin::UndoSubclassAndAssociateWindow, but I'll leave that for another bug.
Attachment #628565 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/0aa7fc75cad5
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 628565 [details] [diff] [review]
wip

Bug caused by (feature/regressing bug #): bug 719117
User impact if declined: likely crashes at planetminecraft and possibly other sites, exposure to critical security bug
Testing completed (on m-c, etc.): local testing, just landed on m-c
Risk to taking this patch (and alternatives if risky): doesn't seem especially risky, but we've said that before!
String or UUID changes made by this patch: None
Attachment #628565 - Flags: review-
Attachment #628565 - Flags: review+
Attachment #628565 - Flags: approval-mozilla-beta?
Attachment #628565 - Flags: approval-mozilla-aurora?
Attachment #628565 - Flags: review- → review+
Comment on attachment 628565 [details] [diff] [review]
wip

[Triage Comment]
The security team is concerned with shipping a new sg:crit regression, and Benjamin feels this fix is less risky than any of the alternatives, so we'll spin a Beta 7 with this.
Attachment #628565 - Flags: approval-mozilla-beta?
Attachment #628565 - Flags: approval-mozilla-beta+
Attachment #628565 - Flags: approval-mozilla-aurora?
Attachment #628565 - Flags: approval-mozilla-aurora+
Whiteboard: [qa+]
Blocks: 719117
Keywords: regression
I've tried several different builds but have been unable to reproduce this bug:
 * Firefox 13.0b6
 * Firefox 14.0a2 2012-05-30
 * Firefox 15.0a1 2012-05-30
 * Firefox 13.0a1 2012-03-02

Therefore, I am unable to verify this is fixed (verification blocks 13.0b7 sign-off). Please, can someone help me?
fwiw, I tested the url in crash automation with fresh builds from this morning and could not reproduce the crash.
Thanks to Bob via IRC, I am not able to reproduce this issue. Will now try to verify the fix.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> Thanks to Bob via IRC, I am not able to reproduce this issue. Will now try
> to verify the fix.

Sorry, that's should read "I am able to reproduce".
Verified on latest debug builds for Nightly 15, Aurora 14, and Beta 13 (dated 2012-06-01).
Status: RESOLVED → VERIFIED
Whiteboard: [qa+] → [qa+:ashughes]
Depends on: 760745
Depends on: 760960
No longer depends on: 760745
Group: core-security
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: