Last Comment Bug 759788 - Java related Crash [@ nsPluginNativeWindow::GetPluginInstance(nsRefPtr<nsNPAPIPluginInstance>&) PluginWnd
: Java related Crash [@ nsPluginNativeWindow::GetPluginInstance(nsRefPtr<nsNPAP...
Status: VERIFIED FIXED
[qa+:ashughes]
: crash, regression, sec-critical
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: Trunk
: All Windows XP
: -- normal (vote)
: mozilla15
Assigned To: Benjamin Smedberg [:bsmedberg]
:
: Benjamin Smedberg [:bsmedberg]
Mentors:
http://www.planetminecraft.com/skin/d...
Depends on: 760960
Blocks: 532972 719117
  Show dependency treegraph
 
Reported: 2012-05-30 09:14 PDT by Bob Clary [:bc:]
Modified: 2012-07-12 08:14 PDT (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
+
verified
+
verified
+
verified
unaffected


Attachments
testcase.tar.gz (14.55 KB, application/x-gzip)
2012-05-30 09:49 PDT, Bob Clary [:bc:]
no flags Details
os x stack after hang and ctrl-c (9.63 KB, text/plain)
2012-05-30 15:10 PDT, Bob Clary [:bc:]
no flags Details
wip (871 bytes, patch)
2012-05-30 19:17 PDT, Mats Palmgren (:mats)
benjamin: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Bob Clary [:bc:] 2012-05-30 09:14:14 PDT
1. http://www.planetminecraft.com/skin/dark-souls-knight/
2. Crash Windows Beta/13, Aurora/14, Nightly/15

Note deleted pointer in eax

Operating system: Windows NT
                  6.1.7601 Service Pack 1
CPU: x86
     GenuineIntel family 6 model 37 stepping 1
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash address: 0xffffffffdddddddd

Thread 0 (crashed)
 0  xul.dll!nsRefPtr<nsNPAPIPluginInstance>::assign_with_AddRef(nsNPAPIPluginInstance *) [nsAutoPtr.h : 879 + 0x3]
    eip = 0x672fe760   esp = 0x002bc908   ebp = 0x002bc90c   ebx = 0x00000000
    esi = 0x00000090   edi = 0x002bcabc   eax = 0xdddddddd   ecx = 0x002bc9d4
    edx = 0x76fb7094   efl = 0x00210286
    Found by: given as instruction pointer in context
 1  xul.dll!nsRefPtr<nsNPAPIPluginInstance>::operator=(nsRefPtr<nsNPAPIPluginInstance> const &) [nsAutoPtr.h : 956 + 0xd]
    eip = 0x67abfeb5   esp = 0x002bc914   ebp = 0x002bc91c
    Found by: call frame info
 2  xul.dll!nsPluginNativeWindow::GetPluginInstance(nsRefPtr<nsNPAPIPluginInstance> &) [nsPluginNativeWindow.h : 78 + 0xe]
    eip = 0x67abf066   esp = 0x002bc924   ebp = 0x002bc92c
    Found by: call frame info

I have a saved test case that reproduces and am reducing it now. Will work on the regression range next.

I'm concerned about bug 719117 being the cause.
Comment 1 Bob Clary [:bc:] 2012-05-30 09:49:01 PDT
Created attachment 628365 [details]
testcase.tar.gz

tar xvf testcase.tar.gz
firefox ./index.html

You may need to reload. Trying to inline the script appears to make the crash go away.

Note in windbg this shows eax=0xfeeefeee which is HeapFree.
Comment 2 Benjamin Smedberg [:bsmedberg] 2012-05-30 10:12:59 PDT
I'm going to work on diagnosis, but I'd love somebody to work on a regression range.
Comment 3 Daniel Veditz [:dveditz] 2012-05-30 10:20:17 PDT
looks like we're using deleted objects so guessing sec-critical.

IIRC we fixed a plugin instance related bug in Firefox 13 -- check that one?
Comment 4 Bob Clary [:bc:] 2012-05-30 10:59:37 PDT
from the debug Nightly builds from ftp.

good 20120525121926
bad  20120526213524

definitely looking like bug 719117 comment 59. I'll do an actual bisection to make sure.
Comment 5 Benjamin Smedberg [:bsmedberg] 2012-05-30 12:10:12 PDT
Having trouble with the minimized testcase, perhaps due to permissions issues, but on the URL:

>	xul.dll!nsRefPtr<nsNPAPIPluginInstance>::assign_with_AddRef(rawPtr=0xdddddddd)  Line 845	C++
 	xul.dll!nsRefPtr<nsNPAPIPluginInstance>::operator=(rhs={...})  Line 923	C++
 	xul.dll!nsPluginNativeWindow::GetPluginInstance(aPluginInstance={...})  Line 44	C++
 	xul.dll!PluginWndProcInternal(hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 206	C++
 	xul.dll!CallWindowProcCrashProtected(wndProc=0x57aa3280, hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 32	C++

 	xul.dll!PluginWndProc(hWnd=0x0012040a, msg=0x00000090, wParam=0x00000000, lParam=0x00000000)  Line 347	C++
 	user32.dll!_InternalCallWinProc@20() 	
 	user32.dll!_UserCallWinProcCheckWow@32() 	
 	user32.dll!_DispatchClientMessage@24() 	
 	user32.dll!___fnDWORD@4() 	
 	ntdll.dll!_KiUserCallbackDispatcher@12() 	
 	user32.dll!_NtUserDestroyWindow@4() 	
 	xul.dll!nsWindow::Destroy()  Line 646	C++
 	xul.dll!nsWindow::~nsWindow()  Line 395	C++
 	xul.dll!ChildWindow::~ChildWindow() 	C++
 	xul.dll!ChildWindow::`scalar deleting destructor'() 	C++
 	xul.dll!nsBaseWidget::Release()  Line 56	C++
 	xul.dll!nsWindow::Release()  Line 415	C++
 	xul.dll!nsCOMPtr<nsIWidget>::~nsCOMPtr<nsIWidget>()  Line 488	C++
 	xul.dll!nsPluginInstanceOwner::~nsPluginInstanceOwner()  Line 366	C++
 	xul.dll!nsPluginInstanceOwner::`scalar deleting destructor'() 	C++
 	xul.dll!nsPluginInstanceOwner::Release()  Line 373	C++
 	xul.dll!nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener>()  Line 875	C++
 	xul.dll!nsListenerStruct::~nsListenerStruct()  Line 53	C++
 	xul.dll!nsListenerStruct::`scalar deleting destructor'() 	C++
 	xul.dll!nsTArrayElementTraits<nsListenerStruct>::Destruct(e=0x08d33c28)  Line 349	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::DestructRange(start=0x00000000, count=0x00000001)  Line 1211	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::RemoveElementsAt(start=0x00000000, count=0x00000001)  Line 932	C++
 	xul.dll!nsTArray<nsListenerStruct,nsTArrayDefaultAllocator>::RemoveElementAt(index=0x00000000)  Line 938	C++
 	xul.dll!nsAutoTObserverArray<nsListenerStruct,2>::RemoveElementAt(index=0x00000000)  Line 186	C++
 	xul.dll!nsEventListenerManager::RemoveEventListener(aListener=0x0d83d870, aType=0x0000057e, aUserType=0x06b37d68, aFlags=0x00000004)  Line 400	C++
 	xul.dll!nsEventListenerManager::RemoveEventListenerByType(aListener=0x0d83d870, aType={...}, aFlags=0x00000004)  Line 446	C++
 	xul.dll!nsEventListenerManager::RemoveEventListener(aType={...}, aListener=0x0d83d870, aUseCapture=true)  Line 919	C++
 	xul.dll!nsINode::RemoveEventListener(aType={...}, aListener=0x0d83d870, aUseCapture=true)  Line 1095	C++
 	xul.dll!nsPluginInstanceOwner::Destroy()  Line 2712	C++
 	xul.dll!nsObjectLoadingContent::DoStopPlugin(aInstanceOwner=0x0d83d868, aDelayedStop=false, aForcedReentry=false)  Line 2207	C++
 	xul.dll!nsObjectLoadingContent::StopPluginInstance()  Line 2236	C++
 	xul.dll!InDocCheckEvent::Run()  Line 153	C++
 	xul.dll!nsBaseAppShell::RunSyncSectionsInternal(aStable=true, aThreadRecursionLevel=0x00000000)  Line 354	C++
 	xul.dll!nsBaseAppShell::RunSyncSections(stable=true, threadRecursionLevel=0x00000000)  Line 91	C++
 	xul.dll!nsBaseAppShell::AfterProcessNextEvent(thr=0x00557748, recursionDepth=0x00000000)  Line 409	C++
 	xul.dll!nsThread::ProcessNextEvent(mayWait=false, result=0x002bd32f)  Line 639	C++
 	xul.dll!NS_ProcessNextEvent_P(thread=0x00557748, mayWait=false)  Line 213	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(aDelegate=0x00561ac8)  Line 82	C++
 	xul.dll!MessageLoop::RunInternal()  Line 209	C++
 	xul.dll!MessageLoop::RunHandler()  Line 202	C++
 	xul.dll!MessageLoop::Run()  Line 176	C++
 	xul.dll!nsBaseAppShell::Run()  Line 165	C++
 	xul.dll!nsAppShell::Run()  Line 232	C++
 	xul.dll!nsAppStartup::Run()  Line 256	C++
 	xul.dll!XREMain::XRE_mainRun()  Line 3786	C++
 	xul.dll!XREMain::XRE_main(argc=0x00000004, argv=0x00547ab0, aAppData=0x011cc864)  Line 3863	C++

So PluginWindProc is being called on a window which already has a deleted nsPluginNativeWindowWin, but ::GetProp(hWnd, NS_PLUGIN_WINDOW_PROPERTY_ASSOCIATION) is still returning the dead window.

The testcase uses an <applet> nested inside of a <canvas>, which already has me concerned.

Relevant debug-spew: 2x "WARNING: Plugin failed to subclass our window"
Various "Subdocument container has no frame" warnings
"WARNING: Unable to test style tree integrity -- no content node:"
Comment 6 Benjamin Smedberg [:bsmedberg] 2012-05-30 12:36:27 PDT
It also happens that a 32-bit debug build exits because we hit the exception case in CallWindowProcCrashProtected and then just call TerminateProcess, which makes debugging this kinda painful.

When debugging, every time we reload we hit nsPluginNativeWindowWin::SubclassAndAssociateWindow, but we never hit nsPluginNativeWindowWin::UndoSubclassAndAssociateWindow

We destroy the nsPluginNativeWindow at this stack:

>	xul.dll!nsPluginNativeWindowWin::~nsPluginNativeWindowWin()  Line 506	C++
 	xul.dll!nsPluginNativeWindowWin::`scalar deleting destructor'() 	C++
 	xul.dll!PLUG_DeletePluginNativeWindow(aPluginNativeWindow=0x08217610)  Line 744	C++
 	xul.dll!nsPluginInstanceOwner::~nsPluginInstanceOwner()  Line 356	C++
 	xul.dll!nsPluginInstanceOwner::`scalar deleting destructor'() 	C++
 	xul.dll!nsPluginInstanceOwner::Release()  Line 373	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::assign_assuming_AddRef(newPtr=0x00000000)  Line 863	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::assign_with_AddRef(rawPtr=0x00000000)  Line 847	C++
 	xul.dll!nsRefPtr<nsPluginInstanceOwner>::operator=(rhs=0x00000000)  Line 931	C++
 	xul.dll!nsObjectLoadingContent::StopPluginInstance()  Line 2238	C++
 	xul.dll!InDocCheckEvent::Run()  Line 153	C++
Comment 7 Al Billings [:abillings] 2012-05-30 13:14:49 PDT
We need to figure out whether this needs to block Firefox 13 or not, given that it is a sec-critical.
Comment 8 Al Billings [:abillings] 2012-05-30 13:31:13 PDT
This appears to be a regression. Bug does not crash my Firefox 12 on http://www.planetminecraft.com/skin/dark-souls-knight/ on the same machine it crashes current nightly.
Comment 9 Al Billings [:abillings] 2012-05-30 13:32:04 PDT
Bob thinks we regressed on 5/26 in comment 4 with https://bugzilla.mozilla.org/show_bug.cgi?id=719117#c59.
Comment 10 Daniel Veditz [:dveditz] 2012-05-30 13:36:57 PDT
Part of bug 719117 (a potential regressor?) involves backing out bug 724781, a Java sec-critical bug fix. Maybe it didn't get re-fixed correctly by 719117?
Comment 11 Benjamin Smedberg [:bsmedberg] 2012-05-30 13:58:10 PDT
It doesn't appear that this is a dup or directly related to bug 724781, those testcases still pass and the crash here is in a different place.
Comment 12 Al Billings [:abillings] 2012-05-30 14:06:04 PDT
Hangs java until I quite the JRE console when I run http://www.planetminecraft.com/skin/dark-souls-knight/ in the 5/27 Aurora build from ftp://ftp.mozilla.org/pub/firefox/nightly/2012-05-27-04-20-07-mozilla-aurora/. This is from the day *before* the checkin on 5/28 at https://bugzilla.mozilla.org/show_bug.cgi?id=719117#c65. 

Since Firefox 12 is affected and later Firefox 13 builds are affected, I'm going to try to narrow this down.

3/1  Firefox 13 (Mozilla Central) does *not* show bug (ftp://ftp.mozilla.org/pub/firefox/nightly/2012/03/2012-03-01-03-11-35-mozilla-central/ and Built from http://hg.mozilla.org/mozilla-central/rev/1c3b291d0830).
3/2  Firefox 13 (Mozilla Central) shows bug (ftp://ftp.mozilla.org/pub/firefox/nightly/2012/03/2012-03-02-03-11-12-mozilla-central/ and Built from http://hg.mozilla.org/mozilla-central/rev/3a7b9e61c263).

When the problem happens, Java console says:

Java Plug-in 1.6.0_31
Using JRE version 1.6.0_31-b04-415-11M3635 Java HotSpot(TM) 64-Bit Server VM
User home directory = /Users/albillload: class skinpreviewapplet.AppletLauncher not found.
java.lang.ClassNotFoundException: skinpreviewapplet.AppletLauncher
	at sun.plugin2.applet.Applet2ClassLoader.findClass(Applet2ClassLoader.java:252)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass0(Plugin2ClassLoader.java:249)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass(Plugin2ClassLoader.java:179)
	at sun.plugin2.applet.Plugin2ClassLoader.loadClass(Plugin2ClassLoader.java:160)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	at sun.plugin2.applet.Plugin2ClassLoader.loadCode(Plugin2ClassLoader.java:690)
	at sun.plugin2.applet.Plugin2Manager.createApplet(Plugin2Manager.java:3045)
	at sun.plugin2.applet.Plugin2Manager$AppletExecutionRunnable.run(Plugin2Manager.java:1497)
	at java.lang.Thread.run(Thread.java:680)
Exception: java.lang.ClassNotFoundException: skinpreviewapplet.AppletLauncher
Comment 13 Al Billings [:abillings] 2012-05-30 14:09:37 PDT
Dveditz's guess is that maybe this checkin caused the issue:

https://bugzilla.mozilla.org/show_bug.cgi?id=724781#c14
Comment 14 Al Billings [:abillings] 2012-05-30 14:10:57 PDT
So this is a Firefox 13 only issue. We should fix this, if possible, so we've never shipped this sec-critical bug to the public.
Comment 15 Bob Clary [:bc:] 2012-05-30 15:07:24 PDT
I reconfirmed on Windows XP that 

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-25-mozilla-central-debug/
buildid 20120525001956 does not crash

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-26-mozilla-central-debug/
buildid 20120525121926 does not crash

ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-27-mozilla-central-debug/
buildid 20120526213524 does crash

bsmedberg mentioned that this might be due to webgl not being supported on vms and that he needed to disable webgl and the direct3d acceleration to reproduce on a physical windows machine.

Al mentioned that he was testing on OSX. I can confirm a hang in my nightly debug build. Breaking into gdb after the hang I see a stack involving JVM_MonitorWait, JVM_Lseek etc. 

I'm not sure Al and I are looking at the same bug.
Comment 16 Bob Clary [:bc:] 2012-05-30 15:10:02 PDT
Created attachment 628499 [details]
os x stack after hang and ctrl-c
Comment 17 Steven Michaud [:smichaud] (Retired) 2012-05-30 17:16:30 PDT
Bob, the next time try the following, in this order:

thread apply all bt
call (void) pss()

This will output a gdb stack trace of all threads, and then a Java stack trace of all threads.

The output of pss() might go to the Console app, and not Terminal.
Comment 18 Steven Michaud [:smichaud] (Retired) 2012-05-30 17:26:26 PDT
(Following up comment #17)

Actually don't bother.  Your hang, Bob, appears to be the same as in bug 750480.
Comment 19 Al Billings [:abillings] 2012-05-30 17:35:19 PDT
(In reply to Steven Michaud from comment #18)
> (Following up comment #17)
> 
> Actually don't bother.  Your hang, Bob, appears to be the same as in bug
> 750480.

Is this a different hang than the one I see?
Comment 20 Steven Michaud [:smichaud] (Retired) 2012-05-30 17:42:20 PDT
Also see bug 708461, which also involves a hang on a Java error.
Comment 21 Steven Michaud [:smichaud] (Retired) 2012-05-30 17:43:06 PDT
> Is this a different hang than the one I see?

I don't think so.  But if you have any doubts follow my instructions in comment #17.
Comment 22 Steven Michaud [:smichaud] (Retired) 2012-05-30 17:49:20 PDT
By the way, I'm not at all sure the hangs various people have seen are related to the crash that was first reported.
Comment 23 Al Billings [:abillings] 2012-05-30 17:50:28 PDT
(In reply to Steven Michaud from comment #22)
> By the way, I'm not at all sure the hangs various people have seen are
> related to the crash that was first reported.

Yeah, that is kind of what I was wondering.
Comment 24 Daniel Veditz [:dveditz] 2012-05-30 18:08:43 PDT
bug 719117 still seems the most likely regressor in the range given in comment 15.

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-05-25+11:00&enddate=2012-05-26+22:00

Does not seem realistic to figure out the cause and fix in time for the Firefox 13 build
Comment 25 Mats Palmgren (:mats) 2012-05-30 19:17:46 PDT
Created attachment 628565 [details] [diff] [review]
wip

This seems to fix the crash at the URL in my trunk build.
If we're considering shipping Fx13 as is, then I think this
could be worth taking as a wallpaper at least. (I haven't
analyzed the crash enough to say if this is a complete fix or not)

The idea is to prevent nsPluginInstanceOwner::Release() to actually
delete the object, thus preventing ~nsPluginInstanceOwner() from
deleting the window, for the duration of DoStopPlugin.
The event for a delayed DoStopPlugin also has a strong ref so it
should work in that case too.
Comment 26 Mats Palmgren (:mats) 2012-05-31 03:36:21 PDT
Here's mozilla-beta Try builds with the above patch:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mpalmgren@mozilla.com-880a0d4c4fa3/
Comment 27 Mats Palmgren (:mats) 2012-05-31 03:46:56 PDT
Comment on attachment 628565 [details] [diff] [review]
wip

I'd like to propose this as a low-risk wallpaper for all branches
while we continue investigating.  I think there's a good chance this
is the correct fix and I don't see how it can make things worse.
Comment 28 Bob Clary [:bc:] 2012-05-31 06:05:56 PDT
The first bad revision is:
changeset:   94997:8915b49bc1bc
user:        Mats Palmgren <matspal@gmail.com>
date:        Sat May 26 00:34:11 2012 +0200
summary:     Bug 719117 - Fix bug 724781 by preventing reentry to DoStopPlugin with a flag.  part 2/2, r=bsmedberg
Comment 29 Benjamin Smedberg [:bsmedberg] 2012-05-31 07:03:58 PDT
Comment on attachment 628565 [details] [diff] [review]
wip

This does appear to fix the problem, and as far returning to the prior behavior it's correct. It's still a bit worrisome to me that we never call nsPluginNativeWin::UndoSubclassAndAssociateWindow, but I'll leave that for another bug.
Comment 30 Benjamin Smedberg [:bsmedberg] 2012-05-31 07:10:42 PDT
https://hg.mozilla.org/mozilla-central/rev/0aa7fc75cad5
Comment 31 Benjamin Smedberg [:bsmedberg] 2012-05-31 07:12:25 PDT
Comment on attachment 628565 [details] [diff] [review]
wip

Bug caused by (feature/regressing bug #): bug 719117
User impact if declined: likely crashes at planetminecraft and possibly other sites, exposure to critical security bug
Testing completed (on m-c, etc.): local testing, just landed on m-c
Risk to taking this patch (and alternatives if risky): doesn't seem especially risky, but we've said that before!
String or UUID changes made by this patch: None
Comment 32 Alex Keybl [:akeybl] 2012-05-31 10:57:37 PDT
Comment on attachment 628565 [details] [diff] [review]
wip

[Triage Comment]
The security team is concerned with shipping a new sg:crit regression, and Benjamin feels this fix is less risky than any of the alternatives, so we'll spin a Beta 7 with this.
Comment 34 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-01 10:58:00 PDT
I've tried several different builds but have been unable to reproduce this bug:
 * Firefox 13.0b6
 * Firefox 14.0a2 2012-05-30
 * Firefox 15.0a1 2012-05-30
 * Firefox 13.0a1 2012-03-02

Therefore, I am unable to verify this is fixed (verification blocks 13.0b7 sign-off). Please, can someone help me?
Comment 35 Bob Clary [:bc:] 2012-06-01 11:17:11 PDT
fwiw, I tested the url in crash automation with fresh builds from this morning and could not reproduce the crash.
Comment 36 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-01 13:11:22 PDT
Thanks to Bob via IRC, I am not able to reproduce this issue. Will now try to verify the fix.
Comment 37 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-01 13:21:04 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> Thanks to Bob via IRC, I am not able to reproduce this issue. Will now try
> to verify the fix.

Sorry, that's should read "I am able to reproduce".
Comment 38 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-01 14:00:19 PDT
Verified on latest debug builds for Nightly 15, Aurora 14, and Beta 13 (dated 2012-06-01).

Note You need to log in before you can comment on or make changes to this bug.