Closed Bug 347743 Opened 18 years ago Closed 17 years ago

(trunk) Crash when closing tab with windows media player with onblur handler that removes plugin object

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.9alpha6

People

(Reporter: martijn.martijn, Assigned: sharparrow1)

References

()

Details

(Keywords: crash, testcase, Whiteboard: [sg:critical?] trunk version (post 1.8-branch))

Attachments

(7 files, 7 obsolete files)

22.70 KB, patch
Details | Diff | Splinter Review
22.70 KB, patch
roc
: review+
Details | Diff | Splinter Review
15.50 KB, patch
Details | Diff | Splinter Review
24.04 KB, patch
Details | Diff | Splinter Review
1.17 KB, patch
roc
: review+
Details | Diff | Splinter Review
18.96 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.46 KB, patch
jst
: review+
Details | Diff | Splinter Review
This is the trunk version of the crash as mentioned in bug 347742.
The testcase is also from that bug.

Talkback ID for trunk: TB21861563G
nsPropertyTable::PropertyList::Equals   nsPropertyTable::GetProperty   nsPluginInstanceOwner::Destroy   nsObjectFrame::StopPlugin   nsObjectFrame::Destroy   nsBlockFrame::RemoveFrame   nsFrameManager::RemoveFrame   0x0012e7bc
Some other way of crashing with the windows media player, might be useful for testing, since you're busy with bug 347662 anyway. See also bug 347742.
Depends on: 347662, 347742
Erm.... nsPluginInstanceOwner::Destroy doesn't call nsPropertyTable::GetProperty.

Does this still appear with the band-aid from bug 347662?  Can you reproduce this in a debug build?
Flags: blocking1.9?
Yes, this still happens with the band-aid from bug 347662 (although I had to more or less 'invent' how to apply the patch on trunk).
I think I am able to reproduce this in gdb.
Summary: Crash when closing tab with windows media player with onblur handler that removes plugin object → (trunk) Crash when closing tab with windows media player with onblur handler that removes plugin object
Assignee: nobody → jst
Flags: blocking1.9? → blocking1.9-
Jonas, is there a reason this crash isn't even wanted?
Put another way, why's a security bug not blocking and not wanted?  ;)
oops, didn't realize this was that bad. We'll look at it next triage again.
Flags: blocking1.9- → blocking1.9?
Blocks: 347742
No longer depends on: 347742
Flags: blocking1.9? → blocking1.9+
Whiteboard: [sg:critical?]
does this still look like its making 1.9?
This fixes the bug on Win32 (only, and AFAIK it's only a problem with the Windows Media Player plugin) by making the plugin get destroyed outside of layout frame destruction where it's not safe to call into plugin code. diff -w coming up for review.
Attached patch diff -w of the above. (obsolete) — Splinter Review
Attachment #268873 - Flags: superreview?(roc)
Attachment #268873 - Flags: review?(roc)
Comment on attachment 268873 [details] [diff] [review]
diff -w of the above.

This aint ready yet.
Attachment #268873 - Attachment is obsolete: true
Attachment #268873 - Flags: superreview?(roc)
Attachment #268873 - Flags: review?(roc)
Attachment #268872 - Attachment is obsolete: true
This is the same as the first attachment here with some fixes, mainly to reparent the plugin widget to the top-level firefox window (per roc's suggestion). To do that I needed to use the native Win32 API as there's no widget level API that lets me find the right native window. diff -w coming up for review.
Attachment #269011 - Flags: review?
Attached patch diff -w of the above. (obsolete) — Splinter Review
Attachment #269012 - Flags: superreview?(roc)
Attachment #269012 - Flags: review?(roc)
Attachment #269011 - Flags: review?
   virtual void StopPlugin();
+  void StopPlugin(PRBool aDelayedStop);

Slightly confusing. Call it StopPluginInternal, say, and document what aDelayedStop does.

doStopPlugin should be DoStopPlugin.

+    // Pass ownership of mInstanceOwner to nsStopPluginRunnable

Since it's refcounted, talking about "passing ownership" is slightly misleading. For a little while, both own it.

+  PRPackedBool                mDestroyWidget;

Can you document this?

+/*
+ * Copy of GetNSWindowPropName() from widget/src/windows/nsWindow.cpp
+ */

This is a bit nasty. Is there a way to avoid copying the code, perhaps by adding a new method to nsIWidget to get the nsIWidget from a void* native handle? A kind of reverse of GetNativeData. It could be stubbed out on non-Windows. Alternatively, an API to find the toplevel nsIWidget by scanning the native hierarchy.

nsIWidget lifetimes worry me. You're relying on the plugin's nsIWidget not holding/using a pointer to its parent. That seems to be OK but please document it.

+    // the top-level firefox widget so that we can safely tear down

"Gecko" not "firefox" :-)

I wonder if moving the plugin widget up could do weird things ... e.g. should we disable its window to avoid it getting focus?
Attached patch Updated fix.Splinter Review
Attachment #269011 - Attachment is obsolete: true
Attachment #269012 - Attachment is obsolete: true
Attachment #269012 - Flags: superreview?(roc)
Attachment #269012 - Flags: review?(roc)
Roc, this should address all those comments. Let me know if you come across something that doesn't, or you think of something else...
Attachment #269760 - Flags: superreview?(roc)
Attachment #269760 - Flags: review?(roc)
Comment on attachment 269760 [details] [diff] [review]
Updated fix. (diff -w)

+    /*
+     * Return the widget at the top of the widget hierarcy for the
+     * running application.
+     */
+    virtual nsIWidget* GetAppTopWidget(void) = 0;

I'm not super enthusiastic about this name. Perhaps call it GetTopLevelWindow() and explain that it's the nearest ancestor the widget which does not have a Gecko parent?
Attachment #269760 - Flags: superreview?(roc)
Attachment #269760 - Flags: superreview+
Attachment #269760 - Flags: review?(roc)
Attachment #269760 - Flags: review+
Done. This patch changes the name. To do that I had to rename the internal nsWindow::GetTopLevelWindow() method (which unfortunately doesn't do what I want) to GetTopLevelWindowInternal().
Fix checked in.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9alpha6
Flags: wanted1.8.1.x-
Flags: wanted1.8.0.x-
Whiteboard: [sg:critical?] → [sg:critical?] trunk version (post 1.8-branch)
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070628 Minefield/3.0a6pre

Thanks for fixing this one!
Status: RESOLVED → VERIFIED
Depends on: 386332
Attached patch Backout patch (obsolete) — Splinter Review
This backs out jst's commit and fixes the crashing reported in bug 386332
Whoops, don't know how the DOS line endings got in there...
Attachment #270342 - Attachment is obsolete: true
Backed out so we don't ship in a6 with the crashing
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
The cause of the crash is that nsWindow::SetParent is broken: it reparents the native widget, but not itself (the nsIWidget).  This has extremely bad consequences because it means the nsIWidget is not guaranteed to be destroyed at the same time as its parent, which breaks the nsIWidget destruction code and ultimately causes us to access freed memory, leading to the crash in bug 386332.
It looks like the GTK2 SetParent is broken as well?
I think we might have to give up on this approach, though; even with SetParent fixed, I managed to trigger an internal crash in Flash once. (It's also possible that fixing SetParent isn't enough; I'm getting an extremely odd shutdown crash I haven't been able to figure out.)
(In reply to comment #24)
> It looks like the GTK2 SetParent is broken as well?

Probably; it doesn't matter at the moment because we don't call it, though.
Depends on: 386253
(In reply to comment #25)
> I think we might have to give up on this approach, though; even with SetParent
> fixed, I managed to trigger an internal crash in Flash once. (It's also
> possible that fixing SetParent isn't enough; I'm getting an extremely odd
> shutdown crash I haven't been able to figure out.)
> 

Are you getting that shutdown crash consistently? If so, any thoughts on how to reproduce this?
Comment on attachment 270349 [details] [diff] [review]
Fix for SetParent (for reference)

I've been asked to ask for review on this patch... I believe it's correct, and it's guaranteed not to break anything because we never call SetParent().   (If we did, we would have run into this problem a while ago.)

Another possibility worth considering would be switching to delayed destruction of views.  Essentially, the idea would be that we stick all the views we want to destroy into an array or something, and then destroy the views once we're out of frame construction.  That solution wouldn't require any fancy widget tricks, and it should fix this problem by not dropping into widget code at dangerous times.
Attachment #270349 - Flags: review?(roc)
Ugh, I really don't know what to do here...

The patch that was checked in causes crashes on shutdown, for some reason I haven't figured out.

I don't think there's any other feasible possibility, though: we can't delay widget destruction because of parent widgets, but we can't destroy the plugin's widget outside of an event specifically for that purpose because we can process arbitrary events from the event queue (like mouse events) during the destruction of the plugin's widget!
Attached patch Fixed patch (WIP) (obsolete) — Splinter Review
This version seems to work without crashing.

It is a little bit weird to be reparenting out plugin widgets to the desktop, but it works just fine as far as I can tell.
Assignee: jst → sharparrow1
Status: REOPENED → ASSIGNED
Attached patch Fixed PatchSplinter Review
Version without compile error.  I'd appreciate some testing on this, since plugin stuff tends to be sensitive.
Attachment #272734 - Attachment is obsolete: true
Attachment #272834 - Flags: review?(roc)
(Ignore the change to ForceRedraw.)
I tried the patch, but I crash even befor the profile manager comes up:
http://martijn.martijn.googlepages.com/bt.txt
I did a make -f client.mk build_all_depend, or should I have done a make -f client.mk distclean before that?
I have no clue how you're crashing there.
Comment on attachment 272834 [details] [diff] [review]
Fixed Patch

This looks alright, but we'd better find out what Martijn's issue is.
Attachment #272834 - Flags: superreview+
Attachment #272834 - Flags: review?(roc)
Attachment #272834 - Flags: review+
Ok, just applying the view part of the patch is enough to make my debug build crash, for some reason.
Sorry for the disturbance.
It's working fine now, after I have rebuilt my tree from scratch (I also had some stuff cleaned up, that might have caused it).
Ok, this is something that seems to crash because of this patch:
http://martijn.martijn.googlepages.com/flash_crash.htm
It crashes for me within 10 seconds normally after the iframe has disappeared.

I see this assertion in my debug build:
###!!! ASSERTION: dangling entry->mJSObj JSPrivate because we can't find cx: 'Er
ror', file c:/mozilla/mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp, line 15
94

The iframe content is basically this:
<script>
function doe(i){
document.embeds[0].removeAttribute('style');
}
setTimeout(doe,200,0);
</script>
<embed type="application/x-shockwave-flash" style=" float: right;">
jst, would you mind looking at the above?  I'm not really familiar with the plugin code.

(Note that it crashes in cycle collection.)
Attached patch Fix for Martijn's crash. (obsolete) — Splinter Review
The reasons for the crash Martijn was seeing was that now the plugin destruction can be delayed long enough for the JS context not being reachable from the plugins npp by the time the plugin is destroyed. That led to us not being able to null out the plugin's NPObject wrappers private data, which then during GC caused a crash due to us touching the deleted NPObject. This patch fixes that.
Attachment #274244 - Flags: superreview?(jonas)
Attachment #274244 - Flags: review?(sharparrow1)
Attachment #274244 - Flags: superreview?(jonas) → superreview+
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

Redirecting review request to someone who knows the plugin code.
Attachment #274244 - Flags: review?(sharparrow1) → review?(brendan)
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

Redirecting to jst.

/be
Attachment #274244 - Flags: review?(brendan) → review?(jst)
(In reply to comment #44)
> (From update of attachment 274244 [details] [diff] [review])
> Redirecting to jst.

Since Johnny wrote the patch himself, I'm quite sure he'll approve it ;)

(In reply to comment #45)
> (In reply to comment #44)
> > (From update of attachment 274244 [details] [diff] [review] [details])
> > Redirecting to jst.
> 
> Since Johnny wrote the patch himself, I'm quite sure he'll approve it ;)

Suits me! ;-)

Ok, I missed that. Looking now.

/be
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

>+  NppAndCx *nppcx = static_cast<NppAndCx *>(arg);

I'm not a C++ purist -- I don't even play one on TV -- but isn't reinterpret_cast more the "right thing" here?

>+  JSContext *cx = GetJSContext(npp);
>+  if (!cx) {
>+    nsCOMPtr<nsIThreadJSContextStack> stack =
>+      do_QueryInterface(sContextStack);
>+
>+    if (!stack) {
>+      stack = do_GetService("@mozilla.org/js/xpc/ContextStack;1");
>+      if (!stack) {
>+        // Nothing we can do if we get here.
>+        return;
>+      }
>+    }
>+
>+    stack->GetSafeJSContext(&cx);
>+    if (!cx) {
>+      // Nothing we can do if we get here.
>+    }
>+  }
>+
>   if (sNPObjWrappers.ops) {
>+    NppAndCx nppcx = { npp, cx };
>     PL_DHashTableEnumerate(&sNPObjWrappers,
>-                           NPObjWrapperPluginDestroyedCallback, npp);
>+                           NPObjWrapperPluginDestroyedCallback, &nppcx);

So is there a possibility that we'll have objects alive in the runtime with dangling private data pointers, but no ability to find a cx to use here? If so, and if the case does not involve running from the last GC when control flow reaches here, then I don't see how that bad case could arise. But it seems we could just always use the safe context, and shrink this code.

(Actually, we could also violate the JS API and use LOCKED_OBJ_SET_SLOT, which requires no cx at all.)

Thoughts? r=me in any event, but if you can shrink code please do.

/be
Attachment #274244 - Flags: review?(jst) → review+
(In reply to comment #47)
> (From update of attachment 274244 [details] [diff] [review])
> >+  NppAndCx *nppcx = static_cast<NppAndCx *>(arg);
> 
> I'm not a C++ purist -- I don't even play one on TV -- but isn't
> reinterpret_cast more the "right thing" here?

Sure.

> So is there a possibility that we'll have objects alive in the runtime with
> dangling private data pointers, but no ability to find a cx to use here? If so,

Well, yes, we can run into a situation where we're unable to find the cx associated with the plugin due to its document already having been torn down (especially with the other patch in this bug applied).

> and if the case does not involve running from the last GC when control flow
> reaches here, then I don't see how that bad case could arise. But it seems we
> could just always use the safe context, and shrink this code.

This code rarely (if ever) runs from GC. But yeah, I could always use the safe context here (but the code doesn't really shrink much if I do).

Updated patch to that effect coming up.
Attachment #274244 - Attachment is obsolete: true
Attachment #274700 - Flags: superreview+
Attachment #274700 - Flags: review+
The patches in this bug can land, not? Or should I do some more testing before landing?
I landed "Updated fix for Martijn's crash"; it apparently caused Bug 391182.
Ok, "Updated fix for Martijn's crash" landed at 2007-08-06 10:19.
Shouldn't https://bugzilla.mozilla.org/attachment.cgi?id=272834 ("Fixed patch") also land at some point?
Well, I don't know how important the assertion in bug 391182 is; if it's important enough to get this fix backed out, we obviously don't want to land the other part.
Depends on: 391182
Eli, I'm fixing bug 391182, it's a bogus assertion after the fix you landed that I didn't catch in my testing. Please go ahead and land the fix for this bug.
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Group: security
Flags: in-testsuite?
Thanks to all people who helped fixing this!

Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007082111 Minefield/3.0a8pre
Status: RESOLVED → VERIFIED
Depends on: 401393
Depends on: 430219
Depends on: 434593
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: