(trunk) Crash when closing tab with windows media player with onblur handler that removes plugin object

VERIFIED FIXED in mozilla1.9alpha6

Status

()

--
critical
VERIFIED FIXED
12 years ago
10 years ago

People

(Reporter: martijn.martijn, Assigned: sharparrow1)

Tracking

({crash, testcase})

Trunk
mozilla1.9alpha6
x86
Windows XP
crash, testcase
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9 +
wanted1.8.1.x -
wanted1.8.0.x -
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:critical?] trunk version (post 1.8-branch), URL)

Attachments

(7 attachments, 7 obsolete attachments)

22.70 KB, patch
Details | Diff | Splinter Review
22.70 KB, patch
roc
: review+
Details | Diff | Splinter Review
15.50 KB, patch
Details | Diff | Splinter Review
24.04 KB, patch
Details | Diff | Splinter Review
1.17 KB, patch
roc
: review+
Details | Diff | Splinter Review
18.96 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.46 KB, patch
jst
: review+
Details | Diff | Splinter Review
(Reporter)

Description

12 years ago
This is the trunk version of the crash as mentioned in bug 347742.
The testcase is also from that bug.

Talkback ID for trunk: TB21861563G
nsPropertyTable::PropertyList::Equals   nsPropertyTable::GetProperty   nsPluginInstanceOwner::Destroy   nsObjectFrame::StopPlugin   nsObjectFrame::Destroy   nsBlockFrame::RemoveFrame   nsFrameManager::RemoveFrame   0x0012e7bc
(Reporter)

Comment 1

12 years ago
Some other way of crashing with the windows media player, might be useful for testing, since you're busy with bug 347662 anyway. See also bug 347742.
Erm.... nsPluginInstanceOwner::Destroy doesn't call nsPropertyTable::GetProperty.

Does this still appear with the band-aid from bug 347662?  Can you reproduce this in a debug build?
Flags: blocking1.9?
(Reporter)

Comment 3

12 years ago
Yes, this still happens with the band-aid from bug 347662 (although I had to more or less 'invent' how to apply the patch on trunk).
I think I am able to reproduce this in gdb.
Summary: Crash when closing tab with windows media player with onblur handler that removes plugin object → (trunk) Crash when closing tab with windows media player with onblur handler that removes plugin object
Assignee: nobody → jst
Flags: blocking1.9? → blocking1.9-
Jonas, is there a reason this crash isn't even wanted?
Put another way, why's a security bug not blocking and not wanted?  ;)
oops, didn't realize this was that bad. We'll look at it next triage again.
Flags: blocking1.9- → blocking1.9?
Blocks: 347742
No longer depends on: 347742
Flags: blocking1.9? → blocking1.9+
Whiteboard: [sg:critical?]

Comment 7

11 years ago
does this still look like its making 1.9?
Created attachment 268872 [details] [diff] [review]
Fix, move plugin destruction out of frame destruction code.

This fixes the bug on Win32 (only, and AFAIK it's only a problem with the Windows Media Player plugin) by making the plugin get destroyed outside of layout frame destruction where it's not safe to call into plugin code. diff -w coming up for review.
Created attachment 268873 [details] [diff] [review]
diff -w of the above.
Attachment #268873 - Flags: superreview?(roc)
Attachment #268873 - Flags: review?(roc)
Comment on attachment 268873 [details] [diff] [review]
diff -w of the above.

This aint ready yet.
Attachment #268873 - Attachment is obsolete: true
Attachment #268873 - Flags: superreview?(roc)
Attachment #268873 - Flags: review?(roc)

Updated

11 years ago
Attachment #268872 - Attachment is obsolete: true
Created attachment 269011 [details] [diff] [review]
Fix, move plugin destruction out of frame destruction code.

This is the same as the first attachment here with some fixes, mainly to reparent the plugin widget to the top-level firefox window (per roc's suggestion). To do that I needed to use the native Win32 API as there's no widget level API that lets me find the right native window. diff -w coming up for review.
Attachment #269011 - Flags: review?
Created attachment 269012 [details] [diff] [review]
diff -w of the above.
Attachment #269012 - Flags: superreview?(roc)
Attachment #269012 - Flags: review?(roc)

Updated

11 years ago
Attachment #269011 - Flags: review?
   virtual void StopPlugin();
+  void StopPlugin(PRBool aDelayedStop);

Slightly confusing. Call it StopPluginInternal, say, and document what aDelayedStop does.

doStopPlugin should be DoStopPlugin.

+    // Pass ownership of mInstanceOwner to nsStopPluginRunnable

Since it's refcounted, talking about "passing ownership" is slightly misleading. For a little while, both own it.

+  PRPackedBool                mDestroyWidget;

Can you document this?

+/*
+ * Copy of GetNSWindowPropName() from widget/src/windows/nsWindow.cpp
+ */

This is a bit nasty. Is there a way to avoid copying the code, perhaps by adding a new method to nsIWidget to get the nsIWidget from a void* native handle? A kind of reverse of GetNativeData. It could be stubbed out on non-Windows. Alternatively, an API to find the toplevel nsIWidget by scanning the native hierarchy.

nsIWidget lifetimes worry me. You're relying on the plugin's nsIWidget not holding/using a pointer to its parent. That seems to be OK but please document it.

+    // the top-level firefox widget so that we can safely tear down

"Gecko" not "firefox" :-)

I wonder if moving the plugin widget up could do weird things ... e.g. should we disable its window to avoid it getting focus?
Created attachment 269759 [details] [diff] [review]
Updated fix.
Attachment #269011 - Attachment is obsolete: true
Attachment #269012 - Attachment is obsolete: true
Attachment #269012 - Flags: superreview?(roc)
Attachment #269012 - Flags: review?(roc)
Created attachment 269760 [details] [diff] [review]
Updated fix. (diff -w)

Roc, this should address all those comments. Let me know if you come across something that doesn't, or you think of something else...
Attachment #269760 - Flags: superreview?(roc)
Attachment #269760 - Flags: review?(roc)
Comment on attachment 269760 [details] [diff] [review]
Updated fix. (diff -w)

+    /*
+     * Return the widget at the top of the widget hierarcy for the
+     * running application.
+     */
+    virtual nsIWidget* GetAppTopWidget(void) = 0;

I'm not super enthusiastic about this name. Perhaps call it GetTopLevelWindow() and explain that it's the nearest ancestor the widget which does not have a Gecko parent?
Attachment #269760 - Flags: superreview?(roc)
Attachment #269760 - Flags: superreview+
Attachment #269760 - Flags: review?(roc)
Attachment #269760 - Flags: review+
Created attachment 270096 [details] [diff] [review]
Patch that was checked in.

Done. This patch changes the name. To do that I had to rename the internal nsWindow::GetTopLevelWindow() method (which unfortunately doesn't do what I want) to GetTopLevelWindowInternal().
Fix checked in.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED

Updated

11 years ago
Target Milestone: --- → mozilla1.9alpha6
Flags: wanted1.8.1.x-
Flags: wanted1.8.0.x-
Whiteboard: [sg:critical?] → [sg:critical?] trunk version (post 1.8-branch)
(Reporter)

Comment 19

11 years ago
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070628 Minefield/3.0a6pre

Thanks for fixing this one!
Status: RESOLVED → VERIFIED
Depends on: 386332
Created attachment 270342 [details] [diff] [review]
Backout patch

This backs out jst's commit and fixes the crashing reported in bug 386332
Created attachment 270343 [details] [diff] [review]
Backout patch round 2

Whoops, don't know how the DOS line endings got in there...
Attachment #270342 - Attachment is obsolete: true
Backed out so we don't ship in a6 with the crashing
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 23

11 years ago
The cause of the crash is that nsWindow::SetParent is broken: it reparents the native widget, but not itself (the nsIWidget).  This has extremely bad consequences because it means the nsIWidget is not guaranteed to be destroyed at the same time as its parent, which breaks the nsIWidget destruction code and ultimately causes us to access freed memory, leading to the crash in bug 386332.
It looks like the GTK2 SetParent is broken as well?
(Assignee)

Comment 25

11 years ago
I think we might have to give up on this approach, though; even with SetParent fixed, I managed to trigger an internal crash in Flash once. (It's also possible that fixing SetParent isn't enough; I'm getting an extremely odd shutdown crash I haven't been able to figure out.)
(Assignee)

Comment 26

11 years ago
(In reply to comment #24)
> It looks like the GTK2 SetParent is broken as well?

Probably; it doesn't matter at the moment because we don't call it, though.
(Assignee)

Comment 27

11 years ago
Created attachment 270349 [details] [diff] [review]
Fix for SetParent (for reference)
(Reporter)

Updated

11 years ago
Depends on: 386253
(In reply to comment #25)
> I think we might have to give up on this approach, though; even with SetParent
> fixed, I managed to trigger an internal crash in Flash once. (It's also
> possible that fixing SetParent isn't enough; I'm getting an extremely odd
> shutdown crash I haven't been able to figure out.)
> 

Are you getting that shutdown crash consistently? If so, any thoughts on how to reproduce this?
Duplicate of this bug: 386735
(Assignee)

Comment 30

11 years ago
Comment on attachment 270349 [details] [diff] [review]
Fix for SetParent (for reference)

I've been asked to ask for review on this patch... I believe it's correct, and it's guaranteed not to break anything because we never call SetParent().   (If we did, we would have run into this problem a while ago.)

Another possibility worth considering would be switching to delayed destruction of views.  Essentially, the idea would be that we stick all the views we want to destroy into an array or something, and then destroy the views once we're out of frame construction.  That solution wouldn't require any fancy widget tricks, and it should fix this problem by not dropping into widget code at dangerous times.
Attachment #270349 - Flags: review?(roc)
Attachment #270349 - Flags: superreview+
Attachment #270349 - Flags: review?(roc)
Attachment #270349 - Flags: review+
(Assignee)

Comment 31

11 years ago
Ugh, I really don't know what to do here...

The patch that was checked in causes crashes on shutdown, for some reason I haven't figured out.

I don't think there's any other feasible possibility, though: we can't delay widget destruction because of parent widgets, but we can't destroy the plugin's widget outside of an event specifically for that purpose because we can process arbitrary events from the event queue (like mouse events) during the destruction of the plugin's widget!
(Assignee)

Comment 32

11 years ago
Created attachment 272734 [details] [diff] [review]
Fixed patch (WIP)

This version seems to work without crashing.

It is a little bit weird to be reparenting out plugin widgets to the desktop, but it works just fine as far as I can tell.
Assignee: jst → sharparrow1
Status: REOPENED → ASSIGNED
(Assignee)

Comment 33

11 years ago
Created attachment 272834 [details] [diff] [review]
Fixed Patch

Version without compile error.  I'd appreciate some testing on this, since plugin stuff tends to be sensitive.
Attachment #272734 - Attachment is obsolete: true
Attachment #272834 - Flags: review?(roc)
(Assignee)

Comment 34

11 years ago
(Ignore the change to ForceRedraw.)
(Reporter)

Comment 35

11 years ago
I tried the patch, but I crash even befor the profile manager comes up:
http://martijn.martijn.googlepages.com/bt.txt
I did a make -f client.mk build_all_depend, or should I have done a make -f client.mk distclean before that?
(Assignee)

Comment 36

11 years ago
I have no clue how you're crashing there.
Comment on attachment 272834 [details] [diff] [review]
Fixed Patch

This looks alright, but we'd better find out what Martijn's issue is.
Attachment #272834 - Flags: superreview+
Attachment #272834 - Flags: review?(roc)
Attachment #272834 - Flags: review+
(Reporter)

Comment 38

11 years ago
Ok, just applying the view part of the patch is enough to make my debug build crash, for some reason.
(Reporter)

Comment 39

11 years ago
Sorry for the disturbance.
It's working fine now, after I have rebuilt my tree from scratch (I also had some stuff cleaned up, that might have caused it).
(Reporter)

Comment 40

11 years ago
Ok, this is something that seems to crash because of this patch:
http://martijn.martijn.googlepages.com/flash_crash.htm
It crashes for me within 10 seconds normally after the iframe has disappeared.

I see this assertion in my debug build:
###!!! ASSERTION: dangling entry->mJSObj JSPrivate because we can't find cx: 'Er
ror', file c:/mozilla/mozilla/modules/plugin/base/src/nsJSNPRuntime.cpp, line 15
94

The iframe content is basically this:
<script>
function doe(i){
document.embeds[0].removeAttribute('style');
}
setTimeout(doe,200,0);
</script>
<embed type="application/x-shockwave-flash" style=" float: right;">
(Assignee)

Comment 41

11 years ago
jst, would you mind looking at the above?  I'm not really familiar with the plugin code.

(Note that it crashes in cycle collection.)
Created attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

The reasons for the crash Martijn was seeing was that now the plugin destruction can be delayed long enough for the JS context not being reachable from the plugins npp by the time the plugin is destroyed. That led to us not being able to null out the plugin's NPObject wrappers private data, which then during GC caused a crash due to us touching the deleted NPObject. This patch fixes that.
Attachment #274244 - Flags: superreview?(jonas)
Attachment #274244 - Flags: review?(sharparrow1)
Attachment #274244 - Flags: superreview?(jonas) → superreview+
(Assignee)

Comment 43

11 years ago
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

Redirecting review request to someone who knows the plugin code.
Attachment #274244 - Flags: review?(sharparrow1) → review?(brendan)
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

Redirecting to jst.

/be
Attachment #274244 - Flags: review?(brendan) → review?(jst)
(Reporter)

Comment 45

11 years ago
(In reply to comment #44)
> (From update of attachment 274244 [details] [diff] [review])
> Redirecting to jst.

Since Johnny wrote the patch himself, I'm quite sure he'll approve it ;)

(In reply to comment #45)
> (In reply to comment #44)
> > (From update of attachment 274244 [details] [diff] [review] [details])
> > Redirecting to jst.
> 
> Since Johnny wrote the patch himself, I'm quite sure he'll approve it ;)

Suits me! ;-)

Ok, I missed that. Looking now.

/be
Comment on attachment 274244 [details] [diff] [review]
Fix for Martijn's crash.

>+  NppAndCx *nppcx = static_cast<NppAndCx *>(arg);

I'm not a C++ purist -- I don't even play one on TV -- but isn't reinterpret_cast more the "right thing" here?

>+  JSContext *cx = GetJSContext(npp);
>+  if (!cx) {
>+    nsCOMPtr<nsIThreadJSContextStack> stack =
>+      do_QueryInterface(sContextStack);
>+
>+    if (!stack) {
>+      stack = do_GetService("@mozilla.org/js/xpc/ContextStack;1");
>+      if (!stack) {
>+        // Nothing we can do if we get here.
>+        return;
>+      }
>+    }
>+
>+    stack->GetSafeJSContext(&cx);
>+    if (!cx) {
>+      // Nothing we can do if we get here.
>+    }
>+  }
>+
>   if (sNPObjWrappers.ops) {
>+    NppAndCx nppcx = { npp, cx };
>     PL_DHashTableEnumerate(&sNPObjWrappers,
>-                           NPObjWrapperPluginDestroyedCallback, npp);
>+                           NPObjWrapperPluginDestroyedCallback, &nppcx);

So is there a possibility that we'll have objects alive in the runtime with dangling private data pointers, but no ability to find a cx to use here? If so, and if the case does not involve running from the last GC when control flow reaches here, then I don't see how that bad case could arise. But it seems we could just always use the safe context, and shrink this code.

(Actually, we could also violate the JS API and use LOCKED_OBJ_SET_SLOT, which requires no cx at all.)

Thoughts? r=me in any event, but if you can shrink code please do.

/be
Attachment #274244 - Flags: review?(jst) → review+
(In reply to comment #47)
> (From update of attachment 274244 [details] [diff] [review])
> >+  NppAndCx *nppcx = static_cast<NppAndCx *>(arg);
> 
> I'm not a C++ purist -- I don't even play one on TV -- but isn't
> reinterpret_cast more the "right thing" here?

Sure.

> So is there a possibility that we'll have objects alive in the runtime with
> dangling private data pointers, but no ability to find a cx to use here? If so,

Well, yes, we can run into a situation where we're unable to find the cx associated with the plugin due to its document already having been torn down (especially with the other patch in this bug applied).

> and if the case does not involve running from the last GC when control flow
> reaches here, then I don't see how that bad case could arise. But it seems we
> could just always use the safe context, and shrink this code.

This code rarely (if ever) runs from GC. But yeah, I could always use the safe context here (but the code doesn't really shrink much if I do).

Updated patch to that effect coming up.
Created attachment 274700 [details] [diff] [review]
Updated fix for Martijn's crash.
Attachment #274244 - Attachment is obsolete: true
Attachment #274700 - Flags: superreview+
Attachment #274700 - Flags: review+
(Reporter)

Comment 50

11 years ago
The patches in this bug can land, not? Or should I do some more testing before landing?
(Assignee)

Comment 51

11 years ago
I landed "Updated fix for Martijn's crash"; it apparently caused Bug 391182.
(Reporter)

Comment 52

11 years ago
Ok, "Updated fix for Martijn's crash" landed at 2007-08-06 10:19.
Shouldn't https://bugzilla.mozilla.org/attachment.cgi?id=272834 ("Fixed patch") also land at some point?
(Assignee)

Comment 53

11 years ago
Well, I don't know how important the assertion in bug 391182 is; if it's important enough to get this fix backed out, we obviously don't want to land the other part.
Eli, I'm fixing bug 391182, it's a bogus assertion after the fix you landed that I didn't catch in my testing. Please go ahead and land the fix for this bug.
(Assignee)

Comment 55

11 years ago
Checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED
Group: security
Flags: in-testsuite?
(Reporter)

Comment 56

11 years ago
Thanks to all people who helped fixing this!

Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a8pre) Gecko/2007082111 Minefield/3.0a8pre
Status: RESOLVED → VERIFIED

Updated

11 years ago
Depends on: 401393
Depends on: 430219
(Reporter)

Updated

10 years ago
Depends on: 434593
You need to log in before you can comment on or make changes to this bug.