Closed Bug 420886 Opened 16 years ago Closed 16 years ago

Crash when reloading page with RealPlayer plugin that has modal child window

Categories

(Core Graveyard :: Plug-ins, defect, P1)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9beta5

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

()

Details

(Keywords: crash, regression, testcase)

Attachments

(2 files, 2 obsolete files)

STEPS TO REPRODUCE
1. load URL (testcase 2 from bug 420148)
2. When clicking on the 'i' in the RealPlayer plugin, Mozilla crashes within
   100ms after that action.

Firefox trunk on Windows XP (probably on other platforms as well)

Regression from bug 389931 simply because before that we didn't process
any thread events from nested native event loops so the reload didn't
occur until the modal window is dismissed.
Attached file stack
A breakpoint at PLUG_DeletePluginNativeWindow results in this stack,
which looks like the plugin instance being deleted from the plugin's
native event loop.  It crashes when it unwinds to pngu3267.dll!60f7bd18().
This bug looks like the same problem as in bug 410946.
Still occurs in current Windows nightly 2008030405.
Flags: blocking1.9?
(In reply to comment #2)
> This bug looks like the same problem as in bug 410946.

Yes, more precisely, it would be bug 420884: hang there, crash here...
Marking dependent until we know if it's a duplicate.
Depends on: 420884
This is unrelated to bug 420884, that's a trivial hang due to an infinite loop in recently landed code, this is a crash due to a native window going away while a modal dialog pareted at that window is still alive.

Marking this a P1 blocker, but I don't really see how this is fixable in our code w/o backing out bug 389931. If there's a way for us to stick some kind of guard object on the stack that would stay on the stack while the modal dialog is open, we could maybe delay the destruction of the plugin window (note that this is unrelated to the guard objects introduced in bug 410946 which delay teardown of the plugin, not its window), i.e. in PluginWndProc(), but looking at the stack in this bug it doesn't seem like that's the case here :(
Assignee: nobody → mats.palmgren
No longer depends on: 420884
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
(In reply to comment #4)
> This is unrelated to bug 420884

My bad: I acted too quickly :-(

> I don't really see how this is fixable in our code w/o backing out bug 389931.

(Backout which would get many "ThreadManager" regressions back ... and require backing out dependent patches since then :-<)
Yeah, this doesn't look pretty no matter how you look at it.
Before the patch in bug 389931 went in, the modal dialog of RealPlayer stopped executing stuff in Firefox just like an alert dialog. Isn't there a way to make the RealPlayer modal dialog to act like an alert dialog?
alert's don't stop network activity, i.e. location="..."; alert(); will load "..." while the alert() is open.
One thing we *might* be able to hack up here is to make nsThread expose a message processing nesting depth or somesuch, and make nsStopPluginRunnable::Run() re-dispatch itself if the depth is greater than 1, or something. We might peg the CPU if we hit this case, but we should stop that once the modal window is dismissed. Then again, if might never be dismissed since we hide the plugin native window when we do the reparenting of it prior to doing the delayed destroying of the widget, which we need to do to prevent the plugin from paining elsewhere, likely outside the firefox window (due to the reparenting, which changes its coordinate base etc etc).
if venkman's around, a depth can easily be >1 assuming things still work as they have in the past...
How are we doing here?  Making progress, Mats?
(Just in case: Did bug 420148 improve/fix this too ?)
I have something in my tree that appears to fix this...  I bumped
into some other bugs while working on it so I'm going to wait
for bug 422926 to see what that brings...
Depends on: 422926
Target Milestone: --- → mozilla1.9beta5
(In reply to comment #13)
> I have something in my tree that appears to fix this...  I bumped
> into some other bugs while working on it so I'm going to wait
> for bug 422926 to see what that brings...
> 

Can we get that posted here?
Attached patch wip8b (obsolete) — Splinter Review
This patch is the starting point, it roughly implements what Johnny
suggested above.  I track the nesting level in the appshell.
Also, as the testcase for this bug shows, the reload is processed
from the nested native event loop and the new plugin startup is an
async event in itself (nsAsyncInstantiateEvent), so one needs to allow
the target nesting level to "sink" as well.  The patch fixes the crash
for the given STR, however it doesn't fix these additional steps:
(1 and 2 as above)
3. visit any other page (this puts the plugin page in bfcache)
4. Quit  ==> crash stack: attachment 311013 [details]
The damage calculations and invalidation stuff seems unnecessary
since pres shell is frozen at this point (filed bug 424395), but
fixing that only moves the crash slightly later...
Attached patch wip8g (obsolete) — Splinter Review
So, I fixed the remaining crash by allowing calls through StopPlugin()
to take the delay path as well, with the exception of the
"doCallSetWindowAfterDestroy" case which ignores the requested delay
so we need to avoid the disown/reparent stuff on the widget...

Test builds are available here:
https://build.mozilla.org/tryserver-builds/2008-03-21_08:11-mpalmgren@mozilla.com-420886_wip8g/
Attachment #311017 - Flags: superreview?(jst)
Attachment #311017 - Flags: review?(jst)
Mats, have you tested that this doesn't regress bug 422926 (and thus also bug 423260 and bug 423180)? At least run this through the first one, seems like this essentially reverts the change (while also adding the timer and nesting checking code etc) that fixed that bug, IIRC.
(In reply to comment #17)
> Mats, have you tested that this doesn't regress bug 422926 (and thus
> also bug 423260 and bug 423180)?

Yes, I tested those, on Windows, Linux and OSX.  I also tested most
other URLs that caused problems since bug 389931, plus a bunch of Java
applets etc, on all three platforms.  Everything works fine AFAICT.

> At least run this through the first one, seems like
> this essentially reverts the change that fixed that bug, IIRC.

Let's analyze what the patch in bug 422926 does:
https://bugzilla.mozilla.org/attachment.cgi?id=310374
The first few hunks are a NOP.  The last hunk is a NOP if 'aDelayedStop'
is false.  There are no other calls to DoStopPlugin() other than these
two.  So, the only change is for "StopPluginInternal(PR_TRUE)" and
there is only one such caller: nsObjectFrame::Destroy().

My patch does not change that path (other than the timer thing).
It makes nsObjectFrame::StopPlugin() take the same path (with the one
exception noted above).  FWIW, the remaining last call site to 
StopPluginInternal() is PrepareInstanceOwner() which calls it with
an unconditional PR_FALSE and thus should not be affected by this patch.
Hmm, I see now that bug 422926 affects "StopPluginInternal(PR_FALSE)" too.
It seems the call that was mostly affected by that change is the one
from PrepareInstanceOwner(), which this patch does not change.
Comment on attachment 311017 [details] [diff] [review]
wip8g

+  virtual ~nsStopPluginRunnable()
+  {
+   if (mTimer)
+     mTimer->Cancel();
   }

We can't get here if mTimer non-null, or if we do, we failed to initialize it so it never fired. Assert here rather than check and cancel.

- In nsStopPluginRunnable::Run():

+      nsresult rv;
+      if (!mTimer) {
+        mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);
+        NS_ENSURE_SUCCESS(rv, rv);
+        this->AddRef();
+      }
+      rv = mTimer->InitWithCallback(this, 3000, nsITimer::TYPE_ONE_SHOT);
+      if (NS_SUCCEEDED(rv)) {
+        return rv;
+      }
+      NS_ERROR("Failed to setup a timer to stop the plugin later (at a safe "
+               "time). Stopping the plugin now, this might crash.");
+    }

Seems like we should do this if we fail to create the timer too? And no need to do manual reference counting of |this| here, the timer code will hold |this| alive as long as there's a pending timer here. You would however need to hold a strong reference to |this| if you're re-initializing an existing timer as nsTimerImpl::InitWithCallback() releases the callback before it addrefs the new one, but that way you wouldn't need any manual reference counting at all.

+  if (mTimer) {
+    mTimer = nsnull;
+    this->Release();
+  }

And then this could just unconditionally assign mTimer to null.

- In nsObjectFrame::StopPlugin()

+#ifdef XP_WIN
+  PRBool delayedStop = PR_TRUE;
+  nsCOMPtr<nsIPluginInstance> inst;
+  if (mInstanceOwner)
+    mInstanceOwner->GetInstance(*getter_AddRefs(inst));
+  if (inst) {
+    PRBool doCache = PR_TRUE;
+    inst->GetValue(nsPluginInstanceVariable_DoCacheBool, (void *)&doCache);
+    if (!doCache) {
+      PRBool doCallSetWindowAfterDestroy = PR_FALSE;
+      inst->GetValue(nsPluginInstanceVariable_CallSetWindowAfterDestroyBool, 
+                     (void *)&doCallSetWindowAfterDestroy);
+      if (doCallSetWindowAfterDestroy) {
+        // Because DoStopPlugin ignores its 'aDelayedStop' arg in this case.
+        delayedStop = PR_FALSE;
+      }
+    }

Please add a comment here to clean this up once there are no other plugin instances than ns4xPluginInstance, per the XXXjst comment around the similar code in DoStopPlugin.

r+sr=jst with that.
Attachment #311017 - Flags: superreview?(jst)
Attachment #311017 - Flags: superreview+
Attachment #311017 - Flags: review?(jst)
Attachment #311017 - Flags: review+
Comment on attachment 311017 [details] [diff] [review]
wip8g

a1.9b5=beltzner with jst's comments addressed
Attachment #311017 - Flags: approval1.9b5+
We're holding beta for this fix: any ETA on when you'll be able to get to it, Mats?
Oh, I have a fix ready for checkin but Tinderboxes are orange so I'm not
allowed to checkin according to the rules for qm-win2k3-01...
Are you saying I can ignore that?
(In reply to comment #23)
> Oh, I have a fix ready for checkin but Tinderboxes are orange so I'm not
> allowed to checkin according to the rules for qm-win2k3-01...
> Are you saying I can ignore that?

Yes, because I don't believe that orange is real because I backed out the patch that caused the reftests to fail. Please check-in.
Fixed nits according to comment 20, except the request to assert on !mTimer
in the destructor - it fired when the object was released from
TimerThread::Shutdown(), so I just skipped the destructor...
Attachment #311016 - Attachment is obsolete: true
Attachment #311017 - Attachment is obsolete: true
mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp 	1.10
mozilla/widget/src/xpwidgets/nsBaseAppShell.h 	1.10
mozilla/widget/public/nsIAppShell.idl 	1.13
mozilla/layout/generic/nsObjectFrame.cpp 	1.645 

-> FIXED
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Blocks: 425289
Depends on: 493803
Depends on: 616787
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: