Closed Bug 570249 Opened 13 years ago Closed 13 years ago

if firefox hang detector fires after/at the same time as flash slow script dialog browser hangs

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(blocking2.0 final+, blocking1.9.2 .14+, status1.9.2 .14-fixed, status1.9.1 unaffected)

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+
blocking1.9.2 --- .14+
status1.9.2 --- .14-fixed
status1.9.1 --- unaffected

People

(Reporter: chofmann, Assigned: jimm)

References

()

Details

(Keywords: verified1.9.2, Whiteboard: [oopp-watchlist])

Attachments

(3 files, 2 obsolete files)

spin off of https://bugzilla.mozilla.org/show_bug.cgi?id=569493#c8

I set  dom.ipc.plugins.timeoutSecs=16
loaded the page at http://garlstedt.myminicity.es/
flash unresponsive dialog pops
then browser window pops back on top to inform that the plugin has crashed

at that point the browser becomes unresponsive (maybe due to a flash modal
dialog that can't be reached?)

the only solution at this point is to kill the browser with the task manager.

if we ever advance pref("dom.ipc.plugins.timeoutSecs", to something greater
than the flash slow script time out,  or the make a change on the flash side to
speed up the time out notification it will create a world of hurt for firefox
users.

if there is a way to communicate and coordinate these timeouts between browser and plugin that could avoid some serious hang problems.

the more simple solution would be to just add a comment in the source to make sure we don't ever advance pref("dom.ipc.plugins.timeoutSecs", to something greater.   adding some source comments on the flash side would also avoid the flash team also setting up the conditions for the problem, but we have less control over that.
than the flash slow script time out.
blocking1.9.2: --- → ?
blocking2.0: --- → ?
I also wonder:
- does the adobe reader and other plugins have something similar to the slow script detection and warning?
- and what the timeouts are on those plugins?
- are the warning dialogs that allow the user to control interrupting run-away scripts modal?
the test case above suggests a plugin app possibly popping a modal dialog that leaves firefox in hung state when the hang detector kicks in.

if that turns out to be the cause of the firefox hang, do we also need protection against user script plugin dialogs that might be modal as well?

http://www.acrobatusers.com/forums/aucbb/viewtopic.php?pid=68619 suggests it possible for an acrobat user script to pop a dialog and we might have problems with if that plugin script dialog pops, and then we get into a run away plugin situation.   this might be still another bug to file.
Depends on: 561075
(In reply to comment #2)
> the test case above suggests a plugin app possibly popping a modal dialog that
> leaves firefox in hung state when the hang detector kicks in.
> 
> if that turns out to be the cause of the firefox hang, do we also need
> protection against user script plugin dialogs that might be modal as well?
> 
> http://www.acrobatusers.com/forums/aucbb/viewtopic.php?pid=68619 suggests it
> possible for an acrobat user script to pop a dialog and we might have problems
> with if that plugin script dialog pops, and then we get into a run away plugin
> situation.   this might be still another bug to file.

I think the user script case is bug 561075
blocking1.9.2: ? → needed
Component: Plug-ins → IPC
QA Contact: plugins → ipc
No, that's the case where *Firefox* is presenting a dialog. We're talking about Flash user scripts here, which can present modal Flash dialogs (maybe).

In any case, this is plugins, not IPC!
Component: IPC → Plug-ins
QA Contact: ipc → plugins
Assignee: nobody → felipc
sounds like we are considering changing the hang time out value for a 3.6.6 release.

we need to make sure we test these cases with what ever value we set for what ever slowscript warnings that versions of flash, quicktime and silverlight set.
bsmedberg: is there anything more known here?  Did Adobe say anything?  I just tried to reproduce it with a test profile on the URL above, and it didn't happen (I was able to reload fine), but I may not be doing it properly.  What version of Flash were you using, chofmann?
I think it would be considered a bug if the hang detector kicked in after Flash presented the modal dialog (modal dialogs are supposed to kill the hang detector), although there might be a little race condition if the timeout for the two is roughly identical. In any case, I don't think this should hold up 3.6.6: none of the common issues from the support forums involve the Flash slow-script dialog.
I'm sure it would be considered a bug: do we know why the hang detector is triggering in this case?  Juan is seeing this with a 45s plugin timeout, after saying "yes, continue" to the Flash slow-script dialog.
the windows vista vm  I orginally tested this on is trashed so I can't retest on that.

I just tried this on the home pc with Build identifier: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4 ( .NET CLR 3.5.30729; .NET4.0C)

and it seems I periodically see the problem even on what we are shipping with 3.6.4.  

some times the slow script dialog seems to be firing before the hang detector kicks in even when  dom.ipc.plugins.timeoutSecs=10 on this HP laptop I'm testing on.
    File: NPSWF32.dll    Version: 10.0.32.18    Shockwave Flash 10.0 r32
is the flash version for comment 9
in the first 10 times running the test url on the laptop in comment 9 with 3.6.4 I had to kill the browser about 50% of the time.  In the second 10 times I only had to kill it twice,  so it seems pretty non-deterministic about when flash will fire the slow script dialog ahead of the hang detector.  Maybe server loading or server communications comes into play?
when the hang detector does fire before the slowscript warning here are examples of the report pairs I see.

Report ID
        	Date Submitted
bp-cb2e1525-65ef-4812-8c17-77a942100626	6/26/2010	10:39 AM
10519c6f-6fb8-4a4a-a1d8-0e0e471f68f4	6/26/2010	10:39 AM
54d481e9-d1ec-431c-9209-9ffd61d93039	6/26/2010	10:38 AM
bf204ad2-5edc-48ff-b86d-7ae68546ad97	6/26/2010	10:38 AM
65f6d09d-94a7-4e95-bb12-a0ec2f5bd1d0	6/26/2010	10:37 AM
9802411a-9916-45e1-98ea-a626a36f299a	6/26/2010	10:37 AM
4418e089-9260-492e-baad-25675561ef16	6/26/2010	10:37 AM
de7e2bbc-9665-4fea-a7df-4b3a6f6f232f	6/26/2010	10:37 AM
bp-c6ef7508-ae59-4823-8225-4d68e2100626	6/26/2010	10:37 AM
bp-b1cf5316-946a-478c-ba6a-daff22100626	6/26/2010	10:37 AM
137a00b3-dd3a-455c-8484-1f0155c0df37	6/26/2010	10:36 AM
8c1e9204-916c-435b-9f4c-921bb4a4b687	6/26/2010	10:36 AM
bp-052c6f92-2d96-4b18-b513-ff1832100626	6/26/2010	10:36 AM
bp-8fde22f5-b304-4667-9d16-a9ed82100626	6/26/2010	10:36 AM
1de1fbf4-3819-45df-865f-f47047d385d3	6/26/2010	10:36 AM
f23b68b6-407b-4b28-9b2c-0e605910fd92	6/26/2010	10:36 AM
bp-9ac9fede-09bb-4b33-94e3-b7c802100626	6/26/2010	10:35 AM
bp-bf1a64ed-38c6-4ec6-9b39-928112100626	6/26/2010	10:35 AM
bp-5dbd2b4f-4ed5-400c-9dca-560182100626	6/26/2010	10:35 AM
bp-d3b49ff8-0e81-45cc-aace-f9a282100626	6/26/2010	10:35 AM
bp-3753628c-71e3-45bf-8bd6-3b0312100626	6/26/2010	10:35 AM
bp-6c116f64-fc65-43db-82e7-9a62b2100626	6/26/2010	10:35 AM
11880c7f-115d-4837-84ca-cfd893d768c9	6/26/2010	10:32 AM
d81a9b67-4907-449a-a6fc-2d51e8e6b6eb	6/26/2010	10:32 AM
23e3019b-1185-4d03-ad3c-c7149af9a6ac	6/26/2010	10:17 AM
9cdda3ca-22f2-412e-8db5-89435378c8ef	6/26/2010	10:17 AM
bp-131e5d3a-1866-4bc0-861a-4de622100626	6/26/2010	10:15 AM
bp-ce73f234-e7fa-42ab-a03e-0e7c82100626	6/26/2010	10:15 AM
bp-25f66bdb-d081-4596-80f1-e168d2100626	6/26/2010	10:14 AM
bp-25f72456-cc07-4480-a239-008d52100626	6/26/2010	10:14 AM
92453fc6-9ba7-4cde-99d8-fab564076fbd	6/26/2010	10:12 AM
cbb300b2-e7b3-4649-a2da-6194207693a5	6/26/2010	10:12 AM
Please: the hang detector reports are irrelevant to this bug, because they are intentional.

I have this in recording, though I'm having trouble breaking at the relevant spot.
What happens when you disable the hang detector entirely, by setting the pref to -1? I'm getting a behavior where the Flash slow-script dialog shows up but is unresponsive and you can't do anything.
In this case, it looks like:

* Flash is chugging along doing it's own thing (in this case, processing asynchronously-sent stream data). Note that it is *not* currently responsing to any RPC call from the browser, even though one is pending. During this processing, it pops up a modal dialog using MessageBox (the flash slow-script dialog).
* Firefox has sent an RPC call to the plugin, which it has not yet processed (in this case, a paint request).

The normal "detect a nested event loop" logic is not in play because that is only used when we are inside an RPC call, not when the RPC call is pending. We normally assume that nested loops not in the RPC frame will process all pending events, so the pending RPC call would be processed within the loop. Apparently Windows filters out messages so this doesn't happen.

So far, so good: I'd be fine with killing the plugin in this case; but it doesn't explain why the message box doesn't receive input. I think this is probably because it is parented in a strange way: jmathies, don't we have a special HWND we use for windowless drawing which isn't hooked up to the normal widget tree? Is it possible that the message box is being parented to this window and therefore isn't getting focus?
Spy++ window tree:

	Window 000309A6 "Adobe Flash Player 10" #32770 (Dialog)
		Window 0007090E "&Yes" Button
		Window 000B09F0 "&No" Button
		Window 002E0362 "" Static
		Window 002A024E "A script in this movie is causing Adobe Flash Player 10 to run slowly.  If it continues to run, your computer may become unresponsive.  Do you want to abort the script?" Static


	Window 000209AE "garlstedt - MyMiniCity - Minefield" MozillaUIWindowClass
		Window 000E0634 "" MozillaContentWindowClass
			Window 000608FE "" MozillaWindowClass
				Window 00050A42 "" MozillaWindowClass
					Window 000908A8 "" GeckoPluginWindow
				Window 00050A44 "" MozillaWindowClass
					Window 00050A3E "" GeckoPluginWindow
				Window 00090980 "" Static

The MessageBox call hWnd being passed in is 000908A8, but as you can see it's not actually parenting to that window.
Actually, and even more interesting: in the Spy properties, the parent and owner of 000309A6 (the modal dialog) are both listed as 000209AE, which is a window for another process. Seems quite likely that would screw things up!
Attachment #454287 - Attachment is patch: false
Hey Benjamin: any sense continuing to hold Firefox 3.6.6 for this?
I did not think that you were holding it for this...
(In reply to comment #14)
I can't repro this behavior.  The slow script dialog pops up and the browser is intermittently unresponsive but you can still click the dialog to kill it and you can still kill plugin-container entirely.
Also, as an aside, when you set the two close to each other, if the Flash dialog fires and then the hang protection kicks in, you get really stuck (killing plugin-container.exe doesn't help acutally since it's already kicked in).  However in 3.6.6, we're setting it to 45 seconds, giving users more time to click one of the two buttons in the flash slow script dialog.  Also we seem to have killed the site.
Whiteboard: [oopp-watchlist]
Sorry about the duplicate, wasn't sure if this was exactly the same bug. This might help people reproduce...

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6)
Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6)
Gecko/20100625 Firefox/3.6.6 (.NET CLR 3.5.30729)

Since plugin crash detection was added to Firefox I have been experiencing the
following issue when using the debug version of Flash (required for my job):

If Flash pops up a dialog (i.e. a warning or a notice about a crash) and I take
too long to dismiss it (say because I am reading the notice), this makes
Firefox think that the plugin has crashed. At that point the page switches to a
sad face LEGO piece, but the browser itself becomes completely unresponsive to
clicks or keyboard events. It seems to think there is still a modal dialog box
in front of it, but it is in the foreground. At that point the only recourse is
to kill Firefox and start it again.

I don't mind so much that the crash detection is misfiring, but I would really
like if it didn't take the whole browser down with it.

Reproducible: Always

Steps to Reproduce:
1. Install debug version of Flash player plugin
2. Visit a website that triggers an alert dialog from Flash
3. Wait until Firefox decides that the plugin has crashed
Actual Results:  
Firefox hangs and will not respond to clicks or keyboard activity.

Expected Results:  
Firefox works as expected and it's possible to close or reload the "crashed"
tab.
Jim, this is the one that I'm worried about. See also bug 561075, which has similar symptoms, but Firefox is showing the dialog, not Flash.
Assignee: felipc → jmathies
Looks like spin loop didn't trigger in the parent - it's stuck in WaitForNotify while the child is in a message box procedure.
Which would make this very similar to bug 555699 (java signature verifier dialog).
(In reply to comment #26)
> Which would make this very similar to bug 555699 (java signature verifier
> dialog).

(In reply to comment #26)
> Which would make this very similar to bug 555699 (java signature verifier
> dialog).

Slight difference though - here the browser creates the window while the child waits for an rpc response. In the java security warning case java is creating the window, and the browser is waiting on an rpc response.
Also, according to felipe's testing(In reply to comment #27)
> (In reply to comment #26)
> > Which would make this very similar to bug 555699 (java signature verifier
> > dialog).
> 
> (In reply to comment #26)
> > Which would make this very similar to bug 555699 (java signature verifier
> > dialog).
> 
> Slight difference though - here the browser creates the window while the child
> waits for an rpc response. In the java security warning case java is creating
> the window, and the browser is waiting on an rpc response.

Oh woops! I was thinking this was bug 561075. This is the same, but in this case it's a message box, so I'm sure our hook picks that up. This bug the bigger mystery.
blocking2.0: ? → final+
Attached patch patch v.1 (obsolete) — Splinter Review
This addresses the browser and dialog hang.  Although it seems flash tears some stuff down when you select "yes" on the dialog, which causes the container to crash. Probably a separate bug / patch, but I'll see if I can track it down. It has something to do with a stream that's been destroyed getting a delayed task message.
Attachment #459122 - Flags: review?(benjamin)
Also, oddly enough, clicking "yes" will reliably reproduce bug 545892.
Comment on attachment 459122 [details] [diff] [review]
patch v.1

As noted on IRC, I don't think this is the right solution, for several reasons:

1) It only catches the case where the Flash hang happens during stream processing. If the Flash hang happens while it's processing an OS event (for a windowed plugin), we won't catch it. We probably need to integrate the nested event loop detection directly into the chromium event loop.

2) EnteredCall has a fairly strict invariant that there is actually an RPC call pending on the parent. In the current patch there is no guarantee that there would actually be an RPC call pending (the common case is that there isn't a call pending). I'm pretty sure this should/will hit the assertion at http://mxr.mozilla.org/mozilla-central/source/ipc/glue/WindowsMessageLoop.cpp#610 when there is no RPC call pending.
Attachment #459122 - Flags: review?(benjamin) → review-
Attached patch patch v.2 (obsolete) — Splinter Review
Kinda hackish, but it works. I put this in the host post a complete shutdown of the plugin module, which may be overkill.  But I figured it'd probably be better to do this after every instance is destroyed.
Attachment #460617 - Flags: review?(benjamin)
There might be a slightly better version of this if I can retrieve a browser hwnd or widget interface of the window the frozen plugin instance sits from within PluginModuleParent. Will do a little more research looking for that.
Comment on attachment 460617 [details] [diff] [review]
patch v.2

jimm says this is too broad a brush on irc
Attachment #460617 - Flags: review?(benjamin) → review-
Attachment #459122 - Attachment is obsolete: true
Attachment #460617 - Attachment is obsolete: true
Attached patch patchSplinter Review
This is safer in that the code check to make sure the disabled top level window isn't displaying any dialogs. In which case it's normal for the window to be in that state.
Attachment #480164 - Flags: review?(benjamin)
Attachment #480164 - Flags: review?(benjamin) → review+
This is for all plugins right? 

I think just had something that sounds the same, but with Silverlight - Silverlight shows a dialog, Firefox thinks the plugin has hung, kills the plugin process, then stops accepting mouse input. (http://crash-stats.mozilla.com/report/index/b7774ab7-629f-4088-9169-6a9db2101105 )

Bug 555699 is different because that's related to the fact that java uses a third process to display dialogs.
(In reply to comment #37)
> This is for all plugins right? 
> 

yes.

http://hg.mozilla.org/mozilla-central/rev/93b2997b8785
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
We'd like to take this on branch. Can we get a branch patch? Do you think this is appropriate to take?
blocking1.9.2: needed → .14+
(In reply to comment #39)
> We'd like to take this on branch. Can we get a branch patch? Do you think this
> is appropriate to take?

Should be pretty innocuous. I'll work up a branch patch.
I've confirmed this fixes the issue on 192.
Attachment #504781 - Flags: approval1.9.2.14?
Comment on attachment 504781 [details] [diff] [review]
192 flash dialog hang patch

a=LegNeato for 1.9.2.14
Attachment #504781 - Flags: approval1.9.2.14? → approval1.9.2.14+
Verified fixed for 1.9.2 using steps in comment 0. Verified problem with 1.9.2.13 on XP and fix in nightly build (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.14pre) Gecko/20110120 Namoroka/3.6.14pre ( .NET CLR 3.5.30729)).
Keywords: verified1.9.2
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.