Closed Bug 1133351 Opened 9 years ago Closed 9 years ago

Flash hangs on right-click on arbitrary SWFs

Categories

(Core Graveyard :: Plug-ins, defect)

37 Branch
All
Windows 8.1
defect
Not set
normal

Tracking

(firefox38 wontfix, firefox38.0.5 wontfix, firefox39+ verified, firefox40 verified, firefox-esr31 unaffected, firefox-esr3840+ verified)

VERIFIED FIXED
mozilla40
Tracking Status
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 + verified
firefox40 --- verified
firefox-esr31 --- unaffected
firefox-esr38 40+ verified

People

(Reporter: till, Assigned: bugzilla)

References

Details

User Story

STR:

* Using Windows 8.1
* Open C:\Program Files\Common Files\microsoft shared\ink\tabtip.exe
* Open a page containing windowless Flash: e.g. http://benjamin.smedbergs.us/tests/ctptests/flash-solo-wless.html
* Right-click the Flash element

Actual:
Flash deadlocks for 45 seconds

Attachments

(3 files, 3 obsolete files)

I can produce immediate plugin hangs when clicking on all and any embedded SWFs by simply right-clicking on them in current beta:

https://crash-stats.mozilla.com/report/index/e3850bd6-ed92-4f47-b5fd-013792150215
https://crash-stats.mozilla.com/report/index/11d1c79a-aa08-4ba6-a9b8-746442150215
https://crash-stats.mozilla.com/report/index/e39ef536-a518-4de1-ba4f-7839e2150215


With beta 8 that had the sandbox experiment active, the hang only occurred on about 25% of all clicks:

https://crash-stats.mozilla.com/report/index/da206d69-ea07-4c75-b458-8e8942150215


In current Developer Edition, the hang also occurred, but only after 50 or so clicks:

https://crash-stats.mozilla.com/report/index/41ddf124-5570-40a0-9230-723272150215

I took a full memory dump of a hang in beta 9 with Process Explorer:
https://www.dropbox.com/s/3236hlwum6esdlu/flash-hang-dump-2015-02-15.zip?dl=0

Note that CPU usage of the plugin-container.exe process was closer to or at times above 1. I wasn't able to contact bsmedberg for now, so I'm still posting the bug.
Attached file about:support content
Does this happen on both windowed and windowless SWFs, or just one kind?

This is definitely accessibility-related: CARET::UpdateMSAAEditFieldState ends up calling into COM which is running a modal loop and blocking up the works. Do you know of utilities or laptop drivers that might be using the accessibility API?
Can you run a debug build and see if there is any interesting spew about "received 'nonqueued'" message and if so what the message ID is?
Flags: needinfo?(till)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> Does this happen on both windowed and windowless SWFs, or just one kind?

It only happens on windowless SWFs. All wmodes except for "window" seem to be affected.

> Can you run a debug build and see if there is any interesting spew about
> "received 'nonqueued'" message and if so what the message ID is?

I didn't get a message about receiving nonqueued messages, but a failed assertion about sending one:

[15404] ###!!! ASSERTION: Failed to prevent a nonqueued message from running!: '!mozilla::ipc::MessageChannel::IsPumpingMessages()', file c:\builds\moz2_slave\m-beta-w32-d-00000000000000000\build\widget\windows\nsWindow.cpp, line 4252

I found out that terminating the TabTip.exe screen keyboard process makes the hang go away. Restarting it makes the hang reappear. So yes, it's accessibility related all right.

One thing that might or might not be interesting is that, with TabTip.exe disabled, the following message is spewed 4 times for each opening of the context menu:

[3108] WARNING: An unhandled ISMEX_SEND message was received during spin loop! (7F): file c:\builds\moz2_slave\m-beta-w32-d-00000000000000000\build\widget\windows\nsWindow.cpp, line 4242
Flags: needinfo?(till)
Can you break at the assertion and tell me what message code is running, and maybe what window the message is being delivered to?

0x7F is WM_GETICON, which is interesting. We probably need to add that to nsWindow::IsAsyncResponseEvent. But I don't think that has much to do with the hang.
On a hunch I just tried running TabTip.exe in my Windows 8.1 instance in VMWare, and sure enough, I can reproduce the hang.

AFAICT, that means two things: debugging this doesn't require buying a specific machine, and it probably affects each and every Windows machine with a touch display, because TabTip.exe is running by default on those.

Benjamin, do you want me to continue analyzing this, or do you have a Windows installation you can do it on directly?
Flags: needinfo?(benjamin)
I'll take this.
Assignee: nobody → benjamin
Flags: needinfo?(benjamin)
User Story: (updated)
Depends on: 1138078
According to the data collected in bug 1138078, this bug accounts for 2-3% of all Flash hangs.
Comment on attachment 8583381 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling

This is the patch that bent and I came up with. We decided to avoid going with CoWaitForMultipleHandles and instead add COM-aware smarts to our existing code. To make this work we need to find the hidden COM window for our thread, then pump any messages that are destined for that window.

We're a bit unsure as to the lifetime of this hidden window so we have some assertions in there that will fail if the window is destroyed.

We also intentionally included an unbalanced CoInitialize call in an effort to prevent the hidden COM window from being destroyed.
Attachment #8583381 - Flags: review?(benjamin)
Missed some file removals when I imported the patch. Here's the revised one.
Attachment #8583381 - Attachment is obsolete: true
Attachment #8583381 - Flags: review?(benjamin)
Attachment #8583584 - Flags: review?(benjamin)
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

Review of attachment 8583584 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/WindowsMessageLoop.cpp
@@ +650,5 @@
>    // on startup.
>    if (!gUIThreadId) {
>      gUIThreadId = GetCurrentThreadId();
> +
> +    CoInitialize(nullptr);

Why do we need to trick COM into keeping itself alive longer than it normally would?

Unbalanced calls like this on the main thread are kinda gross.. at least add a comment warning people that we have unmatched coinit calls, and that this is a purposeful omission.
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

Review of attachment 8583584 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/WindowsMessageLoop.cpp
@@ +650,5 @@
>    // on startup.
>    if (!gUIThreadId) {
>      gUIThreadId = GetCurrentThreadId();
> +
> +    CoInitialize(nullptr);

Our fear was that if the COM counter ever goes to 0 then Windows might try to kill the hidden window, and we don't want that hidden window to ever go away to avoid the cost of having to find it again every time we send an IPC message.
(In reply to Ben Turner <unresponsive until 3/30> (use the needinfo flag!) from comment #13)
> Our fear was that if the COM counter ever goes to 0 then Windows might try
> to kill the hidden window

Of course none of this is documented so we're just being paranoid.
I guess that one option is that we could make the COM HWND a member of MessageChannel and include balanced CoInitialize/CoUninitialize calls there... my concern in that case would be mainly that CoInitialize can be an expensive call to make.
We already have OleInitialize (which implies CoInitialized) at plugin startup, so I don't think there's anything necessary to do here:

http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginProcessChild.cpp#98
This is needed in the parent process though.
Attachment #8583584 - Flags: review?(benjamin) → review+
This needs dedicated testing, both of the current tests as well as the old testcases that led to the original message filter, and around general usage of accessibility tools.
I'm going to have to back this out: our worst fears have come true by this assertion failure:
https://bugzilla.mozilla.org/show_bug.cgi?id=622272#c227

It looks like the HWND of the COM window changed :-(

https://hg.mozilla.org/integration/mozilla-inbound/rev/7bcbabfb63fe
https://hg.mozilla.org/mozilla-central/rev/0cc8abe4e2bb
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Sorry, missed seeing the backout, and then for extra clownshoes hg's internal merge made lovely decisions while merging the no-backout m-c back to the backout m-i. Should be properly backed out everywhere now, I hope.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla39 → ---
https://hg.mozilla.org/mozilla-central/rev/7bcbabfb63fe
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
(In reply to Phil Ringnalda (:philor) from comment #23)
> https://hg.mozilla.org/mozilla-central/rev/7bcbabfb63fe

Hm, this needs another backout now?
Flags: needinfo?(philringnalda)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #20)
> It looks like the HWND of the COM window changed :-(

I guess we could try subclassing the COM window and watch for WM_DESTROY?
Poor worn out bug is going to wind up with whiplash. No, 7bcbabfb63fe was the backout, but since e9769c80ee59 was too, mcMerge somehow decided that 7bcbabfb63fe must not have been a backout despite saying it was, and called it a fix.
Status: RESOLVED → REOPENED
Flags: needinfo?(philringnalda)
Resolution: FIXED → ---
Target Milestone: mozilla39 → ---
(In reply to Ben Turner <unresponsive until 3/30> (use the needinfo flag!) from comment #25)
> (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #20)
> > It looks like the HWND of the COM window changed :-(
> 
> I guess we could try subclassing the COM window and watch for WM_DESTROY?

I'd like to try monitoring the main thread for window creation and destruction using SetWinEventHook. That way we wouldn't need to subclass and we wouldn't need to mess around with CoInitialize/CoUninitialize. I'll throw together a patch and see if it works with message-only windows.
This patch applies on top of the first one. It uses SetWinEventHook to watch for window creation and destruction events and updates the COM HWND accordingly. We still need FindCOMWindow for the initial start up in case it has been created prior to setting the hook.
Attachment #8585651 - Flags: review?(bent.mozilla)
Fixed something I wasn't happy with.
Attachment #8585651 - Attachment is obsolete: true
Attachment #8585651 - Flags: review?(bent.mozilla)
Attachment #8585709 - Flags: review?(bent.mozilla)
Comment on attachment 8585709 [details] [diff] [review]
Part 2: Use SetWinEventHook to watch for the COM window (r2)

Review of attachment 8585709 [details] [diff] [review]:
-----------------------------------------------------------------

I can't believe this works!

Also, r+ :)

::: ipc/glue/WindowsMessageLoop.cpp
@@ +104,5 @@
>  HHOOK gDeferredGetMsgHook = nullptr;
>  HHOOK gDeferredCallWndProcHook = nullptr;
>  
>  DWORD gUIThreadId = 0;
>  HWND gCOMWindow = 0;

Nit: Rename to gOLEWindow maybe?

@@ +105,5 @@
>  HHOOK gDeferredCallWndProcHook = nullptr;
>  
>  DWORD gUIThreadId = 0;
>  HWND gCOMWindow = 0;
> +HWINEVENTHOOK gWinEventHook = nullptr;

Might want to comment that we intentionally never Unhook this, we just save the value to see if we need to install a hook or not.

@@ +106,5 @@
>  
>  DWORD gUIThreadId = 0;
>  HWND gCOMWindow = 0;
> +HWINEVENTHOOK gWinEventHook = nullptr;
> +const wchar_t OleWindowClassName[] = L"OleMainThreadWndClass";

Nit: s/OleWindowClassName/kOleWindowClassName/

@@ +116,5 @@
> +WinEventHook(HWINEVENTHOOK aWinEventHook, DWORD aEvent, HWND aHwnd,
> +             LONG aIdObject, LONG aIdChild, DWORD aEventThread,
> +             DWORD aMsEventTime)
> +{
> +  MOZ_ASSERT(aWinEventHook == gWinEventHook);

Also |MOZ_ASSERT(gUIThreadId == aEventThread)| too?

@@ +127,5 @@
> +      wchar_t classBuf[256] = {0};
> +      int result = ::GetClassNameW(aHwnd, classBuf,
> +                                   MOZ_ARRAY_LENGTH(classBuf));
> +      if (result != (MOZ_ARRAY_LENGTH(OleWindowClassName) - 1) ||
> +          wcsncmp(OleWindowClassName, classBuf, MOZ_ARRAY_LENGTH(classBuf))) {

The last param should probably be |result| I think, though it doesn't really matter I guess.

@@ +131,5 @@
> +          wcsncmp(OleWindowClassName, classBuf, MOZ_ARRAY_LENGTH(classBuf))) {
> +        // Not a class we're interested in
> +        return;
> +      }
> +      gCOMWindow = aHwnd;

Maybe |MOZ_ASSERT(FindCOMWindow() == aHwnd)| here?

@@ +136,5 @@
> +      break;
> +    }
> +    case EVENT_OBJECT_DESTROY: {
> +      if (aHwnd == gCOMWindow && aIdObject == OBJID_WINDOW &&
> +          aIdChild == CHILDID_SELF) {

Is the |aIdChild == CHILDID_SELF| condition something that could be asserted rather than checked in release builds? I hope these windows never get child windows...

@@ +712,5 @@
> +
> +  if (!gWinEventHook) {
> +    gWinEventHook = SetWinEventHook(EVENT_OBJECT_CREATE, EVENT_OBJECT_DESTROY,
> +                                    NULL, &WinEventHook, GetCurrentProcessId(),
> +                                    gUIThreadId, WINEVENT_OUTOFCONTEXT);

Hm, does it work if you pass 0 as the last param?
Attachment #8585709 - Flags: review?(bent.mozilla) → review+
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #15)
> I guess that one option is that we could make the COM HWND a member of
> MessageChannel and include balanced CoInitialize/CoUninitialize calls
> there... my concern in that case would be mainly that CoInitialize can be an
> expensive call to make.

If CoInit has been called before on the same thread I'm pretty sure it's basically a no-op. All it does is up a counter.

I wouldn't expect the com window to go away as long as we have an open ended CoInit. You could add one of these down in widget or similar (with a matching uninit).
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #30)
> Comment on attachment 8585709 [details] [diff] [review]
> Part 2: Use SetWinEventHook to watch for the COM window (r2)
> 
> Review of attachment 8585709 [details] [diff] [review]:
> -----------------------------------------------------------------
> 

All comments fixed as requested except where noted.

> 
> ::: ipc/glue/WindowsMessageLoop.cpp
> @@ +104,5 @@
> >  HHOOK gDeferredGetMsgHook = nullptr;
> >  HHOOK gDeferredCallWndProcHook = nullptr;
> >  
> >  DWORD gUIThreadId = 0;
> >  HWND gCOMWindow = 0;
> 
> Nit: Rename to gOLEWindow maybe?

I thought about this for a bit and decided to standardize on "COM" in the variable/function names. All of our modifications in this bug now use "COM" for consistency.

> @@ +712,5 @@
> > +
> > +  if (!gWinEventHook) {
> > +    gWinEventHook = SetWinEventHook(EVENT_OBJECT_CREATE, EVENT_OBJECT_DESTROY,
> > +                                    NULL, &WinEventHook, GetCurrentProcessId(),
> > +                                    gUIThreadId, WINEVENT_OUTOFCONTEXT);
> 
> Hm, does it work if you pass 0 as the last param?

I looked at the header file for those flags, and it turns out that WINEVENT_OUTOFCONTEXT is #defined as 0 anyway.
(In reply to Jim Mathies [:jimm] from comment #31)
> (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #15)
> > I guess that one option is that we could make the COM HWND a member of
> > MessageChannel and include balanced CoInitialize/CoUninitialize calls
> > there... my concern in that case would be mainly that CoInitialize can be an
> > expensive call to make.
> 
> If CoInit has been called before on the same thread I'm pretty sure it's
> basically a no-op. All it does is up a counter.

Our concern was that from the various invocations that we found in the tree, it was unclear that there would always be a previous CoInitialize.

> 
> I wouldn't expect the com window to go away as long as we have an open ended
> CoInit.

This turns out to be a false assumption: we added some assertions in the initial patch to catch this scenario and they fired during testing on inbound. Hence the backout and my follow-up patch.

> You could add one of these down in widget or similar (with a
> matching uninit).

My follow up patch eliminates the need to tangle with CoInitialize, so we should be OK now. Thanks for the suggestions.
Patch including review comments. Carrying forward r+.
Attachment #8585709 - Attachment is obsolete: true
Attachment #8586209 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/f4c346b5cf5a
https://hg.mozilla.org/mozilla-central/rev/829195e67537
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Depends on: 1151318
Flags: qe-verify+
Blocks: 1165189
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

Approval Request Comment
[Feature/regressing bug #]: NPAPI plugins on windows
[User impact if declined]: Plugin hangs particularly with accessibility or screen keyboards
[Describe test coverage new/current, TreeHerder]: Plugin code coverage, already landed on aurora and central.
[Risks and why]: Low, this has been baking on 40 for quite a while and our plugin hang situation is very good there.
[String/UUID change made/needed]: None
Attachment #8583584 - Flags: approval-mozilla-beta?
Comment on attachment 8586209 [details] [diff] [review]
Part 2: Use SetWinEventHook to watch for the COM window (r3)

Approval Request Comment
[Feature/regressing bug #]: NPAPI plugins on windows
[User impact if declined]: Plugin hangs particularly with accessibility or screen keyboards
[Describe test coverage new/current, TreeHerder]: Plugin code coverage, already landed on aurora and central.
[Risks and why]: Low, this has been baking on 40 for quite a while and our plugin hang situation is very good there.
[String/UUID change made/needed]: None
Attachment #8586209 - Flags: approval-mozilla-beta?
Marking 38+ as affected since this looks like it's been a regression since beta 37.
Comment on attachment 8586209 [details] [diff] [review]
Part 2: Use SetWinEventHook to watch for the COM window (r3)

Approved for uplift to beta. This would be great to fix for 39!
Attachment #8586209 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

Approved for uplift to beta.
Attachment #8583584 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Verified as fixed using:

FF 39.0b6
Build Id: 20150615125213
OS: Win 8.1 x86
Flags: qe-verify+
[Tracking Requested - why for this release]: hang see 1165189, it is worth to fix in esr38
Any plans for uplifting these patches to ESR channel?
Flags: needinfo?(aklotz)
Possibly. I'd like to let everything sit on 39 release for a bit before we do so.
Flags: needinfo?(aklotz)
Hi, it has been 3 weeks- any plans for ESR?
Flags: needinfo?(aklotz)
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

We can do that, but they'll need to land alongside bug 1151318. They come as a package deal. :-)

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
Reduction in plugin (especially Flash) hangs.
User impact if declined:
Higher rate of flash hangs.
Fix Landed on Version:
Released on 39.
Risk to taking this patch (and alternatives if risky): 
Pretty minimal at this point. These patches have ridden the trains and have been doing well on release.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(aklotz)
Attachment #8583584 - Flags: approval-mozilla-esr38?
Comment on attachment 8586209 [details] [diff] [review]
Part 2: Use SetWinEventHook to watch for the COM window (r3)

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
Reduction in plugin (especially Flash) hangs.
User impact if declined:
Higher rate of flash hangs.
Fix Landed on Version:
Released on 39.
Risk to taking this patch (and alternatives if risky): 
Pretty minimal at this point. These patches have ridden the trains and have been doing well on release.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8586209 - Flags: approval-mozilla-esr38?
Comment on attachment 8583584 [details] [diff] [review]
Make Windows IPC play nicely with COM STA marshaling (r2)

This is a pretty severe Flash regression. We'll take this in ESR 38.2.0 - the first release that will see automatic updates from ESR31. ESR38+

aklotz - Can you please write a draft release note for this change?
Flags: needinfo?(aklotz)
Attachment #8583584 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Attachment #8586209 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
"Fixed an issue where Firefox would hang after right-clicking Flash content on Windows 8."
Flags: needinfo?(aklotz)
Thanks Aaron. I've edited as

Fixed: Firefox may become unresponsive after right-clicking Flash content on Windows 8
Flags: qe-verify+
Reproduced with specified steps in Firefox 38.0.5, on Windows 7 x64. The issue no longer reproduced using Firefox ESR 38.2.0 on Windows 7 x64, Windows 8.1 x64, and Windows 10 x64. Right-clicking the SWF, while tabtip was open, worked just fine.
Status: RESOLVED → VERIFIED
Depends on: 1254969
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: