Windows IPC message loop needs more frequent message pumping

RESOLVED FIXED in Firefox 41

Status

()

Core
IPC
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: aklotz, Assigned: aklotz)

Tracking

unspecified
mozilla42
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41+ verified, firefox42 fixed, relnote-firefox 41+)

Details

Attachments

(1 attachment, 3 obsolete attachments)

Right now our Windows IPC loop only pumps messages when the loop is waiting for an incoming message. This is insufficient in certain circumstances. For example, if there are multiple async messages in the pending queue, we won't pump until the queue has been drained. We should also be pumping message in between consecutive event dispatches.
Created attachment 8637520 [details] [diff] [review]
Patch

https://treeherder.mozilla.org/#/jobs?repo=try&revision=775633997ffa
IPDL test suite looks good too.

Right now I've added some RAII wrappers around neutering. I think in the long term this code could use some cleanup and refactoring but I want to minimize its footprint right now.

Too bad bent isn't with us anymore to review this. Jim, are you comfortable reviewing Window neutering code?
Attachment #8637520 - Flags: review?(jmathies)
(Also I think this might help with some of your mysterious e10s hangs in user32 APIs)
Created attachment 8637522 [details] [diff] [review]
Patch

Whoops, uploaded the wrong file.
Attachment #8637520 - Attachment is obsolete: true
Attachment #8637520 - Flags: review?(jmathies)
Attachment #8637522 - Flags: review?(jmathies)

Comment 4

2 years ago
Comment on attachment 8637522 [details] [diff] [review]
Patch

Review of attachment 8637522 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessageChannel.cpp
@@ +1034,5 @@
>              return false;
>          }
>  
> +#ifdef OS_WIN
> +        neuteredRgn.PumpOnce();

comment me

::: ipc/glue/Neutering.h
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef mozilla_ipc_Neutering_h
> +#define mozilla_ipc_Neutering_h

new file / new class(s), lets add commenting in here for things like public methods, and the classes. What's a NeuteredWindowRegion? What the heck is a DeneuteredWindowRegion? :)

::: ipc/glue/WindowsMessageLoop.cpp
@@ +1115,2 @@
>        }
> +      DeneuteredWindowRegion deneuteredRgn;

This one is confusing to me. If gWindowHook is set then neutering is active and mReneuter in DeneuteredWindowRegion is true.

When we create this object we turn neutering off, and when we destroy it we will turn neutering behavior back on?
(In reply to Jim Mathies [:jimm] from comment #4)
> Comment on attachment 8637522 [details] [diff] [review]
> Patch
> 
> Review of attachment 8637522 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/MessageChannel.cpp
> @@ +1034,5 @@
> >              return false;
> >          }
> >  
> > +#ifdef OS_WIN
> > +        neuteredRgn.PumpOnce();
> 
> comment me

Will do

> 
> ::: ipc/glue/Neutering.h
> @@ +4,5 @@
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +#ifndef mozilla_ipc_Neutering_h
> > +#define mozilla_ipc_Neutering_h
> 
> new file / new class(s), lets add commenting in here for things like public
> methods, and the classes. What's a NeuteredWindowRegion? What the heck is a
> DeneuteredWindowRegion? :)

OK

> 
> ::: ipc/glue/WindowsMessageLoop.cpp
> @@ +1115,2 @@
> >        }
> > +      DeneuteredWindowRegion deneuteredRgn;
> 
> This one is confusing to me. If gWindowHook is set then neutering is active
> and mReneuter in DeneuteredWindowRegion is true.
> 
> When we create this object we turn neutering off, and when we destroy it we
> will turn neutering behavior back on?


Yeah, DeneuteredWindowRegion is supposed to be analogous to MutexAutoUnlock; we're in a region that already has neutering and we temporarily need to suspend it for the remaining duration of the block.

Comment 6

2 years ago
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #2)
> (Also I think this might help with some of your mysterious e10s hangs in
> user32 APIs)

why is that?

Comment 7

2 years ago
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #5)
> Yeah, DeneuteredWindowRegion is supposed to be analogous to MutexAutoUnlock;
> we're in a region that already has neutering and we temporarily need to
> suspend it for the remaining duration of the block.

Ok so we set up the neutering much sooner now, at the top of MessageChannel::Send. I'm not sure what the side effects of that will be. For example there's a send on mLink that gets wrapped by this now. We should be on the lookout for regressions and not be aggressive in uplifting this work.

Comment 8

2 years ago
Comment on attachment 8637522 [details] [diff] [review]
Patch

Review of attachment 8637522 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/WindowsMessageLoop.cpp
@@ +916,5 @@
> +  if (gCOMWindow && ::PeekMessageW(&msg, gCOMWindow, 0, 0, PM_REMOVE)) {
> +    ::TranslateMessage(&msg);
> +    ::DispatchMessageW(&msg);
> +  }
> +  ::PeekMessageW(&msg, nullptr, 0, 0, PM_NOREMOVE);

What's this PeekMessage call doing? Expunging non-queued messages?
Attachment #8637522 - Flags: review?(jmathies) → review+
(In reply to Jim Mathies [:jimm] from comment #6)
> (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #2)
> > (Also I think this might help with some of your mysterious e10s hangs in
> > user32 APIs)
> 
> why is that?

If one side of a connection has Recv* or Answer* handlers that call user32 APIs, and the other side's MessageChannel has multiple IPC messages pending, messages won't be pumped without this patch.

(In reply to Jim Mathies [:jimm] from comment #8)
> Comment on attachment 8637522 [details] [diff] [review]
> Patch
> 
> Review of attachment 8637522 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/WindowsMessageLoop.cpp
> @@ +916,5 @@
> > +  if (gCOMWindow && ::PeekMessageW(&msg, gCOMWindow, 0, 0, PM_REMOVE)) {
> > +    ::TranslateMessage(&msg);
> > +    ::DispatchMessageW(&msg);
> > +  }
> > +  ::PeekMessageW(&msg, nullptr, 0, 0, PM_NOREMOVE);
> 
> What's this PeekMessage call doing? Expunging non-queued messages?

Yes. I've added comments for that.
Created attachment 8639391 [details] [diff] [review]
Patch (r2)

Added comments. Carrying forward r+.
Attachment #8637522 - Attachment is obsolete: true
Attachment #8639391 - Flags: review+

Comment 11

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2d0e28fe539
https://hg.mozilla.org/mozilla-central/rev/b2d0e28fe539
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Backed out for Windows test_plugin_focus.html timeouts.
https://treeherder.mozilla.org/logviewer.html#?job_id=12159460&repo=mozilla-inbound

https://hg.mozilla.org/mozilla-central/rev/c0a3d6f72a63
Status: RESOLVED → REOPENED
status-firefox42: fixed → ---
Resolution: FIXED → ---
Target Milestone: mozilla42 → ---
Created attachment 8639977 [details] [diff] [review]
Patch (r3)

Whoops, something made it into patch r2 that wasn't supposed to be there. My bad.
Attachment #8639391 - Attachment is obsolete: true
Attachment #8639977 - Flags: review+
Keywords: checkin-needed

Comment 15

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/f290fbb44af6
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f290fbb44af6
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42

Updated

2 years ago
Depends on: 1189709
This appears to have caused regression 1189709
Comment on attachment 8639977 [details] [diff] [review]
Patch (r3)

Approval Request Comment
[Feature/regressing bug #]: NPAPI plugins
[User impact if declined]: Hangs, see bug 1207766 and bug 1209235
[Describe test coverage new/current, TreeHerder]: Plugin test suite
[Risks and why]: Low at this point, this has ridden the trains on 42 to beta. Note that the patch in bug 1189709 must be landed alongside this one.
[String/UUID change made/needed]: None
Attachment #8639977 - Flags: approval-mozilla-release?

Updated

2 years ago
Depends on: 1209464
Comment on attachment 8639977 [details] [diff] [review]
Patch (r3)

Approving for uplift to moz-release to be included in 41.0.1 build2. This patch is needed for fix the recent flash hangs (bug 1207766).
Attachment #8639977 - Flags: approval-mozilla-release? → approval-mozilla-release+

Updated

2 years ago
status-firefox41: --- → affected
tracking-firefox41: --- → +
https://hg.mozilla.org/releases/mozilla-release/rev/e758e34f3790
status-firefox41: affected → fixed
During our Firefox 41.0.1 build 2 smoke testing we ran a number of plugin tests that included Flash across platforms (Windows 10 32bit, Windows 7 64bit, Mac OS X 10.10.5 and Ubuntu 14.04 32-bit) and we can say that we did not encounter any hangs or crashes. Marking this verified fixed on FF 41.
status-firefox41: fixed → verified
Added to the release notes with "Fix potential hangs with Flash plugins" as wording.
relnote-firefox: --- → 41+

Comment 23

2 years ago
After some of our Windows computers updated to FireFox 40.0.1 we noticed that FireFox has started to hang when accessing BigBlueButton.  We are able to reproduce the hang with the following configuration

Windows 10
FireFox 40.0.1
Flash 19.0.0.185

Steps to reproduce on demo.bigbluebutton.org (our public demo server)

1.  Open http://demo.bigbluebutton.org/demo/demo2.jsp
2.  Enter any name 
3.  Click 'Join'
4.  Wait until the BigBlueButton client loads.
5.  Click 'Listen Only'

** FireFox may hang at this point **

6.  Refresh page go to Step 4

You might have to repeat steps (4)-(6) a few times to get FireFox to hang.  I've uploaded a video to YouTube that demonstrates

  https://youtu.be/NzryyxQyvHo

Most of the times when it hangs, FireFox locks up for good and I have to close it from the task bar.  This bug is bad for us as it potentially prevents a large number of students and teachers from using BigBlueButton with the latest version of FireFox.

Regards,... Fred


P.S.  BigBlueButton is an open source web conferencing system for on-line learning.  We are one of the Mozilla WebFWD graduates.  See: https://webfwd.org/portfolio/.

Comment 24

2 years ago
The correct configuration to comment 23 is

Windows 10
FireFox 41.0.1
Flash 19.0.0.185

We just had our computers update to FireFox 41.0.1.

Comment 25

2 years ago
A bit more information.

We downgraded to FireFox 41.0.0 and tested on same computer that was hanging with 41.0.1.  After 20 minutes of aggressive testing I couldn't get FireFox to hang when joining a BigBlueButton session.

Upgraded again to 41.0.1 and it hangs almost immediately.

BigBlueButton is open source and localized into 25+ languages.  The project has been under heavy development for over seven years.  We have a very large community (tens of thousands) of teachers and students world-wide that use BigBlueButton every day for on-line classes.

This is going to be painful for us as many of them use FireFox.  FireFox 41.0.1 hangs -- hard.

Is there anyway to back out this change or quickly triage? 

Let us know if there is anything we can do to help.

Regards,... Fred
Let's take this discussion to bug 1210665.
You need to log in before you can comment on or make changes to this bug.