Closed Bug 1060738 Opened 10 years ago Closed 10 years ago

WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Categories

(Core :: WebRTC: Audio/Video, defect)

x86
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
firefox33 + fixed
firefox34 + fixed
firefox35 --- fixed

People

(Reporter: jesup, Assigned: jimm)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 20 obsolete files)

1.90 KB, text/plain
Details
11.46 KB, patch
jesup
: review+
Details | Diff | Splinter Review
14.28 KB, patch
Details | Diff | Splinter Review
14.93 KB, patch
TimAbraldes
: review+
Details | Diff | Splinter Review
37.37 KB, patch
Details | Diff | Splinter Review
One Windows 8.1 machine (mine) was seen to be unable to window/screen share, and after trying unable to access the camera.

Investigation pointed to WindowsEnumerationHandler(), which appeared to be deadlocking and blocking the MediaManager thread.

jimm noted that we shouldn't be doing operations like this (using an HWND) on a non-ui/non-windows-messageloop thread.  Primary option here is to proxy to the ui (Main) thread.

Note: the Capture() thread is a high-priority native thread, but doesn't use the windows message loop that I know of.  We can't do this reasonably from Main, so we'll likely want to find a way to make this Capture thread be windows event-loop-driven - if in fact the operations it's doing are unsafe.

This has not been seen thus far on other machines beyond my one.  I had a number of other people try it on 8.1 with no problems.
Attached file stack
I did a little testing to see if some of these calls were safe - 

http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/desktop_capture/window_capturer_win.cc#178

turns out none of them are, every call causes Windows to create a message queue for background threads. As soon as you get a message queue, you risk deadlocks at random times since the message queue will never get processed. You can use the win32 call IsGUIThread(false) to test for a message queue.

I don't know the webrtc code at all, can we itemize here all the code areas that manipulate win32 ui resources on background threads? I see a few areas, how much of this code runs on a non-gui ipc thread, and how much of it runs on a webrtc background thread?

http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/desktop_capture/app_capturer_win.cc
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/desktop_capture/mouse_cursor_monitor_win.cc
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/desktop_capture/screen_capturer_win.cc
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/desktop_capture/window_capturer_win.cc
[Tracking Requested - why for this release]:
This code is incompatible with the Windows gui model. It must be addressed before we roll these features out to release channel. Risks involve random deadlocks on Windows when the code in question is in use.
Blocks: 1040061
Work in progress.  It adds a message queue for capture threads, but does not yet deal with enumeration
Assignee: nobody → linuxwolf
Status: NEW → ASSIGNED
Comment on attachment 8481810 [details] [diff] [review]
WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481810 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +487,5 @@
>    memset(_incomingFrameTimes, 0, sizeof(_incomingFrameTimes));
>  }
>  
>  DesktopCaptureImpl::~DesktopCaptureImpl() {
> +  terminate_ = true;

Also, post a thread message (WM_NULL?) here to get the thread out of PeekMessage below.

@@ +800,5 @@
> +  // capture needs to happen in GUI thread on Windows.
> +  // Should this be done in ThreadWindows::Run?
> +  MSG msg;
> +  if (!queue_ready_) {
> +    PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE);

Does this block properly or just fail? I think a more proper way to do this would be to use an event (CreateEvent) that you can signal (SetEvent), and call WaitForSingleObject(event, INFINITE) here.
Comment on attachment 8481810 [details] [diff] [review]
WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481810 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +804,5 @@
> +    PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE);
> +    PostThreadMessage(GetCurrentThreadId(), WM_NULL, 0, 0);
> +    queue_ready_ = true;
> +  }
> +  if (!terminate_ && PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {

Accessing terminate_ here violates thread-safety rules (TSAN, etc) since it's not under a lock.
Either make it atomic, put it under a lock, or use a message to tell the thread to break out of the loop and stop (perhaps best).
Do we even need queue_ready_ here and the associated code? The Windows message queue will automatically get created the first time you call PeekMessage. I think you can take all of this code out, unless you really need to wait for something before dropping into the dispatch loop below.
Comment on attachment 8481810 [details] [diff] [review]
WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481810 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +813,4 @@
>    DesktopRect desktop_rect;
>    DesktopRegion desktop_region;
>  
>    desktop_capturer_cursor_composer_->Capture(DesktopRegion());

Looking at this a more broadly, does this code need to execute while we're running or is this shutdown code? This call to Capture won't execute until the thread is ready to exit. Maybe your comment above about moving this to ThreadWindows::Run is right?
Comment on attachment 8481810 [details] [diff] [review]
WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481810 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +813,4 @@
>    DesktopRect desktop_rect;
>    DesktopRegion desktop_region;
>  
>    desktop_capturer_cursor_composer_->Capture(DesktopRegion());

>Looking at this a more broadly, does this code need to execute while we're running or is this shutdown code? This call to
>Capture won't execute until the thread is ready to exit. Maybe your comment above about moving this to ThreadWindows::Run
>is right?

The above messageloop code executes once per process(); process() is called by ::Run(), and ::Run() is called in an infinite loop by the system_wrappers/thread code used for this thread (until thread->Stop() is called).  So it will make one PeekMessage, then call Capture(), then loop around again.  This still leaves it as a busy-loop without the bug to limit the framerate of capture.

PeekMessage doesn't wait, unlike GetMessage
Interdiff over attachment 8481810 [details] [diff] [review] to address review comments from Randell Jesup and Jim Mathies.
This patch addresses all of the review comments from the previous revision.  Enumeration is still not dealt with yet.

Interdiff is attachment 8481842 [details] [diff] [review].
Attachment #8481810 - Attachment is obsolete: true
Attachment #8481843 - Flags: review?(rjesup)
Attachment #8481843 - Flags: review?(jmathies)
Attachment #8481843 - Attachment is patch: true
Comment on attachment 8481843 [details] [diff] [review]
v2 - WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481843 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with a while()

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +750,5 @@
>    }
>  
>    return nrOfFrames;
>  }
> +		

spaces

@@ +796,5 @@
> +#if XP_WIN
> +  // capture needs to happen in GUI thread on Windows.
> +  // Should this be done in ThreadWindows::Run?
> +  MSG msg;
> +  if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {

This will only process one message per screen/window frame captured

Perhaps "while (PeekMessage(...)) {"
Drain the queue, but don't wait for more messages.
Attachment #8481843 - Flags: review?(rjesup) → review+
Comment on attachment 8481843 [details] [diff] [review]
v2 - WebRTC window (and probably screen) sharing uses Windows HWNDs/etc on non-ui/message-loop threads

Review of attachment 8481843 [details] [diff] [review]:
-----------------------------------------------------------------

This isn't going to work, you need to be in the dispatch loop when you call Capture.

I think what you'll need to do here is : 

1) in Run, create a hidden window, and set a timer (capture) callback (SetTimer)
2) drop into the dispatch loop, where this thread will stay for its entire lifetime
3) in the window procedure in response to WM_TIMER, call process() -> Capture(). You can store |this| as a property on the window using SetProp. There are lots of example of this in our code base.
4) figure out how to shut this all down when you want the thread to exit.
Attachment #8481843 - Flags: review?(jmathies) → review-
If you have a "signal" call that comes in from another thread that tells this thread a capture is needed needed, you can replace the timer with this signal using PostMessage.
If you like I can pick this up, but I won't be able to start work on it till mondayish. :)
I think I'll take you up on your offer (-:

If I don't get anywhere on my own, I'll assign it to you, Jim?
Matt -- Thanks for digging in on this!  I think at this point, it makes more sense for you to work on bug 1056213 (which got backed out earlier today) so it can re-land before uplift.  I don't think this bug has a chance of landing before Tuesday whereas bug 1056213 does.

Jim -- I think it makes sense for you to take over this bug as soon as you can.  Thanks for the help!  I'd love to get this fixed in time for the first Beta build of Fx33, if possible.  Randell and/or I will touch base with Sylvestre (the release driver for Fx33) and let him know that we'll be asking to uplift a fix into Beta 33 once it's ready.
Tracking, we don't want deadlocks.
A couple questions -

1) is there a good test case someplace for testing capture?
2) In the current implementation it looks like the ThreadWindows capture thread drops into a tight loop with no sleeps or waits. Is this correct?
Assignee: linuxwolf → jmathies
(In reply to Jim Mathies [:jimm] from comment #19)
> A couple questions -
> 
> 1) is there a good test case someplace for testing capture?

http://mozilla.github.io/webrtc-landing/gum_test.html (there are some others too)

> 2) In the current implementation it looks like the ThreadWindows capture
> thread drops into a tight loop with no sleeps or waits. Is this correct?

Yes, there's a bug filed on limiting the FPS by gcp; I assume in the windows messageloop impl we'd use a WM_TIMER as a base.  GCP has a fix (at least for linux/mac) that also takes into account CPU usage.  Though it doesn't stop the waste of CPU time, we also limit encoding of the captured frames to 30fps or less
Attached patch part 1 - capture thread (obsolete) — Splinter Review
This part should solve the problem with Capture() calls.

This does not solve the problems with cpu consumption, since I'm emulated the previous thread behavior. That work can be done in a follow up.

It also does not solve the problem associated with jesup's stack in comment 1. I haven't had a chance to dig into that issue yet.
Attachment #8481842 - Attachment is obsolete: true
Attachment #8481843 - Attachment is obsolete: true
Attached patch part 2 - media manager (obsolete) — Splinter Review
This solves the problem shown by jesup's stack. We create a background (non-ui) nsThread for polling work for devices. On Windows, this has to happen on a ui thread, the simplest to access being the main thread. I consider this more of a work around than anything else.
Attached patch part 3 - asserts (obsolete) — Splinter Review
Attachment #8482780 - Attachment description: wip - capture thread → part 1 - capture thread
Blocks: 1060796
I did an audit of MediaManager to understand which runnables might try to access ui resources on Windows. Here are the results:


ErrorCallbackRunnable (main thread enforced)
SuccessCallbackRunnable (main thread enforced)
DeviceSuccessCallbackRunnable (main thread enforced)
GetUserMediaListenerRemove (main thread enforced)
GetUserMediaStreamRunnable (main thread enforced)
GetUserMediaNotificationEvent (main thread enforced)
GetUserMediaRunnable (off main thread, unsafe)
GetUserMediaDevicesRunnable (off main thread, unsafe)
MediaOperationRunnable (off main thread, unknown - accesses media sources for controlling playback)
 http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaManager.h#343

jesup was complaining of more lockups even with parts 1 and 2 applied. Looking at this I think its likely due to the other two off main thread runnables I didn't main thread in  part 2.

So I don't think that's the right fix. Here are two approaches I'm going to test - 

1) do a little surgery on nsThread so that it can invoke a different message pump here - 
http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#345

This actually doesn't look too hard to do, we can modify MainThreadFlag(s) to take an additional thread type. This change shouldn't break anything in the existing implementation. Then we check this flag in ThreadFunc and choose an appropriate message pump (TYPE_MOZILLA_NONMAINTHREAD/TYPE_MOZILLA_UI). The risk here is that this is a completely untested change to nsThread behavior, which presumably would have to land on *beta* sometime next week.

2) Use nsIThreadInternal observers to implement the native event dispatching in MediaManager.

This would be similar to the way we handle main thread dispatching today - except that the underlying message pump would be different. I'm not sure what the side effects of this will be. The two message pumps are very different:

http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessagePump.cpp#306
http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_pump_win.cc#188

I'm leaning toward #1, but I might implement both for testing purposes.
I'm definitely going to start with option 1. With #2, we want the enumerations and other ux calls to happen while DispatchMessage is on the stack. AFAICT, there's no way to do that. We could dispatch events outside of these calls, but runnable->run calls would not get invoked via DispatchMessage.
Attached patch part 1 - capture thread (obsolete) — Splinter Review
Attachment #8482780 - Attachment is obsolete: true
Attachment #8482821 - Attachment is obsolete: true
Attachment #8482822 - Attachment is obsolete: true
Attachment #8484308 - Attachment is obsolete: true
(In reply to Jim Mathies [:jimm] from comment #30)
> https://tbpl.mozilla.org/?tree=Try&rev=0fb07239d8ca

Nils, can we schedule some test time with these builds? Looking for general regressions on mac and linux, and issues with hangs / odd window & screen capture behavior on Windows.
Flags: needinfo?(drno)
https://tbpl.mozilla.org/?tree=Try&rev=8dc5c1f7640e

Fresh try push with tests this time. Also, one change was made for linux/mac, I was requesting a TYPE_UI message pump for all platforms. Not sure what impact that might have on mac/linux, so I switched to the default type for those platforms.
Sorry Jim I'm traveling right now, so I only have access to one OS (Mac). As I'll be offline next week and I assume this should get done earlier, maybe Anthony knows someone who can help out here.
Flags: needinfo?(drno) → needinfo?(anthony.s.hughes)
This isn't ready for testing yet. The default chromium message pump types don't process xpcom events. I think we're going to need a new pump implementation for this on windows.
Flags: needinfo?(anthony.s.hughes)
This is a new rev of the chromium thread work, which works. I'll push to try to see what happens there on Windows.

Over the weekend though I want to take a step back and look again at option #2 from comment 24. Using a chromium thread/message pump is good in that we're reusing an existing gui thread message loop that is well tested and works with our code. However it turns out TYPE_UI pumps don't process xpcom events, and TYPE_MOZILLA_UI was really only meant for the main gecko thread. So I ended up needing a new pump implemented as a subclass of chromium's MessagePumpForUI with xpcom support. This works but involves more changes that I'd hoped.

Option two involves recreating the hackish approach we use in app shell on the main thread using a thread observer to process gui events. I'll see how that turns out and then make a decision on which approach is best.
Attachment #8484312 - Attachment is obsolete: true
Option 3:  Use a TYPE_UI thread, and dispatch any runnables to MediaThread using alternative mechanisms supported in TYPE_UI

Not sure if it's as good or better than #1 or #2, but worth mentioning
(In reply to Randell Jesup [:jesup] from comment #36)
> Option 3:  Use a TYPE_UI thread, and dispatch any runnables to MediaThread
> using alternative mechanisms supported in TYPE_UI
> 
> Not sure if it's as good or better than #1 or #2, but worth mentioning

That's exactly what I had here - 

(In reply to Jim Mathies [:jimm] from comment #32)
> https://tbpl.mozilla.org/?tree=Try&rev=8dc5c1f7640e
> 
> Fresh try push with tests this time. Also, one change was made for
> linux/mac, I was requesting a TYPE_UI message pump for all platforms. Not
> sure what impact that might have on mac/linux, so I switched to the default
> type for those platforms.

Hence the switch to posted tasks vs. runnables. The test failures were caused by a lack of xpcom support for objects like GetUserMediaCallbackMediaStreamListener.
(In reply to Jim Mathies [:jimm] from comment #37)
> (In reply to Randell Jesup [:jesup] from comment #36)
> > Option 3:  Use a TYPE_UI thread, and dispatch any runnables to MediaThread
> > using alternative mechanisms supported in TYPE_UI

> That's exactly what I had here - 

> > Fresh try push with tests this time. Also, one change was made for
> > linux/mac, I was requesting a TYPE_UI message pump for all platforms. Not
> > sure what impact that might have on mac/linux, so I switched to the default
> > type for those platforms.
> 
> Hence the switch to posted tasks vs. runnables. The test failures were
> caused by a lack of xpcom support for objects like
> GetUserMediaCallbackMediaStreamListener.

Sorry, I misunderstood what that try was for.  Too bad, that seemed promising.  What caused the failures?  Perhaps I can take another look at that in parallel.
(In reply to Randell Jesup [:jesup] from comment #38)
> (In reply to Jim Mathies [:jimm] from comment #37)
> > Hence the switch to posted tasks vs. runnables. The test failures were
> > caused by a lack of xpcom support for objects like
> > GetUserMediaCallbackMediaStreamListener.
> 
> Sorry, I misunderstood what that try was for.  Too bad, that seemed
> promising.  What caused the failures?  Perhaps I can take another look at
> that in parallel.

The media thread didn't handle xpcom events, so for example dispatch to thread calls didn't work. I guess the assumption has been made in the media code that the media thread processes xpcom events in various places, so we need to support that.

The good news is the changes I had to make to the TYPE_UI pump and the new custom pump are minimal, I've copy pasted the old DoRunLoop method for TYPE_UI and added a call to NS_ProcessNextEvent in the loop. This is very similar to the other mozilla pumps we have.
Attachment #8484307 - Attachment is obsolete: true
Attachment #8485253 - Attachment is obsolete: true
(In reply to Jim Mathies [:jimm] from comment #41)
> Created attachment 8485425 [details] [diff] [review]
> part 2 - MediaManager threading work

This is pretty close. I have one issue to sort out still: for some reason I'm not getting nsIThreadObserver calls for OnDispatchedEvent when an event is added to the thread's queue. This causes the media thread to get caught up in WaitForWork() on test runs, causing time outs.
new try push with various fixes:

https://tbpl.mozilla.org/?tree=Try&rev=46cda9e8a884
Attachment #8485425 - Attachment is obsolete: true
Attachment #8484313 - Attachment is obsolete: true
https://tbpl.mozilla.org/?tree=Try&rev=8e7ee79c5942

Hey jesup, can you shed any light on possible causes for these mochitest-3 test timeouts on linux? Not much has changed, we've just switched thread types.
Flags: needinfo?(rjesup)
latest
Attachment #8485428 - Attachment is obsolete: true
Attachment #8485444 - Attachment is patch: true
just linux opt m-3, with added test output:

https://tbpl.mozilla.org/?tree=Try&showall=0&rev=215f9524212e
While running these ipc tests, the first task posted to the media thread never gets invoked. Not sure why yet. Does anyone have a linux box they can debug this on?

https://tbpl.mozilla.org/php/getParsedLog.php?id=47573540&tree=Try&full=1#error0
https://hg.mozilla.org/try/rev/5d6dd8a3b328
I think I may have a fix to the final issue i was running into.

https://tbpl.mozilla.org/?tree=Try&rev=47785910974b
Flags: needinfo?(rjesup)
Attachment #8485444 - Attachment is obsolete: true
Attachment #8486405 - Flags: review?(benjamin)
Attachment #8485424 - Flags: review?(rjesup)
Please reassign if this isn't an appropriate request for you jesup.
Attachment #8485429 - Attachment is obsolete: true
Attachment #8486407 - Flags: review?(rjesup)
Attachment #8485430 - Flags: review?(rjesup)
Attachment #8486407 - Attachment is patch: true
Comment on attachment 8486405 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

So in the end, what this is implementing is a thread which responds to both XPCOM and UI event loops.

Currently the main XPCOM thread is the only thread which does this, and it is implemented by having the widget code be the thread observer and do the joint waiting.

This patch appears to implement an entirely separate system for other threads which uses chromium event pumps and its own observer mechanism. That seems like unfortunate technical debt, although it may be the best answer in this timeframe. However it's going to require really careful review of race conditions across the three event sources involved (chromium/UI/XPCOM), and I'm not sure I have the time this week for that review. I don't know if bent does, but he wrote the original XPCOM/chromium dispatch integration.

I also strongly object to the code pattern this is using:

sSingleton->mMediaThread = new base::Thread("MediaManager");

Since we're creating a thread that processes XPCOM events, this should absolutely be created as an XPCOM thread via the thread manager, rather than being magically promoted.
Attachment #8486405 - Flags: review?(benjamin) → review?(bent.mozilla)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #55)
> Comment on attachment 8486405 [details] [diff] [review]
> I also strongly object to the code pattern this is using:
> 
> sSingleton->mMediaThread = new base::Thread("MediaManager");
> 
> Since we're creating a thread that processes XPCOM events, this should
> absolutely be created as an XPCOM thread via the thread manager, rather than
> being magically promoted.

This works fine in practice. The default creation behavior for nsThread is actually very handy here. We could switch to an xpcom thread, but we don't get anything for it other than added complexity. We would need to expose a way to request a different chromium message pump down on the ThreadFunc, maybe through the use of MainThreadFlag in the constructor of nsThread. We would also need a new thread creation helper in nsThreadUtils. These changes can be made, but it'd probably be best to cover those in a follow up. I'd be happy to take that since I already wrote it once, but then abandoned it here when I realized I was making a lot of xpcom changes that I didn't need to make if I just went with Thread*. (I was actually trying to channel your review on this in advanced. :)
Comment on attachment 8485424 [details] [diff] [review]
part 1 - Add support for webrtc ThreadWindowsUI

Review of attachment 8485424 [details] [diff] [review]:
-----------------------------------------------------------------

We also need to add this to the webrtc_uplifts meta bug

::: media/webrtc/trunk/webrtc/system_wrappers/interface/thread_wrapper.h
@@ +59,5 @@
>  
> +  static ThreadWrapper* CreateUIThread(ThreadRunFunction func,
> +                                       ThreadObj obj,
> +                                       ThreadPriority prio = kNormalPriority,
> +                                       const char* thread_name = 0);

NULL

::: media/webrtc/trunk/webrtc/system_wrappers/source/thread_win.cc
@@ +23,5 @@
> +// For use in ThreadWindowsUI callbacks
> +static UINT static_reg_windows_msg = RegisterWindowMessageW(L"WebrtcWindowsUIThreadEvent");
> +// timer id used in delayed callbacks
> +static const UINT_PTR kTimerId = 1;
> +static const wchar_t kThisProperty[] = L"ThreadWindowsUIPtr"; 

trailing space

::: media/webrtc/trunk/webrtc/system_wrappers/source/thread_win.h
@@ +66,5 @@
> + public:
> +  ThreadWindowsUI(ThreadRunFunction func, ThreadObj obj, ThreadPriority prio,
> +                  const char* thread_name) :
> +  ThreadWindows(func, obj, prio, thread_name),
> +  hwnd_(nullptr),

webrtc.org uses NULL not nullptr

::: media/webrtc/trunk/webrtc/video_engine/desktop_capture_impl.cc
@@ +763,5 @@
>    unsigned int t_id =0;
>    capturer_thread_.Start(t_id);
> +
> +#if defined(_WIN32)
> +  capturer_thread_.RequestCallbackTimer(50); // 50 msec equals about 20fps

Unless there's a reason, use the same logic from the non-win32 version below

Start with 1000/_requestedCapability.maxFPS.  Avoiding CPU overload (like in non-windows) could be a followon, if need be.
Attachment #8485424 - Flags: review?(rjesup) → review+
Attachment #8485430 - Flags: review?(rjesup) → review+
Comment on attachment 8486407 [details] [diff] [review]
part 3 - MediaManager threading work

Review of attachment 8486407 [details] [diff] [review]:
-----------------------------------------------------------------

I presume the observer's mMediaThread->Shutdown() still works right and guarantees no leaked events
Attachment #8486407 - Flags: review?(rjesup) → review+
Comment on attachment 8486405 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

Review of attachment 8486405 [details] [diff] [review]:
-----------------------------------------------------------------

I *really* don't have time to review this now... Is there anyone else that could possibly take this?
Attachment #8486405 - Flags: review?(bent.mozilla)
- fixing some comment typos.
Attachment #8486405 - Attachment is obsolete: true
Attachment #8486777 - Flags: review?(bent.mozilla)
bsmedberg, any suggestions here? Looking at blame I see cjones, you, and bent doing most of the heavy lifting.
Flags: needinfo?(benjamin)
I can connect up on vidyo to go over the changes if it helps.
Note: The reason the Try in comment 32 failed was because MediaEngineDefault.cpp uses nsITimers on MediaManager thread to generate fake audio/video; these could easily be swapped or spun off to another thread.
(In reply to Randell Jesup [:jesup] from comment #63)
> Note: The reason the Try in comment 32 failed was because
> MediaEngineDefault.cpp uses nsITimers on MediaManager thread to generate
> fake audio/video; these could easily be swapped or spun off to another
> thread.

Please file a follow up. I've certainly set things up for you here, but I do not want the scope of this bug to creep.
Other possible reviewers here - dmajor, tabraldes, bbondy, dvander?
Flags: needinfo?(dmajor)
oops, didn't mean to tag david on that post.
Flags: needinfo?(dmajor)
Per irc with aoverholt (who is sitting next to bent and khuey at a work week), we could ask tabraldes for the review.
Attachment #8486777 - Flags: review?(tabraldes)
Comment on attachment 8486407 [details] [diff] [review]
part 3 - MediaManager threading work

Review of attachment 8486407 [details] [diff] [review]:
-----------------------------------------------------------------

For part3 a couple updates..

::: dom/media/MediaManager.cpp
@@ +62,5 @@
>  #include "mozilla/WindowsVersion.h"
>  #endif
>  
> +#include "base/thread.h"
> +#include "base/task.h"

nit - I was able to remove these includes.

@@ +1441,5 @@
> +    }
> +    // Note, this creates the nsThread for Thread.
> +    nsIThread* nsthread = NS_GetCurrentThread();
> +    MOZ_ASSERT(nsthread);
> +    NS_SetThreadName(nsthread, "MediaManager");

I removed this code block. It was supposed to be working on the new nsthread, but since I call NS_GetCurrentThread() here, its actually working on the main thread! The chromium naming is sufficient, we don't post xpcom events to this thread using the named thread utils.
Comment on attachment 8486777 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

Review of attachment 8486777 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessagePump.h
@@ +104,5 @@
> +public:
> +  // We don't want xpcom refing, chromium controls our lifetime via
> +  // RefCountedThreadSafe.
> +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> +    return NS_OK;

Just a drive-by review here: the typical pattern is to return 2 here...

@@ +107,5 @@
> +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> +    return NS_OK;
> +  }
> +  NS_IMETHOD_(MozExternalRefCountType) Release(void) {
> +    return NS_OK;

...and 1 here.  NS_OK is a...ungood value to return in these contexts.
Hi Chris -- Would you have time to look over "patch 2" of this bug ("add support for MessagePumpForNonMainUIThreads")?  We'd love as many knowledgeable eyes on it as we can.  We'll take a review or feedback -- whatever you can give us.  

This is an important patch for screen sharing that won't be getting much bake time because the bug wasn't found until late Aurora, and we'd really love to deliver WebRTC screen sharing in Fx33.  Thanks for any time you can give us!
Flags: needinfo?(cjones.bugs)
(In reply to Nathan Froyd (:froydnj) from comment #69)
> Comment on attachment 8486777 [details] [diff] [review]
> part 2 - add support for MessagePumpForNonMainUIThreads
> 
> Review of attachment 8486777 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/MessagePump.h
> @@ +104,5 @@
> > +public:
> > +  // We don't want xpcom refing, chromium controls our lifetime via
> > +  // RefCountedThreadSafe.
> > +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> > +    return NS_OK;
> 
> Just a drive-by review here: the typical pattern is to return 2 here...
> 
> @@ +107,5 @@
> > +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> > +    return NS_OK;
> > +  }
> > +  NS_IMETHOD_(MozExternalRefCountType) Release(void) {
> > +    return NS_OK;
> 
> ...and 1 here.  NS_OK is a...ungood value to return in these contexts.

thanks for reviewing, nice catch!
I probably won't be able to do a thorough review until this weekend or later.  (Swamped with other time-sensitive work at the moment.)
Flags: needinfo?(cjones.bugs)
Comment on attachment 8486777 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

Review of attachment 8486777 [details] [diff] [review]:
-----------------------------------------------------------------

A couple questions and one issue. Let me know if you want to chat on IRC or anything!

::: ipc/glue/MessagePump.cpp
@@ +319,5 @@
>    }
>  
> +  // Calls to ScheduleWork will accomplish nothing if mThread isn't valid
> +  // at the time of the call. So we need to clear out the chromium event
> +  // queue here or risk having work notifications/tasks out of sync.

I don't understand this comment; how does clearing out the chromium event queue ensure that `mThread` is valid?

@@ +326,5 @@
> +  // using a runnable on the xpcom side, but some thread implementations
> +  // (dom workers) get cranky if we call ScheduleWork here (ScheduleWork
> +  // calls dispatch on mThread) before the thread processes an event. As
> +  // such, clear the queue manually.
> +  while (aDelegate->DoWork()) {

Do we really need to be completely clear of chromium events at some later point? What do we do if a chromium event enters our queue between now and whenever that point is? Do we have some kind of guard preventing chromium events from entering the queue in the meantime?

@@ +390,5 @@
> +  MOZ_ASSERT(ti);
> +  ti->SetObserver(this);
> +
> +  // Clear the chromium event queue, see above.
> +  while (state_->delegate->DoWork()) {

Like above, I'm not clear on why we need to do this here. Please let me know if I'm missing something.

@@ +423,5 @@
> +    if (didWork || hasWork) {
> +      continue;
> +    }
> +
> +    mInWait = true;

If an XPCOM event comes in before this call, but after the call to `NS_HasPendingEvents` above, it feels like we're going to sit in `WaitForWork` below event though we have an XPCOM event waiting. Does that seem accurate?

@@ +424,5 @@
> +      continue;
> +    }
> +
> +    mInWait = true;
> +    WaitForWork(); // Calls MsgWaitForMultipleObjectsEx(QS_ALLINPUT)

This doesn't ask the delegate whether it has events to process; it only checks whether there are Windows events to deal with. Does that matter? I'm not super familiar with the chromium events loop that we're dealing with here.

::: ipc/glue/MessagePump.h
@@ +5,5 @@
>  #ifndef __IPC_GLUE_MESSAGEPUMP_H__
>  #define __IPC_GLUE_MESSAGEPUMP_H__
>  
>  #include "base/message_pump_default.h"
> +#if defined(XP_WIN) 

nit: trailing space

@@ +104,5 @@
> +public:
> +  // We don't want xpcom refing, chromium controls our lifetime via
> +  // RefCountedThreadSafe.
> +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> +    return NS_OK;

Let me test my understanding here:
  We're inheriting `void AddRef()` and `void Release` from `RefCountedThreadSafe` by joining the `MessagePump` inheritance hierarchy.
  We're also implementing our own `AddRef` and `Release` functions with non-void return values because they're required for XPCOM.

I'm not totally clear on how the compiler will know when it should call `RefCountedThreadSafe::AddRef` versus `nsISupports::AddRef` that we've overridden here, but if someone tells me that this "just works," I'll believe them.

@@ +131,5 @@
> +  ~MessagePumpForNonMainUIThreads()
> +  {
> +  }
> +
> +  bool mInWait;

This is accessed by multiple threads. Why are we using a naked bool instead of something that we can operate on using the `Interlocked*` functions?
> ::: ipc/glue/MessagePump.cpp
> @@ +319,5 @@
> >    }
> >  
> > +  // Calls to ScheduleWork will accomplish nothing if mThread isn't valid
> > +  // at the time of the call. So we need to clear out the chromium event
> > +  // queue here or risk having work notifications/tasks out of sync.
> 
> I don't understand this comment; how does clearing out the chromium event
> queue ensure that `mThread` is valid?

There's a call on MessagePump, ScheduleWork [1] which gets called when you call PostTask [2]. In ScheduleWork, mThread may be null if we're working with a chromium thread that hasn't entered Run yet since we set mThread at the start of Run. There's no assurance that the chromium thread will get down into Run before someone posts a new task to the thread. So we need to empty those tasks out before we drop into the wait loop. See comment 50 here for a try run that shows the side effects of this.

[1] http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessagePump.cpp#151
[2] http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/message_loop.cc#294 

> 
> @@ +326,5 @@
> > +  // using a runnable on the xpcom side, but some thread implementations
> > +  // (dom workers) get cranky if we call ScheduleWork here (ScheduleWork
> > +  // calls dispatch on mThread) before the thread processes an event. As
> > +  // such, clear the queue manually.
> > +  while (aDelegate->DoWork()) {
> 
> Do we really need to be completely clear of chromium events at some later
> point? What do we do if a chromium event enters our queue between now and
> whenever that point is? Do we have some kind of guard preventing chromium
> events from entering the queue in the meantime?

chromium should take care of managing access to its queue for multiple threads. So the guarding of access to the queue should be covered. We need to clear anything out prior to entering Run and the setting of mThread. Since mThread is valid at this point, any ScheduleWork notifications will pick up new tasks. I considered blocking the main thread on entrance to Run for the media thread, but that sucks since it blocks main thread, and it doesn't fix other cases where someone might be doing the same thing.

> @@ +390,5 @@
> > +  MOZ_ASSERT(ti);
> > +  ti->SetObserver(this);
> > +
> > +  // Clear the chromium event queue, see above.
> > +  while (state_->delegate->DoWork()) {
> 
> Like above, I'm not clear on why we need to do this here. Please let me know
> if I'm missing something.

Same issue as above, covered for this run loop.

> 
> @@ +423,5 @@
> > +    if (didWork || hasWork) {
> > +      continue;
> > +    }
> > +
> > +    mInWait = true;
> 
> If an XPCOM event comes in before this call, but after the call to
> `NS_HasPendingEvents` above, it feels like we're going to sit in
> `WaitForWork` below event though we have an XPCOM event waiting. Does that
> seem accurate?

In this small corridor, if it was the last xpcom event this thread ever received and we aren't receiving any chromium task either, then yes. In practice this never happens.

> @@ +424,5 @@
> > +      continue;
> > +    }
> > +
> > +    mInWait = true;
> > +    WaitForWork(); // Calls MsgWaitForMultipleObjectsEx(QS_ALLINPUT)
> 
> This doesn't ask the delegate whether it has events to process; it only
> checks whether there are Windows events to deal with. Does that matter? I'm
> not super familiar with the chromium events loop that we're dealing with
> here.

If either an xpcom event or chromium event get posted, we call ScheduleWork, which breaks us out of WaitForWork().

> @@ +104,5 @@
> > +public:
> > +  // We don't want xpcom refing, chromium controls our lifetime via
> > +  // RefCountedThreadSafe.
> > +  NS_IMETHOD_(MozExternalRefCountType) AddRef(void) {
> > +    return NS_OK;
> 
> Let me test my understanding here:
>   We're inheriting `void AddRef()` and `void Release` from
> `RefCountedThreadSafe` by joining the `MessagePump` inheritance hierarchy.
>   We're also implementing our own `AddRef` and `Release` functions with
> non-void return values because they're required for XPCOM.
> 
> I'm not totally clear on how the compiler will know when it should call
> `RefCountedThreadSafe::AddRef` versus `nsISupports::AddRef` that we've
> overridden here, but if someone tells me that this "just works," I'll
> believe them.

it just works. :)

> 
> @@ +131,5 @@
> > +  ~MessagePumpForNonMainUIThreads()
> > +  {
> > +  }
> > +
> > +  bool mInWait;
> 
> This is accessed by multiple threads. Why are we using a naked bool instead
> of something that we can operate on using the `Interlocked*` functions?

yeah, I'm probably being lazy here. Use of something like MutexAutoLock is warranted.
Flags: needinfo?(benjamin)
Attachment #8486777 - Attachment is obsolete: true
Attachment #8486777 - Flags: review?(tabraldes)
Attachment #8486777 - Flags: review?(bent.mozilla)
Attachment #8487912 - Flags: superreview?(cjones.bugs)
Attachment #8487912 - Flags: review?(tabraldes)
Attachment #8486407 - Attachment is obsolete: true
Attachment #8487914 - Attachment description: part 3 - MediaManager threading work r=jesup → part 3 - MediaManager threading work (r=jesup)
Attachment #8487912 - Flags: review?(nfroyd)
Attachment #8488150 - Flags: review?(rjesup)
Blocks: 1066242
Comment on attachment 8487912 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

Review of attachment 8487912 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/MessagePump.cpp
@@ +319,5 @@
>    }
>  
> +  // Calls to ScheduleWork will accomplish nothing if mThread isn't valid
> +  // at the time of the call. So we need to clear out the chromium event
> +  // queue here or risk having work notifications/tasks out of sync.

After talking with jimm in IRC I feel like I actually understand what's going on here.

For my benefit, and for that of future readers, I would expand this comment to give more context:
  "Any chromium events that need to be processed should be received by this event loop as a `DoWorkRunnable`. The `NS_ProcessNextEvent` call at the bottom of the loop will wait until it encounters one of these or some other XPCOM event. Chromium events that were received before `mThread` became valid, however, will not be received in this way (check the implementation of `ScheduleWork`, which is called by the chromium event posting code, to understand why). We must be sure to process all of these before we enter this loop, or we will forever have unprocessed chromium messages in our queue."

I don't care about exact wording, but I think that extra context would have helped me out a lot on my first pass through this code.

@@ +391,5 @@
> +  ti->SetObserver(this);
> +
> +  // Clear the chromium event queue, see above.
> +  while (state_->delegate->DoWork()) {
> +  }

As discussed in IRC, this is probably not necessary for `MessagePumpForNonMainUIThreads`. For the sake of readability, we may want to remove this.

@@ +393,5 @@
> +  // Clear the chromium event queue, see above.
> +  while (state_->delegate->DoWork()) {
> +  }
> +
> +  base::ScopedNSAutoreleasePool autoReleasePool;

This is #ifdef XPWIN code. From looking at the implementation of `ScopedNSAutoreleasePool`, it seems like we don't need this.

@@ +426,5 @@
> +
> +    // Note, there is a slight risk here that we can get an xpcom event in this
> +    // corridor which we won't pick up in WaitForWork assuming its the last
> +    // xpcom event we get, and we never receive any chromium tasks. Generally this
> +    // doesn't happen in practice.

I think we can avoid this potential issue by calling `SetInWait()` before calling `NS_HasPendingEvents`

if (didWork) {
  continue;
}

SetInWait();
// Now any XPCOM events that come
// in will trigger a call to `ScheduleWork`
bool hasWork = NS_HasPendingEvents(mThread);
if (hasWork) {
  ClearInWait();
  continue;
}
WaitForWork();
ClearInWait();

...but I guess that could impose a lot of mutex locking overhead on our loop.

I just worry that this known race condition might lead to some super subtle perceived-performance issue down the road.

::: ipc/glue/MessagePump.h
@@ +5,5 @@
>  #ifndef __IPC_GLUE_MESSAGEPUMP_H__
>  #define __IPC_GLUE_MESSAGEPUMP_H__
>  
>  #include "base/message_pump_default.h"
> +#if defined(XP_WIN) 

nit: trailing space
Attachment #8487912 - Flags: review?(tabraldes) → review+
Comment on attachment 8488150 [details] [diff] [review]
webrtc hack to get around broken compositor code

Review of attachment 8488150 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the suggestion I made

::: gfx/layers/d3d9/DeviceManagerD3D9.cpp
@@ +26,5 @@
>  const LPCWSTR kClassName       = L"D3D9WindowClass";
>  
> +// Our focus window - this is really a dummy window we can associate our
> +// device with. This was moved here temporarily due to bug 1066242.
> +HWND sD3D9FocusWnd;

So, this is read from another thread (MedaManager) with no locking...

I'd suggest leave mFocusWnd alone, and ALSO add sD3D9... which will only be read from MediaManager (and avoid theoretical TSAN issues, and simplifies the patch and removing it later).
Attachment #8488150 - Flags: review?(rjesup) → review+
(In reply to Jim Mathies [:jimm] from comment #78)
> Created attachment 8488150 [details] [diff] [review]
> webrtc hack to get around broken compositor code

For those following along, we are currently considering uplifting bug 1066242 with this rather than land this hack. That fix should land on inbound tonight.
Note that this wallpaper fix (In reply to Jim Mathies [:jimm] from comment #81)
> (In reply to Jim Mathies [:jimm] from comment #78)
> > Created attachment 8488150 [details] [diff] [review]
> > webrtc hack to get around broken compositor code
> 
> For those following along, we are currently considering uplifting bug
> 1066242 with this rather than land this hack. That fix should land on
> inbound tonight.

Note that this wallpaper fix misses an issue where the D2D9 window has a child IME window, which would also need to be skipped.  See options in bug 1066242; if we resuscitate this patch, we'll need to have it check for child windows of D3D9 as well.
This work has r+ for everything here, so I'm going to push to inbound as soon as bug 1066242 merges so we can get some testing done. I can address any nits later through follow up landings.
Attachment #8488150 - Attachment is obsolete: true
Comment on attachment 8487912 [details] [diff] [review]
part 2 - add support for MessagePumpForNonMainUIThreads

Review of attachment 8487912 [details] [diff] [review]:
-----------------------------------------------------------------

I was asked to look this over as "the XPCOM guy", but there's not a lot of XPCOM-ish stuff here and I think Tim's review is sufficient.
Attachment #8487912 - Flags: review?(nfroyd)
Blocks: 1066662
Depends on: 1066703
No longer blocks: 1066242
Depends on: 1066242
Comment on attachment 8487907 [details] [diff] [review]
part 1 - Add support for webrtc ThreadWindowsUI (r=jesup)

For parts 1-4:

Approval Request Comment
[Feature/regressing bug #]:
Issue with the original implementation of screen sharing on windows.
[User impact if declined]:
Lock ups in the media thread or main thread.
[Describe test coverage new/current, TBPL]:
On mc. 
[Risks and why]:
Low risk, well tested.
[String/UUID change made/needed]:
none.

bug 1066242 is a companion to this one.
Attachment #8487907 - Flags: approval-mozilla-beta?
Attachment #8487907 - Flags: approval-mozilla-aurora?
Attachment #8487907 - Flags: approval-mozilla-beta?
Attachment #8487907 - Flags: approval-mozilla-beta+
Attachment #8487907 - Flags: approval-mozilla-aurora?
Attachment #8487907 - Flags: approval-mozilla-aurora+
Sorry, I've been totally swamped lately.  It looks everything is landed; is sr still any use?
Flags: needinfo?(jmathies)
(In reply to Chris Jones [:cjones] mostly inactive; ni?/f?/r? if you need me from comment #90)
> Sorry, I've been totally swamped lately.  It looks everything is landed; is
> sr still any use?

I think we're ok, this landed without issue thus far. Note, we also had to switch the Windows compositor thread to TYPE_UI in another bug since it was also manipulating ui widgets.
Flags: needinfo?(jmathies)
Depends on: 1081010
Attachment #8487912 - Flags: superreview?(cjones.bugs)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: