Closed
Bug 1209987
Opened 9 years ago
Closed 9 years ago
webrtc.org Engine creation and destruction should happen on the WebRTC threads
Categories
(Core :: Audio/Video, defect)
Core
Audio/Video
Tracking
()
RESOLVED
FIXED
mozilla44
Tracking | Status | |
---|---|---|
firefox44 | --- | fixed |
People
(Reporter: gcp, Assigned: gcp)
Details
Attachments
(2 files, 3 obsolete files)
1.02 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
30.49 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
<jesup> gcp: The CamerasParent code blows up new thread-safety checks added in 43... :-( It verifies that you're calling AllocateCaptureDevice() from the ViECaptureThread... and we're calling it from a CamerasParent thread (VideoCapture)
<jesup> gcp: I think module_process_thread_ is created in ViESharedData::ViESharedData() (via ProcessThread::Create())
<jesup> yes, pbackground: CamerasParent.cpp:275 helper->mEngine = webrtc::VideoEngine::Create(helper->mConfig);
What this means is that the creation and destruction of the webrtc.org engines should be done on the same thread as the one the actual calls are dispatched to.
We currently don't do this, consequently they're on PBackground:
https://dxr.mozilla.org/mozilla-central/source/dom/media/systemservices/CamerasParent.cpp#236
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → gpascutto
Assignee | ||
Comment 1•9 years ago
|
||
This also slightly simplifies the shutdown sequence to make it clear
what happens where. Quite a bit of thread and ref juggling but no way
around that.
Attachment #8670739 -
Flags: review?(rjesup)
Assignee | ||
Comment 2•9 years ago
|
||
So it looks like we can get rid of all the locking if we can run everything on the WebRTC thread.
Unfortunately there's this:
https://bugzilla.mozilla.org/show_bug.cgi?id=1070216#c32
I'm going to see if that is still needed.
Comment 3•9 years ago
|
||
Comment on attachment 8670739 [details] [diff] [review]
webrtc.org Engine creation and destruction should happen on the WebRTC threads
Review of attachment 8670739 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/media/systemservices/CamerasParent.cpp
@@ +513,5 @@
> CamerasParent::RecvGetCaptureDevice(const int& aCapEngine,
> const int& aListNumber)
> {
> LOG((__PRETTY_FUNCTION__));
> + LOG(("RecvGetCaptureDevice"));
redundant
@@ +575,3 @@
> int error = -1;
> + if (!self->EnsureInitialized(aCapEngine)) {
> + LOG(("Fails to initialize"));
How about (for all of these) LOG(("Failed to initialize in " __FUNCTION__)) or "Failed to initialize in CamerasParent"
@@ +744,5 @@
> + self->mEngines[aCapEngine].mEngineIsRunning = false;
> + }
> + }
> + MutexAutoLock lock(self->mCallbackMutex);
> + for (unsigned int i = 0; i < self->mCallbacks.Length(); i++) {
size_t perhaps
@@ +851,5 @@
> +
> + // We need to shut down the video capture thread, but in
> + // all likelyhood we're running on it (when the runnable
> + // in ActorDestroy derefs self). Dispatch the stop to the main
> + // thread (PBackground which started it is likely gone by now).
Thread shutdown has to occur from the starting thread I think, at least on some OSes (check with froyd)
Attachment #8670739 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 4•9 years ago
|
||
Assignee | ||
Comment 5•9 years ago
|
||
Removed locks. Addressed review comments.
Attachment #8670813 -
Flags: review?(rjesup)
Assignee | ||
Updated•9 years ago
|
Attachment #8670739 -
Attachment is obsolete: true
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #3)
> Thread shutdown has to occur from the starting thread I think, at least on
> some OSes (check with froyd)
The VideoCaptureThread (base::thread) is created when PBackground creates the CamerasParent class. I presume that (the creation) is on the PBackground thread. When ActorDestroy gets called, we need to shut down the WebRTC stuff on the VideoCaptureThread. To do this the relevant shutdown runnable needs to take a reference to CamerasParent, and when the shutdown sequence chair-dance ends, it's going to be the one holding the last ref to it because PBackground is gone by then. So ~CamerasParent is now invoked on the VideoCaptureThread and somehow has to get rid of itself. Oops.
This is tricky. I can think of 3 ways (that I dare post here):
1) Synchronously dispatch the WebRTC shutdown tasks to the VideoCaptureThread, while sitting on the PBackground thread in ActorDestroy, and then do the destruction of the thread on PBackground. This is a little evil, and I haven't tried if it works, but I suspect it might. This causes PBackground to block on WebRTC shutdown.
2) Do the thread creation on PBackground, and in the destructor Dispatch the thread stop/deletion to the mainthread. This is the cleanest. But as stated above jesup isn't sure it's allowed, I'm not either, and froydnj responded "dare I ask why you even want to know this?"
3) Somehow do the thread creation on mainthread. You really want to have the thread up by the time the CamerasParent ctr finishes though, because otherwise all the incoming PBackground traffic must check if that thread is up (and hold if it's not done yet? that brings us back to 1).
Assignee | ||
Comment 7•9 years ago
|
||
I ended up implementing (3). I think it's what I'll least regret later.
Attachment #8670839 -
Flags: review?(rjesup)
Assignee | ||
Updated•9 years ago
|
Attachment #8670813 -
Attachment is obsolete: true
Attachment #8670813 -
Flags: review?(rjesup)
Assignee | ||
Comment 8•9 years ago
|
||
Assignee | ||
Comment 9•9 years ago
|
||
Need to test how much (hopefully nothing) breaks after bug Bug 1070216 lands.
Attachment #8670865 -
Flags: review?(rjesup)
Comment 10•9 years ago
|
||
Comment on attachment 8670839 [details] [diff] [review]
webrtc.org Engine creation and destruction should happen on the WebRTC threads
Review of attachment 8670839 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with the mVideoCaptureThread fully within the monitor lock
::: dom/media/systemservices/CamerasParent.cpp
@@ +629,2 @@
> cbh = self->mCallbacks.AppendElement(
> + new CallbackHelper(static_cast<CaptureEngine>(aCapEngine), capnum, self));
Is this just extra indent?
@@ +812,5 @@
> + if (mVideoCaptureThread) {
> + base::Thread *thread = mVideoCaptureThread;
> + {
> + MonitorAutoLock lock(mThreadMonitor);
> + mVideoCaptureThread = nullptr;
*thread = mVideoCaptureThread;
That should be within the lock, no? And the test.
If mVideoCaptureThread is referenced from multiple threads (and the lock says it is), then all access must be under lock or otherwise guaranteed not to have TSAN issues -- or make it Atomic, which might be a pain perhaps.
Attachment #8670839 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #10)
> @@ +812,5 @@
> > + if (mVideoCaptureThread) {
> > + base::Thread *thread = mVideoCaptureThread;
> > + {
> > + MonitorAutoLock lock(mThreadMonitor);
> > + mVideoCaptureThread = nullptr;
>
> *thread = mVideoCaptureThread;
> That should be within the lock, no? And the test.
> If mVideoCaptureThread is referenced from multiple threads (and the lock
> says it is), then all access must be under lock or otherwise guaranteed not
> to have TSAN issues -- or make it Atomic, which might be a pain perhaps.
Well, the thread that this is on is the only one writing to it, so I think the *reads* in the same thread can safely go outside the lock.
Assignee | ||
Comment 12•9 years ago
|
||
Failing on try:
10:21:34 INFO - [1882] ###!!! ASSERTION: Failed NS_DispatchToMainThread() in shutdown; leaking: 'false', file /builds/slave/try-lx-d-000000000000000000000/build/src/xpcom/glue/nsThreadUtils.cpp, line 192
10:21:34 INFO - #01: NS_DispatchToMainThread(already_AddRefed<nsIRunnable>&&, unsigned int) [xpcom/glue/nsThreadUtils.cpp:192]
10:21:34 INFO - #02: NS_DispatchToMainThread(nsIRunnable*, unsigned int) [xpcom/glue/nsThreadUtils.cpp:209]
10:21:34 INFO - #03: mozilla::camera::CamerasParent::~CamerasParent() [dom/media/systemservices/CamerasParent.cpp:826]
10:21:34 INFO - #04: mozilla::camera::CamerasParent::~CamerasParent() [memory/mozalloc/mozalloc.h:210]
10:21:34 INFO - #05: mozilla::camera::CamerasParent::Release() [dom/media/systemservices/CamerasParent.h:78]
10:21:34 INFO - #06: mozilla::media::LambdaRunnable<mozilla::camera::CamerasParent::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason)::<lambda()> >::~LambdaRunnable [xpcom/glue/nsThreadUtils.h:230]
The comments in nsThreadUtils.cpp seem to suggest that this Is A Bad Idea.
Updated•9 years ago
|
Attachment #8670865 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 13•9 years ago
|
||
Assignee | ||
Comment 14•9 years ago
|
||
Assignee | ||
Comment 15•9 years ago
|
||
Assignee | ||
Comment 16•9 years ago
|
||
Assignee | ||
Comment 17•9 years ago
|
||
Assignee | ||
Comment 18•9 years ago
|
||
Assignee | ||
Comment 19•9 years ago
|
||
Assignee | ||
Comment 20•9 years ago
|
||
Sorry for yet another review, but as you can see above all the
past approaches didn't actually work properly.
We now observe for xpcom-shutdown, dispatch the WebRTC engine
closing to the WebRTC thread, and sit on a monitor (i.e. it's a blocking,
sync dispatch) for that to finish, after which we dispatch the WebRTC
thread shutdown to the main thread (which might not be a real dispatch
if the shutdown came from xpcom-shutdown since we're in the main thread then,
but we must also handle the case where PBackground shuts down our actor).
We must block hard because once in xpcom-shutdown, the runnable in the WebRTC
thread might not get a chance to dispatch the thread close to the main thread
before we're in xpcom-shutdown-threads, at which point the main thread is no
longer accessible, causing an inevitable thread leak.
The proper machinery has been added to distinguish the two additional
states, i.e. "WebRTC thread still being spun up" and "WebRTC is being
shut down, thread may go away soon".
Attachment #8671955 -
Flags: review?(rjesup)
Assignee | ||
Updated•9 years ago
|
Attachment #8670839 -
Attachment is obsolete: true
Comment 21•9 years ago
|
||
Comment on attachment 8671955 [details] [diff] [review]
webrtc.org Engine creation and destruction should happen on the WebRTC threads
Review of attachment 8671955 [details] [diff] [review]:
-----------------------------------------------------------------
r+ assuming there's no issue with the booleans and RemoveObserver() is added (or point out that it's not needed for some reason), but we may be introducing races depending on the answer about the booleans. Likely that requires at least Atomic<> or a lock.
::: dom/media/systemservices/CamerasParent.cpp
@@ +151,5 @@
> + const char16_t *aData)
> +{
> + MOZ_ASSERT(!strcmp(aTopic, NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID));
> + StopVideoCapture();
> + return NS_OK;
Don't you need to remove the Observer here?
@@ +399,5 @@
> + LOG(("Being closed down while engine %d is running!", i));
> + }
> + if (mEngines[i].mPtrViERender) {
> + mEngines[i].mPtrViERender->Release();
> + mEngines[i].mPtrViERender = nullptr;
Perhaps not critical for this patch landing (and an existing issue), but wouldn't it be easier/safer to use ScopedCustomReleasePtr<webrtc::ViEBase>/etc like VideoConduit/AudioConduit do? Then these become ...mPtrViERender = nullptr;/etc
::: dom/media/systemservices/CamerasParent.h
@@ +142,5 @@
>
> // Shutdown handling
> bool mChildIsAlive;
> bool mDestroyed;
> + bool mWebRTCAlive;
What's the thread access story for these three? What threads can access? Is there locking? (I think no) Perhaps these should be some form of Atomic<bool> (not sure the level of consistency across the set needed)
Attachment #8671955 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #21)
> > // Shutdown handling
> > bool mChildIsAlive;
> > bool mDestroyed;
> > + bool mWebRTCAlive;
>
> What's the thread access story for these three? What threads can access? Is
> there locking? (I think no) Perhaps these should be some form of
> Atomic<bool> (not sure the level of consistency across the set needed)
mChildIsAlive and mDestroyed should be PBackground-only vars.
It looks like mWebRTCAlive needs a memory barrier in EnsureInitialized (the other access are under the monitor), so making it Atomic is probably easiest.
Comment 23•9 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #11)
> (In reply to Randell Jesup [:jesup] from comment #10)
> > @@ +812,5 @@
> > > + if (mVideoCaptureThread) {
> > > + base::Thread *thread = mVideoCaptureThread;
> > > + {
> > > + MonitorAutoLock lock(mThreadMonitor);
> > > + mVideoCaptureThread = nullptr;
> >
> > *thread = mVideoCaptureThread;
> > That should be within the lock, no? And the test.
> > If mVideoCaptureThread is referenced from multiple threads (and the lock
> > says it is), then all access must be under lock or otherwise guaranteed not
> > to have TSAN issues -- or make it Atomic, which might be a pain perhaps.
>
> Well, the thread that this is on is the only one writing to it, so I think
> the *reads* in the same thread can safely go outside the lock.
Actually, one of the horrors of TSAN is that this is wrong. Reads from another thread can corrupt the value. Really. https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/ and the lastr example from https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Assignee | ||
Comment 24•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #23)
> Actually, one of the horrors of TSAN is that this is wrong. Reads from
> another thread can corrupt the value. Really.
> https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/
> and the lastr example from
> https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-
> could-possibly-go-wrong
The relevant ones only apply to stack vars, not in-memory ones, which is guaranteed here. Of course there's the catch-all undefined behavior issue.
Note that the last patch already got rid of that code.
Assignee | ||
Comment 25•9 years ago
|
||
Assignee | ||
Comment 26•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/fc7d1acf804f05db4c6cea3f480cbc6bccbce153
Bug 1209987 - webrtc.org Engine creation and destruction should happen on the WebRTC threads. r=jesup
Backed out for leaks like https://treeherder.mozilla.org/logviewer.html#?job_id=15554576&repo=mozilla-inbound
https://hg.mozilla.org/integration/mozilla-inbound/rev/730bd2d9fe31
Flags: needinfo?(gpascutto)
Hrm, this happened twice in a row, and backfilling the gap behind the first instance showed this was in fact the first instance. But then it didn't fail on a new push several pushes later (which didn't finish until I had already backed this out, assuming it was permafailing). So maybe this isn't at fault.
doing a bunch of retriggers to see if I can get it to happen earlier.
Assignee | ||
Comment 29•9 years ago
|
||
This code has a large risk of causing leaks (see all try pushes above), but the current implementation didn't leak on try, and the leak in the logs seems to originate from unrelated audio code.
Flags: needinfo?(gpascutto)
Assignee | ||
Comment 30•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9c88cbf86ea362ddcfa0ae692caa538aaa120640
Bug 1209987 - webrtc.org Engine creation and destruction should happen on the WebRTC threads. r=jesup
Assignee | ||
Comment 31•9 years ago
|
||
Relanded in agreement with the sheriffs.
<nigelb|sheriffduty> Out of 8 test runs, only 2 were orange.
It's possible this modifies shutdown timing so it's exposing bugs in other code.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #31)
> Relanded in agreement with the sheriffs.
> <nigelb|sheriffduty> Out of 8 test runs, only 2 were orange.
>
> It's possible this modifies shutdown timing so it's exposing bugs in other
> code.
I did a bunch of retriggers on pushes near where you landed, and it leaked just like that linked one well before you landed, so I think you're in the clear in this particular case.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in
before you can comment on or make changes to this bug.
Description
•