Closed Bug 1209987 Opened 9 years ago Closed 9 years ago

webrtc.org Engine creation and destruction should happen on the WebRTC threads

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla44

Tracking Flags:

Tracking

Status

firefox44

---

fixed

People

(Reporter: gcp, Assigned: gcp)

Details

Attachments

(2 files, 3 obsolete files)

webrtc.org Engine creation and destruction should happen on the WebRTC threads 9 years ago Gian-Carlo Pascutto [:gcp] 22.17 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
webrtc.org Engine creation and destruction should happen on the WebRTC threads 9 years ago Gian-Carlo Pascutto [:gcp] 21.50 KB, patch		Details \| Diff \| Splinter Review
webrtc.org Engine creation and destruction should happen on the WebRTC threads 9 years ago Gian-Carlo Pascutto [:gcp] 26.34 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
Assert that Cameras are not used on the main thread 9 years ago Gian-Carlo Pascutto [:gcp] 1.02 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
webrtc.org Engine creation and destruction should happen on the WebRTC threads 9 years ago Gian-Carlo Pascutto [:gcp] 30.49 KB, patch	jesup : review+	Details \| Diff \| Splinter Review

Gian-Carlo Pascutto [:gcp]

Assignee

Description

•

9 years ago

<jesup> gcp: The CamerasParent code blows up new thread-safety checks added in 43... :-( It verifies that you're calling AllocateCaptureDevice() from the ViECaptureThread... and we're calling it from a CamerasParent thread (VideoCapture) <jesup> gcp: I think module_process_thread_ is created in ViESharedData::ViESharedData() (via ProcessThread::Create()) <jesup> yes, pbackground: CamerasParent.cpp:275 helper->mEngine = webrtc::VideoEngine::Create(helper->mConfig); What this means is that the creation and destruction of the webrtc.org engines should be done on the same thread as the one the actual calls are dispatched to. We currently don't do this, consequently they're on PBackground: https://dxr.mozilla.org/mozilla-central/source/dom/media/systemservices/CamerasParent.cpp#236

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

9 years ago

Assignee: nobody → gpascutto

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 1

•

9 years ago

Attached patch webrtc.org Engine creation and destruction should happen on the WebRTC threads (obsolete) — Details — Splinter Review

This also slightly simplifies the shutdown sequence to make it clear what happens where. Quite a bit of thread and ref juggling but no way around that.

Attachment #8670739 - Flags: review?(rjesup)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 2

•

9 years ago

So it looks like we can get rid of all the locking if we can run everything on the WebRTC thread. Unfortunately there's this: https://bugzilla.mozilla.org/show_bug.cgi?id=1070216#c32 I'm going to see if that is still needed.

Randell Jesup [:jesup] (needinfo me)

Comment 3

•

9 years ago

Comment on attachment 8670739 [details] [diff] [review] webrtc.org Engine creation and destruction should happen on the WebRTC threads Review of attachment 8670739 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/systemservices/CamerasParent.cpp @@ +513,5 @@ > CamerasParent::RecvGetCaptureDevice(const int& aCapEngine, > const int& aListNumber) > { > LOG((__PRETTY_FUNCTION__)); > + LOG(("RecvGetCaptureDevice")); redundant @@ +575,3 @@ > int error = -1; > + if (!self->EnsureInitialized(aCapEngine)) { > + LOG(("Fails to initialize")); How about (for all of these) LOG(("Failed to initialize in " __FUNCTION__)) or "Failed to initialize in CamerasParent" @@ +744,5 @@ > + self->mEngines[aCapEngine].mEngineIsRunning = false; > + } > + } > + MutexAutoLock lock(self->mCallbackMutex); > + for (unsigned int i = 0; i < self->mCallbacks.Length(); i++) { size_t perhaps @@ +851,5 @@ > + > + // We need to shut down the video capture thread, but in > + // all likelyhood we're running on it (when the runnable > + // in ActorDestroy derefs self). Dispatch the stop to the main > + // thread (PBackground which started it is likely gone by now). Thread shutdown has to occur from the starting thread I think, at least on some OSes (check with froyd)

Attachment #8670739 - Flags: review?(rjesup) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 4

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c6f62b302aa

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 5

•

9 years ago

Attached patch webrtc.org Engine creation and destruction should happen on the WebRTC threads (obsolete) — Details — Splinter Review

Removed locks. Addressed review comments.

Attachment #8670813 - Flags: review?(rjesup)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

9 years ago

Attachment #8670739 - Attachment is obsolete: true

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 6

•

9 years ago

(In reply to Randell Jesup [:jesup] from comment #3) > Thread shutdown has to occur from the starting thread I think, at least on > some OSes (check with froyd) The VideoCaptureThread (base::thread) is created when PBackground creates the CamerasParent class. I presume that (the creation) is on the PBackground thread. When ActorDestroy gets called, we need to shut down the WebRTC stuff on the VideoCaptureThread. To do this the relevant shutdown runnable needs to take a reference to CamerasParent, and when the shutdown sequence chair-dance ends, it's going to be the one holding the last ref to it because PBackground is gone by then. So ~CamerasParent is now invoked on the VideoCaptureThread and somehow has to get rid of itself. Oops. This is tricky. I can think of 3 ways (that I dare post here): 1) Synchronously dispatch the WebRTC shutdown tasks to the VideoCaptureThread, while sitting on the PBackground thread in ActorDestroy, and then do the destruction of the thread on PBackground. This is a little evil, and I haven't tried if it works, but I suspect it might. This causes PBackground to block on WebRTC shutdown. 2) Do the thread creation on PBackground, and in the destructor Dispatch the thread stop/deletion to the mainthread. This is the cleanest. But as stated above jesup isn't sure it's allowed, I'm not either, and froydnj responded "dare I ask why you even want to know this?" 3) Somehow do the thread creation on mainthread. You really want to have the thread up by the time the CamerasParent ctr finishes though, because otherwise all the incoming PBackground traffic must check if that thread is up (and hold if it's not done yet? that brings us back to 1).

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 7

•

9 years ago

Attached patch webrtc.org Engine creation and destruction should happen on the WebRTC threads (obsolete) — Details — Splinter Review

I ended up implementing (3). I think it's what I'll least regret later.

Attachment #8670839 - Flags: review?(rjesup)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

9 years ago

Attachment #8670813 - Attachment is obsolete: true

Attachment #8670813 - Flags: review?(rjesup)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 8

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a8bcb09cfac4

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 9

•

9 years ago

Attached patch Assert that Cameras are not used on the main thread — Details — Splinter Review

Need to test how much (hopefully nothing) breaks after bug Bug 1070216 lands.

Attachment #8670865 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

9 years ago

Comment on attachment 8670839 [details] [diff] [review] webrtc.org Engine creation and destruction should happen on the WebRTC threads Review of attachment 8670839 [details] [diff] [review]: ----------------------------------------------------------------- r+ with the mVideoCaptureThread fully within the monitor lock ::: dom/media/systemservices/CamerasParent.cpp @@ +629,2 @@ > cbh = self->mCallbacks.AppendElement( > + new CallbackHelper(static_cast<CaptureEngine>(aCapEngine), capnum, self)); Is this just extra indent? @@ +812,5 @@ > + if (mVideoCaptureThread) { > + base::Thread *thread = mVideoCaptureThread; > + { > + MonitorAutoLock lock(mThreadMonitor); > + mVideoCaptureThread = nullptr; *thread = mVideoCaptureThread; That should be within the lock, no? And the test. If mVideoCaptureThread is referenced from multiple threads (and the lock says it is), then all access must be under lock or otherwise guaranteed not to have TSAN issues -- or make it Atomic, which might be a pain perhaps.

Attachment #8670839 - Flags: review?(rjesup) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 11

•

9 years ago

(In reply to Randell Jesup [:jesup] from comment #10) > @@ +812,5 @@ > > + if (mVideoCaptureThread) { > > + base::Thread *thread = mVideoCaptureThread; > > + { > > + MonitorAutoLock lock(mThreadMonitor); > > + mVideoCaptureThread = nullptr; > > *thread = mVideoCaptureThread; > That should be within the lock, no? And the test. > If mVideoCaptureThread is referenced from multiple threads (and the lock > says it is), then all access must be under lock or otherwise guaranteed not > to have TSAN issues -- or make it Atomic, which might be a pain perhaps. Well, the thread that this is on is the only one writing to it, so I think the *reads* in the same thread can safely go outside the lock.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 12

•

9 years ago

Failing on try: 10:21:34 INFO - [1882] ###!!! ASSERTION: Failed NS_DispatchToMainThread() in shutdown; leaking: 'false', file /builds/slave/try-lx-d-000000000000000000000/build/src/xpcom/glue/nsThreadUtils.cpp, line 192 10:21:34 INFO - #01: NS_DispatchToMainThread(already_AddRefed<nsIRunnable>&&, unsigned int) [xpcom/glue/nsThreadUtils.cpp:192] 10:21:34 INFO - #02: NS_DispatchToMainThread(nsIRunnable*, unsigned int) [xpcom/glue/nsThreadUtils.cpp:209] 10:21:34 INFO - #03: mozilla::camera::CamerasParent::~CamerasParent() [dom/media/systemservices/CamerasParent.cpp:826] 10:21:34 INFO - #04: mozilla::camera::CamerasParent::~CamerasParent() [memory/mozalloc/mozalloc.h:210] 10:21:34 INFO - #05: mozilla::camera::CamerasParent::Release() [dom/media/systemservices/CamerasParent.h:78] 10:21:34 INFO - #06: mozilla::media::LambdaRunnable<mozilla::camera::CamerasParent::ActorDestroy(mozilla::ipc::IProtocolManager<mozilla::ipc::IProtocol>::ActorDestroyReason)::<lambda()> >::~LambdaRunnable [xpcom/glue/nsThreadUtils.h:230] The comments in nsThreadUtils.cpp seem to suggest that this Is A Bad Idea.

Randell Jesup [:jesup] (needinfo me)

Updated

•

9 years ago

Attachment #8670865 - Flags: review?(rjesup) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 13

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a18586ee9e75

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 14

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f016ec97798e

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 15

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fda391a95a38

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 16

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dfc0c051845

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 17

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f433098d038

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 18

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3827e276f44

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 19

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=99d05c4dfbee

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 20

•

9 years ago

Attached patch webrtc.org Engine creation and destruction should happen on the WebRTC threads — Details — Splinter Review

Sorry for yet another review, but as you can see above all the past approaches didn't actually work properly. We now observe for xpcom-shutdown, dispatch the WebRTC engine closing to the WebRTC thread, and sit on a monitor (i.e. it's a blocking, sync dispatch) for that to finish, after which we dispatch the WebRTC thread shutdown to the main thread (which might not be a real dispatch if the shutdown came from xpcom-shutdown since we're in the main thread then, but we must also handle the case where PBackground shuts down our actor). We must block hard because once in xpcom-shutdown, the runnable in the WebRTC thread might not get a chance to dispatch the thread close to the main thread before we're in xpcom-shutdown-threads, at which point the main thread is no longer accessible, causing an inevitable thread leak. The proper machinery has been added to distinguish the two additional states, i.e. "WebRTC thread still being spun up" and "WebRTC is being shut down, thread may go away soon".

Attachment #8671955 - Flags: review?(rjesup)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

9 years ago

Attachment #8670839 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Comment 21

•

9 years ago

Comment on attachment 8671955 [details] [diff] [review] webrtc.org Engine creation and destruction should happen on the WebRTC threads Review of attachment 8671955 [details] [diff] [review]: ----------------------------------------------------------------- r+ assuming there's no issue with the booleans and RemoveObserver() is added (or point out that it's not needed for some reason), but we may be introducing races depending on the answer about the booleans. Likely that requires at least Atomic<> or a lock. ::: dom/media/systemservices/CamerasParent.cpp @@ +151,5 @@ > + const char16_t *aData) > +{ > + MOZ_ASSERT(!strcmp(aTopic, NS_XPCOM_WILL_SHUTDOWN_OBSERVER_ID)); > + StopVideoCapture(); > + return NS_OK; Don't you need to remove the Observer here? @@ +399,5 @@ > + LOG(("Being closed down while engine %d is running!", i)); > + } > + if (mEngines[i].mPtrViERender) { > + mEngines[i].mPtrViERender->Release(); > + mEngines[i].mPtrViERender = nullptr; Perhaps not critical for this patch landing (and an existing issue), but wouldn't it be easier/safer to use ScopedCustomReleasePtr<webrtc::ViEBase>/etc like VideoConduit/AudioConduit do? Then these become ...mPtrViERender = nullptr;/etc ::: dom/media/systemservices/CamerasParent.h @@ +142,5 @@ > > // Shutdown handling > bool mChildIsAlive; > bool mDestroyed; > + bool mWebRTCAlive; What's the thread access story for these three? What threads can access? Is there locking? (I think no) Perhaps these should be some form of Atomic<bool> (not sure the level of consistency across the set needed)

Attachment #8671955 - Flags: review?(rjesup) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 22

•

9 years ago

(In reply to Randell Jesup [:jesup] from comment #21) > > // Shutdown handling > > bool mChildIsAlive; > > bool mDestroyed; > > + bool mWebRTCAlive; > > What's the thread access story for these three? What threads can access? Is > there locking? (I think no) Perhaps these should be some form of > Atomic<bool> (not sure the level of consistency across the set needed) mChildIsAlive and mDestroyed should be PBackground-only vars. It looks like mWebRTCAlive needs a memory barrier in EnsureInitialized (the other access are under the monitor), so making it Atomic is probably easiest.

Randell Jesup [:jesup] (needinfo me)

Comment 23

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #11) > (In reply to Randell Jesup [:jesup] from comment #10) > > @@ +812,5 @@ > > > + if (mVideoCaptureThread) { > > > + base::Thread *thread = mVideoCaptureThread; > > > + { > > > + MonitorAutoLock lock(mThreadMonitor); > > > + mVideoCaptureThread = nullptr; > > > > *thread = mVideoCaptureThread; > > That should be within the lock, no? And the test. > > If mVideoCaptureThread is referenced from multiple threads (and the lock > > says it is), then all access must be under lock or otherwise guaranteed not > > to have TSAN issues -- or make it Atomic, which might be a pain perhaps. > > Well, the thread that this is on is the only one writing to it, so I think > the *reads* in the same thread can safely go outside the lock. Actually, one of the horrors of TSAN is that this is wrong. Reads from another thread can corrupt the value. Really. https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/ and the lastr example from https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 24

•

9 years ago

(In reply to Randell Jesup [:jesup] from comment #23) > Actually, one of the horrors of TSAN is that this is wrong. Reads from > another thread can corrupt the value. Really. > https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/ > and the lastr example from > https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what- > could-possibly-go-wrong The relevant ones only apply to stack vars, not in-memory ones, which is guaranteed here. Of course there's the catch-all undefined behavior issue. Note that the last patch already got rid of that code.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 25

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=894263233ee3

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 26

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/fc7d1acf804f05db4c6cea3f480cbc6bccbce153 Bug 1209987 - webrtc.org Engine creation and destruction should happen on the WebRTC threads. r=jesup

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 27

•

9 years ago

Backed out for leaks like https://treeherder.mozilla.org/logviewer.html#?job_id=15554576&repo=mozilla-inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/730bd2d9fe31

Flags: needinfo?(gpascutto)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 28

•

9 years ago

Hrm, this happened twice in a row, and backfilling the gap behind the first instance showed this was in fact the first instance. But then it didn't fail on a new push several pushes later (which didn't finish until I had already backed this out, assuming it was permafailing). So maybe this isn't at fault. doing a bunch of retriggers to see if I can get it to happen earlier.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 29

•

9 years ago

This code has a large risk of causing leaks (see all try pushes above), but the current implementation didn't leak on try, and the leak in the logs seems to originate from unrelated audio code.

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 30

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9c88cbf86ea362ddcfa0ae692caa538aaa120640 Bug 1209987 - webrtc.org Engine creation and destruction should happen on the WebRTC threads. r=jesup

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 31

•

9 years ago

Relanded in agreement with the sheriffs. <nigelb|sheriffduty> Out of 8 test runs, only 2 were orange. It's possible this modifies shutdown timing so it's exposing bugs in other code.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 32

•

9 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #31) > Relanded in agreement with the sheriffs. > <nigelb|sheriffduty> Out of 8 test runs, only 2 were orange. > > It's possible this modifies shutdown timing so it's exposing bugs in other > code. I did a bunch of retriggers on pushes near where you landed, and it leaked just like that linked one well before you landed, so I think you're in the clear in this particular case.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 33

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/9c88cbf86ea3

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox44: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla44

You need to log in before you can comment on or make changes to this bug.