Closed Bug 1304270 Opened 8 years ago Closed 8 years ago

Intermittent dom/media/tests/mochitest/test_peerConnection_addtrack_removetrack_events.html | Test timed out.

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 --- unaffected
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mchiang)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

This looks like test_peerConnection_addSecondVideoStreamNoBundle.html causes the MediaStreamGraph to stall. A stalled MSG is how I interpret a gUM request that never resolves or rejects -- the TracksAvailableCallback never gets notified by the MSG.
Rank: 25
Priority: -- → P2
Component: WebRTC → Audio/Video: MediaStreamGraph
Here's a try run where achronop hit this issue with some logging activated (including MediaManager): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffe8095f8150

There's an obvious difference in the shutdown sequence of test_peerConnection_addSecondVideoStreamNoBundle.html between when this error occurs and when it passes.

When it passes the last thing before TEST-START is:
> 10:31:53     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:31:53     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:31:53     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1

However when it fails we see:
> 10:32:05     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:32:05     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:32:05     INFO -  [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 0


This must be a clue!
I think the more important clue is this:

Working:
> 10:31:53     INFO -  [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Stop(mozilla::SourceMediaStream*, mozilla::TrackID)
> 10:31:53     INFO -  [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Deallocate(mozilla::MediaEngineSource::AllocationHandle*)
> 10:31:53     INFO -  [Main Thread]: D/MediaManager Listener removed on purpose, mFinished = 0
> 10:31:53     INFO -  [Main Thread]: D/MediaManager Sent recording-window-ended for window 243 (outer 12)
> 10:31:53     INFO -  [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager Video device 4097 deallocated
> 10:31:53     INFO -  [Main Thread]: D/MediaManager OnNavigation for 243

Failing:
> 10:32:05     INFO -  [Unnamed thread 0x7f626ccd3c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Stop(mozilla::SourceMediaStream*, mozilla::TrackID)
> 10:32:05     INFO -  [Unnamed thread 0x7f626ccd3c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Deallocate(mozilla::MediaEngineSource::AllocationHandle*)
> 10:32:05     INFO -  [Main Thread]: D/MediaManager Listener removed on purpose, mFinished = 0
> 10:32:05     INFO -  [Main Thread]: D/MediaManager Sent recording-window-ended for window 243 (outer 12)
> 10:32:05     INFO -  [Main Thread]: D/MediaManager OnNavigation for 243

The video device is never deallocated, even though we call Deallocate()! Looking at the code this must mean we take this path: http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp#149
Aha! There's a deadlock!

See https://archive.mozilla.org/pub/firefox/try-builds/achronop@gmail.com-ffe8095f815073a3dd2d34acf6a3f3f147696efd/try-linux64/try_ubuntu64_vm_test-mochitest-media-bm130-tests1-linux64-build630.txt.gz

When it times out and crashes we have the following two threads deadlocking each other:

Thread 29 (this explains comment 8) is holding the CamerasSingleton::Mutex() and waiting for the reply monitor at http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/systemservices/CamerasChild.cpp#253:
> 10:55:59     INFO -  Thread 29
> 10:55:59     INFO -   0  libpthread-2.15.so + 0xbd84
> 10:55:59     INFO -      rax = 0xfffffffffffffe00   rdx = 0x00000000000006a7
> 10:55:59     INFO -      rcx = 0xffffffffffffffff   rbx = 0x00007f626ccd3c80
> 10:55:59     INFO -      rsi = 0x0000000000000080   rdi = 0x00007f626da2b9cc
> 10:55:59     INFO -      rbp = 0x00007f626bedd970   rsp = 0x00007f626bedd920
> 10:55:59     INFO -       r8 = 0x0000000000000000    r9 = 0x0000000000000353
> 10:55:59     INFO -      r10 = 0x0000000000000000   r11 = 0x0000000000000246
> 10:55:59     INFO -      r12 = 0x00007f626da2b9c0   r13 = 0x00000000ffffffff
> 10:55:59     INFO -      r14 = 0x0000000000000005   r15 = 0x00007f626beddc1f
> 10:55:59     INFO -      rip = 0x00007f62ae1f8d84
> 10:55:59     INFO -      Found by: given as instruction pointer in context
> 10:55:59     INFO -   1  libxul.so!mozilla::CondVar::Wait [CondVar.h:ffe8095f8150 : 79 + 0xb]
> 10:55:59     INFO -      rbp = 0x00007f626bedd9a0   rsp = 0x00007f626bedd980
> 10:55:59     INFO -      rip = 0x00007f62a3413e65
> 10:55:59     INFO -      Found by: previous frame's frame pointer
> 10:55:59     INFO -   2  libxul.so!mozilla::camera::CamerasChild::DispatchToParent [Monitor.h:ffe8095f8150 : 40 + 0xc]
> 10:55:59     INFO -      rbx = 0x00007f627675bae0   rbp = 0x00007f626bedd9d0
> 10:55:59     INFO -      rsp = 0x00007f626bedd9b0   rip = 0x00007f62a3415df1
> 10:55:59     INFO -      Found by: call frame info
> 10:55:59     INFO -   3  libxul.so!mozilla::camera::LockAndDispatch<int>::LockAndDispatch [CamerasChild.cpp:ffe8095f8150 : 219 + 0x5]
> 10:55:59     INFO -      rbx = 0x00007f626bedda20   rbp = 0x00007f626bedda00
> 10:55:59     INFO -      rsp = 0x00007f626bedd9e0   r12 = 0x00007f627675bae0
> 10:55:59     INFO -      rip = 0x00007f62a34161c5
> 10:55:59     INFO -      Found by: call frame info
> 10:55:59     INFO -   4  libxul.so!mozilla::camera::CamerasChild::ReleaseCaptureDevice [CamerasChild.cpp:ffe8095f8150 : 436 + 0x13]
> 10:55:59     INFO -      rbx = 0x00007f6270ac6ec0   rbp = 0x00007f626bedda80
> 10:55:59     INFO -      rsp = 0x00007f626bedda10   r12 = 0x00007f626bedda20
> 10:55:59     INFO -      rip = 0x00007f62a3416b0b 
> 10:55:59     INFO -      Found by: call frame info
> 10:55:59     INFO -   5  libxul.so!mozilla::camera::GetChildAndCall<int (mozilla::camera::CamerasChild::*)(mozilla::camera::CaptureEngine, int), mozilla::camera::CaptureEngine&, int&> [CamerasChild.h:ffe8095f8150 : 137 + 0xd]
> 10:55:59     INFO -      rbx = 0x00007f626beddae0   rbp = 0x00007f626beddad0
> 10:55:59     INFO -      rsp = 0x00007f626bedda90   r12 = 0x00007f6271a06de4
> 10:55:59     INFO -      r13 = 0x00007f6271a06d74   r14 = 0x00007f626beddb98
> 10:55:59     INFO -      rip = 0x00007f62a3453da7
> 10:55:59     INFO -      Found by: call frame info 
> 10:55:59     INFO -   6  libxul.so!mozilla::MediaEngineRemoteVideoSource::Deallocate [MediaEngineRemoteVideoSource.cpp:ffe8095f8150 : 153 + 0x2a]
> 10:55:59     INFO -      rbx = 0x00007f6271a06c00   rbp = 0x00007f626beddb00
> 10:55:59     INFO -      rsp = 0x00007f626beddae0   r12 = 0x00007f6291221800
> 10:55:59     INFO -      r13 = 0x0000000000000000   r14 = 0x00007f626beddb98
> 10:55:59     INFO -      rip = 0x00007f62a345ad66
> 10:55:59     INFO -      Found by: call frame info
> 10:55:59     INFO -   7  libxul.so!mozilla::MediaOperationTask::Run [MediaManager.cpp:ffe8095f8150 : 604 + 0x9]
> 10:55:59     INFO -      rbx = 0x00007f6273693800   rbp = 0x00007f626beddb60
> 10:55:59     INFO -      rsp = 0x00007f626beddb10   r12 = 0x00007f62773f5640
> 10:55:59     INFO -      r13 = 0x0000000000000000   r14 = 0x00007f626beddb98
> 10:55:59     INFO -      rip = 0x00007f62a33850ed
> 10:55:59     INFO -      Found by: call frame info

Thread 30 This is blocked trying to lock the CamerasSingleton::Mutex() at http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/systemservices/CamerasChild.cpp#53.
My guess is that this is blocking the runnable that thread 29 has dispatched to CamerasSingleton::Thread(), since this DelayedRunnable runs on CamerasSingleton::Thread() as well.:
> 10:55:59     INFO -  Thread 30
> 10:55:59     INFO -   0  libpthread-2.15.so + 0xe89c
> 10:55:59     INFO -      rax = 0xfffffffffffffe00   rdx = 0x0000000000000002
> 10:55:59     INFO -      rcx = 0xffffffffffffffff   rbx = 0x00007f626ce68c10
> 10:55:59     INFO -      rsi = 0x0000000000000080   rdi = 0x00007f626ce68c10
> 10:55:59     INFO -      rbp = 0x00007f626b6dcb60   rsp = 0x00007f626b6dcaa8
> 10:55:59     INFO -       r8 = 0x00007f626ce68c10    r9 = 0x000000000000152a
> 10:55:59     INFO -      r10 = 0x0000000000000000   r11 = 0x0000000000000202
> 10:55:59     INFO -      r12 = 0x00007f626cce5380   r13 = 0x00007f626dab6e00
> 10:55:59     INFO -      r14 = 0x00000265b1651c08   r15 = 0x00007f626f38b548
> 10:55:59     INFO -      rip = 0x00007f62ae1fb89c
> 10:55:59     INFO -      Found by: given as instruction pointer in context
> 10:55:59     INFO -   1  libxul.so!mozilla::camera::FakeOnDeviceChangeEventRunnable::Run [Mutex.h:ffe8095f8150 : 164 + 0x5]
> 10:55:59     INFO -      rbp = 0x00007f626b6dcba0   rsp = 0x00007f626b6dcb70
> 10:55:59     INFO -      rip = 0x00007f62a341a2f4
> 10:55:59     INFO -      Found by: previous frame's frame pointer
> 10:55:59     INFO -   2  libxul.so!DelayedRunnable::DoRun [nsThread.cpp:ffe8095f8150 : 260 + 0x9]
> 10:55:59     INFO -      rbx = 0x00007f626f38b510   rbp = 0x00007f626b6dcbd0
> 10:55:59     INFO -      rsp = 0x00007f626b6dcbb0   r12 = 0x00007f626cdc8410
> 10:55:59     INFO -      r13 = 0x0000000000000001   rip = 0x00007f62a22e203f
> 10:55:59     INFO -      Found by: call frame info
> 10:55:59     INFO -   3  libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:ffe8095f8150 : 524 + 0xd]
> 10:55:59     INFO -      rbx = 0x00007f626f38b510   rbp = 0x00007f626b6dcc40
> 10:55:59     INFO -      rsp = 0x00007f626b6dcbe0   r12 = 0x00007f626cdc8410
> 10:55:59     INFO -      r13 = 0x0000000000000001   rip = 0x00007f62a22ee800
> 10:55:59     INFO -      Found by: call frame info


Munro, FakeOnDeviceChangeEventRunnable appears to be your code. Could you take a look?
Flags: needinfo?(mchiang)
Assignee: nobody → mchiang
Flags: needinfo?(mchiang)
No longer blocks: 1286429
Depends on: 1286429
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;

https://reviewboard.mozilla.org/r/82268/#review80914

LGTM
Attachment #8796397 - Flags: review+
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/02b1f8c28483
Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock. r=pehrsons
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/02b1f8c28483
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1299166
51 is affected for Mac. See bug 1299166. Munro can you request uplift?
Flags: needinfo?(mchiang)
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;

Approval Request Comment
[Feature/regressing bug #]: Bug 1304270
[User impact if declined]: Several mochitests failed due to deadlock
[Describe test coverage new/current, TreeHerder]:Covered by existing unit tests.
[Risks and why]: Low risk. This patch only addresses the deadlock root cause.
[String/UUID change made/needed]: None.
Flags: needinfo?(mchiang)
Attachment #8796397 - Flags: approval-mozilla-aurora?
This is responsible for 4 or 5 of the top 20 oranges; we definitely want this.
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;

Hi :mchiang, 
Usually, the Feature/regressing bug in uplift request template should not be the bug itself. I'll take the bug this time to make life easier. Next time, please fill the correct bug in the template.

Fix an intermittent test error. Take it in 51 aurora.
Flags: needinfo?(mchiang)
Attachment #8796397 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
oops!

Correct the request template.
Thanks for the reminding!

Approval Request Comment
[Feature/regressing bug #]: Bug 1299166
[User impact if declined]: Several mochitests failed due to deadlock
[Describe test coverage new/current, TreeHerder]:Covered by existing unit tests.
[Risks and why]: Low risk. This patch only addresses the deadlock root cause.
[String/UUID change made/needed]: None.
Flags: needinfo?(mchiang)
Bug 1286429 is the regressing bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: