Closed
Bug 1304270
Opened 8 years ago
Closed 8 years ago
Intermittent dom/media/tests/mochitest/test_peerConnection_addtrack_removetrack_events.html | Test timed out.
Categories
(Core :: Audio/Video: MediaStreamGraph, defect, P2)
Core
Audio/Video: MediaStreamGraph
Tracking
()
RESOLVED
FIXED
mozilla52
Tracking | Status | |
---|---|---|
firefox50 | --- | unaffected |
firefox51 | --- | fixed |
firefox52 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: mchiang)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
58 bytes,
text/x-review-board-request
|
pehrsons
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
Comment 1•8 years ago
|
||
This looks like test_peerConnection_addSecondVideoStreamNoBundle.html causes the MediaStreamGraph to stall. A stalled MSG is how I interpret a gUM request that never resolves or rejects -- the TracksAvailableCallback never gets notified by the MSG.
Rank: 25
Priority: -- → P2
Updated•8 years ago
|
Component: WebRTC → Audio/Video: MediaStreamGraph
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 7•8 years ago
|
||
Here's a try run where achronop hit this issue with some logging activated (including MediaManager): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffe8095f8150
There's an obvious difference in the shutdown sequence of test_peerConnection_addSecondVideoStreamNoBundle.html between when this error occurs and when it passes.
When it passes the last thing before TEST-START is:
> 10:31:53 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:31:53 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:31:53 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
However when it fails we see:
> 10:32:05 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:32:05 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 1
> 10:32:05 INFO - [Main Thread]: D/MediaManager Listener removed by DOM Destroy(), mFinished = 0
This must be a clue!
Comment 8•8 years ago
|
||
I think the more important clue is this:
Working:
> 10:31:53 INFO - [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Stop(mozilla::SourceMediaStream*, mozilla::TrackID)
> 10:31:53 INFO - [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Deallocate(mozilla::MediaEngineSource::AllocationHandle*)
> 10:31:53 INFO - [Main Thread]: D/MediaManager Listener removed on purpose, mFinished = 0
> 10:31:53 INFO - [Main Thread]: D/MediaManager Sent recording-window-ended for window 243 (outer 12)
> 10:31:53 INFO - [Unnamed thread 0x7f1fa6b15c80]: D/MediaManager Video device 4097 deallocated
> 10:31:53 INFO - [Main Thread]: D/MediaManager OnNavigation for 243
Failing:
> 10:32:05 INFO - [Unnamed thread 0x7f626ccd3c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Stop(mozilla::SourceMediaStream*, mozilla::TrackID)
> 10:32:05 INFO - [Unnamed thread 0x7f626ccd3c80]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Deallocate(mozilla::MediaEngineSource::AllocationHandle*)
> 10:32:05 INFO - [Main Thread]: D/MediaManager Listener removed on purpose, mFinished = 0
> 10:32:05 INFO - [Main Thread]: D/MediaManager Sent recording-window-ended for window 243 (outer 12)
> 10:32:05 INFO - [Main Thread]: D/MediaManager OnNavigation for 243
The video device is never deallocated, even though we call Deallocate()! Looking at the code this must mean we take this path: http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp#149
Comment 9•8 years ago
|
||
Aha! There's a deadlock!
See https://archive.mozilla.org/pub/firefox/try-builds/achronop@gmail.com-ffe8095f815073a3dd2d34acf6a3f3f147696efd/try-linux64/try_ubuntu64_vm_test-mochitest-media-bm130-tests1-linux64-build630.txt.gz
When it times out and crashes we have the following two threads deadlocking each other:
Thread 29 (this explains comment 8) is holding the CamerasSingleton::Mutex() and waiting for the reply monitor at http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/systemservices/CamerasChild.cpp#253:
> 10:55:59 INFO - Thread 29
> 10:55:59 INFO - 0 libpthread-2.15.so + 0xbd84
> 10:55:59 INFO - rax = 0xfffffffffffffe00 rdx = 0x00000000000006a7
> 10:55:59 INFO - rcx = 0xffffffffffffffff rbx = 0x00007f626ccd3c80
> 10:55:59 INFO - rsi = 0x0000000000000080 rdi = 0x00007f626da2b9cc
> 10:55:59 INFO - rbp = 0x00007f626bedd970 rsp = 0x00007f626bedd920
> 10:55:59 INFO - r8 = 0x0000000000000000 r9 = 0x0000000000000353
> 10:55:59 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000246
> 10:55:59 INFO - r12 = 0x00007f626da2b9c0 r13 = 0x00000000ffffffff
> 10:55:59 INFO - r14 = 0x0000000000000005 r15 = 0x00007f626beddc1f
> 10:55:59 INFO - rip = 0x00007f62ae1f8d84
> 10:55:59 INFO - Found by: given as instruction pointer in context
> 10:55:59 INFO - 1 libxul.so!mozilla::CondVar::Wait [CondVar.h:ffe8095f8150 : 79 + 0xb]
> 10:55:59 INFO - rbp = 0x00007f626bedd9a0 rsp = 0x00007f626bedd980
> 10:55:59 INFO - rip = 0x00007f62a3413e65
> 10:55:59 INFO - Found by: previous frame's frame pointer
> 10:55:59 INFO - 2 libxul.so!mozilla::camera::CamerasChild::DispatchToParent [Monitor.h:ffe8095f8150 : 40 + 0xc]
> 10:55:59 INFO - rbx = 0x00007f627675bae0 rbp = 0x00007f626bedd9d0
> 10:55:59 INFO - rsp = 0x00007f626bedd9b0 rip = 0x00007f62a3415df1
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 3 libxul.so!mozilla::camera::LockAndDispatch<int>::LockAndDispatch [CamerasChild.cpp:ffe8095f8150 : 219 + 0x5]
> 10:55:59 INFO - rbx = 0x00007f626bedda20 rbp = 0x00007f626bedda00
> 10:55:59 INFO - rsp = 0x00007f626bedd9e0 r12 = 0x00007f627675bae0
> 10:55:59 INFO - rip = 0x00007f62a34161c5
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 4 libxul.so!mozilla::camera::CamerasChild::ReleaseCaptureDevice [CamerasChild.cpp:ffe8095f8150 : 436 + 0x13]
> 10:55:59 INFO - rbx = 0x00007f6270ac6ec0 rbp = 0x00007f626bedda80
> 10:55:59 INFO - rsp = 0x00007f626bedda10 r12 = 0x00007f626bedda20
> 10:55:59 INFO - rip = 0x00007f62a3416b0b
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 5 libxul.so!mozilla::camera::GetChildAndCall<int (mozilla::camera::CamerasChild::*)(mozilla::camera::CaptureEngine, int), mozilla::camera::CaptureEngine&, int&> [CamerasChild.h:ffe8095f8150 : 137 + 0xd]
> 10:55:59 INFO - rbx = 0x00007f626beddae0 rbp = 0x00007f626beddad0
> 10:55:59 INFO - rsp = 0x00007f626bedda90 r12 = 0x00007f6271a06de4
> 10:55:59 INFO - r13 = 0x00007f6271a06d74 r14 = 0x00007f626beddb98
> 10:55:59 INFO - rip = 0x00007f62a3453da7
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 6 libxul.so!mozilla::MediaEngineRemoteVideoSource::Deallocate [MediaEngineRemoteVideoSource.cpp:ffe8095f8150 : 153 + 0x2a]
> 10:55:59 INFO - rbx = 0x00007f6271a06c00 rbp = 0x00007f626beddb00
> 10:55:59 INFO - rsp = 0x00007f626beddae0 r12 = 0x00007f6291221800
> 10:55:59 INFO - r13 = 0x0000000000000000 r14 = 0x00007f626beddb98
> 10:55:59 INFO - rip = 0x00007f62a345ad66
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 7 libxul.so!mozilla::MediaOperationTask::Run [MediaManager.cpp:ffe8095f8150 : 604 + 0x9]
> 10:55:59 INFO - rbx = 0x00007f6273693800 rbp = 0x00007f626beddb60
> 10:55:59 INFO - rsp = 0x00007f626beddb10 r12 = 0x00007f62773f5640
> 10:55:59 INFO - r13 = 0x0000000000000000 r14 = 0x00007f626beddb98
> 10:55:59 INFO - rip = 0x00007f62a33850ed
> 10:55:59 INFO - Found by: call frame info
Thread 30 This is blocked trying to lock the CamerasSingleton::Mutex() at http://searchfox.org/mozilla-central/rev/767e1e9b118269f957ca1817138472146278a29e/dom/media/systemservices/CamerasChild.cpp#53.
My guess is that this is blocking the runnable that thread 29 has dispatched to CamerasSingleton::Thread(), since this DelayedRunnable runs on CamerasSingleton::Thread() as well.:
> 10:55:59 INFO - Thread 30
> 10:55:59 INFO - 0 libpthread-2.15.so + 0xe89c
> 10:55:59 INFO - rax = 0xfffffffffffffe00 rdx = 0x0000000000000002
> 10:55:59 INFO - rcx = 0xffffffffffffffff rbx = 0x00007f626ce68c10
> 10:55:59 INFO - rsi = 0x0000000000000080 rdi = 0x00007f626ce68c10
> 10:55:59 INFO - rbp = 0x00007f626b6dcb60 rsp = 0x00007f626b6dcaa8
> 10:55:59 INFO - r8 = 0x00007f626ce68c10 r9 = 0x000000000000152a
> 10:55:59 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000202
> 10:55:59 INFO - r12 = 0x00007f626cce5380 r13 = 0x00007f626dab6e00
> 10:55:59 INFO - r14 = 0x00000265b1651c08 r15 = 0x00007f626f38b548
> 10:55:59 INFO - rip = 0x00007f62ae1fb89c
> 10:55:59 INFO - Found by: given as instruction pointer in context
> 10:55:59 INFO - 1 libxul.so!mozilla::camera::FakeOnDeviceChangeEventRunnable::Run [Mutex.h:ffe8095f8150 : 164 + 0x5]
> 10:55:59 INFO - rbp = 0x00007f626b6dcba0 rsp = 0x00007f626b6dcb70
> 10:55:59 INFO - rip = 0x00007f62a341a2f4
> 10:55:59 INFO - Found by: previous frame's frame pointer
> 10:55:59 INFO - 2 libxul.so!DelayedRunnable::DoRun [nsThread.cpp:ffe8095f8150 : 260 + 0x9]
> 10:55:59 INFO - rbx = 0x00007f626f38b510 rbp = 0x00007f626b6dcbd0
> 10:55:59 INFO - rsp = 0x00007f626b6dcbb0 r12 = 0x00007f626cdc8410
> 10:55:59 INFO - r13 = 0x0000000000000001 rip = 0x00007f62a22e203f
> 10:55:59 INFO - Found by: call frame info
> 10:55:59 INFO - 3 libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:ffe8095f8150 : 524 + 0xd]
> 10:55:59 INFO - rbx = 0x00007f626f38b510 rbp = 0x00007f626b6dcc40
> 10:55:59 INFO - rsp = 0x00007f626b6dcbe0 r12 = 0x00007f626cdc8410
> 10:55:59 INFO - r13 = 0x0000000000000001 rip = 0x00007f62a22ee800
> 10:55:59 INFO - Found by: call frame info
Munro, FakeOnDeviceChangeEventRunnable appears to be your code. Could you take a look?
Flags: needinfo?(mchiang)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → mchiang
Flags: needinfo?(mchiang)
Updated•8 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 12•8 years ago
|
||
Comment hidden (mozreview-request) |
Assignee | ||
Comment 14•8 years ago
|
||
Comment hidden (mozreview-request) |
Comment 16•8 years ago
|
||
mozreview-review |
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;
https://reviewboard.mozilla.org/r/82268/#review80914
LGTM
Attachment #8796397 -
Flags: review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 17•8 years ago
|
||
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/02b1f8c28483
Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock. r=pehrsons
Keywords: checkin-needed
Comment hidden (Intermittent Failures Robot) |
Comment 19•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Updated•8 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Attachment #8796397 -
Flags: review?(jib)
Comment 26•8 years ago
|
||
51 is affected for Mac. See bug 1299166. Munro can you request uplift?
Flags: needinfo?(mchiang)
Assignee | ||
Comment 27•8 years ago
|
||
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;
Approval Request Comment
[Feature/regressing bug #]: Bug 1304270
[User impact if declined]: Several mochitests failed due to deadlock
[Describe test coverage new/current, TreeHerder]:Covered by existing unit tests.
[Risks and why]: Low risk. This patch only addresses the deadlock root cause.
[String/UUID change made/needed]: None.
Flags: needinfo?(mchiang)
Attachment #8796397 -
Flags: approval-mozilla-aurora?
Comment 28•8 years ago
|
||
This is responsible for 4 or 5 of the top 20 oranges; we definitely want this.
Comment 29•8 years ago
|
||
Comment on attachment 8796397 [details]
Bug 1304270 - Run FakeOnDeviceChangeEventRunnable on a dedicated thread rather than Camera IPC thread to prevent deadlock;
Hi :mchiang,
Usually, the Feature/regressing bug in uplift request template should not be the bug itself. I'll take the bug this time to make life easier. Next time, please fill the correct bug in the template.
Fix an intermittent test error. Take it in 51 aurora.
Flags: needinfo?(mchiang)
Attachment #8796397 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 30•8 years ago
|
||
oops!
Correct the request template.
Thanks for the reminding!
Approval Request Comment
[Feature/regressing bug #]: Bug 1299166
[User impact if declined]: Several mochitests failed due to deadlock
[Describe test coverage new/current, TreeHerder]:Covered by existing unit tests.
[Risks and why]: Low risk. This patch only addresses the deadlock root cause.
[String/UUID change made/needed]: None.
Flags: needinfo?(mchiang)
Comment 31•8 years ago
|
||
Bug 1286429 is the regressing bug.
Comment 32•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•