Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 330/370 seconds with no output

RESOLVED FIXED in Firefox 60

Status

()

Core
WebRTC
P2
normal
Rank:
16
RESOLVED FIXED
a year ago
2 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: bwc)

Tracking

({intermittent-failure})

unspecified
mozilla61
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox59 wontfix, firefox60 fixed, firefox61 fixed)

Details

(Whiteboard: [stockwell fixed:product])

Perhaps fallout from bug 1341285?
Rank: 16
Priority: -- → P1

Comment 2

a year ago
1 failures in 892 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-06-19&endday=2017-06-25&tree=all

Comment 3

a year ago
7 failures in 718 pushes (0.01 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 3
* mozilla-inbound: 2
* try: 1
* pine: 1

Platform breakdown:
* linux64: 4
* linux32: 2
* linux64-qr: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-06-26&endday=2017-07-02&tree=all

Comment 4

a year ago
2 failures in 656 pushes (0.003 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 2

Platform breakdown:
* linux32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-07-03&endday=2017-07-09&tree=all

Comment 5

11 months ago
8 failures in 720 pushes (0.011 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 5
* autoland: 3

Platform breakdown:
* linux64: 4
* linux64-qr: 2
* windows8-64: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-07-10&endday=2017-07-16&tree=all

Comment 6

11 months ago
2 failures in 1008 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 2

Platform breakdown:
* linux64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-07-24&endday=2017-07-30&tree=all

Comment 7

11 months ago
2 failures in 888 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1
* autoland: 1

Platform breakdown:
* windows7-32: 1
* linux64-qr: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-07-31&endday=2017-08-06&tree=all

Comment 8

10 months ago
4 failures in 949 pushes (0.004 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 2
* mozilla-inbound: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 2
* linux64-stylo: 1
* linux64-qr: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-08-14&endday=2017-08-20&tree=all

Comment 9

10 months ago
4 failures in 908 pushes (0.004 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 2
* mozilla-inbound: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 2
* linux64-nightly: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-08-21&endday=2017-08-27&tree=all

Comment 10

10 months ago
3 failures in 939 pushes (0.003 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 2
* mozilla-central: 1

Platform breakdown:
* linux64-stylo: 1
* linux64: 1
* linux32-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-08-28&endday=2017-09-03&tree=all

Comment 11

10 months ago
3 failures in 924 pushes (0.003 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 2
* try: 1

Platform breakdown:
* windows8-64: 1
* linux64: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-09-04&endday=2017-09-10&tree=all
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2

Comment 13

9 months ago
7 failures in 1032 pushes (0.007 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 3
* mozilla-inbound: 2
* mozilla-release: 1
* mozilla-central: 1

Platform breakdown:
* windows7-32: 2
* linux64: 2
* windows7-32-stylo-disabled: 1
* linux64-stylo-disabled: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-09-11&endday=2017-09-17&tree=all

Comment 14

9 months ago
2 failures in 943 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* autoland: 1

Platform breakdown:
* linux64: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-09-18&endday=2017-09-24&tree=all

Comment 15

9 months ago
2 failures in 885 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* mozilla-central: 1

Platform breakdown:
* linux64-ccov: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-09-25&endday=2017-10-01&tree=all

Comment 16

9 months ago
2 failures in 824 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* mozilla-beta: 1

Platform breakdown:
* windows7-32: 1
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-10-02&endday=2017-10-08&tree=all

Comment 17

8 months ago
4 failures in 947 pushes (0.004 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1
* mozilla-central: 1
* mozilla-beta: 1
* autoland: 1

Platform breakdown:
* linux64: 2
* linux64-qr: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-10-09&endday=2017-10-15&tree=all

Comment 18

8 months ago
3 failures in 864 pushes (0.003 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 3

Platform breakdown:
* windows10-64: 1
* linux64: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-10-16&endday=2017-10-22&tree=all

Comment 19

8 months ago
3 failures in 912 pushes (0.003 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 2
* mozilla-inbound: 1

Platform breakdown:
* linux64: 2
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-10-23&endday=2017-10-29&tree=all

Comment 20

8 months ago
6 failures in 857 pushes (0.007 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 3
* try: 1
* mozilla-inbound: 1
* mozilla-beta: 1

Platform breakdown:
* linux64: 2
* linux32-stylo-disabled: 2
* linux64-stylo-disabled: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-10-30&endday=2017-11-05&tree=all

Comment 21

7 months ago
3 failures in 849 pushes (0.004 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 2
* mozilla-central: 1

Platform breakdown:
* linux32-stylo-disabled: 2
* windows10-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-11-06&endday=2017-11-12&tree=all

Comment 22

7 months ago
8 failures in 744 pushes (0.011 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 7
* mozilla-release: 1

Platform breakdown:
* linux64: 4
* linux64-stylo-disabled: 1
* linux64-qr: 1
* linux32-nightly: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-11-20&endday=2017-11-26&tree=all

Comment 23

7 months ago
24 failures in 792 pushes (0.03 failures/push) were associated with this bug in the last 7 days. 

This is the #50 most frequent failure this week.   

Repository breakdown:
* mozilla-inbound: 16
* mozilla-central: 4
* autoland: 4

Platform breakdown:
* windows10-64: 9
* linux64-stylo-disabled: 4
* linux64-qr: 3
* linux32-stylo-disabled: 3
* linux64: 2
* linux32: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-11-27&endday=2017-12-03&tree=all

Comment 24

7 months ago
32 failures in 889 pushes (0.036 failures/push) were associated with this bug in the last 7 days. 

This is the #35 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* mozilla-inbound: 10
* autoland: 10
* mozilla-central: 8
* try: 4

Platform breakdown:
* linux64-stylo-disabled: 7
* linux32-stylo-disabled: 7
* linux32: 6
* windows10-64: 4
* linux64-qr: 3
* linux64: 3
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-12-04&endday=2017-12-10&tree=all

Comment 25

6 months ago
15 failures in 423 pushes (0.035 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 8
* mozilla-central: 5
* try: 1
* mozilla-inbound: 1

Platform breakdown:
* linux64-qr: 8
* windows10-64: 3
* linux32: 2
* windows10-64-ccov: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-12-11&endday=2017-12-17&tree=all

Comment 26

6 months ago
24 failures in 590 pushes (0.041 failures/push) were associated with this bug in the last 7 days. 

This is the #47 most frequent failure this week.   

Repository breakdown:
* autoland: 10
* try: 6
* mozilla-inbound: 6
* mozilla-central: 2

Platform breakdown:
* windows10-64: 7
* linux64-qr: 6
* linux64-stylo-disabled: 3
* linux64: 3
* linux32-stylo-disabled: 2
* linux32: 2
* windows10-64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-12-18&endday=2017-12-24&tree=all

Comment 27

6 months ago
9 failures in 147 pushes (0.061 failures/push) were associated with this bug in the last 7 days. 

This is the #33 most frequent failure this week.   

Repository breakdown:
* autoland: 6
* mozilla-central: 2
* mozilla-inbound: 1

Platform breakdown:
* linux64: 3
* linux64-qr: 2
* windows10-64-ccov: 1
* windows10-64: 1
* osx-10-10: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2017-12-25&endday=2017-12-31&tree=all

Comment 28

6 months ago
24 failures in 462 pushes (0.052 failures/push) were associated with this bug in the last 7 days. 

This is the #41 most frequent failure this week.   

Repository breakdown:
* mozilla-central: 8
* mozilla-inbound: 6
* autoland: 6
* try: 4

Platform breakdown:
* windows10-64: 7
* linux64-qr: 4
* windows7-32: 3
* linux64: 3
* windows10-64-ccov: 2
* linux32: 2
* linux64-stylo-disabled: 1
* linux64-nightly: 1
* linux32-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-01&endday=2018-01-07&tree=all
There have been 32 failures in the past week, according to orange factor:

Ocurrences per plaform:

8 on windows10-64
5 on linux32-stylo-disabled
5 on linux64-qr
4 on Linux
4 on Linux x64
3 on Windows 7
2 on windows10-64-ccov
1 on linux64-stylo-disabled

Occurrences per build type:

24 on debug
5 on opt
2 on pgo
1 on asan

Here a relevant log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=155142552&lineNumber=54732

22:15:18     INFO -  2102 INFO TEST-PASS | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | PeerConnectionWrapper (pcRemote): legal ICE state transition from connected to closed
54731
22:15:18     INFO -  Buffered messages finished
54732
22:15:18  WARNING -  TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 370 seconds with no output
54733
22:15:18     INFO -  2103 ERROR Force-terminating active process(es).
54734
22:15:18     INFO -  Determining child pids from psutil...

:drno, could you please take a look? Thank you!
Flags: needinfo?(drno)
Whiteboard: [stockwell needswork]

Comment 30

5 months ago
39 failures in 788 pushes (0.049 failures/push) were associated with this bug in the last 7 days. 

This is the #41 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 15
* mozilla-inbound: 13
* try: 6
* mozilla-central: 5

Platform breakdown:
* linux64-qr: 8
* windows10-64: 7
* linux32-stylo-disabled: 7
* linux32: 6
* windows7-32: 4
* linux64: 4
* windows10-64-ccov: 2
* linux64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-08&endday=2018-01-14&tree=all
Over the last 7 days there are 30 failures present on this bug. These happen  on Linux, Linux x64, linux32-stylo-disabled, linux64-qr, linux64-stylo-disabled, windows10-64 and windows10-64-ccov

Here is the most recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=157380001&lineNumber=24705

Here is a relevant snippet from aforementioned log:

[task 2018-01-19T08:21:58.457Z] 08:21:58     INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | The author of the test has indicated that flaky timeouts are expected.  Reason: WebRTC inherently depends on timeouts 
[task 2018-01-19T08:21:58.458Z] 08:21:58     INFO - Buffered messages finished
[task 2018-01-19T08:21:58.459Z] 08:21:58    ERROR - TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 370 seconds with no output
[task 2018-01-19T08:21:58.460Z] 08:21:58    ERROR - Force-terminating active process(es).
[task 2018-01-19T08:21:58.461Z] 08:21:58     INFO - Determining child pids from psutil...
[task 2018-01-19T08:21:58.462Z] 08:21:58     INFO - [3072, 3107, 4121]

Comment 32

5 months ago
32 failures in 657 pushes (0.049 failures/push) were associated with this bug in the last 7 days. 

This is the #28 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 13
* mozilla-central: 10
* mozilla-inbound: 7
* mozilla-beta: 2

Platform breakdown:
* linux64-qr: 11
* windows10-64: 7
* linux32: 5
* linux64: 3
* linux32-stylo-disabled: 3
* windows10-64-qr: 1
* windows10-64-ccov: 1
* linux64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-15&endday=2018-01-21&tree=all

Comment 33

5 months ago
21 failures in 129 pushes (0.163 failures/push) were associated with this bug yesterday.    

Repository breakdown:
* try: 18
* autoland: 2
* mozilla-central: 1

Platform breakdown:
* linux64-jsdcov: 4
* linux64: 3
* linux32-stylo-disabled: 3
* android-4-3-armv7-api16: 3
* windows7-32: 2
* linux64-qr: 2
* windows10-64-ccov: 1
* osx-10-10: 1
* linux64-stylo-disabled: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-25&endday=2018-01-25&tree=all
Looks like some problem around ending tracks. When looking at the Orange factor for the last three months it looks to me that landing transceivers in bug 1290948 made this problem a lot worse.

Byron, any idea why transceivers could result in track ending events going missing?
Could you have look why the mochitest appears to fail to tear down the test properly here?
Depends on: 1290948
Flags: needinfo?(drno) → needinfo?(docfaraday)

Comment 35

5 months ago
43 failures in 701 pushes (0.061 failures/push) were associated with this bug in the last 7 days. 

This is the #23 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* try: 21
* mozilla-inbound: 10
* autoland: 7
* mozilla-central: 4
* mozilla-beta: 1

Platform breakdown:
* linux64: 6
* windows7-32: 5
* windows10-64: 5
* linux64-qr: 5
* linux32-stylo-disabled: 5
* linux64-jsdcov: 4
* osx-10-10: 3
* linux32: 3
* android-4-3-armv7-api16: 3
* windows10-64-qr: 1
* windows10-64-ccov: 1
* linux64-stylo-disabled: 1
* linux64-ccov: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-22&endday=2018-01-28&tree=all
(Assignee)

Comment 36

5 months ago
(In reply to Nils Ohlmeier [:drno] from comment #34)
> Looks like some problem around ending tracks. When looking at the Orange
> factor for the last three months it looks to me that landing transceivers in
> bug 1290948 made this problem a lot worse.
> 
> Byron, any idea why transceivers could result in track ending events going
> missing?
> Could you have look why the mochitest appears to fail to tear down the test
> properly here?

The transceivers landing changed the timing of a lot of stuff in subtle ways, because our timing was not spec compliant before. It may be that the event listeners aren't guaranteed to be installed soon enough? Let me give it a look.
(Assignee)

Comment 37

5 months ago
Ok, so this is worse than track ended events not firing. We are deadlocked due to the threading rat's nest of webrtc.org and GMP. Here's what I can tell:

webrtc.org is spinning the main thread
GMP is sync dispatching from two webrtc.org threads to the GMP thread (and both of those stacks have a mutex locked in webrtc.org, one of which is being waited on by yet another webrtc.org thread)
GMP is sync dispatching to main



We're deadlocking/spinning main, here: https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/base/platform_thread.cc#261


GMP is sync dispatching to main here:

11:10:43     INFO -   7  xul.dll!mozilla::SyncRunnable::DispatchToThread(nsIEventTarget *,nsIRunnable *,bool) [SyncRunnable.h:996706d59171 : 98 + 0x15]
11:10:43     INFO -      eip = 0x5c09f714   esp = 0x109bf354   ebp = 0x109bf364
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   8  xul.dll!mozilla::gmp::GMPServiceCreateHelper::GetOrCreate() [GMPService.cpp:996706d59171 : 85 + 0x11]
11:10:43     INFO -      eip = 0x5d8484b2   esp = 0x109bf36c   ebp = 0x109bf384
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/dom/media/gmp/GMPService.cpp#85


The MSG thread also appears to be stuck, possibly waiting on main to do something:

11:10:43     INFO -   6  xul.dll!mozilla::SystemClockDriver::WaitForNextIteration() [GraphDriver.cpp:996706d59171 : 398 + 0xf]
11:10:43     INFO -      eip = 0x5d7614ba   esp = 0x112ff48c   ebp = 0x112ff51c
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   7  xul.dll!mozilla::MediaStreamGraphImpl::UpdateMainThreadState() [MediaStreamGraph.cpp:996706d59171 : 1305 + 0xd]
11:10:43     INFO -      eip = 0x5d7eb356   esp = 0x112ff524   ebp = 0x112ff530
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/dom/media/GraphDriver.cpp#398


A sync dispatch caused by GMP, from one of webrtc.org's threads to the GMP thread, while holding VideoReceiver::receive_crit_:

11:10:43     INFO -   7  xul.dll!mozilla::SyncRunnable::DispatchToThread(nsIEventTarget *,nsIRunnable *,bool) [SyncRunnable.h:996706d59171 : 98 + 0x15]
11:10:43     INFO -      eip = 0x5c09f714   esp = 0x13edf5d0   ebp = 0x13edf5e0
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   8  xul.dll!mozilla::WebrtcGmpVideoDecoder::Decode(webrtc::EncodedImage const &,bool,webrtc::RTPFragmentationHeader const *,webrtc::CodecSpecificInfo const *,__int64) [WebrtcGmpVideoCodec.cpp:996706d59171 : 812 + 0x33]
11:10:43     INFO -      eip = 0x5c8befc0   esp = 0x13edf5e8   ebp = 0x13edf600
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#815


webrtc::VideoStreamDecoder is stuck waiting on VideoReceiver::receive_crit_ (see above):

11:10:43     INFO -   4  xul.dll!rtc::CritScope::CritScope(rtc::CriticalSection const *) [criticalsection.cc:996706d59171 : 183 + 0xf]
11:10:43     INFO -      eip = 0x5eba22bb   esp = 0x1242fd68   ebp = 0x1242fd70   ebx = 0x00000001
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   5  xul.dll!webrtc::vcm::VideoReceiver::SetReceiveChannelParameters(__int64) [video_receiver.cc:996706d59171 : 165 + 0xc]
11:10:43     INFO -      eip = 0x5eb82f70   esp = 0x1242fd78   ebp = 0x1242fd84
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   6  xul.dll!webrtc::VideoStreamDecoder::OnRttUpdate(__int64,__int64) [video_stream_decoder.cc:996706d59171 : 139 + 0x10]
11:10:43     INFO -      eip = 0x5eafe7f2   esp = 0x1242fd8c   ebp = 0x1242fd98
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/video_receiver.cc#165


Yet another sync dispatch caused by GMP, from one of webrtc.org's threads to the GMP thread (similar to above):

11:10:43     INFO -   7  xul.dll!mozilla::SyncRunnable::DispatchToThread(nsIEventTarget *,nsIRunnable *,bool) [SyncRunnable.h:996706d59171 : 98 + 0x15]
11:10:43     INFO -      eip = 0x5c09f714   esp = 0x1416eb88   ebp = 0x1416eb98
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   8  xul.dll!mozilla::WebrtcGmpVideoEncoder::Encode(webrtc::VideoFrame const &,webrtc::CodecSpecificInfo const *,std::vector<webrtc::FrameType,std::allocator<webrtc::FrameType> > const *) [WebrtcGmpVideoCodec.cpp:996706d59171 : 322 + 0x29]
11:10:43     INFO -      eip = 0x5c8c0c64   esp = 0x1416eba0   ebp = 0x1416ebb4
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#325


IncomingVideoStream is also waiting forever it seems:

11:10:43     INFO -   5  xul.dll!webrtc::EventTimerWin::Wait(unsigned long) [event_timer_win.cc:996706d59171 : 41 + 0xc]
11:10:43     INFO -      eip = 0x5eb948fe   esp = 0x13b9f800   ebp = 0x13b9f808
11:10:43     INFO -      Found by: call frame info
11:10:43     INFO -   6  xul.dll!webrtc::IncomingVideoStream::IncomingVideoStreamProcess() [incoming_video_stream.cc:996706d59171 : 67 + 0xf]
11:10:43     INFO -      eip = 0x5eb14aaa   esp = 0x13b9f810   ebp = 0x13b9f910
11:10:43     INFO -      Found by: call frame info

https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_video/incoming_video_stream.cc#67
Flags: needinfo?(docfaraday) → needinfo?(rjesup)
Dan recently landed in bug 1429390 a change to make some pieces async around H264 GMP. Is this possibly related?
See Also: → bug 1429390
From looking at OrangeFactor I don't see the number of timeouts actually going down since bug 1429390 landed on 1/25. So this looks like another problem. Maybe this is the decoding side, which is still sync?

Comment 40

5 months ago
41 failures in 735 pushes (0.056 failures/push) were associated with this bug in the last 7 days. 

This is the #22 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 15
* mozilla-inbound: 12
* mozilla-central: 7
* try: 5
* mozilla-beta: 2

Platform breakdown:
* linux32: 11
* linux64: 8
* windows10-64: 7
* linux32-stylo-disabled: 5
* linux64-qr: 4
* windows7-32: 3
* windows10-64-ccov: 1
* osx-10-10: 1
* linux64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-01-29&endday=2018-02-04&tree=all
There have been 31 failures in the last 7 days. 
For the failure pattern see Comment 31
Do you have any updates here?
Flags: needinfo?(drno)

Comment 42

4 months ago
35 failures in 702 pushes (0.05 failures/push) were associated with this bug in the last 7 days. 

This is the #35 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 17
* mozilla-inbound: 9
* try: 4
* mozilla-central: 3
* mozilla-beta: 2

Platform breakdown:
* linux32: 9
* windows7-32: 5
* linux64-qr: 5
* windows10-64: 4
* linux64: 4
* linux32-stylo-disabled: 4
* linux64-stylo-disabled: 2
* windows10-64-qr: 1
* windows10-64-devedition: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-02-05&endday=2018-02-11&tree=all
Well... I wish I had more of the stacks from that analysis in comment 37, or a link to the log.  However...

It appears that we're trying to Stop the GMP at the same time we're trying to get a reference to it; the Encode() and Decode() sync calls appear to be red herrings - those are just blocked because the gmp thread has blocked.  Though it's odd we're calling Stop() on the GMP thread (why I want the full stacks...).  

In any case though, since sSingleton is a) threadsafe, and b) never nulled or changed until ClearOnShutdown runs, we can safely do (in GetOrCreate()):
   if (sSingleton) {
     // sSingleton is guaranteed to persist until ClearOnShutdown runs
     RefPtr<GeckoMediaPluginService> service(sSingleton.get());
     return service.forget();
   }

sync dispatches are always dangerous, and one should (almost) never, ever sync dispatch to mainthread.  That sync dispatch to main appears to go back to the original code by josh, before we got involved for OpenH264.

Probably a fair bit of re-architecting would be needed to get rid of the dispatch entirely, however, if this is created once, it's hard for the mainthread to be trying to do a synchronous operation like Stop against it before it's created.  The problem now is that every call to get a handle on it dispatches sync to main.  My snippet above would remove that
Flags: needinfo?(rjesup)

Updated

4 months ago
See Also: → bug 1379378

Comment 44

4 months ago
24 failures in 685 pushes (0.035 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 13
* mozilla-inbound: 8
* mozilla-central: 3

Platform breakdown:
* linux32: 7
* windows10-64: 4
* linux64-qr: 4
* linux32-stylo-disabled: 3
* windows7-32: 2
* linux64-stylo-disabled: 2
* linux64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-02-12&endday=2018-02-18&tree=all
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 330 seconds with no output → Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 330/370 seconds with no output

Comment 45

4 months ago
22 failures in 675 pushes (0.033 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 9
* autoland: 9
* mozilla-central: 3
* mozilla-beta: 1

Platform breakdown:
* linux32: 8
* linux32-stylo-disabled: 5
* windows10-64: 3
* osx-10-10: 2
* linux64: 2
* windows7-32: 1
* linux64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-02-19&endday=2018-02-25&tree=all
(Assignee)

Comment 46

4 months ago
This test also uses the fake h264 encoder/decoder, which is broken. Might be the cause here too. See bug 1407653.
See Also: → bug 1407653

Comment 47

4 months ago
43 failures in 831 pushes (0.052 failures/push) were associated with this bug in the last 7 days. 

This is the #37 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 16
* mozilla-inbound: 12
* mozilla-central: 7
* try: 5
* mozilla-beta: 3

Platform breakdown:
* linux32-stylo-disabled: 12
* linux32: 11
* linux64-qr: 5
* linux64-stylo-disabled: 4
* windows10-64-ccov: 3
* windows10-64: 3
* linux64: 2
* windows7-32: 1
* osx-10-10: 1
* linux64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-02-26&endday=2018-03-04&tree=all
Update:
There have been 40 failures in the last week.

Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 330 seconds with no output → Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 330/370 seconds with no output

The failures occur mostly on Linux /debug.

See also: bug 1407653 , bug 1379378

Here is a relevant log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=166813809&lineNumber=25225
[task 2018-03-08T19:42:35.960Z] 19:42:35     INFO - PeerConnectionWrapper (pcRemote): oniceconnectionstatechange fired, new state is: closed
[task 2018-03-08T19:42:35.962Z] 19:42:35     INFO - TEST-PASS | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | PeerConnectionWrapper (pcRemote): legal ICE state transition from connected to closed 
[task 2018-03-08T19:42:35.962Z] 19:42:35     INFO - Buffered messages finished
[task 2018-03-08T19:42:35.968Z] 19:42:35    ERROR - TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | application timed out after 370 seconds with no output
[task 2018-03-08T19:42:35.969Z] 19:42:35    ERROR - Force-terminating active process(es).
[task 2018-03-08T19:42:35.969Z] 19:42:35     INFO - Determining child pids from psutil...
[task 2018-03-08T19:42:35.969Z] 19:42:35     INFO - [3697, 2779, 2807]
[task 2018-03-08T19:42:35.969Z] 19:42:35     INFO - ==> process 2730 launched child process 2750
[task 2018-03-08T19:42:35.972Z] 19:42:35     INFO - ==> process 2730 launched child process 2779
[task 2018-03-08T19:42:35.972Z] 19:42:35     INFO - ==> process 2730 launched child process 2807
[task 2018-03-08T19:42:35.974Z] 19:42:35     INFO - ==> process 2730 launched child process 3697
[task 2018-03-08T19:42:35.976Z] 19:42:35     INFO - Found child pids: set([3697, 2779, 2750, 2807])
[task 2018-03-08T19:42:35.976Z] 19:42:35     INFO - Failed to get child procs
[task 2018-03-08T19:42:35.976Z] 19:42:35     INFO - Killing process: 3697

Comment 49

4 months ago
40 failures in 747 pushes (0.054 failures/push) were associated with this bug in the last 7 days. 

This is the #29 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* mozilla-inbound: 14
* autoland: 13
* mozilla-central: 7
* try: 4
* mozilla-release: 1
* mozilla-beta: 1

Platform breakdown:
* linux32: 12
* linux64: 6
* windows10-64: 5
* linux64-qr: 5
* linux32-stylo-disabled: 4
* windows10-64-ccov: 3
* linux64-nightly: 3
* windows7-32: 1
* windows10-64-nightly: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-03-05&endday=2018-03-11&tree=all

Comment 50

3 months ago
27 failures in 814 pushes (0.033 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 13
* autoland: 5
* try: 4
* mozilla-central: 3
* mozilla-release: 1
* mozilla-beta: 1

Platform breakdown:
* linux32: 7
* linux64-qr: 5
* linux64: 5
* windows10-64: 4
* linux64-stylo-disabled: 2
* linux32-stylo-disabled: 2
* windows10-64-ccov: 1
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1375540&startday=2018-03-12&endday=2018-03-18&tree=all
(Assignee)

Comment 51

3 months ago
(In reply to Randell Jesup [:jesup] from comment #43)
> Well... I wish I had more of the stacks from that analysis in comment 37, or
> a link to the log.  However...
> 

Here's a recent example of this happening.

https://taskcluster-artifacts.net/KvTDnRFPTAGkoxMAF9JzwA/0/public/logs/live_backing.log

> It appears that we're trying to Stop the GMP at the same time we're trying
> to get a reference to it; the Encode() and Decode() sync calls appear to be
> red herrings - those are just blocked because the gmp thread has blocked. 
> Though it's odd we're calling Stop() on the GMP thread (why I want the full
> stacks...).  

   I do not see any sign that we're trying to stop GMP. We are trying to stop the VideoReceiveStream webrtc.org thread though, synchronously on main (ugh). Which sucks when this thread is deadlocked due to sync dispatches...

> In any case though, since sSingleton is a) threadsafe, and b) never nulled
> or changed until ClearOnShutdown runs, we can safely do (in GetOrCreate()):
>    if (sSingleton) {
>      // sSingleton is guaranteed to persist until ClearOnShutdown runs
>      RefPtr<GeckoMediaPluginService> service(sSingleton.get());
>      return service.forget();
>    }

   StaticRefPtr::mRawPtr is just a bare pointer, so simply doing if (sSingleton) will be racy.


   We need to do at least three things here.

1. rtc::PlatformThread::Stop either needs to stop blocking, or we need to ensure that we only stop webrtc threads from threads other than main. We are blocking main every time we stop a receive stream, at the very least. This is not ok.

2. We need to stop Sync dispatching in mozilla::WebrtcGmpVideoDecoder::Decode.

3. We need to stop sync dispatching when creating the GMP service.
(Assignee)

Updated

3 months ago
Depends on: 1448863

Comment 52

3 months ago
36 failures in 822 pushes (0.044 failures/push) were associated with this bug in the last 7 days.   

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **  

Repository breakdown:
* autoland: 13
* mozilla-central: 8
* mozilla-inbound: 7
* try: 3
* mozilla-beta: 3
* mozilla-release: 2

Platform breakdown:
* windows10-64-qr: 9
* linux64: 8
* windows10-64: 4
* windows7-32: 3
* linux32: 3
* windows10-64-ccov: 2
* linux64-qr: 2
* linux32-stylo-disabled: 2
* windows7-32-devedition: 1
* windows10-64-nightly: 1
* osx-10-10: 1

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1375540&startday=2018-03-20&endday=2018-03-26&tree=trunk

Comment 53

3 months ago
26 failures in 718 pushes (0.036 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 12
* mozilla-inbound: 6
* mozilla-central: 3
* mozilla-beta: 3
* try: 2

Platform breakdown:
* linux32: 9
* linux64: 7
* windows10-64: 6
* windows10-64-qr: 2
* windows7-32: 1
* linux64-qr: 1

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1375540&startday=2018-03-26&endday=2018-04-01&tree=trunk

Comment 54

3 months ago
27 failures in 765 pushes (0.035 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* autoland: 14
* mozilla-beta: 6
* mozilla-inbound: 4
* try: 3

Platform breakdown:
* linux32: 7
* windows10-64-qr: 4
* windows10-64: 3
* linux64: 3
* linux64-stylo-disabled: 2
* linux64-qr: 2
* linux32-devedition: 2
* windows7-32-devedition: 1
* windows7-32: 1
* windows10-64-nightly: 1
* linux32-nightly: 1

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1375540&startday=2018-04-02&endday=2018-04-08&tree=trunk
(Assignee)

Comment 55

3 months ago
Looks like bug 1448863 fixed this.
Status: NEW → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Assignee: nobody → docfaraday
status-firefox59: --- → wontfix
status-firefox60: --- → affected
status-firefox61: --- → fixed
Flags: needinfo?(drno)
Target Milestone: --- → mozilla61
Whiteboard: [stockwell needswork] → [stockwell fixed:product]

Comment 56

2 months ago
1 failures in 792 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-beta: 1

Platform breakdown:
* windows10-64: 1

For more details, see:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?bug=1375540&startday=2018-04-09&endday=2018-04-15&tree=trunk
status-firefox60: affected → fixed
You need to log in before you can comment on or make changes to this bug.