Closed Bug 1379378 Opened 7 years ago Closed 11 months ago

Intermittent dom/media/webrtc/tests/mochitest/test_peerConnection_basicH264Video.html stuck in 'E/mtransport Couldn't send media on' causing mda timeout

Categories

(Core :: WebRTC: Audio/Video, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix

People

(Reporter: aryx, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

https://treeherder.mozilla.org/logviewer.html#?job_id=112727894&repo=autoland

19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport (ice/ERR) ICE(PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h)/STREAM(0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0)/COMP(1): Refresh consent failed with 3
19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport (ice/INFO) ICE(PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h): peer (PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h:default) Trickle grace period is over; marking every component with only failed pairs as failed.
19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport (ice/WARNING) ICE(PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h)/STREAM(0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0)/COMP(1): Consent refresh final time out
19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport (ice/INFO) ICE(PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h)/STREAM(0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0)/COMP(1): Consent refresh failed
19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport (ice/INFO) ICE-PEER(PC:1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h:default)/CAND-PAIR(skX1): setting pair to state CANCELLED: skX1|IP4:10.26.41.223:55804/UDP|IP4:10.26.41.223:55802/UDP(host(IP4:10.26.41.223:55804/UDP)|candidate:0 1 UDP 2122252543 10.26.41.223 55802 typ host)
19:42:06     INFO - GECKO(2448) | [Socket Thread]: D/mtransport Flow[955462e557bf481e:0,rtp(none)]; Layer[ice]: ICE Failed(0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0,1)
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport c:/builds/moz2_slave/autoland-w64-00000000000000000/build/src/media/mtransport/transportlayerice.cpp:211: Flow[955462e557bf481e:0,rtp(none)]; Layer[ice]: state 3->5
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport Flow[955462e557bf481e:0,rtp(none)]; Layer[dtls]: Lower layer experienced an error
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport c:/builds/moz2_slave/autoland-w64-00000000000000000/build/src/media/mtransport/transportlayerdtls.cpp:867: Flow[955462e557bf481e:0,rtp(none)]; Layer[dtls]: state 4->5
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport Couldn't send media on '0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0'
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport Couldn't send media on '0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0'
19:42:06     INFO - GECKO(2448) | [Socket Thread]: E/mtransport Couldn't send media on '0-1499481695299000 (id=4294967516 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0'
Rank: 22
Priority: -- → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
In the last week, this has failed 51 times, mostly on debug Windows 10 (both stylo enabled and disabled), on Mochitests.

:jib Can you have a look?
Flags: needinfo?(jib)
Whiteboard: [stockwell needswork]
Seems like an ICE issue. Over to Nils.
Flags: needinfo?(jib) → needinfo?(drno)
Hmm something just appears to close one of the PeerConnections out of the blue:

05:34:12     INFO -  GECKO(6104) | [Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:3158: mozilla::PeerConnectionImpl::CloseInt: Closing PeerConnectionImpl ad510145678f1cb5; ending call
05:34:12     INFO -  GECKO(6104) | [Main Thread]: I/jsep [1507872851003000 (id=4294967520 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h]: stable -> closed

No wonder that the other side then can't send media and ICE consent eventually fails.
Flags: needinfo?(drno)
(In reply to Nils Ohlmeier [:drno] from comment #13)
> Hmm something just appears to close one of the PeerConnections out of the blue:

Hmm that does not match what I'm seeing e.g. in this log [1] around 09:55:14. There's no sign of PeerConnectionImpl::CloseInt anywhere near that time mark.

There are plenty of instances earlier, but not on test_peerConnection_basicH264Video.h - I don't think that in itself is unusual, as I believe CloseInt gets called regularly when documents are torn down in JS once a test ends. 

Can you have another look, or link me to the specific log you looked at in comment 13?

[1] https://public-artifacts.taskcluster.net/SNV_naN9Q1e_JeF7FePtow/0/public/logs/live_backing.log
Flags: needinfo?(drno)
(In reply to Jan-Ivar Bruaroey [:jib] (on PTO Oct 11) from comment #15)
> Can you have another look, or link me to the specific log you looked at in
> comment 13?
> 
> [1]
> https://public-artifacts.taskcluster.net/SNV_naN9Q1e_JeF7FePtow/0/public/
> logs/live_backing.log

So in that log I see the following:

09:54:42     INFO - GECKO(665) | [Socket Thread]: D/mtransport NrIceCtx(PC:1506963282039507 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h): state 1->2
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[408e77e2a1e86840:0,rtp(none)]; Layer[dtls]: Handshake would have blocked
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[a4edfcf5df189cbc:0,rtp(none)]; Layer[dtls]: Handshake would have blocked
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[408e77e2a1e86840:0,rtp(none)]; Layer[dtls]: Handshake would have blocked
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[a4edfcf5df189cbc:0,rtp(none)]; Layer[dtls]: ****** SSL handshake completed ******
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[a4edfcf5df189cbc:0,rtp(none)]; Layer[dtls]: Selected ALPN string: webrtc
09:54:42     INFO - GECKO(665) | (ice/ERR) ICE(PC:1506963282032932 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h): peer (PC:1506963282032932 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h:default), stream(0-1506963282032932 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0) tried to trickle ICE in inappropriate state 4
09:54:42     INFO - GECKO(665) | [Socket Thread]: E/mtransport Trickle candidates are redundant for stream '0-1506963282032932 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h aLevel=0' because it is completed
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[408e77e2a1e86840:0,rtp(none)]; Layer[dtls]: ****** SSL handshake completed ******
09:54:42     INFO - GECKO(665) | [Socket Thread]: I/mtransport Flow[408e77e2a1e86840:0,rtp(none)]; Layer[dtls]: Selected ALPN string: webrtc
09:54:42     INFO - GECKO(665) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/x4/ks5mf1yn31g3lx152w31w37400000w/T/tmpCyyTsZ.mozrunner/runtests_leaks_geckomediaplugin_pid678.log
09:54:42     INFO - GECKO(665) | [GMP 678, Main Thread] WARNING: NS_ENSURE_TRUE(InitStaticMembers()) failed: file /builds/worker/workspace/build/src/modules/libpref/Preferences.cpp, line 1638
09:54:42     INFO - GECKO(665) | [GMP 678, Main Thread] WARNING: NS_ENSURE_TRUE(InitStaticMembers()) failed: file /builds/worker/workspace/build/src/modules/libpref/Preferences.cpp, line 1638
09:54:43     INFO - GECKO(665) | [Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:3043: CloseInt: Closing PeerConnectionImpl a4edfcf5df189cbc; ending call
09:54:43     INFO - GECKO(665) | [Main Thread]: I/jsep [1506963282032932 (id=2147483868 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h]: stable -> closed


The "state 1->2" means ICE did actually connected. As you can see the log also shows the CloseInt thing I referred to earlier.

But one thing I overlooked before are the two lines with "GMP 678" in them. The line number appears to completely off, but that looks like GMP is trying to lookup some user pref and the failure of that lookup results in the PeerConnection getting closed?!

The preferences code seems to have thread assertions. Maybe that is randomly failing here?
But I have no idea where or why GMP would try to read user prefs.
Flags: needinfo?(drno)
Turns out GMP is in fact full of Preferences.

Anthony who would be a good GMP expert who could help us trying to figure out what is going wrong here?
Flags: needinfo?(ajones)
GMP 678, Main Thread -> I presume this is coming from the Main Thread in the GMP process.

I see one pref grabbed in a child: 
GMPChild.cpp:  info.shouldLog = Preferences::GetBool("security.sandbox.logging.enabled") ||
                   PR_GetEnv("MOZ_SANDBOX_LOGGING");
(That's looked at in Mac only)

There are many prefs in the Parent, but those wouldn't show up in a log with that prefix I think.

It's unclear to me if this can be accessed in a GMP process, or if that has changed.  And this is an intermittent, which makes it odder.
What makes me a little suspicious is this comment here http://searchfox.org/mozilla-central/source/dom/media/MediaPrefs.h#15

But then other code in GMP directly accessing the Prefs without going through MediaPrefs. Makes me wonder if depending on timing sometimes the stuff in MediaPrefs indeed set things up in the parent properly, before the GMP child accesses prefs. And in other cases the child is the first, and maybe the Prefs code got some new thread assertions back in July...
:drno Do you have any updates on this bug?
Flags: needinfo?(drno)
There have been 32 failures in the last 7 days.

:drno, any updates?
This is really weird. I'm seeing one of the PCs being closed right after ICE is connected? And then later we see consent refresh timing out.

23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/mtransport Flow[4b2afb416406f12e:0,rtp(none)]; Layer[dtls]: ****** SSL handshake completed ******
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/mtransport Flow[4b2afb416406f12e:0,rtp(none)]; Layer[dtls]: Selected ALPN string: webrtc
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:760: Flow is ready
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:798: Transport ready for pipeline 000002860A291190 flow 4b2afb416406f12e| Receive video: RTP/RTCP mux
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:873: Listening for RTP/RTCP mux packets received on 000002860B890F50
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:760: Flow is ready
23:00:16     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:798: Transport ready for pipeline 000002860C226BB0 flow 4b2afb416406f12e| Transmit video[{660a8321-e6ea-4902-bfa5-e90dc14ac3a9}]: RTP/RTCP mux
23:00:16     INFO -  GECKO(4324) | [Main Thread]: W/signaling [main|PeerConnectionMedia] PeerConnectionMedia.cpp:1182: mozilla::PeerConnectionMedia::GetTransmitPipelinesMatching: none found for 00000286076AC800
23:00:17     INFO -  GECKO(4324) | [Main Thread]: W/signaling [main|PeerConnectionMedia] PeerConnectionMedia.cpp:1182: mozilla::PeerConnectionMedia::GetTransmitPipelinesMatching: none found for 00000286076AC800
23:00:17     INFO -  GECKO(4324) | [Main Thread]: W/signaling [main|PeerConnectionMedia] PeerConnectionMedia.cpp:1182: mozilla::PeerConnectionMedia::GetTransmitPipelinesMatching: none found for 000002860A267800
23:00:17     INFO -  GECKO(4324) | [Main Thread]: W/signaling [main|PeerConnectionMedia] PeerConnectionMedia.cpp:1182: mozilla::PeerConnectionMedia::GetTransmitPipelinesMatching: none found for 00000286076AC800
23:00:17     INFO -  GECKO(4324) | [Main Thread]: W/signaling [main|PeerConnectionMedia] PeerConnectionMedia.cpp:1182: mozilla::PeerConnectionMedia::GetTransmitPipelinesMatching: none found for 000002860A267800
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2820: mozilla::PeerConnectionImpl::CloseInt: Closing PeerConnectionImpl fef8e87007d37685; ending call
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/jsep [1512514816499000 (id=4294967540 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h]: stable -> closed
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/signaling [main|MediaPipeline] MediaPipeline.cpp:607: fef8e87007d37685| Receive video in mozilla::MediaPipeline::Shutdown_m
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/signaling [main|MediaPipeline] MediaPipeline.cpp:607: fef8e87007d37685| Transmit video[{40a950c1-f674-4396-a944-a64cbfc3a3b6}] in mozilla::MediaPipeline::Shutdown_m
23:00:17     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:907: Transport closed for flow RTP/RTCP mux
23:00:17     INFO -  GECKO(4324) | [Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:907: Transport closed for flow RTP/RTCP mux
23:00:17     INFO -  GECKO(4324) | [Socket Thread]: E/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:1013: Discarding incoming packet; pipeline not open
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2820: mozilla::PeerConnectionImpl::CloseInt: Closing PeerConnectionImpl 4b2afb416406f12e; ending call
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/jsep [1512514816502000 (id=4294967540 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.h]: stable -> closed
23:00:17     INFO -  GECKO(4324) | [Main Thread]: I/signaling [main|MediaPipeline] MediaPipeline.cpp:607: 4b2afb416406f12e| Receive video in mozilla::MediaPipeline::Shutdown_m
I'm going to leave this with Nils.
Flags: needinfo?(ajones)
Is this possibly related to bug 1375540 which also tests H264 and hangs?

But in this case something else happens in GMP land which tears down transport on end B. And the A side only realizes that the B side has disappeared through ICE consent timing out, because probably our mochitest don't catch too early closing of peer connection (and translate them into a test failure).
Flags: needinfo?(drno)
See Also: → 1375540
The failure on Apr 10 appears to have been mis-classified. The last actual failure seems to have happened on Apr 1. It is possible that bug 1448863 fixed this. It is also possible that whatever caused bug 1407653 to stop happening also caused this bug to stop happening.
Severity: normal → S3
Summary: Intermittent dom/media/tests/mochitest/test_peerConnection_basicH264Video.html stuck in 'E/mtransport Couldn't send media on' causing mda timeout → Intermittent dom/media/webrtc/tests/mochitest/test_peerConnection_basicH264Video.html stuck in 'E/mtransport Couldn't send media on' causing mda timeout
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.