Closed Bug 1373123 Opened 8 years ago Closed 6 years ago

Intermittent dom/media/tests/mochitest/test_peerConnection_stats.html | Error in test execution: Error: Waiting for synced RTCP timed out after at least 15000ms waitForSyncedRtcp@http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_

Categories

(Core :: WebRTC: Signaling, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bwc)

Details

(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Rank: 25
Component: WebRTC → WebRTC: Signaling
OS: Unspecified → Linux
Priority: -- → P2
Hardware: Unspecified → x86_64
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → INCOMPLETE

There are 21 total failures on linux, osx and windows in the last 7 days.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231760068&repo=autoland&lineNumber=11574

23:09:07 INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
23:09:07 INFO - outbound_rtcp_audio_0 is missing roundTripTime: {"id":"outbound_rtcp_audio_0","timestamp":1551740946827,"type":"remote-inbound-rtp","kind":"audio","localId":"outbound_rtp_audio_0","mediaType":"audio","ssrc":848272873,"bytesReceived":176855,"jitter":0,"packetsLost":0,"packetsReceived":1098}
23:09:07 INFO - waitForSyncedRtcp: no synced RTCP on attempt30, retrying.
23:09:07 INFO -
23:09:07 INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
23:09:07 INFO - Buffered messages finished
23:09:07 INFO - TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | Error in test execution: Error: Waiting for synced RTCP timed out after at least 15000ms waitForSyncedRtcp@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:1674:11 ... asyncPC_LOCAL_TEST_LOCAL_STATS@http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_stats.html:782:23 ... execute/</<@http://mochi.test:8888/tests/dom/media/tests/mochitest/head.js:852:31 ... promise callbackexecute/<@http://mochi.test:8888/tests/dom/media/tests/mochitest/head.js:850:19 ... execute@http://mochi.test:8888/tests/dom/media/tests/mochitest/head.js:845:26 ... PeerConnectionTest.prototype.run@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:486:21 ... @http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_stats.html:806:8 ... runNetworkTest/<@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:2187:13 ... asyncrunTestWhenReady/<@http://mochi.test:8888/tests/dom/media/tests/mochitest/head.js:473:41 ... promise callbackrunTestWhenReady@http://mochi.test:8888/tests/dom/media/tests/mochitest/head.js:473:25 ... runNetworkTest@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:2169:9 ... async*@http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_stats.html:795:1 ...
23:09:07 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
23:09:07 INFO - execute/<@dom/media/tests/mochitest/head.js:856:14
23:09:07 INFO - promise callbackexecute@dom/media/tests/mochitest/head.js:855:8
23:09:07 INFO - PeerConnectionTest.prototype.run@dom/media/tests/mochitest/pc.js:486:21
23:09:07 INFO - @dom/media/tests/mochitest/test_peerConnection_stats.html:806:8
23:09:07 INFO - runNetworkTest/<@dom/media/tests/mochitest/pc.js:2187:13
23:09:07 INFO - async
runTestWhenReady/<@dom/media/tests/mochitest/head.js:473:41
23:09:07 INFO - promise callbackrunTestWhenReady@dom/media/tests/mochitest/head.js:473:25
23:09:07 INFO - runNetworkTest@dom/media/tests/mochitest/pc.js:2169:9
23:09:07 INFO - async
@dom/media/tests/mochitest/test_peerConnection_stats.html:795:1
23:09:07 INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
23:09:07 INFO - Closing peer connections
23:09:07 INFO - Waiting for track {ad2e26a6-4860-4abd-a5c6-51086e82a257} (audio) to end.
23:09:07 INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
23:09:07 INFO - Waiting for track {25128642-094e-4643-b047-85a54f16bfb2} (video) to end.
23:09:07 INFO - TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_stats.html | The author of the test has indicated that flaky timeouts are expected. Reason: WebRTC inherently depends on timeouts
23:09:07 INFO - TEST-PASS | dom/media/tests/mochitest/test_peerConnection_stats.html | iceConnectionState should not be undefined
23:09:07 INFO - PeerConnectionWrapper (pcLocal): oniceconnectionstatechange fired, new state is: closed
23:09:07 INFO - TEST-PASS | dom/media/tests/mochitest/test_peerConnection_stats.html | PeerConnectionWrapper (pcLocal): legal ICE state transition from connected to closed

Flags: needinfo?(drno)
Whiteboard: [stockwell needswork:owner]

When playing with these stats locally, the rtt seems to come and go. When it is there, it is in the single digits. This may simply be a case of the RTT being effectively 0 ms, since the tests are running locally over loopback.

@ng, would it make sense to stop eliding this statistic when it is calculated to be 0? Do we need to have a separate test that introduces network delay and verifies that it eventually takes a positive value?

Flags: needinfo?(na-g)

I should also point out that we are legitimately seeing cases where it takes surprisingly long to see RTCP packets. Why would this be?

(In reply to Byron Campen [:bwc] from comment #36)

When playing with these stats locally, the rtt seems to come and go. When it is there, it is in the single digits. This may simply be a case of the RTT being effectively 0 ms, since the tests are running locally over loopback.

@ng, would it make sense to stop eliding this statistic when it is calculated to be 0? Do we need to have a separate test that introduces network delay and verifies that it eventually takes a positive value?

I think we should open a bug on this behavior.

The elision of received 0s is mandated by the spec: https://w3c.github.io/webrtc-stats/#dom-rtcremoteinboundrtpstreamstats if the DSLR is 0. I a brief look through the jumble of places that libwebrtc calculates and stores RTT doesn't make it immediately apparent as to if a reported rtt of 0 implies that DLSR was exactly zero (less than 1/2^16 of a second). All that said, I have never been fond of the elision, and I think we should report what was reported in the packet. If it is found that the DLSR is often zero in local connections I will open a bug on the spec.

The current tests do wait for RTT to appear at least once (and all appearances should be > 0). https://searchfox.org/mozilla-central/source/dom/media/tests/mochitest/pc.js#1638 I don't think it would be harmful to add a bit of latency to the existing test.

Flags: needinfo?(na-g)

(In reply to Byron Campen [:bwc] from comment #37)

I should also point out that we are legitimately seeing cases where it takes surprisingly long to see RTCP packets. Why would this be?

What length of time are you seeing? There is logic in libwebrtc that controls the random length of time we wait before sending the first RTCP packets. We have been maintaining a changeset that significantly shortened that time. It is entirely possible that something changed in the latest webrtc.org update that effected this.

In the cases where we see this intermittent failure, it takes about 14 seconds to get audio RTCP stats.

(In reply to Byron Campen [:bwc] from comment #40)

In the cases where we see this intermittent failure, it takes about 14 seconds to get audio RTCP stats.

That could very well be that we are never getting a non-zero round trip time.

That 14-second delay is not just rtt; no RTCP stats at all come in. When they do come in, there's no rtt.

The RTT could be held up on waiting for an SR. A receiver can't calculate a DSLR (delay since last SR) to put in a RR without first receiving a SR. The AudioConduit and VideoConduit both log when they send and receive RTP and RTCP, can we turn the logging of WebrtcAudioSessionConduit and WebrtcVideoSessionConduit up to DEBUG during the tests in question? We may want to elevate the RTCP DEBUG messages all to INFO, they shouldn't be too noisy.

receiver_only_ appears to always be false here [0] for audio. It may be a clue. The configuration object that ultimately determines its value is created here [1]. I am not sure how to read the comment at [0], I seem to remember that there is a trick by which the RTT can be estimated early.

[0] https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_receiver.cc#494
[1] https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/voice_engine/channel.cc#949

Do we want to extend this timeout to 20 or 25 seconds for now, or do we want to try to get to the bottom of why this is taking so much time?

(In reply to Byron Campen [:bwc] from comment #45)

Do we want to extend this timeout to 20 or 25 seconds for now, or do we want to try to get to the bottom of why this is taking so much time?

I think this is worth digging into, extending the timeout wouldn't hurt though.

It looks like in bug 1376873 the audio and video conduits were changed to pull the round trip time off of the Call interface. This RTT is a "call wide" metric. I am sure this was done to eliminate a patch that allowed us to get the RTT from within the RTP guts of libwebrtc. This change was also tried in the previous libwebrtc update and the patch had to be restored. I have no idea as to if it worked this time.

test-verify jobs for this test look ok across the board, and I'm doing a bunch of retriggers on win32 debug. It is possible that this doesn't happen unless more of the test suite runs, but it looks promising.

Assignee: nobody → docfaraday
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: