Closed Bug 1460653 Opened 6 years ago Closed 5 years ago

Intermittent dom/media/tests/mochitest/test_peerConnection_trackless_sender_stats.html stuck in E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803 causing timeout

Categories

(Core :: WebRTC: Signaling, defect, P3)

x86_64
macOS
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: nataliaCs, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [stockwell unknown])

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=177865540&repo=mozilla-beta

07:58:56     INFO - TEST-START | dom/media/tests/mochitest/test_peerConnection_trackless_sender_stats.html
07:58:56     INFO - GECKO(1964) | TEST DEVICES: No test device found in media.audio_loopback_dev, using fake audio streams.
07:58:56     INFO - GECKO(1964) | TEST DEVICES: No test device found in media.video_loopback_dev, using fake video streams.
07:58:56     INFO - GECKO(1964) | Timecard created 1525964332.333895
07:58:56     INFO - GECKO(1964) |  Timestamp   | Delta       | Event                          | File                         | Function
07:58:56     INFO - GECKO(1964) | ======================================================================================================================
07:58:56     INFO - GECKO(1964) |     0.000012 |    0.000012 | Constructor Completed          | PeerConnectionImpl.cpp:355   | PeerConnectionImpl
07:58:56     INFO - GECKO(1964) |     0.000432 |    0.000420 | Initializing PC Ctx            | PeerConnectionImpl.cpp:646   | Initialize
07:58:56     INFO - GECKO(1964) |     0.030899 |    0.030467 | Create Offer                   | PeerConnectionImpl.cpp:1534  | CreateOffer
07:58:56     INFO - GECKO(1964) |     0.032130 |    0.001231 | Set Local Description          | PeerConnectionImpl.cpp:1739  | SetLocalDescription
07:58:56     INFO - GECKO(1964) |     0.048030 |    0.015900 | Set Remote Description         | PeerConnectionImpl.cpp:1842  | SetRemoteDescription
07:58:56     INFO - GECKO(1964) |     0.056234 |    0.008204 | Ice gathering state: gathering | PeerConnectionImpl.cpp:3187  | IceGatheringStateChange
07:58:56     INFO - GECKO(1964) |     0.061671 |    0.005437 | Ice gathering state: complete  | PeerConnectionImpl.cpp:3190  | IceGatheringStateChange
07:58:56     INFO - GECKO(1964) |     0.076590 |    0.014919 | Add Ice Candidate              | PeerConnectionImpl.cpp:2010  | AddIceCandidate
07:58:56     INFO - GECKO(1964) |     0.079461 |    0.002871 | Add Ice Candidate              | PeerConnectionImpl.cpp:2010  | AddIceCandidate
07:58:56     INFO - GECKO(1964) |     0.081404 |    0.001943 | Ice state: checking            | PeerConnectionImpl.cpp:3139  | IceConnectionStateChange
07:58:56     INFO - GECKO(1964) |     0.082910 |    0.001506 | Ice state: connected           | PeerConnectionImpl.cpp:3142  | IceConnectionStateChange
07:58:56     INFO - GECKO(1964) |     3.841996 |    3.759086 | Destructor Invoked             | PeerConnectionImpl.cpp:369   | ~PeerConnectionImpl
09:03:47     INFO - GECKO(1964) | [1965:Socket Thread]: I/signaling [Socket Thread|MediaPipeline] MediaPipeline.cpp:1114: RTCP sent packet count for 651c5db6fed02dcc| Receive video Pipeline 0x12db04b00 Flow: 0x12d3144c0: 3900
09:03:47     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:48     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:48     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:49     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:49     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:50     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:50     INFO - GECKO(1964) | (stun/INFO) STUN-CLIENT(consent): Received response; processing
09:03:50     INFO - GECKO(1964) | (ice/INFO) ICE(PC:1525964336234864 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender)/STREAM(0-1525964336234864 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender aLevel=0)/COMP(1): Consent refreshed
09:03:50     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:51     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:51     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:52     INFO - GECKO(1964) | (stun/INFO) STUN-CLIENT(consent): Received response; processing
09:03:52     INFO - GECKO(1964) | (ice/INFO) ICE(PC:1525964336233398 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender)/STREAM(0-1525964336233398 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender aLevel=0)/COMP(1): Consent refreshed
09:03:52     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:52     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:53     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:53     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:54     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:54     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
09:03:55     INFO - GECKO(1964) | (stun/INFO) STUN-CLIENT(consent): Received response; processing
09:03:55     INFO - GECKO(1964) | (ice/INFO) ICE(PC:1525964336234864 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender)/STREAM(0-1525964336234864 (id=2147484222 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_trackless_sender aLevel=0)/COMP(1): Consent refreshed
09:03:55     INFO - GECKO(1964) | [1965:Unnamed thread 0x12bc61d90]: E/signaling [|WebrtcAudioSessionConduit] AudioConduit.cpp:803: A/V sync: sync delta: 0ms, audio jitter delay 10ms, playout delay 0ms
[taskcluster 2018-05-10T16:03:55.317Z] Aborting task - max run time exceeded!
[taskcluster 2018-05-10T16:03:55.325Z] Exit Code: -1
Rank: 25
Component: WebRTC → WebRTC: Signaling
OS: Unspecified → Mac OS X
Priority: -- → P3
Hardware: Unspecified → x86_64
In the past 7 days we had 41 failures. Most of the occurrences were on osx-10-10 opt, but we also encountered some examples on:
- windows7-32 pgo/opt
- macosx64-nightly opt
- linux64 asan
- windows10-64-qr opt 
Recent log failure: https://treeherder.mozilla.org/logviewer.html#?job_id=191991375&repo=autoland&lineNumber=98555

:drno Can you please take a look at this bug?
Flags: needinfo?(drno)
Why is none of the mochitest logging present?
Hey, jib, do you know why we aren't getting any mochitest logging here? Are we not failing in the "right" way to trigger it?
Flags: needinfo?(jib)
(In reply to Byron Campen [:bwc] from comment #17)
> Hey, jib, do you know why we aren't getting any mochitest logging here?

Unsure. I seem to recall a flag or something to not truncate mochitest output. Ted, do you know?
Flags: needinfo?(jib) → needinfo?(ted)
It isn't truncated. It simply is not there at all.
So, I think we must be deadlocking main right when we try to end the test. When we start closing the PeerConnections in the test in comment 15, only 10 seconds has elapsed since ICE connected; we wait 15 for stats to come back. I bet the test has actually succeeded, and is just wrapping up when the deadlock occurs. The usual MOZ_LOG logging that happens on main is missing, even though it is present on previous tests in the same run.
Unless I (or someone else) can repro, figuring out where we're deadlocking is going to be hard, because webrtc.org is riddled with deadlock bugs. It would be really useful if there were stack dumps when the mochitest harness had to kill the process...
We appear to be deadlocking inside StopPermanentlyAndGetRtpStates, called from DestroyVideoSendStream, called from WebrtcVideoConduit::DeleteSendStream, called from WebrtcVideoConduit::DeleteStreams, called from TransceiverImpl::Shutdown_m. 

This code was added by bug 1414829, and the failures started a few days after that landed.

I'm going to try to figure out more about where/how we're deadlocking.
Blocks: 1414829
Flags: needinfo?(ted)
Flags: needinfo?(drno)
Keywords: regression
Assignee: nobody → docfaraday
Ok, we are deadlocking here, because the task we post above is never run:

https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/video/vie_encoder.cc#312

Not sure yet whether this is because the encoder_queue_ is no longer running, or it has itself deadlocked.
It looks like we might be deadlocking because ConfigureEncoderTask is called after we try to stop the encoder. Continuing to investigate.
Yeah, I see ConfigureEncoderTask in the logging every time we deadlock. It looks like we open ourselves up to a deadlock whenever we close the PC too soon after configuring an encoder. This is probably due to the replaceTrack we're doing right at the end of the test (we re-configure the encoder on replaceTrack because we might be switching to/from screen-sharing, which requires tweaking things a little).

This is definitely a webrtc.org bug. Dan, do you think the planned webrtc.org merge might fix this?
Flags: needinfo?(dminor)
(In reply to Byron Campen [:bwc] from comment #35)
> Yeah, I see ConfigureEncoderTask in the logging every time we deadlock. It
> looks like we open ourselves up to a deadlock whenever we close the PC too
> soon after configuring an encoder. This is probably due to the replaceTrack
> we're doing right at the end of the test (we re-configure the encoder on
> replaceTrack because we might be switching to/from screen-sharing, which
> requires tweaking things a little).
> 
> This is definitely a webrtc.org bug. Dan, do you think the planned
> webrtc.org merge might fix this?

In the update vie_encoder.cc has been renamed to video_stream_encoder.cc, but the code to run ConfigureEncoderTask looks pretty much the same as in the current version. What is looking different is where ReconfigureEncoder is called, but I can't tell offhand whether this would make things better or not. What I can do is spin up some try jobs and see if I can reproduce the problem on try with the branch update. It looks like OS X 10.10 is particularly bad, so I'll try that unless you have another suggestion.
Flags: needinfo?(dminor)
There are 30 occurrences for this failure in the last 7 days.

:drno can you take a look at this?
Flags: needinfo?(drno)
Whiteboard: [stockwell needswork]
drno: Have you had a chance to take a look at this bug yet?
In the last 7 days, there have been 34 failures.

:drno As you are the triage owner of this component, could you please take a look at this?
Thank you!
This is the same thing as bug 1481797, the last time I looked.

Perhaps we could decrease the frequency of this intermittent by adding a timer to delay the shutdown of this test, until the latest webrtc.org code lands?
This seems to not be happening on m-c now, and the root cause (bug 1481797) is blocked anyhow.
Assignee: docfaraday → nobody
Depends on: 1481797
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.