This is a deadlock between starting conduits (which triggers a SyncRunnable from Call thread to main as part of video codec init), and GetRtpSources (which grabs a conduit Mutex on main). Fixed by D124373.

Assignee: nobody → apehrson

Status: NEW → ASSIGNED

Michael Froman [:mjf]

Comment 2

•

3 years ago

Andreas, D124373 is in our stack now, but we're still seeing the timeouts on linux1804-64-tsan-qr opt builds:
https://treeherder.mozilla.org/jobs?repo=try&revision=df56c392371dfd991f9647ef4fc275ea2a3d6595&selectedTaskRun=LD1K2tWbRnaFqER4U2BQCg.0
and
https://treeherder.mozilla.org/jobs?repo=try&revision=df56c392371dfd991f9647ef4fc275ea2a3d6595&selectedTaskRun=LD1K2tWbRnaFqER4U2BQCg.0

Any thoughts?

Flags: needinfo?(apehrson)

Andreas Pehrson [:pehrsons]

Assignee

Comment 3

•

3 years ago

Hmm. This cannot have been the deadlock I mention in comment 1 because these tests don't use getContributingSources or getSynchronizationSources.

This being TSAN it just kinda looks like it's too slow. We could try to increase the size of the thread pool. Bug 1706925 is meant to follow up with this.

The call thread is a global (process-wide) TaskQueue which (among other things) routes all network packets (ugh, heavy), so may still be a bottleneck.

I'm not sure how many cores the machines running TSAN have, but I would like to see us setting a thread pool size that's a bit more adapted to the local machine's CPU. If we were saturating the thread pool this should help ease the pain for the call thread. It's at least worth trying to explore a higher number than 4 to see whether it has any effect on this test. We could also consider making the call thread a dedicated thread (or a single-thread thread-pool-backed TaskQueue, probably makes for a simpler patch) so it doesn't compete with the other TaskQueues over the threads in the pool. Do you have cycles to play with this Michael?

Flags: needinfo?(apehrson) → needinfo?(mfroman)

Andreas Pehrson [:pehrsons]

Assignee

Comment 4

•

3 years ago

•

Edited

I've written a patch to improve this by making the call thread sit on a dedicated single-thread thread pool. Seems to help locally under rr. Checking tsan on try here.

Flags: needinfo?(mfroman)

Andreas Pehrson [:pehrsons]

Assignee

Comment 5

•

3 years ago

See D127263 for this (on bug 1654112).

Michael Froman [:mjf]

Comment 6

•

3 years ago

This appears fixed by D127263. See here. I'm going to close this as fixed for now.

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

test_peerConnection_twoAudioVideoStreams.html and test_peerConnection_twoAudioVideoStreamsCombined.html are timing out

Categories

(Core :: WebRTC, defect, P2)

Tracking

()

People

(Reporter: ng, Assigned: pehrsons)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6