Open Bug 1419580 Opened 7 years ago Updated 2 years ago

significant audio lag on appear.in WebRTC call

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

Unspecified
Linux
defect

Tracking

()

Tracking Status
firefox59 --- affected

People

(Reporter: dbaron, Unassigned)

References

Details

Attachments

(1 file)

For today's W3C TAG meeting:
https://github.com/w3ctag/meetings/blob/gh-pages/2017/telcons/11-21-agenda.md
which took place in the w3ctagpro room on appear.in, I was experiencing significant audio lag building up over time.

I joined the meeting about 13:00 local time, and by 13:50 I think the audio was about 8 seconds behind the video.  I think disconnected in appear.in (by using the leave button and the bottom and then the rejoin button on the resulting screen -- interestingly, when I did so, I heard the audio continue for about the 8 seconds after leaving the room and no longer seeing the video).

Then, over the next 20 minutes, the audio lag returned and was probably about 5 or 6 seconds behind the video by the end of the call.

I was running this webrtc call in a browser session with a decent number of tabs, and there was probably a little bit of jank.  It's a build I built myself based on https://hg.mozilla.org/mozilla-central/rev/e1d7427787f7 plus my local patches.

I'll attach a webrtc log, although given the audio continuing after I left the call the first time, I'm expecting the problem is mostly local...
Also, from #media, mostly during the call:
13:58:37 <dbaron> are there known issues in nightly lately with audio delays in webrtc calls?
13:59:05 <jib> I don't think so
13:59:11 <dbaron> I'm getting about a 5-8 second audio lag after being on an appear.in webrtc call
13:59:32 <jib> That's new to me
14:00:34 <jib> does it reproduce?
14:01:21 <dbaron> at least twice in the same call after leaving and rejoining
14:01:31 <dbaron> it was sync'd when I rejoined and then drifted out of sync again
14:01:38 <dbaron> might be related to browser being under load with a lot of tabs
14:04:25 <jib> dbaron: We haven't seen anything like that in quite a while, so if it reproduces at all please consider filing a bug with details
14:06:29 <jib> We have some audio patches coming in the next few days that may help, but they haven't landed yet
14:06:55 <jib> in bug 1397793
14:14:08 <jib> We'd hoped bug 1405258 would have taken care of most of the drift issues
We should check again once bug 1397793 lands.
Rank: 13
Priority: -- → P2
See Also: → 1397793
What platform and what cam/mic?
Flags: needinfo?(dbaron)
Linux, Thinkpad's builtin camera, microphone connected to the Laptop's headphone jack
Flags: needinfo?(dbaron)
I've been struck by this as well. It's an internal pulse buffer for output that becomes ridiculously long for unknown reasons. We reached out to pulse people a while ago on this but I'm not sure what happened after that. Paul, have you heard more?
Flags: needinfo?(padenot)
OS: Unspecified → Linux
We're still waiting for an answer from pulse people, that are as confused about this problem as we are. We know the cause, we have evidences of what is happening, we don't know the solution just yet. While we are actively discussing with upstream pulse audio maintainers to fix this, we are working on a number of patches for Firefox to mitigate this.

This is caused by missing real-time audio callback deadlines. PulseAudio then detects that the deadlines are missing, and increases the size of internal buffers, increasing latency, to give the CPU more time to work. We see growing numbers being reported by PulseAudio, when querying the audio server. This should _not_ grow that high, according to PulseAudio people, though, hence the confusion.

Missing those deadlines can be cause by two things, that are related:
- This call was busy, with 2+ participants, during a mesh call (p2p), or with an SFU (relayed media, but each participants have distinct streams). This causes us to have to decode N-1 Opus stream and run N-1 jitter buffer instances (those are expensive), with N the number of participants in this call. This can lead to an overloaded real-time callback, and cause under-runs (akin to frame drops, to make a graphics analogy). Bug 1404997 has been opened to fix this. jya is assigned, but is currently dealing with urgent post-57 release fixups.
- The machine was busy: because this is likely not a system with a real-time kernel, and even if the thread that runs the audio callback is (likely, depending on the distro) bumped in priority, there is no guarantee that the thread will make its deadlines each time. This we can address by simply optimizing our code a lot (it's likely that this WebRTC call was the task that was the  most demanding).

Bug 1397793 will help (it fixes a lot of inefficiencies in the code that runs in the real-time thread, inefficiencies caused mostly by simple technical debt), and bug 1404997 will help much more.

I hope to land 1397793 this week, and I see :achronop has reached out again to the PulseAudio maintainer this morning.
Flags: needinfo?(padenot)
Hello. I finally found out this bug matching what I experienced. I now run Debian Stretch with firefox 60.0.1 (64-bit). I downloaded binary firefox package, untar and run it manually.
According to your description I have found out a way to monitor the lag status, explained below in case it's useful:
- Open Firefox and go to https://talky.io/FF-test
- Grant required permissions, specially audio related. There's no need to join the call.
- On a console issue: "pacmd list-sink-inputs".
- From the previous output take a look at the number of sink inputs. Look the one whose client property matches what firefox requires, in my case "client: 1089 <AudioIPC Server>", this is the sink input with "index: 2157"
- Watch the "current latency" and "requested latency" values. You can automate the monitoring with the following command: watch  -c "pacmd list-sink-inputs |grep latency". This will show all sink input latencies so you need to know the position of the firefox oe in the full list.

In my case I have seen latency rise in excess to 6000ms, main trigger was heavy disk I/O. Feel free to request more information.
I'd re-check with 61 or 62, I've landed even more optimization recently.

Linux Desktop is special because we can't bump the priority of the audio thread (using a default configuration), so it's kind of bound to be terrible under load (unlike other OSes: on OSX, `stress --cpu 16` on a MacBook pro is unable to make a WebRTC call glitch when using Firefox).
This bug also affects me on FF 61.0.1 on Ubuntu 18.04.1. My audio device is a USB headset. The bug appears to show up during heavy CPU usage from other programs. My computer has persistent issues with thermal throttling.

I can confirm that https://bugzilla.mozilla.org/show_bug.cgi?id=1419580#c8 gives a reasonable-sounding measure of the latency.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: