Closed Bug 1467962 Opened 6 years ago Closed 6 years ago

Bad audio quality in a group video call, CPU over 120%

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

61 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1423194

People

(Reporter: avasilko, Unassigned)

Details

Attachments

(10 files)

Attached image firefox cpu.png
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36

Steps to reproduce:

1. Join a group video call (SFU) while sharing audio and video tracks
2. The call has 6 participants, 4 of them share video and audio tracks, the rest shares only audio

Reproduced with the latest Firefox beta version.
The issue is not reproducible in Chrome browser with the same video call.
Please find attached Firefox performance and memory snapshot/logs



Actual results:

- Received incoming audio is crackling, has occasional artifacts (not reproducible in Chrome in the same video room)
- CPU usage by Firefox spikes up to 120%


Expected results:

Clean audio
Attached file FF_memory.fxsnapshot
Component: Untriaged → WebRTC
Product: Firefox → Core
Hi Anna, thanks for filing.

For us to understand this a bit better, please answer a couple of questions:
Does this happen only on Mac, or on other platforms too?
What audio backend were you using when this happened? Find the one your Firefox uses on about:support, under "Media".

Do you have a link to a page I can use to reproduce this? This is to ensure we look at and debug the same things as you are reporting. Feel free to email me in private if it's something you can't share publicly.
Component: WebRTC → WebRTC: Audio/Video
Flags: needinfo?(avasilko)
Whiteboard: [need info reporter 2018-06-18]
Hi Andreas,

The "Media" section mentions audio backend being "audiounit". I am attaching a screenshot of details to this ticket.
Meantime we are trying to open up our test app for your team to debug.

My team will try on Windows and update here shortly.
Flags: needinfo?(avasilko)
I see you do not use the built-in output device. Can you please try to reproduce with that?
Hi Andreas,

I've checked today the issue, described by Anna on Windows and MacOS. On Windows there is no crackling sound, which you can clearly hear on MacOS. In Windows for the same scenario FF uses ± 50% CPU. I've used 60.0.1 version for both platforms.
Thanks for the information!

We do have some issues with a slow graphics route on Mac. My guess would be that that's what's causing the CPU spike. To try to verify that without having access to your service, could you try an experiment that doesn't render video (but still negotiates, sends and receives) and compare CPU usage for this on both Windows and Mac? I would expect Mac when not rendering video to drop much closer to what you see for Windows.

Whether that is then causing the audio glitches is another question, but one step at a time.

Of course if we had a page to test on I could also do some profiling to verify this and dig deeper. That might be the simpler path forward.
Flags: needinfo?(avasilko)
Hi Andreas,
Thanks, we'll try the suggested experiment.

Meantime our test app is shared with your team (see the private email group), hope it helps with the investigation.
Flags: needinfo?(avasilko)
Anna, I have tried to reproduce your problems but I haven't been very successful. I do note some very occasional glitches under load but it doesn't seem like a major problem. One thing that could help is an audio recording of the glitches you hear, so I can tell whether I'm reproducing the right thing or not.

We are aware of some general issues that can cause this in our audio pipeline but since they are not trivial to fix it's an ongoing effort.

If what you hear is worse than what I hear, I think this could warrant deeper investigation. Otherwise, it should get fixed in the long term by our continuous improvements.
Flags: needinfo?(avasilko)
One thing I noted in my MSGTracing is that a call with 8 remote peers seems to have 20 SourceMediaStreams present (2 local tracks + 16 remote + 2 more, seems ok), and ~190 TrackUnionStreams (Each original (given by an API, like gUM) and each clone of a MediaStream means 2 TrackUnionStreams, a `new MediaStream()` means 1 TrackUnionStream).

These TrackUnionStreams take a considerable time of the audio budget to process and are frequently the reason we overrun this budget (causing a glitch). While we are working to simplify and optimize this, a short term fix you could attempt would be to reduce the number of MediaStreams you use in your app. The same goes for MediaStreamTrack clones.
Blocks: 1423194
Attached image Tracing 1
Here are some screenshots from the tracing to show what it looks like when we overrun the budget.

The boxes here that take a long time are probably due to a syscall because of a Mutex in the audio thread. We are constantly working to reduce those, but are not quite at an optimal place today.
Attached image Tracing 2
Attached image Tracing 3
Priority: -- → P2
I have more proof of my analysis in comment 10.

I also traced another service which uses an SFU and it showed *a lot* less TrackUnionStreams being processed, and hence it glitched a lot less too. I'll attach a screenshot of twilio and another of the other service to show the difference, and how close to blowing the budget twilio is because of this.

This indicates there's a lot you can do on your end by reducing the number of MediaStreamTracks and MediaStreams to a minimum. And remember to stop and give up ones you don't need anymore.
With this said, I'm gonna dupe this to bug 1423194 which will get rid of most of that internal processing in the long term. We are doing other things too that may help with worst case scenarios observed here, but bug 1423194 should have the most impact on your service as it stands today.
No longer blocks: 1423194
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Whiteboard: [need info reporter 2018-06-18]
Trying to identify problems on our side I see in my tracing that each video track source (common across clones) is rendered in three media elements. The local track is in addition sent over a peer connection, and one track is rendered in four media elements -- this must be the main view.

I'm gonna guess that each of these media elements have their own clone of the original stream or tracks, and that's why we see so much processing happening.

Rendering in multiple elements like this is not affecting audio that much, but as mentioned before, all the clones are as the number of internal streams kind of blows up.

To follow up on this bug after we have landed a number of refactoring bits since I was debugging this last -- I re-ran the same test as in comment 15 to get a decent comparison.

All bits are considerably faster, and the impact from the extra tracks the twilio service is using is much smaller. This is both attributable to a simplified topology of tracks in our graph, and to less overhead per processed track, though mostly the former.

To complete the followup, the same other SFU service as I looked at in comment 16 now looks like this.

To summarize we've brought down the audio callback duration from ~10.5ms to ~2ms for Twilio, and from ~3.5ms to ~1ms for the other SFU service, when testing on a 2016 Macbook Pro with 8 remote peers.

This is a huge feat.

Flags: needinfo?(avasilko)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: