Bad audio quality in a group video call, CPU over 120%
Categories
(Core :: WebRTC: Audio/Video, defect, P2)
Tracking
()
People
(Reporter: avasilko, Unassigned)
Details
Attachments
(10 files)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.79 Safari/537.36 Steps to reproduce: 1. Join a group video call (SFU) while sharing audio and video tracks 2. The call has 6 participants, 4 of them share video and audio tracks, the rest shares only audio Reproduced with the latest Firefox beta version. The issue is not reproducible in Chrome browser with the same video call. Please find attached Firefox performance and memory snapshot/logs Actual results: - Received incoming audio is crackling, has occasional artifacts (not reproducible in Chrome in the same video room) - CPU usage by Firefox spikes up to 120% Expected results: Clean audio
Reporter | ||
Comment 1•6 years ago
|
||
Updated•6 years ago
|
Comment 2•6 years ago
|
||
Hi Anna, thanks for filing. For us to understand this a bit better, please answer a couple of questions: Does this happen only on Mac, or on other platforms too? What audio backend were you using when this happened? Find the one your Firefox uses on about:support, under "Media". Do you have a link to a page I can use to reproduce this? This is to ensure we look at and debug the same things as you are reporting. Feel free to email me in private if it's something you can't share publicly.
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
Hi Andreas, The "Media" section mentions audio backend being "audiounit". I am attaching a screenshot of details to this ticket. Meantime we are trying to open up our test app for your team to debug. My team will try on Windows and update here shortly.
Reporter | ||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
I see you do not use the built-in output device. Can you please try to reproduce with that?
Hi Andreas, I've checked today the issue, described by Anna on Windows and MacOS. On Windows there is no crackling sound, which you can clearly hear on MacOS. In Windows for the same scenario FF uses ± 50% CPU. I've used 60.0.1 version for both platforms.
Comment 7•6 years ago
|
||
Thanks for the information! We do have some issues with a slow graphics route on Mac. My guess would be that that's what's causing the CPU spike. To try to verify that without having access to your service, could you try an experiment that doesn't render video (but still negotiates, sends and receives) and compare CPU usage for this on both Windows and Mac? I would expect Mac when not rendering video to drop much closer to what you see for Windows. Whether that is then causing the audio glitches is another question, but one step at a time. Of course if we had a page to test on I could also do some profiling to verify this and dig deeper. That might be the simpler path forward.
Reporter | ||
Comment 8•6 years ago
|
||
Hi Andreas, Thanks, we'll try the suggested experiment. Meantime our test app is shared with your team (see the private email group), hope it helps with the investigation.
Comment 9•6 years ago
|
||
Anna, I have tried to reproduce your problems but I haven't been very successful. I do note some very occasional glitches under load but it doesn't seem like a major problem. One thing that could help is an audio recording of the glitches you hear, so I can tell whether I'm reproducing the right thing or not. We are aware of some general issues that can cause this in our audio pipeline but since they are not trivial to fix it's an ongoing effort. If what you hear is worse than what I hear, I think this could warrant deeper investigation. Otherwise, it should get fixed in the long term by our continuous improvements.
Comment 10•6 years ago
|
||
One thing I noted in my MSGTracing is that a call with 8 remote peers seems to have 20 SourceMediaStreams present (2 local tracks + 16 remote + 2 more, seems ok), and ~190 TrackUnionStreams (Each original (given by an API, like gUM) and each clone of a MediaStream means 2 TrackUnionStreams, a `new MediaStream()` means 1 TrackUnionStream). These TrackUnionStreams take a considerable time of the audio budget to process and are frequently the reason we overrun this budget (causing a glitch). While we are working to simplify and optimize this, a short term fix you could attempt would be to reduce the number of MediaStreams you use in your app. The same goes for MediaStreamTrack clones.
Comment 11•6 years ago
|
||
Here are some screenshots from the tracing to show what it looks like when we overrun the budget. The boxes here that take a long time are probably due to a syscall because of a Mutex in the audio thread. We are constantly working to reduce those, but are not quite at an optimal place today.
Comment 12•6 years ago
|
||
Comment 13•6 years ago
|
||
Updated•6 years ago
|
Comment 14•6 years ago
|
||
I have more proof of my analysis in comment 10. I also traced another service which uses an SFU and it showed *a lot* less TrackUnionStreams being processed, and hence it glitched a lot less too. I'll attach a screenshot of twilio and another of the other service to show the difference, and how close to blowing the budget twilio is because of this. This indicates there's a lot you can do on your end by reducing the number of MediaStreamTracks and MediaStreams to a minimum. And remember to stop and give up ones you don't need anymore.
Comment 15•6 years ago
|
||
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
With this said, I'm gonna dupe this to bug 1423194 which will get rid of most of that internal processing in the long term. We are doing other things too that may help with worst case scenarios observed here, but bug 1423194 should have the most impact on your service as it stands today.
Comment 18•6 years ago
|
||
Trying to identify problems on our side I see in my tracing that each video track source (common across clones) is rendered in three media elements. The local track is in addition sent over a peer connection, and one track is rendered in four media elements -- this must be the main view. I'm gonna guess that each of these media elements have their own clone of the original stream or tracks, and that's why we see so much processing happening. Rendering in multiple elements like this is not affecting audio that much, but as mentioned before, all the clones are as the number of internal streams kind of blows up.
Comment 19•5 years ago
|
||
To follow up on this bug after we have landed a number of refactoring bits since I was debugging this last -- I re-ran the same test as in comment 15 to get a decent comparison.
All bits are considerably faster, and the impact from the extra tracks the twilio service is using is much smaller. This is both attributable to a simplified topology of tracks in our graph, and to less overhead per processed track, though mostly the former.
Comment 20•5 years ago
|
||
To complete the followup, the same other SFU service as I looked at in comment 16 now looks like this.
To summarize we've brought down the audio callback duration from ~10.5ms to ~2ms for Twilio, and from ~3.5ms to ~1ms for the other SFU service, when testing on a 2016 Macbook Pro with 8 remote peers.
This is a huge feat.
Description
•