Closed Bug 1444545 Opened 3 years ago Closed 2 years ago

Noise during 10 person appear.in call for media standup

Categories

(Core :: WebRTC, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox60 --- affected

People

(Reporter: drno, Unassigned)

References

Details

On 2018-02-07 the standup call for the playback team had up to 10 participants. And several (all?) participants heard a strange noise for extended times when someone spoke.

The meeting had more participants then normal.

I spend some time looking at the logs from appear.in and found the following:
- there was probably some server issue around 02:30 after the full hour, because at point several client show very high packet loss
- pretty much all PeerConnections show packet loss at random times
- it looks like the packet loss always happened on a per PeerConnection level, which means the PeerConnection stats show loss of audio and video packets at the same time

But nothing in the logs from the transport explains the strange sound. rillian said it sounded like buffer under-runs to him.

I also gathered a Instruments profiler run for some time in that call, but failed to find anything obvious in that profile.
Assignee: nobody → drno
Rank: 8
Nils, please send me this profile. I trust that it will be exactly identical to the one I've been gathering as part of diagnosing 1443886, but I'd like to make sure. Thanks!
Flags: needinfo?(drno)
(In reply to Paul Adenot (:padenot) from comment #1)
> Nils, please send me this profile. I trust that it will be exactly identical
> to the one I've been gathering as part of diagnosing 1443886, but I'd like
> to make sure. Thanks!

Send a link to Dropbpx archive via email to you.
Flags: needinfo?(drno)
Today during the standup the sound quality was bad again. My Mac was basically utilized on all it's cores. I noticed that when I turn off sending video the sounds was perfect again, because the CPU usage dropped significantly. So apparently the noise in a big conference calls is caused by the audio parts not getting enough CPU cycles.

This makes we wonder:
- has the CPU usage for encoding video gone up lately?
- should we have code which degrades the outgoing video in case audio is missing cycles?
(In reply to Nils Ohlmeier [:drno] from comment #3)
> Today during the standup the sound quality was bad again. My Mac was
> basically utilized on all it's cores. I noticed that when I turn off sending
> video the sounds was perfect again, because the CPU usage dropped
> significantly. So apparently the noise in a big conference calls is caused
> by the audio parts not getting enough CPU cycles.
> 
> This makes we wonder:
> - has the CPU usage for encoding video gone up lately?

We regress constantly because we're not monitoring our performance. I'm currently working on having something that fixes this in bug 1444976, and will have something "soon".

> - should we have code which degrades the outgoing video in case audio is
> missing cycles?

This is a possibility that we should certainly explore.

However, we doing quite a large amount of very very very stupid things on the audio thread, that are not very hard to fix. As I'm writing the tooling I'm talking about above, I'm also testing it via a series of performance patches (nothing crazy, just stop doing some very stupid things). Profiling using Instruments and my new tools agree that the patches help in a situation where:

- Your Firefox is doing things other than the WebRTC call where you're experiencing bad audio
- There is a large number of peers in the call
- Your machine is otherwise not idle

which seems like a rather common case that we really want to work well.
I'm wondering if thread priorities has anything to-do. E.g if the video encoding happens with the highest priority and if the audio decoding happens on a thread with lower priority and therefore gets less cycles, even though we should give audio encoding and decoding the highest priority.
(In reply to Nils Ohlmeier [:drno] from comment #5)
> I'm wondering if thread priorities has anything to-do. E.g if the video
> encoding happens with the highest priority and if the audio decoding happens
> on a thread with lower priority and therefore gets less cycles, even though
> we should give audio encoding and decoding the highest priority.

It is the opposite: audio has the highest priority (and is in fact in a different scheduling class altogether) and everything else has normal priority.

The problem here is that our code has not been written carefully, and we experience priority inversion very often during the processing of the audio callback: say a thread with a normal priority acquires a resource (such as a mutex) and does something very complex. Now, the audio thread wakes up and wants to do something, but in the process, tries to acquire this resource: it will block. Now we have the thread with the highest priority being blocked on a resources that is currently acquired by a thread with a normal priority. This results in a glitch.

Code patterns that are causing this are:
- Locks
- Dispatches (even asynchronous, and clearly synchronous also, but we don't do those), this include MozPromise, Pledges, etc.
- Allocations/Deallocations (this can take a lock underneath, and also do system calls)
- Condition variable waits (this includes MozPromise::AwaitAll)
- System calls

Bumping the priority of the video encoding will achieve nothing.

In a few days, I'll be ready to land some patches and release some analysis tooling that will let us characterize and test for regressions (including proving my assertion above), as well as a few performances patches that will help with the issue at hand.
I don't have the capacity any more to work on this any time soon.
Assignee: drno → nobody
Since the sound these days is fine with a lot more participants I think we can close this.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.