Closed Bug 979716 Opened 6 years ago Closed 5 years ago

webrtc audio connection uses high cpu

Categories

(Core :: WebRTC: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33
blocking-b2g 2.0+
Tracking Status
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: slee, Assigned: jesup)

References

Details

(Keywords: perf, Whiteboard: [ft:loop])

Attachments

(8 files, 5 obsolete files)

1.41 MB, application/zip
Details
1.16 MB, application/zip
Details
1.57 KB, patch
jesup
: review+
Details | Diff | Splinter Review
925.67 KB, application/zip
Details
3.12 KB, patch
jmvalin
: review+
Details | Diff | Splinter Review
3.12 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.79 KB, patch
slee
: review+
Details | Diff | Splinter Review
6.61 KB, patch
gcp
: review+
Details | Diff | Splinter Review
I found that audio-only connection uses about 38% of cpu in content process on peak. After profiling,there are 10.x% of cpu in opus encoding, 9.x% of cpu in echo-cancelation. 
For echo cancelation, it's resulted from that we use default setting rather than mobile settings, [1]. After setting "media.peerconnection.aec" to 4, cpu usage decreases to 34%. Then I apply the comment on [2] to reduce the complexity of opus(from default to 1), browser cpu usage is 32%. 

* build version: 171311:0085a162499f

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=861050#c39
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=861050#c52
Keywords: perf
Attached file apply aec for mobile (obsolete) —
Attached file change opus complexity to 1 (obsolete) —
1. change opus complexity to 1
2. use aec mobile
Attached patch UseMobileAEC.patch (obsolete) — Splinter Review
Using aec mobile version on B2G. Should we also modify the pref on Android?
Attachment #8385864 - Flags: review?(rjesup)
Attached patch set opus complexity to 1 (obsolete) — Splinter Review
34% is still nuts. Is OPUS supported in hardware?
Can you please upload the profiles and link to them? Thanks.
(In reply to Andreas Gal :gal from comment #6)
> 34% is still nuts. Is OPUS supported in hardware?
I think not but I can try G711. It consumes less CPU. 

(In reply to Andreas Gal :gal from comment #7)
> Can you please upload the profiles and link to them? Thanks.
https://bugzilla.mozilla.org/attachment.cgi?id=8385863 is the profiling results of enabling ace-m and using complexity 1 to encode OPUS(the default value is 9 which uses more CPU).
Comment on attachment 8385864 [details] [diff] [review]
UseMobileAEC.patch

Review of attachment 8385864 [details] [diff] [review]:
-----------------------------------------------------------------

someone should profile that on Android or at least do a top with that setting first (before applying it to Android)
Attachment #8385864 - Flags: review?(rjesup) → review+
jmspeex/derf: Can you comment on setting Opus complexity to 1?  Quality and differences?

slee: we likely won't want to set the complexity to 1 on desktop.  Android will be an open question.

I don't think anything has hardware Opus encode/decode yet.
Flags: needinfo?(tterribe)
Flags: needinfo?(jmvalin)
Attached file enable aec for mobile and g711 (obsolete) —
This is the profiling result of enabling ace for mobile and g711. The CPU usage is about 22% of browser app.
Also, would it make sense to change any of the bandwidth params for Opus (per derf's and my IRC conversation a few days ago)?
FYI, before dropping opus complexity, opus encode was 11% (9.2% silk, 1.5% celt).  opus_decode was 1.8% (1/2 silk, 1/2 celt - note decode is driven by the complexity/bitrate chosen by the far end)
I don't think we have good measurements on the quality drop on voice content for using complexity 1. But it will certainly be much better than falling back to G.711.

Looking at the current rate thresholds in mono_voice_bandwidth_thresholds[] and mono_voice_bandwidth_thresholds[] in media/libopus/src/opus_encoder.c, I think we should be setting a bitrate around 16 kbps to keep things in WB. There is absolutely no point in going over WB with a 16 kHz input signal like we currently have.
Flags: needinfo?(tterribe)
I don't have much more to say about complexity 1, except that it may be worth trying complexity 3 or 5 to see what the CPU is like. But yeah, even complexity 1 should be OK. Also, is this Opus 1.1 with asm optimizations enabled?
Flags: needinfo?(jmvalin)
Steven, can you benchmark with the patch from bug 946021 and see if it makes any difference? I'd also appreciate if you could write-up how you test; that's what I was blocked on with that bug.
Depends on: 946021
Flags: needinfo?(slee)
(In reply to Ralph Giles (:rillian) from comment #16)
> Steven, can you benchmark with the patch from bug 946021 and see if it makes
> any difference? I'd also appreciate if you could write-up how you test;
> that's what I was blocked on with that bug.
Hi rillian,

I setup my gupshup to test audio chatting, [1]. And the profiling data is from miniperf. Here is some steps to enable miniperf on B2G. 

1. follow [2] to download the source code, export B2G_PROFILING=1 and config devices
2. clone git://github.com/jld/b2g-toolchain-prebuilt into B2G/prebuilt/alt-toolchain
3. replace B2G/gonk-misc with git://github.com/jld/gonk-misc if needed (ex, peak)
3-1. For peak, you should also replace the file, kernel/arch/arm/configs/C8680_defconfig, by https://wiki.mozilla.org/images/c/c7/C8680.tar
4. If you are encountering building error of B2G/external/skia/src/opts/SkBitmapProcState_opts_arm.cpp, please apply the patch, https://www.dropbox.com/s/sy22dit07fpmrw4/blt_fail.patch
5. run the program and profile by using “./run-perf.sh record-sps”. After getting the profiling data, you can upload to [3] to see the results.
6. Some devices cannot flash gonk or you will brick it. Before flash gonk, please check first.

[1] http://nightly-gupshup.herokuapp.com/login
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=831611#c53
[3] http://people.mozilla.org/~bgirard/cleopatra/#
Flags: needinfo?(slee)
Bug 946021 enables the arm optimization of opus. I tried it with enabling mobile AEC and with opus default complexity. The CPU usage is lower than before.
This data applies,
* ARM optimization
* AEC mobile
* Low bit rate patch from jesup, https://bugzilla.mozilla.org/attachment.cgi?id=8388998
File a bug to hook up the Opus complexity control to the Load Manager?
Also we should expose more tuning parameters like this to prefs (even if we eventually disable them) - it makes investigating tuning (especially device-specific tuning) MUCH faster.

slee's profile shows the combination of the three patches drops Opus Encode to 8.7% - 7.3 silk, 1.3 celt.  (Was 10.8 total in the profile on bug 946021, though I don't think that had the complexity patch)  The numbers overall are confusing, however.

I'll note going back to the original profiles here from 3/5 with the mods - that showed actually lower opus_encode numbers (5.x % for opus_encode) though total CPU use was similar.  For example, does Opus encode time vary according to the input signal?  Could be; video encoders certainly can (especially for black input).

This highlights that the testing setup has considerable variation in the measured numbers - the other parts of the trace (MSG, which is mostly NetEq + decode; e10s, etc) vary by over 1% (which is a large percentage of the amount of time they use, perhaps 20% variation).  So it's not a good idea when looking at the result of a patch to look at "Browser" %.  That's an important measure, but we need to find a way to measure that more repeatably or reliably (or over a longer period to stabilize the measurement).

The best test of a specific patch or patchset is to profile with base, apply patch, profile with patch in the same physical environment (realizing that network especially wireless can vary) - and this assumes that the profiling is producing consistent numbers for the same code in consecutive runs; if it's not then we'll need to merge more than one run probably.  Network variance can especially affect the decode chain (NetEq and the jitter buffering).
Assignee: nobody → rjesup
It's strange that you'd see any CPU spent in CELT if you're indeed running at 16 kHz/16 kbps. In that configuration, Opus should be operating only with SILK. Can you give more details?
Attachment #8388998 - Flags: review?(jmvalin)
Attachment #8388998 - Flags: review?(jmvalin)
Noted that this patch was r+'d but never marked for checkin or landed
https://hg.mozilla.org/integration/mozilla-inbound/rev/dcf85de7c8cc
Whiteboard: [WebRTC] → [leave-open]
Comment on attachment 8385864 [details] [diff] [review]
UseMobileAEC.patch

Aurora approval is important since that's 2.0.

Approval Request Comment
[Feature/regressing bug #]:N/A

[User impact if declined]: greatly increased CPU use for AEC; this causes pops in video calls on a Flame.  The regular AEC is designed for more-powerful CPUs, and in particular assumes high-performance floating point support.

[Describe test coverage new/current, TBPL]: the AEC requires fake external devices to test, which is not supported by TBPL testing currently, especially on B2G emulator.

[Risks and why]: very low risk; Android is using the mobile AEC already by default, but because of the way things are ifdefed in the code, the forced use of AECM didn't apply to B2G.  This just flips the requested AEC type to AECM (mobile)

[String/UUID change made/needed]: none
Attachment #8385864 - Flags: approval-mozilla-aurora?
Comment on attachment 8443916 [details] [diff] [review]
Make Opus complexity configurable in WebRTC; default Gonk to complexity 1

Review of attachment 8443916 [details] [diff] [review]:
-----------------------------------------------------------------

r? to ted for build; to jmspeex for reducing complexity in general (exact number to be selected after looking at comparative profiles)
Attachment #8443916 - Flags: review?(ted)
Attachment #8443916 - Flags: review?(jmvalin)
Whiteboard: [leave-open] → [leave-open][webrtc-uplift]
(In reply to Randell Jesup [:jesup] from comment #27)
> [Describe test coverage new/current, TBPL]: the AEC requires fake external
> devices to test, which is not supported by TBPL testing currently,
> especially on B2G emulator.

Have you manually tested these changes on B2G?

> [Risks and why]: very low risk; Android is using the mobile AEC already by
> default, but because of the way things are ifdefed in the code, the forced
> use of AECM didn't apply to B2G.  This just flips the requested AEC type to
> AECM (mobile)

Am I correct in assuming that your comment above means that the mobile code path has been pretty well tested in fennec at this point? How long ago did fennec switch to using the mobile profile?
(In reply to Lawrence Mandel [:lmandel] from comment #29)
> (In reply to Randell Jesup [:jesup] from comment #27)
> > [Describe test coverage new/current, TBPL]: the AEC requires fake external
> > devices to test, which is not supported by TBPL testing currently,
> > especially on B2G emulator.
> 
> Have you manually tested these changes on B2G?

Yes, on the Flame (and done profiles); this helped resolve the "popping" in video calls due to overruns processing input audio. (Cut CPU use by ~4-5% in top -t for the opensl_rec_thread)

> 
> > [Risks and why]: very low risk; Android is using the mobile AEC already by
> > default, but because of the way things are ifdefed in the code, the forced
> > use of AECM didn't apply to B2G.  This just flips the requested AEC type to
> > AECM (mobile)
> 
> Am I correct in assuming that your comment above means that the mobile code
> path has been pretty well tested in fennec at this point? How long ago did
> fennec switch to using the mobile profile?

Fennec has always used the mobile AEC; apparently the #ifdef for android that made that the default didn't get triggered on B2G.  Switching the pref (as we have done here) is a simpler and safer way to convert b2g over.

The mobile AEC is also used by Google; we don't have any mods to it locally.
Comment on attachment 8443916 [details] [diff] [review]
Make Opus complexity configurable in WebRTC; default Gonk to complexity 1

Review of attachment 8443916 [details] [diff] [review]:
-----------------------------------------------------------------

This patch not only changes the Android behaviour, but AFAICT also the general one. Opus complexity in 1.1 defaults to 9, not 10. In practice, this would make a difference only on fixed-point platform, where the tone/speech/music analysis is enabled only at complexity 10. Is that what you intended?
This bug (especially AEC mobile, but other bits help too) should be 2.0+ since it causes nasty audio pops in webrtc video calls
blocking-b2g: --- → 2.0?
(In reply to Jean-Marc Valin (:jmspeex) from comment #31)
> This patch not only changes the Android behaviour, but AFAICT also the
> general one. Opus complexity in 1.1 defaults to 9, not 10. In practice, this
> would make a difference only on fixed-point platform, where the
> tone/speech/music analysis is enabled only at complexity 10. Is that what
> you intended?

Sorry I was told it was a default of 10, sorry.  I could revise it to not change the value unless explicity set.
Flags: needinfo?(jmvalin)
Comment on attachment 8385864 [details] [diff] [review]
UseMobileAEC.patch

Aurora approval granted. (Which somewhat negates the need for the 2.0 nom.) Thanks for the clarification Randell.
Attachment #8385864 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8443916 [details] [diff] [review]
Make Opus complexity configurable in WebRTC; default Gonk to complexity 1

Review of attachment 8443916 [details] [diff] [review]:
-----------------------------------------------------------------

I'm good if the complexity is only set when we actually want to change it.
Attachment #8443916 - Flags: review?(jmvalin) → review+
>I'm good if the complexity is only set when we actually want to change it.

That's not what the patch does. I'd think the change has to go into opus.gypi conditional on toolkit_gonk or something, so it doesn't change the default on non-gonk.
I mean that jesup and I discussed this in IRC and I said I was OK if the patch was changed to not explicitly set the complexity when not needed.
Flags: needinfo?(jmvalin)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #38)
> >I'm good if the complexity is only set when we actually want to change it.
> 
> That's not what the patch does. I'd think the change has to go into
> opus.gypi conditional on toolkit_gonk or something, so it doesn't change the
> default on non-gonk.

The changing of the default is conditional on Gonk in gyp.mozbuild
>The changing of the default is conditional on Gonk in gyp.mozbuild

The patch does the opus_encoder_ctl in all cases. If the underlying library changes the default, you're wrong. Comment 31 point out this is already the case. That was the point of jmspeex's comment.
FYI, to capture it here: I modified the patch in response to jean-marc's comments; it doesn't touch the complexity except on Gonk.  This was the basis of the r+ in IRC.  So we should be good.
Its not blocking but something that could go through approval, as done.
blocking-b2g: 2.0? → -
Attachment #8443916 - Flags: review?(ted)
Attached patch opus_complexitySplinter Review
This is the patch-as-landed

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: Increased CPU use in webrtc on B2G; will push us closer to where we'd get 'pops' in the audio in video calls.

[Describe test coverage new/current, TBPL]: All normal webrtc tests that have media on B2G will be using this.  Landed in central.

[Risks and why]: very low risk; Opus (the project) is well tested at all complexity levels.

[String/UUID change made/needed]: none
Attachment #8385862 - Attachment is obsolete: true
Attachment #8385863 - Attachment is obsolete: true
Attachment #8385864 - Attachment is obsolete: true
Attachment #8385865 - Attachment is obsolete: true
Attachment #8385975 - Attachment is obsolete: true
Attachment #8449632 - Flags: review+
Attachment #8449632 - Flags: approval-mozilla-aurora?
Comment on attachment 8388998 [details] [diff] [review]
drop opus bitrate to 16000bps to reduce mobile cpu use

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: Increased bandwidth use (most noticeable in low-bw cases).  Small chance of increased popping of audio in b2g video calls.  jmspeex indicates the quality difference for 16000 vs 32000bps at a 16KHz mono sampling rate is minimal.

[Describe test coverage new/current, TBPL]: on m-c.  All media tests are using this patch/bitrate.

[Risks and why]: almost no risk.

[String/UUID change made/needed]:  None.

marking the verbal r+ from jmspeex
Attachment #8388998 - Flags: review+
Attachment #8388998 - Flags: approval-mozilla-aurora?
Comment on attachment 8449632 [details] [diff] [review]
opus_complexity

Aurora+
Attachment #8449632 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8388998 [details] [diff] [review]
drop opus bitrate to 16000bps to reduce mobile cpu use

Aurora+
Attachment #8388998 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8451070 - Flags: review?(slee)
Attachment #8451070 - Flags: review?(slee) → review+
https://hg.mozilla.org/releases/mozilla-aurora/rev/80e9afb69bad
Landed under existing a= since the earlier uplift was the wrong patch (didn't match what was on m-c).
Summary: webrtc audio connection uses high cpu on peak → webrtc audio connection uses high cpu
Whiteboard: [leave-open][webrtc-uplift] → [webrtc-uplift]
I think I have to bite the bullet and change the audio threading architecture.  Ever since we reduced delay by adding the 'bypass' to direct-connect audio gum/AppendToTrack to PeerConnection, we've been doing resampling and Opus encode/etc on the gUM reception thread (opensl_rec_thread on b2g).  The problem is that this thread is also handling AEC among other things, and combined with contention from other sources could blow it's deadline to go back for the next 10ms segment on B2G.  This patch breaks that link by proxying audio insertions from the record thread to a special-purpose thread than handles the insertion of the data into MSG (and direct-connected PeerConnections/Opus).  This makes it far more likely that the record thread will avoid blowing the deadline, especially on dual-core (or more) systems.  It also should help on Desktop under heavy load, and likely Android as well.
In tests on Talky.io or locally in a one-way call, opensl_rec_thread wen from 11-13% down to 5-6%, and I could hear no pops in the audio (even when top -t was running on adb, which previously would cause pops if we were close to the edge).
Attachment #8452503 - Flags: review?(gpascutto)
Comment on attachment 8452503 [details] [diff] [review]
Proxy incoming audio to another thread before calling AppendToTrack()

Review of attachment 8452503 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webrtc/MediaEngineWebRTC.cpp
@@ +326,5 @@
>    mVoiceEngine = nullptr;
> +
> +  if (mThread) {
> +    mThread->Shutdown();
> +  }

We tend to like to null things (see lines above) for easier bug-finding, though not enough context to see if it's relevant here.

::: content/media/webrtc/MediaEngineWebRTCAudio.cpp
@@ +587,3 @@
>        // This is safe from any thread, and is safe if the track is Finished
>        // or Destroyed.
> +      RUN_ON_THREAD(mThread, WrapRunnable(mSources[i], &SourceMediaStream::AppendToTrack,

As discussed in IRC, this does some evil template magic and does a transfer of ownership of the AutoPtr.
Attachment #8452503 - Flags: review?(gpascutto) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c2246f4daa4

2.0? since this is probably the final fix to nail audio pops in video calls on Flame for 2.0.
blocking-b2g: - → 2.0?
Whiteboard: [webrtc-uplift] → [webrtc-uplift][ft:loop]
Target Milestone: --- → mozilla33
blocking-b2g: 2.0? → 2.0+
Whiteboard: [webrtc-uplift][ft:loop] → [ft:loop]
Depends on: 1130150
You need to log in before you can comment on or make changes to this bug.