Closed Bug 1702781 Opened 3 years ago Closed 2 years ago

webrtc audio processing is too CPU intensive

Categories

(Core :: WebRTC: Audio/Video, enhancement, P4)

Firefox 87
x86_64
Linux
enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1654112

People

(Reporter: xm.koutny+bugzilla.mozilla.org, Unassigned)

Details

The CPU usage of webrtc applications is a recognized issue. I set out to look into it from very external view -- profiling the browser that participates in a call (Jitsi instance).

I simulated a call with two additional participants, i.e. local (profiled) browser received two video (+audio) streams and sent out one audio stream, local webcam was disabled. I got the following profile [1] (leaf functions that contributed more than 0.5% to CPU time:

   1.49%  AudioIPC0        libxul.so                   [.] webrtc::SparseFIRFilter::Filter
   1.20%  DecodingThread   libxul.so                   [.] vpx_mbpost_proc_down_sse2
   1.17%  DecodingThread   libxul.so                   [.] vpx_mbpost_proc_across_ip_sse2.nextcol4
   1.06%  DecodingThread   libxul.so                   [.] vp8dx_decode_bool.lto_priv.0
   0.90%  AudioIPC0        libxul.so                   [.] webrtc::SincResampler::Resample
   0.72%  DecodingThread   libxul.so                   [.] vp8_decode.lto_priv.0
   0.72%  IncomingVideoSt  libc-2.33.so                [.] __memmove_avx_unaligned_erms
   0.68%  DecodingThread   libxul.so                   [.] vpx_post_proc_down_and_across_mb_row_sse2.nextcol
   0.66%  AudioIPC0        libxul.so                   [.] webrtc::ThreeBandFilterBank::Analysis
   0.60%  AudioIPC0        libxul.so                   [.] webrtc::FilterFarSSE2
   0.60%  AudioIPC0        libxul.so                   [.] webrtc::OouraFft::InverseFft
   0.59%  AudioIPC0        libxul.so                   [.] webrtc::ProcessNearendBlock
   0.56%  AudioIPC0        libc-2.33.so                [.] __memmove_avx_unaligned_erms
   0.56%  Renderer         libc-2.33.so                [.] __memmove_avx_unaligned_erms
   0.55%  DecodingThread   libxul.so                   [.] CopyRow_ERMS
   0.52%  DecodingThread   libxul.so                   [.] vpx_post_proc_down_and_across_mb_row_sse2.acrossnextcol
   0.51%  DecodingThread   libxul.so                   [.] GetCoeffs

Quite surprisingly, substantial amount of CPU time is spent in audio processing function webrtc::SparseFIRFilter::Filter, this becomes even more pronounced when video transer is off.
Quickly checking the webrtc::SparseFIRFilter::Filter function, I learnt that the whole audio processing was optimized in upstream webrtc. The author reports reduced audio processing complexity by 30% to 50%. I wasn't able to check effect of these changes in my setup.
I tried a nightly build from firefox-89.0a1.en-US.linux-x86_64 and the CPU work seems to be still similarly high (I don't have full profile though).

Suggestion: Upgrade the libwebrtc in the Firefox codebase to version that includes commit 1883d3e231.

(I'm aware of running bug 1654112 and the complexity of webrtc update, I'm filing this bug in order to report and track particular reason where such an update (presumably) helps.)

[1] Using perf record --user-callchains --call-graph=dwarf,8 -a -F99 -o perf.data -ecpu-clock -G $browser_cgroup

The Bugbug bot thinks this bug should belong to the 'Core::WebRTC' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: General → WebRTC
Product: Firefox → Core

Setting this enhancement to New in order to gain more visibility.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Bug 1654112 should handle the webrtc::SparseFIRFilter::Filter inefficiency. We can check once that is done.

Severity: -- → S4
Component: WebRTC → WebRTC: Audio/Video
Depends on: 1654112
Priority: -- → P4

Bug 1654112 has now landed. Michal, could you verify whether this has improved as you'd expect?

Flags: needinfo?(xm.koutny+bugzilla.mozilla.org)

Thanks for the heads-up. Before I get down to setting up by test bench -- could you (or anyone) clarify this for me -- the changes are now in the FF nightly (96.0a1)? (The other bug's milestone says 96 Branch.)

Flags: needinfo?(xm.koutny+bugzilla.mozilla.org)

Yes, they were in the first 96 Nightly.

(I don't have the full test setup as previously (also the Jitsi instance against that I tested changed), so I made a simplified measurement with a single participant in the call without video, i.e. to check only the (local) audio processing path.)

With MozillaFirefox-94.0.1-1.1.x86_64 from shipped on openSUSE Tumbleweed, I've got the following profile (>0.5% only)

10.53% Renderer libxul.so [.] glsl::gaussianBlurHorizontal<unsigned char, glsl::sampler2D_impl*>
6.60% Renderer libxul.so [.] commit_masked_solid_span<unsigned int, glsl::VectorType<unsigned short, 16> >
5.82% Renderer libxul.so [.] blend_pixels
2.99% GraphRunner libxul.so [.] webrtc::SparseFIRFilter::Filter
2.99% Renderer libxul.so [.] glsl::gaussianBlurVertical<unsigned char, glsl::sampler2D_impl*>
2.83% SwComposite libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
2.20% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::draw_span_R8
2.04% Renderer libxul.so [.] cs_clip_rectangle_FAST_PATH_frag::swgl_drawSpanR8
1.89% Renderer libxul.so [.] glsl::textureLinearUnpackedR8<glsl::sampler2D_impl*>
1.73% Renderer libxul.so [.] cs_clip_rectangle_frag::swgl_drawSpanR8
1.42% GraphRunner libxul.so [.] webrtc::SincResampler::Resample
1.26% GraphRunner libxul.so [.] webrtc::OouraFft::Fft
1.26% GraphRunner libxul.so [.] webrtc::ThreeBandFilterBank::Synthesis
1.26% GraphRunner libxul.so [.] webrtc::cftmdl_128_SSE2
1.10% GraphRunner libxul.so [.] webrtc::OouraFft::InverseFft
1.10% GraphRunner libxul.so [.] webrtc::OouraFft::bitrv2_128
1.10% GraphRunner libxul.so [.] webrtc::cft1st_128_SSE2
1.10% Renderer libxul.so [.] commit_solid_span<true>
1.10% Renderer libxul.so [.] cs_clip_box_shadow_TEXTURE_2D_frag::swgl_drawSpanR8
1.10% Renderer libxul.so [.] draw_quad_spans<unsigned int>
0.94% Renderer libxul.so [.] draw_quad_spans<unsigned char>
0.79% GraphRunner libxul.so [.] webrtc::FilterFarSSE2
0.63% GraphRunner libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
0.63% GraphRunner libxul.so [.] webrtc::FilterAdaptationSSE2
0.63% GraphRunner libxul.so [.] webrtc::ThreeBandFilterBank::Analysis
0.63% Renderer libc.so.6 [.] __memmove_avx_unaligned_erms_rtm

The browser cgroup consumed 49% of CPU time during the test.

(The webrtc::SparseFIRFilter::Filter is still pretty high but the profile is not quite comparable with what I captured for FF87 originally.)

Whereas the nightly build firefox-96.0a1 has the following profile

4.29% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::swgl_drawSpanR8
2.95% SwComposite libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
2.14% Renderer libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
1.61% GraphRunner libxul.so [.] webrtc::aec3::MatchedFilterCore_SSE2
1.61% Renderer libxul.so [.] cs_clip_box_shadow_TEXTURE_2D_frag::swgl_drawSpanR8
1.34% GraphRunner libxul.so [.] webrtc::SincResampler::Convolve_SSE
1.34% Renderer libxul.so [.] commit_masked_solid_span<unsigned int, unsigned short __vector(16)>
1.07% AudioIPC0 libxul.so [.] webrtc::SincResampler::Convolve_SSE
1.07% AudioIPC0 libxul.so [.] webrtc::ThreeBandFilterBank::Analysis
1.07% AudioIPent RPC libc.so.6 [.] syscall
0.80% AudioIP
ack RPC [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% BHMgr Monitor [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% BHMgr Monitor libc.so.6 [.] __GI___lll_lock_wake
0.80% GraphRunner [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% GraphRunner libxul.so [.] webrtc::CascadedBiQuadFilter::ApplyBiQuad
0.80% Renderer libxul.so [.] cs_clip_rectangle_frag::swgl_drawSpanR8
0.80% Renderer libxul.so [.] draw_quad_spans<unsigned int>
0.80% Web Content [kernel.kallsyms] [k] 0xffffffffa56d046b
0.80% Web Content firefox-bin [.] free
0.80% Web Content firefox-bin [.] malloc
0.54% AudioIPC0 libxul.so [.] webrtc::(anonymous namespace)::FilterCore
0.54% AudioIPack RPC [kernel.kallsyms] [k] 0xffffffffa56d046b
0.54% AudioIP
ack RPC libxul.so [.] <&std::fs::File as std::io::Read>::read
0.54% AudioIPent RPC [kernel.kallsyms] [k] 0xffffffffa56d046b
0.54% BHMgr Monitor [kernel.kallsyms] [k] 0xffffffffa6066484
0.54% BHMgr Monitor libxul.so [.] mozilla::BackgroundHangManager::MonitorThread
0.54% Compositor libc.so.6 [.] __futex_abstimed_wait_common
0.54% Compositor libxul.so [.] mozilla::ipc::ReadIPDLParam<mozilla::Maybe<mozilla::layers::TransformData> >
0.54% GeckoMain libc.so.6 [.] __poll
0.54% GraphRunner libxul.so [.] nsThread::ProcessNextEvent
0.54% GraphRunner libxul.so [.] webrtc::(anonymous namespace)::FilterCore
0.54% GraphRunner libxul.so [.] webrtc::RmsLevel::Analyze
0.54% GraphRunner libxul.so [.] webrtc::aec3::AdaptPartitions_Sse2
0.54% IPC I/O Child firefox-bin [.] Allocator<MozJemallocBase>::malloc
0.54% Renderer libxul.so [.] clear_buffer<unsigned int>
0.54% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::run
0.54% Renderer libxul.so [.] cs_clip_rectangle_FAST_PATH_frag::swgl_drawSpanR8
0.54% StreamTrans #5 libc.so.6 [.] __vfscanf_internal
0.54% StreamTrans #6 [kernel.kallsyms] [k] 0xffffffffa59c4441
0.54% WRRende
ckend#1 firefox-bin [.] BaseAllocator::realloc
0.54% WRRendeckend#1 firefox-bin [.] free
0.54% WRRende
ckend#1 libxul.so [.] webrender::picture::PicturePrimitive::take_context
0.54% WRRendeckend#1 libxul.so [.] webrender::picture::TileCacheInstance::update_prim_dependencies
0.54% WRRende
ckend#1 libxul.so [.] webrender::picture::TileNode::add_prim
0.54% WRSceneilder#1 libxul.so [.] crossbeam_channel::context::Context::wait_until
0.54% WRScene
ilder#1 libxul.so [.] webrender::scene_building::SceneBuilder::build_item
0.54% Web Content [kernel.kallsyms] [k] 0xffffffffa6066484
0.54% Web Content firefox-bin [.] mozilla::TimeStamp::Now
0.54% Web Content libc.so.6 [.] __GI___pthread_mutex_unlock_usercnt
0.54% threaded-ml [kernel.kallsyms] [k] 0xffffffffa56d046b
0.54% threaded-ml [kernel.kallsyms] [k] 0xffffffffa607605b
0.54% threaded-ml libpulsecommon-15.0.so [.] pa_pdispatch_run
0.54% threaded-ml libpulsecommon-15.0.so [.] 0x000000000002b5e2

The browser cgroup consumed 29% of CPU time during the test.

(The profile is much flatter (longer tail, but it may be caused by different debuginfo or process mapping), the webrtc::SparseFIRFilter::Filter is replaced.)

I can't give proper end user feedback (convenience/CPU fans during ordinary calls) but using the profiling method I can see some improvement (take it with a grain of salt, this isn't statistically tested). AFAIC, this bug can be closed as resolved.

Thanks!

Status: NEW → RESOLVED
Closed: 2 years ago
No longer depends on: 1654112
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.