webrtc audio processing is too CPU intensive
Categories
(Core :: WebRTC: Audio/Video, enhancement, P4)
Tracking
()
People
(Reporter: xm.koutny+bugzilla.mozilla.org, Unassigned)
Details
The CPU usage of webrtc applications is a recognized issue. I set out to look into it from very external view -- profiling the browser that participates in a call (Jitsi instance).
I simulated a call with two additional participants, i.e. local (profiled) browser received two video (+audio) streams and sent out one audio stream, local webcam was disabled. I got the following profile [1] (leaf functions that contributed more than 0.5% to CPU time:
1.49% AudioIPC0 libxul.so [.] webrtc::SparseFIRFilter::Filter
1.20% DecodingThread libxul.so [.] vpx_mbpost_proc_down_sse2
1.17% DecodingThread libxul.so [.] vpx_mbpost_proc_across_ip_sse2.nextcol4
1.06% DecodingThread libxul.so [.] vp8dx_decode_bool.lto_priv.0
0.90% AudioIPC0 libxul.so [.] webrtc::SincResampler::Resample
0.72% DecodingThread libxul.so [.] vp8_decode.lto_priv.0
0.72% IncomingVideoSt libc-2.33.so [.] __memmove_avx_unaligned_erms
0.68% DecodingThread libxul.so [.] vpx_post_proc_down_and_across_mb_row_sse2.nextcol
0.66% AudioIPC0 libxul.so [.] webrtc::ThreeBandFilterBank::Analysis
0.60% AudioIPC0 libxul.so [.] webrtc::FilterFarSSE2
0.60% AudioIPC0 libxul.so [.] webrtc::OouraFft::InverseFft
0.59% AudioIPC0 libxul.so [.] webrtc::ProcessNearendBlock
0.56% AudioIPC0 libc-2.33.so [.] __memmove_avx_unaligned_erms
0.56% Renderer libc-2.33.so [.] __memmove_avx_unaligned_erms
0.55% DecodingThread libxul.so [.] CopyRow_ERMS
0.52% DecodingThread libxul.so [.] vpx_post_proc_down_and_across_mb_row_sse2.acrossnextcol
0.51% DecodingThread libxul.so [.] GetCoeffs
Quite surprisingly, substantial amount of CPU time is spent in audio processing function webrtc::SparseFIRFilter::Filter
, this becomes even more pronounced when video transer is off.
Quickly checking the webrtc::SparseFIRFilter::Filter
function, I learnt that the whole audio processing was optimized in upstream webrtc. The author reports reduced audio processing complexity by 30% to 50%. I wasn't able to check effect of these changes in my setup.
I tried a nightly build from firefox-89.0a1.en-US.linux-x86_64 and the CPU work seems to be still similarly high (I don't have full profile though).
Suggestion: Upgrade the libwebrtc in the Firefox codebase to version that includes commit 1883d3e231.
(I'm aware of running bug 1654112 and the complexity of webrtc update, I'm filing this bug in order to report and track particular reason where such an update (presumably) helps.)
[1] Using perf record --user-callchains --call-graph=dwarf,8 -a -F99 -o perf.data -ecpu-clock -G $browser_cgroup
Comment 1•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::WebRTC' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Comment 2•3 years ago
|
||
Setting this enhancement to New in order to gain more visibility.
Comment 3•3 years ago
|
||
Bug 1654112 should handle the webrtc::SparseFIRFilter::Filter inefficiency. We can check once that is done.
Comment 4•2 years ago
|
||
Bug 1654112 has now landed. Michal, could you verify whether this has improved as you'd expect?
Reporter | ||
Comment 5•2 years ago
|
||
Thanks for the heads-up. Before I get down to setting up by test bench -- could you (or anyone) clarify this for me -- the changes are now in the FF nightly (96.0a1)? (The other bug's milestone says 96 Branch
.)
Comment 6•2 years ago
|
||
Yes, they were in the first 96 Nightly.
Reporter | ||
Comment 7•2 years ago
|
||
(I don't have the full test setup as previously (also the Jitsi instance against that I tested changed), so I made a simplified measurement with a single participant in the call without video, i.e. to check only the (local) audio processing path.)
With MozillaFirefox-94.0.1-1.1.x86_64 from shipped on openSUSE Tumbleweed, I've got the following profile (>0.5% only)
10.53% Renderer libxul.so [.] glsl::gaussianBlurHorizontal<unsigned char, glsl::sampler2D_impl*>
6.60% Renderer libxul.so [.] commit_masked_solid_span<unsigned int, glsl::VectorType<unsigned short, 16> >
5.82% Renderer libxul.so [.] blend_pixels
2.99% GraphRunner libxul.so [.] webrtc::SparseFIRFilter::Filter
2.99% Renderer libxul.so [.] glsl::gaussianBlurVertical<unsigned char, glsl::sampler2D_impl*>
2.83% SwComposite libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
2.20% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::draw_span_R8
2.04% Renderer libxul.so [.] cs_clip_rectangle_FAST_PATH_frag::swgl_drawSpanR8
1.89% Renderer libxul.so [.] glsl::textureLinearUnpackedR8<glsl::sampler2D_impl*>
1.73% Renderer libxul.so [.] cs_clip_rectangle_frag::swgl_drawSpanR8
1.42% GraphRunner libxul.so [.] webrtc::SincResampler::Resample
1.26% GraphRunner libxul.so [.] webrtc::OouraFft::Fft
1.26% GraphRunner libxul.so [.] webrtc::ThreeBandFilterBank::Synthesis
1.26% GraphRunner libxul.so [.] webrtc::cftmdl_128_SSE2
1.10% GraphRunner libxul.so [.] webrtc::OouraFft::InverseFft
1.10% GraphRunner libxul.so [.] webrtc::OouraFft::bitrv2_128
1.10% GraphRunner libxul.so [.] webrtc::cft1st_128_SSE2
1.10% Renderer libxul.so [.] commit_solid_span<true>
1.10% Renderer libxul.so [.] cs_clip_box_shadow_TEXTURE_2D_frag::swgl_drawSpanR8
1.10% Renderer libxul.so [.] draw_quad_spans<unsigned int>
0.94% Renderer libxul.so [.] draw_quad_spans<unsigned char>
0.79% GraphRunner libxul.so [.] webrtc::FilterFarSSE2
0.63% GraphRunner libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
0.63% GraphRunner libxul.so [.] webrtc::FilterAdaptationSSE2
0.63% GraphRunner libxul.so [.] webrtc::ThreeBandFilterBank::Analysis
0.63% Renderer libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
The browser cgroup consumed 49% of CPU time during the test.
(The webrtc::SparseFIRFilter::Filter
is still pretty high but the profile is not quite comparable with what I captured for FF87 originally.)
Whereas the nightly build firefox-96.0a1 has the following profile
4.29% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::swgl_drawSpanR8
2.95% SwComposite libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
2.14% Renderer libc.so.6 [.] __memmove_avx_unaligned_erms_rtm
1.61% GraphRunner libxul.so [.] webrtc::aec3::MatchedFilterCore_SSE2
1.61% Renderer libxul.so [.] cs_clip_box_shadow_TEXTURE_2D_frag::swgl_drawSpanR8
1.34% GraphRunner libxul.so [.] webrtc::SincResampler::Convolve_SSE
1.34% Renderer libxul.so [.] commit_masked_solid_span<unsigned int, unsigned short __vector(16)>
1.07% AudioIPC0 libxul.so [.] webrtc::SincResampler::Convolve_SSE
1.07% AudioIPC0 libxul.so [.] webrtc::ThreeBandFilterBank::Analysis
1.07% AudioIPent RPC libc.so.6 [.] syscallack RPC [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% AudioIP
0.80% BHMgr Monitor [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% BHMgr Monitor libc.so.6 [.] __GI___lll_lock_wake
0.80% GraphRunner [kernel.kallsyms] [k] 0xffffffffa607605b
0.80% GraphRunner libxul.so [.] webrtc::CascadedBiQuadFilter::ApplyBiQuad
0.80% Renderer libxul.so [.] cs_clip_rectangle_frag::swgl_drawSpanR8
0.80% Renderer libxul.so [.] draw_quad_spans<unsigned int>
0.80% Web Content [kernel.kallsyms] [k] 0xffffffffa56d046b
0.80% Web Content firefox-bin [.] free
0.80% Web Content firefox-bin [.] malloc
0.54% AudioIPC0 libxul.so [.] webrtc::(anonymous namespace)::FilterCore
0.54% AudioIPack RPC [kernel.kallsyms] [k] 0xffffffffa56d046back RPC libxul.so [.] <&std::fs::File as std::io::Read>::read
0.54% AudioIP
0.54% AudioIPent RPC [kernel.kallsyms] [k] 0xffffffffa56d046bckend#1 firefox-bin [.] BaseAllocator::realloc
0.54% BHMgr Monitor [kernel.kallsyms] [k] 0xffffffffa6066484
0.54% BHMgr Monitor libxul.so [.] mozilla::BackgroundHangManager::MonitorThread
0.54% Compositor libc.so.6 [.] __futex_abstimed_wait_common
0.54% Compositor libxul.so [.] mozilla::ipc::ReadIPDLParam<mozilla::Maybe<mozilla::layers::TransformData> >
0.54% GeckoMain libc.so.6 [.] __poll
0.54% GraphRunner libxul.so [.] nsThread::ProcessNextEvent
0.54% GraphRunner libxul.so [.] webrtc::(anonymous namespace)::FilterCore
0.54% GraphRunner libxul.so [.] webrtc::RmsLevel::Analyze
0.54% GraphRunner libxul.so [.] webrtc::aec3::AdaptPartitions_Sse2
0.54% IPC I/O Child firefox-bin [.] Allocator<MozJemallocBase>::malloc
0.54% Renderer libxul.so [.] clear_buffer<unsigned int>
0.54% Renderer libxul.so [.] cs_blur_ALPHA_TARGET_frag::run
0.54% Renderer libxul.so [.] cs_clip_rectangle_FAST_PATH_frag::swgl_drawSpanR8
0.54% StreamTrans #5 libc.so.6 [.] __vfscanf_internal
0.54% StreamTrans #6 [kernel.kallsyms] [k] 0xffffffffa59c4441
0.54% WRRende
0.54% WRRendeckend#1 firefox-bin [.] freeckend#1 libxul.so [.] webrender::picture::PicturePrimitive::take_context
0.54% WRRende
0.54% WRRendeckend#1 libxul.so [.] webrender::picture::TileCacheInstance::update_prim_dependenciesckend#1 libxul.so [.] webrender::picture::TileNode::add_prim
0.54% WRRende
0.54% WRSceneilder#1 libxul.so [.] crossbeam_channel::context::Context::wait_untililder#1 libxul.so [.] webrender::scene_building::SceneBuilder::build_item
0.54% WRScene
0.54% Web Content [kernel.kallsyms] [k] 0xffffffffa6066484
0.54% Web Content firefox-bin [.] mozilla::TimeStamp::Now
0.54% Web Content libc.so.6 [.] __GI___pthread_mutex_unlock_usercnt
0.54% threaded-ml [kernel.kallsyms] [k] 0xffffffffa56d046b
0.54% threaded-ml [kernel.kallsyms] [k] 0xffffffffa607605b
0.54% threaded-ml libpulsecommon-15.0.so [.] pa_pdispatch_run
0.54% threaded-ml libpulsecommon-15.0.so [.] 0x000000000002b5e2
The browser cgroup consumed 29% of CPU time during the test.
(The profile is much flatter (longer tail, but it may be caused by different debuginfo or process mapping), the webrtc::SparseFIRFilter::Filter
is replaced.)
I can't give proper end user feedback (convenience/CPU fans during ordinary calls) but using the profiling method I can see some improvement (take it with a grain of salt, this isn't statistically tested). AFAIC, this bug can be closed as resolved.
Comment 8•2 years ago
|
||
Thanks!
Description
•