Open
Bug 1205458
Opened 9 years ago
Updated 2 years ago
improve perf on convolution reverb benchmark
Categories
(Core :: Web Audio, defect, P3)
Tracking
()
NEW
Tracking | Status | |
---|---|---|
firefox43 | --- | affected |
People
(Reporter: karlt, Assigned: karlt)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: perf)
Chrome is considerably faster on the convolution reverb benchmark even though we are using the same fft code from ffmpeg on Linux at least (not sure what Chrome uses on Mac and Windows).
Need to investigate why this is.
Are we doing some unnecessary copying, or multiplying?
Would bug 877662 and bug 881587 help?
http://ouija.allizom.org/grafana/index.html#/dashboard/file/webaudio.json
https://github.com/padenot/webaudio-benchmark
Updated•9 years ago
|
Blocks: webaudioperf_parity
Updated•9 years ago
|
Rank: 12
Priority: -- → P1
Comment 1•9 years ago
|
||
Following up on Karl's initial comment: I'm looking for an assessment of why we're slower. Are we doing some unnecessary copying, or multiplying? Would bug 877662 and bug 881587 help?
Assignee: nobody → padenot
Comment 2•9 years ago
|
||
(In reply to Karl Tomlinson (back Oct 8 ni?:karlt) from comment #0)
> Chrome is considerably faster on the convolution reverb benchmark even
> though we are using the same fft code from ffmpeg on Linux at least (not
> sure what Chrome uses on Mac and Windows).
From chromium build config files, they are using the Accelerate framework on OSX. They told me they use the code from ffmpeg on windows as well.
> Need to investigate why this is.
> Are we doing some unnecessary copying, or multiplying?
> Would bug 877662 and bug 881587 help?
I don't see those functions in the profile.
I've got a local chromium build that I can profile, but I don't see anything obvious, the time is spent where it should be spent.
We are more or less 30% slower on Windows and Linux (where we use the same FFT code), and twice as slow on Mac (where they use a supposedly faster FFT library).
Comment 3•9 years ago
|
||
Hey Karl -- Just wondering if you have any insights into why we're 30% slower. I talked with Paul a few minutes ago, and it's still a mystery; he's wondering if he's just missing something. (FYI: This is the last benchmark where we are significantly slower than Chrome.)
Paul is also getting ready for TPAC (which is happening at the end of this month); so he doesn't have as much time to look at this as he normally would, and I'd love to improve this for Fx44 (if we can) before Fx44 uplifts on November 2nd. Thanks!
Flags: needinfo?(karlt)
Assignee | ||
Comment 4•9 years ago
|
||
I don't know, but this is my next priority after the remainder for bug 1189562.
Assignee: padenot → karlt
Flags: needinfo?(karlt)
Assignee | ||
Comment 5•9 years ago
|
||
Modifying the convolution benchmark to include 10 instances of the convolver
makes it easy enough to run perf top on the processing thread.
Chromium 45.0.2454.15 scored 22200, while the Firefox version withou any of
the patches in dependent bugs scored 30000. Lower is better.
This is the output from perf top for the "Offline Audio R" thread in Chromium:
21.69% chrome [.] ff_fft_permute_sse
19.96% chrome [.] rdft_calc_c
16.80% chrome [.] ff_fft_calc_avx
10.29% libc-2.20.so [.] __memcpy_sse2_unaligned
7.47% chrome [.] ff_dct32_float_sse2
4.12% chrome [.] blink::VectorMath::vadd
3.92% chrome [.] blink::FFTFrame::doFFT
3.16% chrome [.] blink::VectorMath::zvmul
2.99% chrome [.] blink::DirectConvolver::process
2.93% chrome [.] blink::FFTFrame::getUpToDateComplexData
2.48% chrome [.] blink::VectorMath::vsmul
0.67% chrome [.] blink::FFTConvolver::process
0.46% chrome [.] ff_imdct_calc_sse
0.44% chrome [.] blink::ReverbAccumulationBuffer::accumulate
0.28% chrome [.] blink::ReverbConvolverStage::process
0.27% chrome [.] av_rdft_calc
0.16% chrome [.] blink::FFTFrame::multiply
0.13% chrome [.] blink::ReverbConvolver::process
0.11% libc-2.20.so [.] __memset_sse2
0.08% [kernel] [k] hpet_msi_next_event
0.08% chrome [.] WTF::fastMalloc
0.08% chrome [.] blink::AudioHandler::processIfNecessary
0.08% chrome [.] blink::FFTFrame::doInverseFFT
0.07% chrome [.] ff_fft_calc_sse
0.07% chrome [.] blink::AudioBus::zero
0.06% chrome [.] memcpy@plt
0.05% chrome [.] blink::Reverb::process
0.05% libpthread-2.20.so[.] pthread_mutex_trylock
For Firefox MSG thread it is:
15.67% liblgpllibs.so [.] rdft_calc_c
14.29% liblgpllibs.so [.] ff_fft_permute_sse.loop
10.23% liblgpllibs.so [.] pass_sse.loop
8.01% libxul.so [.] WebCore::FFTConvolver::process
7.21% libxul.so [.] mozilla::AudioBufferAddWithScale
6.31% libxul.so [.] mozilla::FFTBlock::GetInverseWithoutScaling
6.25% libxul.so [.] mozilla::BufferComplexMultiply
6.24% libxul.so [.] WebCore::DirectConvolver::process
4.23% libxul.so [.] mozilla::FFTBlock::PerformFFT
4.16% liblgpllibs.so [.] fft16_sse
3.25% liblgpllibs.so [.] pass_interleave_sse.loop
2.42% libxul.so [.] WebCore::ReverbConvolverStage::process
1.95% liblgpllibs.so [.] ff_fft_permute_sse.loopcopy
1.79% liblgpllibs.so [.] fft8_sse
1.49% libxul.so [.] AlignedTArray<float, 32>::Length
0.81% libxul.so [.] mozilla::AudioChannelsDownMix<float>
0.59% libxul.so [.] WebCore::ReverbAccumulationBuffer::accumulat
0.57% firefox [.] je_arena_dalloc_junk_large
0.33% firefox [.] je_arena_dalloc_junk_small
0.21% firefox [.] arena_miscelm_size_get
0.21% liblgpllibs.so [.] av_rdft_calc
0.20% libxul.so [.] WebCore::ReverbAccumulationBuffer::readAndClear
0.19% libxul.so [.] WebCore::ReverbInputBuffer::write
0.16% firefox [.] arena_dalloc_bin_locked_impl.isra.59
0.16% libpthread-2.20.so[.] pthread_mutex_lock
0.16% firefox [.] ifree.constprop.69
0.13% firefox [.] je_malloc
0.11% liblgpllibs.so [.] fft32_sse
Percentages are wrt total thread cpu cycles, so numbers are not directly
comparable but proportions and function names carry some info.
With the patches submitted to dependent bugs Firefox scored 23300:
19.59% liblgpllibs.so [.] ff_fft_permute_sse.loop
19.11% liblgpllibs.so [.] rdft_calc_c
12.53% liblgpllibs.so [.] pass_sse.loop
10.08% libxul.so [.] mozilla::AudioBufferAddWithScale
7.19% libxul.so [.] mozilla::BufferComplexMultiply
5.80% libxul.so [.] mozilla::AudioBufferCopyWithScale
5.08% liblgpllibs.so [.] fft16_sse
4.08% liblgpllibs.so [.] pass_interleave_sse.loop
3.88% libxul.so [.] mozilla::FFTBlock::PerformFFT
2.68% liblgpllibs.so [.] ff_fft_permute_sse.loopcopy
2.52% libxul.so [.] WebCore::FFTConvolver::process
2.23% liblgpllibs.so [.] fft8_sse
1.06% libxul.so [.] mozilla::AudioChannelsDownMix<float>
0.56% libxul.so [.] WebCore::ReverbAccumulationBuffer::accumulate
0.28% libxul.so [.] WebCore::ReverbConvolverStage::process
0.27% libxul.so [.] WebCore::ReverbInputBuffer::write
0.22% libxul.so [.] WebCore::ReverbAccumulationBuffer::readAndClear
0.19% liblgpllibs.so [.] av_rdft_calc
0.14% liblgpllibs.so [.] fft32_sse
0.10% liblgpllibs.so [.] pass_sse
0.09% libxul.so [.] WebCore::ReverbConvolver::process
0.08% liblgpllibs.so [.] ..@1825.branch_instr
0.08% libxul.so [.] mozilla::AudioNodeStream::ObtainInputBlock
0.08% liblgpllibs.so [.] fft64_sse
0.08% libxul.so [.] moz_speex_inner_product_single
0.08% firefox [.] arena_dalloc
0.06% [kernel] [k] hpet_msi_next_event
0.05% liblgpllibs.so [.] ff_fft_permute_sse
Looks like the best next step for further improvement is SIMD processing
for AudioBufferAddWithScale, BufferComplexMultiply, and
AudioBufferCopyWithScale.
Assignee | ||
Comment 6•9 years ago
|
||
I'm changing this from P1 to P2 because Gecko performance is now comparable to Chrome, and bug 877662 and bug 881587 should put Gecko ahead.
Priority: P1 → P2
Comment 7•7 years ago
|
||
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•