improve perf on convolution reverb benchmark

NEW
Assigned to

Status

()

Core
Web Audio
P3
normal
Rank:
12
3 years ago
5 months ago

People

(Reporter: karlt, Assigned: karlt)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {perf})

43 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox43 affected)

Details

(Assignee)

Description

3 years ago
Chrome is considerably faster on the convolution reverb benchmark even though we are using the same fft code from ffmpeg on Linux at least (not sure what Chrome uses on Mac and Windows).

Need to investigate why this is.
Are we doing some unnecessary copying, or multiplying?
Would bug 877662 and bug 881587 help?

http://ouija.allizom.org/grafana/index.html#/dashboard/file/webaudio.json
https://github.com/padenot/webaudio-benchmark
(Assignee)

Updated

3 years ago
Keywords: perf
Blocks: 1189514
Rank: 12
Priority: -- → P1
Following up on Karl's initial comment: I'm looking for an assessment of why we're slower. Are we doing some unnecessary copying, or multiplying?  Would bug 877662 and bug 881587 help?
Assignee: nobody → padenot
(In reply to Karl Tomlinson (back Oct 8 ni?:karlt) from comment #0)
> Chrome is considerably faster on the convolution reverb benchmark even
> though we are using the same fft code from ffmpeg on Linux at least (not
> sure what Chrome uses on Mac and Windows).

From chromium build config files, they are using the Accelerate framework on OSX. They told me they use the code from ffmpeg on windows as well.

> Need to investigate why this is.
> Are we doing some unnecessary copying, or multiplying?
> Would bug 877662 and bug 881587 help?

I don't see those functions in the profile.

I've got a local chromium build that I can profile, but I don't see anything obvious, the time is spent where it should be spent.

We are more or less 30% slower on Windows and Linux (where we use the same FFT code), and twice as slow on Mac (where they use a supposedly faster FFT library).
Hey Karl -- Just wondering if you have any insights into why we're 30% slower. I talked with Paul a few minutes ago, and it's still a mystery;  he's wondering if he's just missing something. (FYI: This is the last benchmark where we are significantly slower than Chrome.)  

Paul is also getting ready for TPAC (which is happening at the end of this month); so he doesn't have as much time to look at this as he normally would, and I'd love to improve this for Fx44 (if we can) before Fx44 uplifts on November 2nd.  Thanks!
Flags: needinfo?(karlt)
(Assignee)

Comment 4

2 years ago
I don't know, but this is my next priority after the remainder for bug 1189562.
Assignee: padenot → karlt
Flags: needinfo?(karlt)
(Assignee)

Updated

2 years ago
Depends on: 1220041
(Assignee)

Updated

2 years ago
Depends on: 1221831
(Assignee)

Updated

2 years ago
Depends on: 1221833
(Assignee)

Updated

2 years ago
Depends on: 1221836
(Assignee)

Comment 5

2 years ago
Modifying the convolution benchmark to include 10 instances of the convolver
makes it easy enough to run perf top on the processing thread.

Chromium 45.0.2454.15 scored 22200, while the Firefox version withou any of
the patches in dependent bugs scored 30000.  Lower is better.

This is the output from perf top for the "Offline Audio R" thread in Chromium:

 21.69%  chrome            [.] ff_fft_permute_sse                         
 19.96%  chrome            [.] rdft_calc_c                                
 16.80%  chrome            [.] ff_fft_calc_avx                            
 10.29%  libc-2.20.so      [.] __memcpy_sse2_unaligned                    
  7.47%  chrome            [.] ff_dct32_float_sse2                        
  4.12%  chrome            [.] blink::VectorMath::vadd                    
  3.92%  chrome            [.] blink::FFTFrame::doFFT                     
  3.16%  chrome            [.] blink::VectorMath::zvmul                   
  2.99%  chrome            [.] blink::DirectConvolver::process            
  2.93%  chrome            [.] blink::FFTFrame::getUpToDateComplexData    
  2.48%  chrome            [.] blink::VectorMath::vsmul                   
  0.67%  chrome            [.] blink::FFTConvolver::process               
  0.46%  chrome            [.] ff_imdct_calc_sse                          
  0.44%  chrome            [.] blink::ReverbAccumulationBuffer::accumulate
  0.28%  chrome            [.] blink::ReverbConvolverStage::process       
  0.27%  chrome            [.] av_rdft_calc                               
  0.16%  chrome            [.] blink::FFTFrame::multiply                  
  0.13%  chrome            [.] blink::ReverbConvolver::process            
  0.11%  libc-2.20.so      [.] __memset_sse2                              
  0.08%  [kernel]          [k] hpet_msi_next_event                        
  0.08%  chrome            [.] WTF::fastMalloc                            
  0.08%  chrome            [.] blink::AudioHandler::processIfNecessary    
  0.08%  chrome            [.] blink::FFTFrame::doInverseFFT              
  0.07%  chrome            [.] ff_fft_calc_sse                            
  0.07%  chrome            [.] blink::AudioBus::zero                      
  0.06%  chrome            [.] memcpy@plt                                 
  0.05%  chrome            [.] blink::Reverb::process                     
  0.05%  libpthread-2.20.so[.] pthread_mutex_trylock                      

For Firefox MSG thread it is:

 15.67%  liblgpllibs.so    [.] rdft_calc_c                                 
 14.29%  liblgpllibs.so    [.] ff_fft_permute_sse.loop                     
 10.23%  liblgpllibs.so    [.] pass_sse.loop                               
  8.01%  libxul.so         [.] WebCore::FFTConvolver::process              
  7.21%  libxul.so         [.] mozilla::AudioBufferAddWithScale            
  6.31%  libxul.so         [.] mozilla::FFTBlock::GetInverseWithoutScaling 
  6.25%  libxul.so         [.] mozilla::BufferComplexMultiply              
  6.24%  libxul.so         [.] WebCore::DirectConvolver::process           
  4.23%  libxul.so         [.] mozilla::FFTBlock::PerformFFT               
  4.16%  liblgpllibs.so    [.] fft16_sse                                   
  3.25%  liblgpllibs.so    [.] pass_interleave_sse.loop                    
  2.42%  libxul.so         [.] WebCore::ReverbConvolverStage::process      
  1.95%  liblgpllibs.so    [.] ff_fft_permute_sse.loopcopy                 
  1.79%  liblgpllibs.so    [.] fft8_sse                                    
  1.49%  libxul.so         [.] AlignedTArray<float, 32>::Length            
  0.81%  libxul.so         [.] mozilla::AudioChannelsDownMix<float>        
  0.59%  libxul.so         [.] WebCore::ReverbAccumulationBuffer::accumulat
  0.57%  firefox           [.] je_arena_dalloc_junk_large
  0.33%  firefox           [.] je_arena_dalloc_junk_small
  0.21%  firefox           [.] arena_miscelm_size_get    
  0.21%  liblgpllibs.so    [.] av_rdft_calc              
  0.20%  libxul.so         [.] WebCore::ReverbAccumulationBuffer::readAndClear
  0.19%  libxul.so         [.] WebCore::ReverbInputBuffer::write   
  0.16%  firefox           [.] arena_dalloc_bin_locked_impl.isra.59
  0.16%  libpthread-2.20.so[.] pthread_mutex_lock                  
  0.16%  firefox           [.] ifree.constprop.69                  
  0.13%  firefox           [.] je_malloc                           
  0.11%  liblgpllibs.so    [.] fft32_sse                           

Percentages are wrt total thread cpu cycles, so numbers are not directly
comparable but proportions and function names carry some info.

With the patches submitted to dependent bugs Firefox scored 23300:

 19.59%  liblgpllibs.so  [.] ff_fft_permute_sse.loop                        
 19.11%  liblgpllibs.so  [.] rdft_calc_c                                    
 12.53%  liblgpllibs.so  [.] pass_sse.loop                                  
 10.08%  libxul.so       [.] mozilla::AudioBufferAddWithScale               
  7.19%  libxul.so       [.] mozilla::BufferComplexMultiply                 
  5.80%  libxul.so       [.] mozilla::AudioBufferCopyWithScale              
  5.08%  liblgpllibs.so  [.] fft16_sse                                      
  4.08%  liblgpllibs.so  [.] pass_interleave_sse.loop                       
  3.88%  libxul.so       [.] mozilla::FFTBlock::PerformFFT                  
  2.68%  liblgpllibs.so  [.] ff_fft_permute_sse.loopcopy                    
  2.52%  libxul.so       [.] WebCore::FFTConvolver::process                 
  2.23%  liblgpllibs.so  [.] fft8_sse                                       
  1.06%  libxul.so       [.] mozilla::AudioChannelsDownMix<float>           
  0.56%  libxul.so       [.] WebCore::ReverbAccumulationBuffer::accumulate  
  0.28%  libxul.so       [.] WebCore::ReverbConvolverStage::process         
  0.27%  libxul.so       [.] WebCore::ReverbInputBuffer::write              
  0.22%  libxul.so       [.] WebCore::ReverbAccumulationBuffer::readAndClear
  0.19%  liblgpllibs.so  [.] av_rdft_calc                                   
  0.14%  liblgpllibs.so  [.] fft32_sse                                      
  0.10%  liblgpllibs.so  [.] pass_sse                                       
  0.09%  libxul.so       [.] WebCore::ReverbConvolver::process              
  0.08%  liblgpllibs.so  [.] ..@1825.branch_instr                           
  0.08%  libxul.so       [.] mozilla::AudioNodeStream::ObtainInputBlock     
  0.08%  liblgpllibs.so  [.] fft64_sse                                      
  0.08%  libxul.so       [.] moz_speex_inner_product_single                 
  0.08%  firefox         [.] arena_dalloc                                   
  0.06%  [kernel]        [k] hpet_msi_next_event                            
  0.05%  liblgpllibs.so  [.] ff_fft_permute_sse                             

Looks like the best next step for further improvement is SIMD processing
for AudioBufferAddWithScale, BufferComplexMultiply, and
AudioBufferCopyWithScale.
Depends on: 877662, 881587
(Assignee)

Updated

2 years ago
Depends on: 1221871
(Assignee)

Updated

2 years ago
Depends on: 1221875
(Assignee)

Comment 6

2 years ago
I'm changing this from P1 to P2 because Gecko performance is now comparable to Chrome, and bug 877662 and bug 881587 should put Gecko ahead.
Priority: P1 → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.