Open Bug 1205458 Opened 9 years ago Updated 2 years ago

improve perf on convolution reverb benchmark

Categories

(Core :: Web Audio, defect, P3)

43 Branch
defect

Tracking

()

Tracking Status
firefox43 --- affected

People

(Reporter: karlt, Assigned: karlt)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf)

Chrome is considerably faster on the convolution reverb benchmark even though we are using the same fft code from ffmpeg on Linux at least (not sure what Chrome uses on Mac and Windows).

Need to investigate why this is.
Are we doing some unnecessary copying, or multiplying?
Would bug 877662 and bug 881587 help?

http://ouija.allizom.org/grafana/index.html#/dashboard/file/webaudio.json
https://github.com/padenot/webaudio-benchmark
Keywords: perf
Rank: 12
Priority: -- → P1
Following up on Karl's initial comment: I'm looking for an assessment of why we're slower. Are we doing some unnecessary copying, or multiplying?  Would bug 877662 and bug 881587 help?
Assignee: nobody → padenot
(In reply to Karl Tomlinson (back Oct 8 ni?:karlt) from comment #0)
> Chrome is considerably faster on the convolution reverb benchmark even
> though we are using the same fft code from ffmpeg on Linux at least (not
> sure what Chrome uses on Mac and Windows).

From chromium build config files, they are using the Accelerate framework on OSX. They told me they use the code from ffmpeg on windows as well.

> Need to investigate why this is.
> Are we doing some unnecessary copying, or multiplying?
> Would bug 877662 and bug 881587 help?

I don't see those functions in the profile.

I've got a local chromium build that I can profile, but I don't see anything obvious, the time is spent where it should be spent.

We are more or less 30% slower on Windows and Linux (where we use the same FFT code), and twice as slow on Mac (where they use a supposedly faster FFT library).
Hey Karl -- Just wondering if you have any insights into why we're 30% slower. I talked with Paul a few minutes ago, and it's still a mystery;  he's wondering if he's just missing something. (FYI: This is the last benchmark where we are significantly slower than Chrome.)  

Paul is also getting ready for TPAC (which is happening at the end of this month); so he doesn't have as much time to look at this as he normally would, and I'd love to improve this for Fx44 (if we can) before Fx44 uplifts on November 2nd.  Thanks!
Flags: needinfo?(karlt)
I don't know, but this is my next priority after the remainder for bug 1189562.
Assignee: padenot → karlt
Flags: needinfo?(karlt)
Depends on: 1220041
Depends on: 1221831
Depends on: 1221833
Depends on: 1221836
Modifying the convolution benchmark to include 10 instances of the convolver
makes it easy enough to run perf top on the processing thread.

Chromium 45.0.2454.15 scored 22200, while the Firefox version withou any of
the patches in dependent bugs scored 30000.  Lower is better.

This is the output from perf top for the "Offline Audio R" thread in Chromium:

 21.69%  chrome            [.] ff_fft_permute_sse                         
 19.96%  chrome            [.] rdft_calc_c                                
 16.80%  chrome            [.] ff_fft_calc_avx                            
 10.29%  libc-2.20.so      [.] __memcpy_sse2_unaligned                    
  7.47%  chrome            [.] ff_dct32_float_sse2                        
  4.12%  chrome            [.] blink::VectorMath::vadd                    
  3.92%  chrome            [.] blink::FFTFrame::doFFT                     
  3.16%  chrome            [.] blink::VectorMath::zvmul                   
  2.99%  chrome            [.] blink::DirectConvolver::process            
  2.93%  chrome            [.] blink::FFTFrame::getUpToDateComplexData    
  2.48%  chrome            [.] blink::VectorMath::vsmul                   
  0.67%  chrome            [.] blink::FFTConvolver::process               
  0.46%  chrome            [.] ff_imdct_calc_sse                          
  0.44%  chrome            [.] blink::ReverbAccumulationBuffer::accumulate
  0.28%  chrome            [.] blink::ReverbConvolverStage::process       
  0.27%  chrome            [.] av_rdft_calc                               
  0.16%  chrome            [.] blink::FFTFrame::multiply                  
  0.13%  chrome            [.] blink::ReverbConvolver::process            
  0.11%  libc-2.20.so      [.] __memset_sse2                              
  0.08%  [kernel]          [k] hpet_msi_next_event                        
  0.08%  chrome            [.] WTF::fastMalloc                            
  0.08%  chrome            [.] blink::AudioHandler::processIfNecessary    
  0.08%  chrome            [.] blink::FFTFrame::doInverseFFT              
  0.07%  chrome            [.] ff_fft_calc_sse                            
  0.07%  chrome            [.] blink::AudioBus::zero                      
  0.06%  chrome            [.] memcpy@plt                                 
  0.05%  chrome            [.] blink::Reverb::process                     
  0.05%  libpthread-2.20.so[.] pthread_mutex_trylock                      

For Firefox MSG thread it is:

 15.67%  liblgpllibs.so    [.] rdft_calc_c                                 
 14.29%  liblgpllibs.so    [.] ff_fft_permute_sse.loop                     
 10.23%  liblgpllibs.so    [.] pass_sse.loop                               
  8.01%  libxul.so         [.] WebCore::FFTConvolver::process              
  7.21%  libxul.so         [.] mozilla::AudioBufferAddWithScale            
  6.31%  libxul.so         [.] mozilla::FFTBlock::GetInverseWithoutScaling 
  6.25%  libxul.so         [.] mozilla::BufferComplexMultiply              
  6.24%  libxul.so         [.] WebCore::DirectConvolver::process           
  4.23%  libxul.so         [.] mozilla::FFTBlock::PerformFFT               
  4.16%  liblgpllibs.so    [.] fft16_sse                                   
  3.25%  liblgpllibs.so    [.] pass_interleave_sse.loop                    
  2.42%  libxul.so         [.] WebCore::ReverbConvolverStage::process      
  1.95%  liblgpllibs.so    [.] ff_fft_permute_sse.loopcopy                 
  1.79%  liblgpllibs.so    [.] fft8_sse                                    
  1.49%  libxul.so         [.] AlignedTArray<float, 32>::Length            
  0.81%  libxul.so         [.] mozilla::AudioChannelsDownMix<float>        
  0.59%  libxul.so         [.] WebCore::ReverbAccumulationBuffer::accumulat
  0.57%  firefox           [.] je_arena_dalloc_junk_large
  0.33%  firefox           [.] je_arena_dalloc_junk_small
  0.21%  firefox           [.] arena_miscelm_size_get    
  0.21%  liblgpllibs.so    [.] av_rdft_calc              
  0.20%  libxul.so         [.] WebCore::ReverbAccumulationBuffer::readAndClear
  0.19%  libxul.so         [.] WebCore::ReverbInputBuffer::write   
  0.16%  firefox           [.] arena_dalloc_bin_locked_impl.isra.59
  0.16%  libpthread-2.20.so[.] pthread_mutex_lock                  
  0.16%  firefox           [.] ifree.constprop.69                  
  0.13%  firefox           [.] je_malloc                           
  0.11%  liblgpllibs.so    [.] fft32_sse                           

Percentages are wrt total thread cpu cycles, so numbers are not directly
comparable but proportions and function names carry some info.

With the patches submitted to dependent bugs Firefox scored 23300:

 19.59%  liblgpllibs.so  [.] ff_fft_permute_sse.loop                        
 19.11%  liblgpllibs.so  [.] rdft_calc_c                                    
 12.53%  liblgpllibs.so  [.] pass_sse.loop                                  
 10.08%  libxul.so       [.] mozilla::AudioBufferAddWithScale               
  7.19%  libxul.so       [.] mozilla::BufferComplexMultiply                 
  5.80%  libxul.so       [.] mozilla::AudioBufferCopyWithScale              
  5.08%  liblgpllibs.so  [.] fft16_sse                                      
  4.08%  liblgpllibs.so  [.] pass_interleave_sse.loop                       
  3.88%  libxul.so       [.] mozilla::FFTBlock::PerformFFT                  
  2.68%  liblgpllibs.so  [.] ff_fft_permute_sse.loopcopy                    
  2.52%  libxul.so       [.] WebCore::FFTConvolver::process                 
  2.23%  liblgpllibs.so  [.] fft8_sse                                       
  1.06%  libxul.so       [.] mozilla::AudioChannelsDownMix<float>           
  0.56%  libxul.so       [.] WebCore::ReverbAccumulationBuffer::accumulate  
  0.28%  libxul.so       [.] WebCore::ReverbConvolverStage::process         
  0.27%  libxul.so       [.] WebCore::ReverbInputBuffer::write              
  0.22%  libxul.so       [.] WebCore::ReverbAccumulationBuffer::readAndClear
  0.19%  liblgpllibs.so  [.] av_rdft_calc                                   
  0.14%  liblgpllibs.so  [.] fft32_sse                                      
  0.10%  liblgpllibs.so  [.] pass_sse                                       
  0.09%  libxul.so       [.] WebCore::ReverbConvolver::process              
  0.08%  liblgpllibs.so  [.] ..@1825.branch_instr                           
  0.08%  libxul.so       [.] mozilla::AudioNodeStream::ObtainInputBlock     
  0.08%  liblgpllibs.so  [.] fft64_sse                                      
  0.08%  libxul.so       [.] moz_speex_inner_product_single                 
  0.08%  firefox         [.] arena_dalloc                                   
  0.06%  [kernel]        [k] hpet_msi_next_event                            
  0.05%  liblgpllibs.so  [.] ff_fft_permute_sse                             

Looks like the best next step for further improvement is SIMD processing
for AudioBufferAddWithScale, BufferComplexMultiply, and
AudioBufferCopyWithScale.
Depends on: 877662, 881587
Depends on: 1221871
Depends on: 1221875
I'm changing this from P1 to P2 because Gecko performance is now comparable to Chrome, and bug 877662 and bug 881587 should put Gecko ahead.
Priority: P1 → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
See Also: → 1576059
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.