Optimize allocation and de-allocation patterns of StreamBuffers

RESOLVED FIXED in Firefox 57



Audio/Video: MediaStreamGraph
3 years ago
9 months ago


(Reporter: padenot, Assigned: karlt)



Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(firefox57 fixed)


MozReview Requests


Submitter Diff Changes Open Issues Last Updated
Error loading review requests:


(5 attachments)



3 years ago
Created attachment 8617397 [details]

ForgetUpTo is always high in the profiles, because it does a lot of de-allocation: it prunes the StreamBuffers storage.

Attached is a picture of a profile, dominated by free(). Maybe we could devise some sort of ring buffer mechanism for stream to reduce allocations ? We should not keep a lot of audio in the graph, these days, anyways, maximum 40ms per stream or something in line with the latency of the audio output (or a fixed amount if we're running an OfflineAudioContext, off course).

This picture is a profile of "Simple gain test without resampling", with the rendering length bumped to 1000 seconds so I have some time to profile.


3 years ago
Blocks: 1169293
Assignee: nobody → amarchesini
Rank: 10
Priority: -- → P1
No longer blocks: 1169293

Comment 1

3 years ago
IIUC most AudioNodes shouldn't use StreamBuffers, but AudioDestinationNode does.

Comment 2

3 years ago
One way to get improvement here might be to keep the AudioChunks in the AudioSegment during ForgetUpTo() but null them in a way that doesn't release the memory for the channel pointer arrays.  Subsequently appended AudioChunks will likely have the same channel count and so can reuse the channel pointer arrays.
Baku -- I know you're busy for the next 2 weeks.  If you can get back to this bug then (around the end of Sept), that'd be great.  I'd really like to land this in the Fx44 cycle.  If landing this in Fx44 seems unlikely, then I'll look to find a new owner.  Thanks!
Rank: 10 → 15
Last Resolved: 2 years ago
Rank: 15 → 25
Priority: P1 → P2
Resolution: --- → INCOMPLETE

Comment 4

10 months ago
I'm guessing the free() in attachment 8617397 [details] is mostly destruction of the AudioChunk::nsTArray<const void*>, which has been inlined.

Now that we have the static analysis for bug 1159433, the compiler tells us what adjustments are necessary to use AutoTArray here.
Assignee: amarchesini → karlt
Component: Web Audio → Audio/Video: MediaStreamGraph
Resolution: INCOMPLETE → ---

Comment 5

10 months ago
Created attachment 8897140 [details]
profiling testcase with many streams

Since changes for bug 1197028, most Web Audio nodes don't create and destroy AudioChunks frequently, but this does still happen when sending output to external streams.


10 months ago
Depends on: 1390286
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 9

10 months ago
The win from fewer allocations doesn't come for free.
CPU usage seen with attachment 8897140 [details] decreased by only just over 1%.

Comparing perf top output for the graph thread without AutoTArray,

     8.06%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendFrom
     6.26%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSliceInternal
     6.02%  libxul.so           [.] mozilla::TrackUnionStream::ProcessInput                                   
     5.82%  libpthread-2.23.so  [.] __pthread_mutex_unlock_usercnt                                            
     5.02%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
     4.96%  libpthread-2.23.so  [.] pthread_mutex_lock                                                        
     3.42%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
     3.35%  libxul.so           [.] mozilla::TrackUnionStream::CopyTrackData                                  
     3.28%  libxul.so           [.] mozilla::AudioBlock::AllocateChannels                                     
     3.08%  firefox             [.] je_free                                                                   
     3.04%  [kernel]            [k] clear_page_c_e                                                            
     2.49%  libxul.so           [.] mozilla::AudioSegment::AppendAndConsumeChunk                              
     2.46%  [kernel]            [k] page_fault                                                                
     2.40%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
     1.94%  libxul.so           [.] mozilla::AudioNodeStream::ProcessInput                                    
     1.90%  firefox             [.] je_malloc                                                                 
     1.55%  libxul.so           [.] mozilla::AudioNodeStream::ObtainInputBlock                                
     1.44%  libxul.so           [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams                
     1.29%  libxul.so           [.] mozilla::StreamTracks::FindTrack                                          
     1.13%  libxul.so           [.] mozilla::AudioBlockCopyChannelWithScale                                   
     1.12%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadata                       
     1.10%  [kernel]            [k] __rmqueue                                                                 
     1.08%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
     1.03%  libc-2.23.so        [.] __memset_avx2                                                             
     0.98%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
     0.91%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shr

and with AutoTAray,

     7.97%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endSliceInternal)
     7.77%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endFrom)
     5.57%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
     4.67%  libxul.so           [.] mozilla::TrackUnionStream::ProcessInput                                   
     4.25%  libxul.so           [.] mozilla::AudioSegment::AppendAndConsumeChunk                              
     4.18%  libpthread-2.23.so  [.] __pthread_mutex_unlock_usercnt                                            
     3.77%  libxul.so           [.] mozilla::AudioBlock::AllocateChannels                                     
     3.46%  libxul.so           [.] mozilla::TrackUnionStream::CopyTrackData                                  
     3.33%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
     3.16%  [kernel]            [k] clear_page_c_e                                                            
     2.46%  [kernel]            [k] page_fault                                                                
     2.28%  libpthread-2.23.so  [.] pthread_mutex_lock                                                        
     2.22%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
     1.86%  firefox             [.] je_free                                                                   
     1.77%  libxul.so           [.] mozilla::AudioNodeStream::ProcessInput                                    
     1.66%  libxul.so           [.] mozilla::AudioNodeStream::ObtainInputBlock                                
     1.55%  libxul.so           [.] nsTArray_CopyWithConstructors<mozilla::AudioChunk>::MoveNonOverlappingRegi
     1.35%  libxul.so           [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams                
     1.28%  libxul.so           [.] mozilla::AudioBlockCopyChannelWithScale                                   
     1.20%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
     1.11%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
     1.09%  libxul.so           [.] mozilla::StreamTracks::FindTrack                                          
     1.08%  [kernel]            [k] __rmqueue                                                                 
     1.03%  firefox             [.] je_malloc                                                                 
     0.96%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Swa
     0.95%  libc-2.23.so        [.] __memset_avx2                                                             

time spent in malloc/free and mutex lock/unlock has definitely improved, as
has time in TrackUnionStream::ProcessInput().

However, time spent in mozilla::MediaSegmentBase<mozilla::AudioSegment,
mozilla::AudioChunk>::AppendSliceInternal() and
AudioSegment::AppendAndConsumeChunk() has increased.

This is in a fresh browser instance without anything else running.
I'm expecting the benefits of avoiding the less predictable behavior of
malloc/free and locking may be more significant when there are more
allocations involved.

Comment 11

9 months ago
Comment on attachment 8897161 [details]
bug 1172979 correct mBufferIsDownstreamRef documentation

Attachment #8897161 - Flags: review?(padenot) → review+

Comment 12

9 months ago
Comment on attachment 8897162 [details]
bug 1172979 permit retrieving const channel data from const AudioChunk

Attachment #8897162 - Flags: review?(padenot) → review+

Comment 13

9 months ago
Comment on attachment 8897163 [details]
bug 1172979 use AutoTArray for AudioChunk::mChannelData to reduce allocations

Attachment #8897163 - Flags: review?(padenot) → review+

Comment 14

9 months ago
Regardless of the CPU usage win (which is always welcome), reducing locking is really important here, thanks a lot.

Comment 15

9 months ago
Pushed by ktomlinson@mozilla.com:
correct mBufferIsDownstreamRef documentation r=padenot
permit retrieving const channel data from const AudioChunk r=padenot
use AutoTArray for AudioChunk::mChannelData to reduce allocations r=padenot

Comment 16

9 months ago
Last Resolved: 2 years ago9 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57


9 months ago
Flags: in-testsuite-
Keywords: perf
You need to log in before you can comment on or make changes to this bug.