Optimize allocation and de-allocation patterns of StreamBuffers

RESOLVED FIXED in Firefox 57

Status

()

Core
Audio/Video: MediaStreamGraph
P2
normal
Rank:
25
RESOLVED FIXED
3 years ago
9 months ago

People

(Reporter: padenot, Assigned: karlt)

Tracking

({perf})

unspecified
mozilla57
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(firefox57 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(5 attachments)

(Reporter)

Description

3 years ago
Created attachment 8617397 [details]
updatecurrenttimeforstream.png

ForgetUpTo is always high in the profiles, because it does a lot of de-allocation: it prunes the StreamBuffers storage.

Attached is a picture of a profile, dominated by free(). Maybe we could devise some sort of ring buffer mechanism for stream to reduce allocations ? We should not keep a lot of audio in the graph, these days, anyways, maximum 40ms per stream or something in line with the latency of the audio output (or a fixed amount if we're running an OfflineAudioContext, off course).

This picture is a profile of "Simple gain test without resampling", with the rendering length bumped to 1000 seconds so I have some time to profile.
(Reporter)

Updated

3 years ago
Blocks: 1169293
Assignee: nobody → amarchesini
Rank: 10
Priority: -- → P1
No longer blocks: 1169293
(Assignee)

Comment 1

3 years ago
IIUC most AudioNodes shouldn't use StreamBuffers, but AudioDestinationNode does.
(Assignee)

Comment 2

3 years ago
One way to get improvement here might be to keep the AudioChunks in the AudioSegment during ForgetUpTo() but null them in a way that doesn't release the memory for the channel pointer arrays.  Subsequently appended AudioChunks will likely have the same channel count and so can reuse the channel pointer arrays.
Baku -- I know you're busy for the next 2 weeks.  If you can get back to this bug then (around the end of Sept), that'd be great.  I'd really like to land this in the Fx44 cycle.  If landing this in Fx44 seems unlikely, then I'll look to find a new owner.  Thanks!
Rank: 10 → 15
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Rank: 15 → 25
Priority: P1 → P2
Resolution: --- → INCOMPLETE
(Assignee)

Comment 4

10 months ago
I'm guessing the free() in attachment 8617397 [details] is mostly destruction of the AudioChunk::nsTArray<const void*>, which has been inlined.

Now that we have the static analysis for bug 1159433, the compiler tells us what adjustments are necessary to use AutoTArray here.
Assignee: amarchesini → karlt
Status: RESOLVED → REOPENED
Component: Web Audio → Audio/Video: MediaStreamGraph
Resolution: INCOMPLETE → ---
(Assignee)

Comment 5

10 months ago
Created attachment 8897140 [details]
profiling testcase with many streams

Since changes for bug 1197028, most Web Audio nodes don't create and destroy AudioChunks frequently, but this does still happen when sending output to external streams.
(Assignee)

Updated

10 months ago
Depends on: 1390286
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 9

10 months ago
The win from fewer allocations doesn't come for free.
CPU usage seen with attachment 8897140 [details] decreased by only just over 1%.

Comparing perf top output for the graph thread without AutoTArray,

     8.06%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendFrom
     6.26%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSliceInternal
     6.02%  libxul.so           [.] mozilla::TrackUnionStream::ProcessInput                                   
     5.82%  libpthread-2.23.so  [.] __pthread_mutex_unlock_usercnt                                            
     5.02%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
     4.96%  libpthread-2.23.so  [.] pthread_mutex_lock                                                        
     3.42%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
     3.35%  libxul.so           [.] mozilla::TrackUnionStream::CopyTrackData                                  
     3.28%  libxul.so           [.] mozilla::AudioBlock::AllocateChannels                                     
     3.08%  firefox             [.] je_free                                                                   
     3.04%  [kernel]            [k] clear_page_c_e                                                            
     2.49%  libxul.so           [.] mozilla::AudioSegment::AppendAndConsumeChunk                              
     2.46%  [kernel]            [k] page_fault                                                                
     2.40%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
     1.94%  libxul.so           [.] mozilla::AudioNodeStream::ProcessInput                                    
     1.90%  firefox             [.] je_malloc                                                                 
     1.55%  libxul.so           [.] mozilla::AudioNodeStream::ObtainInputBlock                                
     1.44%  libxul.so           [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams                
     1.29%  libxul.so           [.] mozilla::StreamTracks::FindTrack                                          
     1.13%  libxul.so           [.] mozilla::AudioBlockCopyChannelWithScale                                   
     1.12%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadata                       
     1.10%  [kernel]            [k] __rmqueue                                                                 
     1.08%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
     1.03%  libc-2.23.so        [.] __memset_avx2                                                             
     0.98%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
     0.91%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shr

and with AutoTAray,

     7.97%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endSliceInternal)
     7.77%  libxul.so           [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endFrom)
     5.57%  libxul.so           [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au
     4.67%  libxul.so           [.] mozilla::TrackUnionStream::ProcessInput                                   
     4.25%  libxul.so           [.] mozilla::AudioSegment::AppendAndConsumeChunk                              
     4.18%  libpthread-2.23.so  [.] __pthread_mutex_unlock_usercnt                                            
     3.77%  libxul.so           [.] mozilla::AudioBlock::AllocateChannels                                     
     3.46%  libxul.so           [.] mozilla::TrackUnionStream::CopyTrackData                                  
     3.33%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens
     3.16%  [kernel]            [k] clear_page_c_e                                                            
     2.46%  [kernel]            [k] page_fault                                                                
     2.28%  libpthread-2.23.so  [.] pthread_mutex_lock                                                        
     2.22%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle
     1.86%  firefox             [.] je_free                                                                   
     1.77%  libxul.so           [.] mozilla::AudioNodeStream::ProcessInput                                    
     1.66%  libxul.so           [.] mozilla::AudioNodeStream::ObtainInputBlock                                
     1.55%  libxul.so           [.] nsTArray_CopyWithConstructors<mozilla::AudioChunk>::MoveNonOverlappingRegi
     1.35%  libxul.so           [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams                
     1.28%  libxul.so           [.] mozilla::AudioBlockCopyChannelWithScale                                   
     1.20%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi
     1.11%  libxul.so           [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle
     1.09%  libxul.so           [.] mozilla::StreamTracks::FindTrack                                          
     1.08%  [kernel]            [k] __rmqueue                                                                 
     1.03%  firefox             [.] je_malloc                                                                 
     0.96%  libxul.so           [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Swa
     0.95%  libc-2.23.so        [.] __memset_avx2                                                             

time spent in malloc/free and mutex lock/unlock has definitely improved, as
has time in TrackUnionStream::ProcessInput().

However, time spent in mozilla::MediaSegmentBase<mozilla::AudioSegment,
mozilla::AudioChunk>::AppendSliceInternal() and
AudioSegment::AppendAndConsumeChunk() has increased.

This is in a fresh browser instance without anything else running.
I'm expecting the benefits of avoiding the less predictable behavior of
malloc/free and locking may be more significant when there are more
allocations involved.
(Reporter)

Comment 11

9 months ago
mozreview-review
Comment on attachment 8897161 [details]
bug 1172979 correct mBufferIsDownstreamRef documentation

https://reviewboard.mozilla.org/r/168432/#review174442
Attachment #8897161 - Flags: review?(padenot) → review+
(Reporter)

Comment 12

9 months ago
mozreview-review
Comment on attachment 8897162 [details]
bug 1172979 permit retrieving const channel data from const AudioChunk

https://reviewboard.mozilla.org/r/168434/#review174444
Attachment #8897162 - Flags: review?(padenot) → review+
(Reporter)

Comment 13

9 months ago
mozreview-review
Comment on attachment 8897163 [details]
bug 1172979 use AutoTArray for AudioChunk::mChannelData to reduce allocations

https://reviewboard.mozilla.org/r/168436/#review174450
Attachment #8897163 - Flags: review?(padenot) → review+
(Reporter)

Comment 14

9 months ago
Regardless of the CPU usage win (which is always welcome), reducing locking is really important here, thanks a lot.

Comment 15

9 months ago
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/73ed1b91c4fe
correct mBufferIsDownstreamRef documentation r=padenot
https://hg.mozilla.org/integration/autoland/rev/367f82686e4d
permit retrieving const channel data from const AudioChunk r=padenot
https://hg.mozilla.org/integration/autoland/rev/6a8b37d47733
use AutoTArray for AudioChunk::mChannelData to reduce allocations r=padenot

Comment 16

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/73ed1b91c4fe
https://hg.mozilla.org/mozilla-central/rev/367f82686e4d
https://hg.mozilla.org/mozilla-central/rev/6a8b37d47733
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago9 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(Assignee)

Updated

9 months ago
Flags: in-testsuite-
Keywords: perf
You need to log in before you can comment on or make changes to this bug.