Closed Bug 1172979 Opened 9 years ago Closed 7 years ago

Optimize allocation and de-allocation patterns of StreamBuffers

Categories

(Core :: Audio/Video: MediaStreamGraph, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: padenot, Assigned: karlt)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(5 files)

ForgetUpTo is always high in the profiles, because it does a lot of de-allocation: it prunes the StreamBuffers storage. Attached is a picture of a profile, dominated by free(). Maybe we could devise some sort of ring buffer mechanism for stream to reduce allocations ? We should not keep a lot of audio in the graph, these days, anyways, maximum 40ms per stream or something in line with the latency of the audio output (or a fixed amount if we're running an OfflineAudioContext, off course). This picture is a profile of "Simple gain test without resampling", with the rendering length bumped to 1000 seconds so I have some time to profile.
Blocks: webaudioperf
Assignee: nobody → amarchesini
Rank: 10
Priority: -- → P1
No longer blocks: webaudioperf
IIUC most AudioNodes shouldn't use StreamBuffers, but AudioDestinationNode does.
One way to get improvement here might be to keep the AudioChunks in the AudioSegment during ForgetUpTo() but null them in a way that doesn't release the memory for the channel pointer arrays. Subsequently appended AudioChunks will likely have the same channel count and so can reuse the channel pointer arrays.
Baku -- I know you're busy for the next 2 weeks. If you can get back to this bug then (around the end of Sept), that'd be great. I'd really like to land this in the Fx44 cycle. If landing this in Fx44 seems unlikely, then I'll look to find a new owner. Thanks!
Rank: 10 → 15
Status: NEW → RESOLVED
Closed: 8 years ago
Rank: 15 → 25
Priority: P1 → P2
Resolution: --- → INCOMPLETE
I'm guessing the free() in attachment 8617397 [details] is mostly destruction of the AudioChunk::nsTArray<const void*>, which has been inlined. Now that we have the static analysis for bug 1159433, the compiler tells us what adjustments are necessary to use AutoTArray here.
Assignee: amarchesini → karlt
Status: RESOLVED → REOPENED
Component: Web Audio → Audio/Video: MediaStreamGraph
Resolution: INCOMPLETE → ---
Since changes for bug 1197028, most Web Audio nodes don't create and destroy AudioChunks frequently, but this does still happen when sending output to external streams.
Depends on: 1390286
The win from fewer allocations doesn't come for free. CPU usage seen with attachment 8897140 [details] decreased by only just over 1%. Comparing perf top output for the graph thread without AutoTArray, 8.06% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendFrom 6.26% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSliceInternal 6.02% libxul.so [.] mozilla::TrackUnionStream::ProcessInput 5.82% libpthread-2.23.so [.] __pthread_mutex_unlock_usercnt 5.02% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au 4.96% libpthread-2.23.so [.] pthread_mutex_lock 3.42% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle 3.35% libxul.so [.] mozilla::TrackUnionStream::CopyTrackData 3.28% libxul.so [.] mozilla::AudioBlock::AllocateChannels 3.08% firefox [.] je_free 3.04% [kernel] [k] clear_page_c_e 2.49% libxul.so [.] mozilla::AudioSegment::AppendAndConsumeChunk 2.46% [kernel] [k] page_fault 2.40% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens 1.94% libxul.so [.] mozilla::AudioNodeStream::ProcessInput 1.90% firefox [.] je_malloc 1.55% libxul.so [.] mozilla::AudioNodeStream::ObtainInputBlock 1.44% libxul.so [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams 1.29% libxul.so [.] mozilla::StreamTracks::FindTrack 1.13% libxul.so [.] mozilla::AudioBlockCopyChannelWithScale 1.12% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadata 1.10% [kernel] [k] __rmqueue 1.08% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi 1.03% libc-2.23.so [.] __memset_avx2 0.98% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle 0.91% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shr and with AutoTAray, 7.97% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endSliceInternal) 7.77% libxul.so [.] mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::App(endFrom) 5.57% libxul.so [.] mozilla::MediaStreamGraphImpl::ProcessChunkMetadataForInterval<mozilla::Au 4.67% libxul.so [.] mozilla::TrackUnionStream::ProcessInput 4.25% libxul.so [.] mozilla::AudioSegment::AppendAndConsumeChunk 4.18% libpthread-2.23.so [.] __pthread_mutex_unlock_usercnt 3.77% libxul.so [.] mozilla::AudioBlock::AllocateChannels 3.46% libxul.so [.] mozilla::TrackUnionStream::CopyTrackData 3.33% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Ens 3.16% [kernel] [k] clear_page_c_e 2.46% [kernel] [k] page_fault 2.28% libpthread-2.23.so [.] pthread_mutex_lock 2.22% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::RemoveEle 1.86% firefox [.] je_free 1.77% libxul.so [.] mozilla::AudioNodeStream::ProcessInput 1.66% libxul.so [.] mozilla::AudioNodeStream::ObtainInputBlock 1.55% libxul.so [.] nsTArray_CopyWithConstructors<mozilla::AudioChunk>::MoveNonOverlappingRegi 1.35% libxul.so [.] mozilla::MediaStreamGraphImpl::UpdateCurrentTimeForStreams 1.28% libxul.so [.] mozilla::AudioBlockCopyChannelWithScale 1.20% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Shi 1.11% libxul.so [.] nsTArray_Impl<mozilla::AudioChunk, nsTArrayInfallibleAllocator>::AppendEle 1.09% libxul.so [.] mozilla::StreamTracks::FindTrack 1.08% [kernel] [k] __rmqueue 1.03% firefox [.] je_malloc 0.96% libxul.so [.] nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::Swa 0.95% libc-2.23.so [.] __memset_avx2 time spent in malloc/free and mutex lock/unlock has definitely improved, as has time in TrackUnionStream::ProcessInput(). However, time spent in mozilla::MediaSegmentBase<mozilla::AudioSegment, mozilla::AudioChunk>::AppendSliceInternal() and AudioSegment::AppendAndConsumeChunk() has increased. This is in a fresh browser instance without anything else running. I'm expecting the benefits of avoiding the less predictable behavior of malloc/free and locking may be more significant when there are more allocations involved.
Attachment #8897161 - Flags: review?(padenot) → review+
Comment on attachment 8897162 [details] bug 1172979 permit retrieving const channel data from const AudioChunk https://reviewboard.mozilla.org/r/168434/#review174444
Attachment #8897162 - Flags: review?(padenot) → review+
Comment on attachment 8897163 [details] bug 1172979 use AutoTArray for AudioChunk::mChannelData to reduce allocations https://reviewboard.mozilla.org/r/168436/#review174450
Attachment #8897163 - Flags: review?(padenot) → review+
Regardless of the CPU usage win (which is always welcome), reducing locking is really important here, thanks a lot.
Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/73ed1b91c4fe correct mBufferIsDownstreamRef documentation r=padenot https://hg.mozilla.org/integration/autoland/rev/367f82686e4d permit retrieving const channel data from const AudioChunk r=padenot https://hg.mozilla.org/integration/autoland/rev/6a8b37d47733 use AutoTArray for AudioChunk::mChannelData to reduce allocations r=padenot
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Flags: in-testsuite-
Keywords: perf
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: