Closed
Bug 877662
Opened 11 years ago
Closed 8 years ago
Optimize the AudioNodeEngine.cpp routines
Categories
(Core :: Web Audio, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla48
Tracking | Status | |
---|---|---|
firefox48 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: dminor)
References
(Blocks 1 open bug)
Details
(Keywords: perf, Whiteboard: [blocking-webaudio-])
Attachments
(9 files, 2 obsolete files)
18.56 KB,
text/x-csrc
|
Details | |
127.96 KB,
image/png
|
Details | |
2.88 KB,
patch
|
ehsan.akhgari
:
review+
roc
:
review-
|
Details | Diff | Splinter Review |
26.80 KB,
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
padenot
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
padenot
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
padenot
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
padenot
:
review+
|
Details |
MozReview Request: Bug 877662 - Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=ehsan
58 bytes,
text/x-review-board-request
|
padenot
:
review+
|
Details |
We need SSE2 versions of those routines. Also it would be nice to ensure that AllocateAudioBlock always allocates 16-byte aligned buffers where we have SSE2 available.
Comment 1•11 years ago
|
||
Here is a reimplementation in sse2 of all the functions we have in AudioNodeEngine.cpp (but not WriteZeroesToAudioBlock, which is essentially a memset), in the form of a independent microbenchmark with various flavor of C++, hand-unrolled C++, SSE2 intrinsics and hand unrolled SSE2 intrinsics, with some basic tests of the output.
Comment 2•11 years ago
|
||
And some encouraging results.
Reporter | ||
Comment 3•11 years ago
|
||
Nice! BTW, I forgot to ask, do you also have NEON experience? We should definitely optimize these routines for mobile as well...
Comment 4•11 years ago
|
||
No, but this is something I've been considering for some time. Now is the time look into it, I think.
Reporter | ||
Comment 5•11 years ago
|
||
Mass moving Web Audio bugs to the Web Audio component. Filter on duckityduck.
Component: Video/Audio → Web Audio
Comment 6•11 years ago
|
||
Attachment #758671 -
Flags: review?(ehsan)
Comment 7•11 years ago
|
||
derf, if you have a minute, could you have a look at the functions in AudioNodeEngineSSE.cpp?
Attachment #758675 -
Flags: review?(tterribe)
Attachment #758675 -
Flags: review?(ehsan)
Comment 8•11 years ago
|
||
I'm also planning on adding some form of compiled test for those functions.
Reporter | ||
Comment 9•11 years ago
|
||
Comment on attachment 758671 [details] [diff] [review] Align AudioBlock on 16 bytes when SSE2 is available. r= Review of attachment 758671 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/AudioNodeEngine.cpp @@ +14,2 @@ > nsRefPtr<SharedBuffer> buffer = > SharedBuffer::Create(WEBAUDIO_BLOCK_SIZE*aChannelCount*sizeof(float)); Please MOZ_ASSERT the alignment here. ::: content/media/SharedBuffer.h @@ +28,5 @@ > * Typically you would allocate one of these, fill it in, and then treat it as > * immutable while it's shared. > * This only guarantees 4-byte alignment of the data. For alignment we > * simply assume that the refcount is at least 4-byte aligned and its size > * is divisible by 4. Please update this comment.
Attachment #758671 -
Flags: review?(ehsan) → review+
Reporter | ||
Comment 10•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Review of attachment 758675 [details] [diff] [review]: ----------------------------------------------------------------- Ted, can you please do the honors for the build system stuff here?
Attachment #758675 -
Flags: review?(ted)
Reporter | ||
Comment 11•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Review of attachment 758675 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/AudioNodeEngineSSE2.cpp @@ +38,5 @@ > + > + vscaled0 = _mm_mul_ps(vin0, vgain); > + vscaled1 = _mm_mul_ps(vin1, vgain); > + vscaled2 = _mm_mul_ps(vin2, vgain); > + vscaled3 = _mm_mul_ps(vin3, vgain); It might make sense to have a separate path for the aScale==1.0 case, but it's fine if you want to do that in a follow-up. ::: content/media/AudioNodeStream.cpp @@ +331,5 @@ > AllocateAudioBlock(outputChannelCount, &aTmpChunk); > float silenceChannel[WEBAUDIO_BLOCK_SIZE] = {0.f}; > // The static storage here should be 1KB, so it's fine > + nsAutoTArray<float, GUESS_AUDIO_CHANNELS*WEBAUDIO_BLOCK_SIZE + 15> downmixBuffer; > + float* alignedDownmixBuffer = ALIGN16(downmixBuffer.Elements()); Please declare this variable below, as this value here will be invalid if we end up doing a dynamic allocation. Also, we don't need this value here anyway. @@ +358,2 @@ > for (uint32_t j = 0; j < outputChannelCount; ++j) { > + outputChannels[j] = &alignedDownmixBuffer[j * WEBAUDIO_BLOCK_SIZE]; So here we're assuming that WEBAUDIO_BLOCK_SIZE is a multiple of 16, which is fine, but please static assert that here to be explicit about it.
Attachment #758675 -
Flags: review?(ehsan) → review+
Comment 12•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Review of attachment 758675 [details] [diff] [review]: ----------------------------------------------------------------- r=me with nits fixed. ::: content/media/AudioNodeEngineSSE2.cpp @@ +8,5 @@ > + > +#ifdef DEBUG > + #define ASSERT_ALIGNED(ptr) \ > + MOZ_ASSERT((((uintptr_t)ptr + 15) & ~0x0F) == (uintptr_t)ptr, \ > + #ptr " has to be aligned on 16 bytes."); "16-byte aligned" @@ +54,5 @@ > + _mm_store_ps(&aOutput[i], vout0); > + _mm_store_ps(&aOutput[i + 4], vout1); > + _mm_store_ps(&aOutput[i + 8], vout2); > + _mm_store_ps(&aOutput[i + 12], vout3); > + } Did you look at the compiler's output on x86-32? I worry that it will spill lots with this unrolled 4 times (since it can't prove the loads don't alias with the stores, it needs at least 9 registers, plus one because register allocation is NP-hard). @@ +183,5 @@ > + _mm_store_ps(in, vout0); > + _mm_store_ps(in+4, vout1); > + _mm_store_ps(in+8, vout2); > + _mm_store_ps(in+12, vout3); > + in += 16; It's generally better for the compiler if you do array indexing (like your other functions) than if you do pointer arithmetic. ::: content/media/AudioNodeStream.cpp @@ +10,5 @@ > #include "ThreeDPoint.h" > > using namespace mozilla::dom; > > +#define ALIGN16(ptr) (float*)(((uintptr_t)ptr + 15) & ~0x0F); Parentheses around ptr, as well as the whole macro body, and remove the trailing ; @@ +330,5 @@ > > AllocateAudioBlock(outputChannelCount, &aTmpChunk); > float silenceChannel[WEBAUDIO_BLOCK_SIZE] = {0.f}; > // The static storage here should be 1KB, so it's fine > + nsAutoTArray<float, GUESS_AUDIO_CHANNELS*WEBAUDIO_BLOCK_SIZE + 15> downmixBuffer; You're allocating 15 extra floats when I think you want 15 extra bytes. @@ +352,5 @@ > } else if (channels.Length() > outputChannelCount) { > if (mChannelInterpretation == ChannelInterpretation::Speakers) { > nsAutoTArray<float*,GUESS_AUDIO_CHANNELS> outputChannels; > outputChannels.SetLength(outputChannelCount); > + downmixBuffer.SetLength(outputChannelCount * WEBAUDIO_BLOCK_SIZE + 15); See above. ::: content/media/Makefile.in @@ +26,5 @@ > CFLAGS += $(GSTREAMER_CFLAGS) > CXXFLAGS += $(GSTREAMER_CFLAGS) > + > +ifneq (,$(INTEL_ARCHITECTURE)) > +# gcc requires -msse2 on AudioNodeEngine.cpp since it uses SSE intrinsics. The comment says -msse2 but the flag you're actually adding is -msse. Since you're using XMM registers, you want the former. Also, shouldn't this be on AudioNodeEngineSSE2.$(OBJ_SUFFIX) instead? @@ +28,5 @@ > + > +ifneq (,$(INTEL_ARCHITECTURE)) > +# gcc requires -msse2 on AudioNodeEngine.cpp since it uses SSE intrinsics. > +ifdef GNU_CC > +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-msse The comment says -msse2 but the flag you're actually adding is -msse. Since you're using XMM registers, you want the former. @@ +32,5 @@ > +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-msse > +endif > + > +ifdef SOLARIS_SUNPRO_CXX > +AudioNodeEngine.$(OBJ_SUFFIX): CXXFLAGS+=-xarch=sse -xO4 Again, sse vs. sse2. ::: content/media/moz.build @@ +43,5 @@ > MODULE = 'content' > > +# Are we targeting x86 or x64? If so, build SSE2 files. > +if CONFIG['INTEL_ARCHITECTURE']: > + DEFINES += ['-DUSE_SSE'] Unless you're planning to use gfx2d's Factory::HasSSE2() for some bizarre reason, I think it makes more sense to use xpcom's SSE.h and MOZILLA_MAY_SUPPORT_SSE2/MOZILLA_PRESUME_SSE2 (which you don't have to define yourself here).
Comment 13•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Forgot to set the flag, too.
Attachment #758675 -
Flags: review?(tterribe) → review+
Comment 14•11 years ago
|
||
Comment on attachment 758675 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Review of attachment 758675 [details] [diff] [review]: ----------------------------------------------------------------- Makefile/moz.build bits look fine modulo derf's concerns.
Attachment #758675 -
Flags: review?(ted) → review+
Comment 15•11 years ago
|
||
The previous patch did not include the SSE2 / scalar dispatch code, for some reason. Here it is, in AudioNodeEngine.cpp. I basically followed the recommendations at the top of SSE.h. The unity gain fastpath is handled outside of SSE2/scalar specific code.
Attachment #759249 -
Flags: review?(ehsan)
Updated•11 years ago
|
Attachment #758675 -
Attachment is obsolete: true
Comment 16•11 years ago
|
||
(In reply to Timothy B. Terriberry (:derf) from comment #12) > @@ +54,5 @@ > > + _mm_store_ps(&aOutput[i], vout0); > > + _mm_store_ps(&aOutput[i + 4], vout1); > > + _mm_store_ps(&aOutput[i + 8], vout2); > > + _mm_store_ps(&aOutput[i + 12], vout3); > > + } > > Did you look at the compiler's output on x86-32? I worry that it will spill > lots with this unrolled 4 times (since it can't prove the loads don't alias > with the stores, it needs at least 9 registers, plus one because register > allocation is NP-hard). http://www.pastebin.mozilla.org/2492550, so it looks like it's OK.
Reporter | ||
Updated•11 years ago
|
Attachment #759249 -
Flags: review?(ehsan) → review+
Comment 17•11 years ago
|
||
This makes sure that we get aligned pointers even when we could borrow data from the AudioBufferSourceNode buffer.
Attachment #760660 -
Flags: review?(ehsan)
Updated•11 years ago
|
Attachment #759249 -
Attachment is obsolete: true
Reporter | ||
Comment 18•11 years ago
|
||
Comment on attachment 760660 [details] [diff] [review] Add an SSE2 implementation of AudioNodeEngine.cpp functions. r= Review of attachment 760660 [details] [diff] [review]: ----------------------------------------------------------------- Nice catch! Note that I added a bunch of functions to AudioNodeEngine.cpp in bug 815643, sorry if it bitrots this patch. I filed bug 881587 to optimize those.
Attachment #760660 -
Flags: review?(ehsan) → review+
Comment on attachment 758671 [details] [diff] [review] Align AudioBlock on 16 bytes when SSE2 is available. r= Review of attachment 758671 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/SharedBuffer.h @@ +35,5 @@ > public: > + void* Data() { > + if (mozilla::supports_sse()) { > + // return a pointer aligned on 16 bytes. > + void * storage_start = (uint8_t*)(this + 1); I think this should be unconditional. It's simpler that way, and there's pretty much no benefit to being less-aligned on non-SSE platforms.
Attachment #758671 -
Flags: review-
Blocks: 882171
Reporter | ||
Comment 20•11 years ago
|
||
Paul, can you please address comment 19 and land these patches?
Flags: needinfo?(paul)
Comment 21•11 years ago
|
||
Some code has landed that allocate new buffers (like in the convolver node), so I need to make all that aligned, and make sure it works. But yes, I'll get to it next week.
Flags: needinfo?(paul)
Reporter | ||
Updated•11 years ago
|
Whiteboard: [blocking-webaudio-]
Updated•11 years ago
|
Whiteboard: [blocking-webaudio-] → [blocking-webaudio-][webaudio-perf]
Updated•11 years ago
|
Keywords: perf
Whiteboard: [blocking-webaudio-][webaudio-perf] → [blocking-webaudio-]
Comment 22•10 years ago
|
||
Making a P1 since early results showed it would help with perf. If it's not going to have a big enough impact, we can drop it to P2.
Priority: -- → P1
Comment 23•10 years ago
|
||
Karl -- Paul is clearing his plate to focus on bug 848954. Can you pick this up? Also, see Comment 22. If you think this should be lowered to a P2, I'm fine with that. Thanks.
Assignee: paul → karlt
Comment 25•10 years ago
|
||
If we want to refocus on this (and its sister bug), we really should spend some time profiling. Last time I checked, it was not clear all the processing functions were the bottlenecks, and the code has moved a lot since then.
Comment 26•10 years ago
|
||
Yes, the only method I've seen over 1% of a busy profile was BufferComplexMultiply()
Blocks: 881587
Updated•9 years ago
|
Blocks: webaudioperf
Updated•9 years ago
|
Priority: P2 → P1
Updated•9 years ago
|
Rank: 15
Updated•9 years ago
|
Blocks: webaudioperf_parity
Updated•9 years ago
|
No longer blocks: webaudioperf
Comment 27•9 years ago
|
||
I talked with Karl today, and he and I agree that it'd be more efficient to find another person to work on this one. Same with Bug 881587.
Assignee: karlt → nobody
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → dminor
Assignee | ||
Comment 28•8 years ago
|
||
To be able to use SSE2 routines, we need to audio buffers to be allocated on 16 byte boundaries. Review commit: https://reviewboard.mozilla.org/r/44347/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/44347/
Attachment #8738465 -
Flags: review?(padenot)
Attachment #8738466 -
Flags: review?(padenot)
Attachment #8738467 -
Flags: review?(padenot)
Assignee | ||
Comment 29•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/44349/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/44349/
Assignee | ||
Comment 30•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/44351/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/44351/
Assignee | ||
Comment 31•8 years ago
|
||
Recent try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=685abc5801f3edfe63b9b0e85ffa203e1fab62d5&selectedJob=19059201
Status: NEW → ASSIGNED
Comment 32•8 years ago
|
||
Comment on attachment 8738467 [details] MozReview Request: Bug 877662 - Use SSE2 versions of AudioNodeEngine functions r?padenot https://reviewboard.mozilla.org/r/44351/#review41969 ::: dom/media/webaudio/blink/ReverbAccumulationBuffer.cpp:100 (Diff revision 1) > MOZ_ASSERT(isSafe); > if (!isSafe) > return 0; > > - AudioBufferAddWithScale(source, 1.0f, destination + writeIndex, numberOfFrames1); > +#ifdef USE_SSE2 > + // It is unlikely either the source is aligned or the number of values This is not ideal, we should either: - fall back to scalar when we have a number of items that is not a multiple of 16 - fall back to scalar if the input aligned, or process the first few elements in scalar and switch to vector and back for the last few items. Can you file a followup for this?
Attachment #8738467 -
Flags: review?(padenot) → review+
Comment 33•8 years ago
|
||
Comment on attachment 8738465 [details] MozReview Request: Bug 877662 - Align audio buffer allocations to 16 byte boundaries r?padenot https://reviewboard.mozilla.org/r/44347/#review41965 ::: dom/media/webaudio/AudioBlock.cpp (Diff revision 1) > * buffer can reuse and modify its contents next iteration if other references > * are all downstream temporary references held by AudioBlock. > - * > - * This only guarantees 4-byte alignment of the data. For alignment we simply > - * assume that the memory from malloc is at least 4-byte aligned and that > - * AudioBlockBuffer's size is divisible by 4. Can you leave a note about alignment here ? ::: dom/media/webaudio/AudioNodeExternalInputStream.cpp:91 (Diff revision 1) > { > NS_ASSERTION(aSegment->GetDuration() == WEBAUDIO_BLOCK_SIZE, "Bad segment duration"); > > +// To be safe, let's copy data if we're using SSE2 to ensure everything is > +// aligned properly. Simply checking for channel alignment does not seem > +// to be sufficient here. Do we know why ? ::: dom/media/webaudio/BiquadFilterNode.cpp:141 (Diff revision 1) > const AudioBlock& aInput, > AudioBlock* aOutput, > bool* aFinished) override > { > - float inputBuffer[WEBAUDIO_BLOCK_SIZE]; > + float inputBuffer[WEBAUDIO_BLOCK_SIZE + 4]; > + float* alignedInputBuffer = (float* )(((uintptr_t)inputBuffer + 15) & ~0x0F); Maybe assert the alignment here ? Or maybe I'm just paranoid, do we have the guarantee that the stack is aligned in all the platforms we support ?
Attachment #8738465 -
Flags: review?(padenot) → review+
Comment 34•8 years ago
|
||
Comment on attachment 8738466 [details] MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot https://reviewboard.mozilla.org/r/44349/#review41967 ::: dom/media/webaudio/AudioNodeEngineSSE2.cpp:31 (Diff revision 1) > +void > +AudioBufferAddWithScale_SSE(const float* aInput, > + float aScale, > + float* aOutput, > + uint32_t aSize) > +{ How are all those different from my initial patch ? It's a bit hard to review, can you make this sit on top of the old patches?
Attachment #8738466 -
Flags: review?(padenot)
Assignee | ||
Comment 35•8 years ago
|
||
https://reviewboard.mozilla.org/r/44347/#review41965 > Do we know why ? I think I was accidentally only checking the alignment of the first channel. Works fine if I iterate over each channel.
Assignee | ||
Comment 36•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/46183/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/46183/
Attachment #8738466 -
Attachment description: MozReview Request: Bug 877662 - Add SSE2 versions of AudioNodeEngine functions r?padenot → MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot
Attachment #8741091 -
Flags: review?(padenot)
Attachment #8738466 -
Flags: review?(padenot)
Assignee | ||
Comment 37•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/46185/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/46185/
Assignee | ||
Comment 38•8 years ago
|
||
Comment on attachment 8738465 [details] MozReview Request: Bug 877662 - Align audio buffer allocations to 16 byte boundaries r?padenot Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44347/diff/1-2/
Assignee | ||
Comment 39•8 years ago
|
||
Comment on attachment 8738466 [details] MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44349/diff/1-2/
Assignee | ||
Comment 40•8 years ago
|
||
Comment on attachment 8738467 [details] MozReview Request: Bug 877662 - Use SSE2 versions of AudioNodeEngine functions r?padenot Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44351/diff/1-2/
Updated•8 years ago
|
Attachment #8741091 -
Flags: review?(padenot) → review+
Comment 41•8 years ago
|
||
Comment on attachment 8741091 [details] MozReview Request: Bug 877662 - Add AlignmentUtils.h r?padenot https://reviewboard.mozilla.org/r/46183/#review42969
Comment 42•8 years ago
|
||
Comment on attachment 8741092 [details] MozReview Request: Bug 877662 - Add an SSE2 implementation of AudioNodeEngine.cpp functions. r=ehsan https://reviewboard.mozilla.org/r/46185/#review42971 I've written this, but I remember it being reviewed by someone, so I'm setting it as r+.
Attachment #8741092 -
Flags: review+
Comment 43•8 years ago
|
||
Comment on attachment 8738466 [details] MozReview Request: Bug 877662 - Update SSE2 versions of AudioNodeEngine functions r?padenot https://reviewboard.mozilla.org/r/44349/#review42973
Attachment #8738466 -
Flags: review?(padenot) → review+
Assignee | ||
Comment 44•8 years ago
|
||
https://reviewboard.mozilla.org/r/46185/#review42971 ehsan did the review of your original patch.
Comment 45•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/826d16396107 https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae01cbc5549 https://hg.mozilla.org/integration/mozilla-inbound/rev/6e96c35c78bd https://hg.mozilla.org/integration/mozilla-inbound/rev/5f1d898a440e https://hg.mozilla.org/integration/mozilla-inbound/rev/58f6d3815cac
Comment 46•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/826d16396107 https://hg.mozilla.org/mozilla-central/rev/5ae01cbc5549 https://hg.mozilla.org/mozilla-central/rev/6e96c35c78bd https://hg.mozilla.org/mozilla-central/rev/5f1d898a440e https://hg.mozilla.org/mozilla-central/rev/58f6d3815cac
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in
before you can comment on or make changes to this bug.
Description
•