Optimize the AudioNodeEngine.cpp routines for NEON

RESOLVED FIXED in Firefox 25

Status

()

Core
Web Audio
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Away for a while, Assigned: jwwang)

Tracking

Trunk
mozilla26
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:koi+, firefox25 fixed, firefox26 fixed)

Details

(Whiteboard: [FT: Media Recording, Sprint 1])

Attachments

(4 attachments, 11 obsolete attachments)

5.30 KB, text/x-c++src
Details
17.31 KB, text/x-c++src
Details
13.29 KB, patch
glandium
: review+
Details | Diff | Splinter Review
5.75 KB, patch
jwwang
: review+
Details | Diff | Splinter Review
Comment hidden (empty)
Ben, would you share your crazy setup you use to write ARM assembly for Opus on your FxOS device with me ? Otherwise, I'm gonna try to find a Pandaboard somewhere.
Flags: needinfo?(ben)
Before we do this (or other optimization work) I think we need some benchmarks. Filed bug 882543 for that.
Depends on: 882543
The SSE2 work is in bug 877662. That bug also makes audio blocks be 16-byte aligned.
Assignee: paul → nobody
Depends on: 877662

Updated

5 years ago
Assignee: nobody → jwwang
(Assignee)

Comment 4

5 years ago
I am planning to use this library.
http://projectne10.github.io/Ne10/doc/index.html

Or is there a similar library integrated to gecko already?
This library does not seem to suit our needs, from what I read.

Working directly with NEON intrinsics should be easy and fast enough for what we have to do to write the code ourselves without having to rely on a library.

If we consider after profiling that the speedup is not that great and that we need more speed, we can then consider writing assembly directly, but this would be way slower to write.
(Reporter)

Comment 6

5 years ago
Yeah, I agree.  We should try to optimize Kiss FFT for NEON and then measure some test cases (ConvolverNode should be the most interesting consumer) and see if the FFT is the bottleneck or not.
(Assignee)

Comment 7

5 years ago
(In reply to Paul Adenot (:padenot) from comment #5)
> This library does not seem to suit our needs, from what I read.

I fail to see how this library doesn't fit our needs. Below is a list of functions that can be imiplemented in Ne10 functions.

void AudioBufferAddWithScale(const float* aInput, float aScale, float* aOutput, uint32_t aSize)
* ne10_add_float (when aScale == 1.0f)
* ne10_mlac_float

AudioBlockCopyChannelWithScale(const float* aInput, float aScale, float* aOutput)
* ne10_mulc_float

AudioBlockCopyChannelWithScale(const float aInput[WEBAUDIO_BLOCK_SIZE], const float aScale[WEBAUDIO_BLOCK_SIZE], float aOutput[WEBAUDIO_BLOCK_SIZE])
* ne10_mul_float
I'm sure you can do something with this lib, but I'm not sure it is worth importing it in the tree.

ne10_add_float is just a vector add, ne10_mlac_float is just a multiply and accumulate. Nothing difficult to write, or I'm missing something.
(Assignee)

Comment 9

5 years ago
Just a thought about not reinventing the wheel.
If it is not worth the porting, let roll our own.
(Reporter)

Comment 10

5 years ago
(Sorry, for some reason I previously commented on this bug thinking that it's bug 885496!)

(In reply to comment #9)
> Just a thought about not reinventing the wheel.
> If it is not worth the porting, let roll our own.

I believe that the code involved here should be fairly simple and localized, and I would prefer to not have to import a large library just so that we can use a few of their routines.
(Assignee)

Comment 11

5 years ago
Created attachment 770000 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1
Attachment #770000 - Flags: review?(ehsan)
(Assignee)

Comment 12

5 years ago
Created attachment 770001 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 2
Attachment #770001 - Flags: review?(ehsan)
Comment on attachment 770000 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

Review of attachment 770000 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioNodeEngineNEON.cpp
@@ +29,5 @@
> +  for (unsigned i = 0; i < aSize; i+=4) {
> +    vin = vld1q_f32((float32_t*)aInput+i);
> +    vout = vld1q_f32((float32_t*)aOutput+i);
> +    vout = vmlaq_f32(vout, vin, vscale);
> +    vst1q_f32((float32_t*)aOutput+i, vout);

This doesn't properly handle the case where aSize is not a multiple of 4.

@@ +83,5 @@
> +
> +  for (uint32_t i = 0; i < aSize * aChannelCount; i+=4) {
> +    vin = vld1q_f32((float32_t*)aBlock+i);
> +    vout = vmulq_f32(vin, vscale);
> +    vst1q_f32((float32_t*)aBlock+i, vout);

Same here.
(Assignee)

Comment 14

5 years ago
Created attachment 770086 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

Handle the case where array size is not a multiple of 4.
Attachment #770000 - Attachment is obsolete: true
Attachment #770000 - Flags: review?(ehsan)
Attachment #770086 - Flags: review?(roc)
Comment on attachment 770086 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

Review of attachment 770086 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #770086 - Flags: review?(roc) → review?(paul)
Comment on attachment 770086 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

Review of attachment 770086 [details] [diff] [review]:
-----------------------------------------------------------------

Could you try hand unrolling a couple functions, to check if we can get some speedup? Just to check if it is worth it. For example, I could get 50% speedup by unrolling the inplace gain implementation four times when using SSE.

derf, could you sanity check this if you have a minute? I'm afraid I'm ignorant when it comes to ARM.

::: content/media/AudioNodeEngineNEON.cpp
@@ +30,5 @@
> +  aSize -= dif;
> +  unsigned i = 0;
> +  for (; i < aSize; i+=4) {
> +    vin = vld1q_f32((float32_t*)aInput+i);
> +    vout = vld1q_f32((float32_t*)aOutput+i);

I remember that derf told me it was preferable to use array indexing rather that pointer arithmetic, in terms of speed.

Anyways, I'd rather have static_casts at the beginning of the function that cluttering the inner loop.
Attachment #770086 - Flags: review?(tterribe)
Attachment #770086 - Flags: review?(paul)
Attachment #770086 - Flags: review+
(Reporter)

Comment 17

5 years ago
Comment on attachment 770001 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 2

Review of attachment 770001 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioNodeEngine.cpp
@@ +22,5 @@
>    aChunk->mDuration = WEBAUDIO_BLOCK_SIZE;
>    aChunk->mChannelData.SetLength(aChannelCount);
>    float* data = static_cast<float*>(buffer->Data());
>    for (uint32_t i = 0; i < aChannelCount; ++i) {
> +    float* check = data + i*WEBAUDIO_BLOCK_SIZE;

Nit: please call this channelData.
Attachment #770001 - Flags: review?(ehsan) → review+
(Assignee)

Comment 18

5 years ago
Created attachment 770587 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 2
Attachment #770001 - Attachment is obsolete: true
Attachment #770587 - Flags: review+
(Assignee)

Comment 19

5 years ago
Created attachment 770588 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 2
Attachment #770587 - Attachment is obsolete: true
Attachment #770588 - Flags: review+
(Assignee)

Comment 20

5 years ago
Created attachment 771175 [details]
Microbenchmark - AudioNodeEngineNEON.cpp with loop unrolling

test results:
AudioBufferAddWithScale                      : loop=1000000, time=1.553925
AudioBufferAddWithScale_NEON                 : loop=1000000, time=0.730693
AudioBufferAddWithScale_NEON_unroll          : loop=1000000, time=0.741942
AudioBufferAddWithScale_NEON_unroll2         : loop=1000000, time=0.701317
AudioBlockCopyChannelWithScale               : loop=1000000, time=0.788323
AudioBlockCopyChannelWithScale_NEON          : loop=1000000, time=0.635880
AudioBlockCopyChannelWithScale_NEON_unroll   : loop=1000000, time=0.495064
AudioBlockCopyChannelWithScale_NEON_unroll2  : loop=1000000, time=0.503057
AudioBlockCopyChannelWithScale               : loop=1000000, time=1.164082
AudioBlockCopyChannelWithScale_NEON          : loop=1000000, time=0.838090
AudioBlockCopyChannelWithScale_NEON_unroll   : loop=1000000, time=0.678961
AudioBlockCopyChannelWithScale_NEON_unroll2  : loop=1000000, time=0.663595
AudioBufferInPlaceScale                      : loop=1000000, time=0.923929
AudioBufferInPlaceScale_NEON                 : loop=1000000, time=0.535680
AudioBufferInPlaceScale_NEON_unroll          : loop=1000000, time=0.469168
AudioBufferInPlaceScale_NEON_unroll2         : loop=1000000, time=0.472063
AudioBlockPanStereoToStereo                  : loop=1000000, time=1.942470
AudioBlockPanStereoToStereo_NEON             : loop=1000000, time=1.273460
AudioBlockPanStereoToStereo_NEON_unroll      : loop=1000000, time=1.103122
AudioBlockPanStereoToStereo_NEON_unroll2     : loop=1000000, time=1.114045

Generally, unroll level 4 is better than those without unrolling. However, for some functions (ex: AudioBufferAddWithScale_NEON_unroll2), unroll level 2 is better than level 4. I think we might have to tweak functions case by case.

Btw, there is no performance difference whether to use pointer arithmetic or array indexing. I guess the compiler is smart enough to figure this out.
(Assignee)

Comment 21

5 years ago
Created attachment 771176 [details]
Implementation - AudioNodeEngineNEON.cpp with loop unrolling

The file that goes with the benchmark test.
(In reply to jwwang from comment #20)
> Btw, there is no performance difference whether to use pointer arithmetic or
> array indexing. I guess the compiler is smart enough to figure this out.

Well, the important point, I think, is whether you change the base pointer in the loop (vin = vld1q_f32((float32_t*)aInput); aInput+=4;) vs. accessing it with the loop index (as you do here). I don't think the exact syntax matters. This is just a general observation that compilers usually produce better code in the latter case, but it may not always apply. See, e.g., <http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf> Section 3.2.
(Assignee)

Comment 24

5 years ago
Created attachment 772519 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

With loop unrolling.
Attachment #770086 - Attachment is obsolete: true
Attachment #770086 - Flags: review?(tterribe)
Attachment #772519 - Flags: review?(tterribe)
Comment on attachment 772519 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1

Review of attachment 772519 [details] [diff] [review]:
-----------------------------------------------------------------

I took a look at the actual asm this generates, and it is moderately awful, as one would expect from intrinsics (lots of redundant addressing calculations, unaligned load/stores, small load/stores instead of large ones, etc.). It all looks like it will work, though.

r=me once you fix the build issue.

::: content/media/AudioNodeEngineNEON.cpp
@@ +22,5 @@
> +                                  float* aOutput,
> +                                  uint32_t aSize)
> +{
> +  ASSERT_ALIGNED(aInput);
> +  ASSERT_ALIGNED(aOutput);

I don't think you actually get any benefit from this, because there are no intrinsics that specifically take aligned pointers. All the asm it generates uses unaligned loads.

@@ +33,5 @@
> +  aSize -= dif;
> +  unsigned i = 0;
> +  for (; i < aSize; i+=16) {
> +    vin0 = vld1q_f32(ADDRESS_OF(aInput, i));
> +    vin1 = vld1q_f32(ADDRESS_OF(aInput, i+4));

Ugh, there's no 4-register vld1 available from intrinsics? That's pretty poor.

::: content/media/AudioNodeEngineNEON.h
@@ +1,4 @@
> +/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
> +/* this source code form is subject to the terms of the mozilla public
> + * license, v. 2.0. if a copy of the mpl was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */

This header has no multiple-include guard.

::: content/media/moz.build
@@ +119,5 @@
>  ]
> +
> +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']:
> +    CPP_SOURCES += [
> +        'AudioNodeEngineNEON.cpp',

This won't build. HAVE_ARM_NEON merely asserts that the compiler can build NEON code. It doesn't actually force using -mfloat-abi=softfp -mfpu=neon, which is required for these intrinsics to work.

This was easy to fix in the old Make-based build system, but I have no idea how to do it with moz.build files.
Attachment #772519 - Flags: review?(tterribe) → review+
(In reply to Timothy B. Terriberry (:derf) from comment #25)
> Ugh, there's no 4-register vld1 available from intrinsics? That's pretty
> poor.

FWIW, since you don't actually care how this data is arranged in the registers, you could maybe use vld2q_f32() to load 32 bytes at a time? The cycle cost should be the same as a real 4-register vld1.32.
(Assignee)

Comment 27

5 years ago
(In reply to Timothy B. Terriberry (:derf) from comment #26)
> FWIW, since you don't actually care how this data is arranged in the
> registers, you could maybe use vld2q_f32() to load 32 bytes at a time? The
> cycle cost should be the same as a real 4-register vld1.32.

Using float32x4x2_t in c code will result in stack push/pop instructions which will decrease the speed. 

Lately I've been poking around source code and build flags to see if I can get a better output which includes:
1. -funroll-loop which adds further improvement slightly over code which is already manually unrolled.
2. memory barrier which prevents improper instruction reordering from compiler which generates code where vld1.32 is followed by vmla.f32 immediately and stalls the CPU pipeline.
3. __builtin_prefetch (available from GCC 4.7) which doesn't add visible improvement.
4. __builtin_assume_aligned which doesn't work well as intended.

I think if we need further optimization, we should go to assembly to get most out of it.
(Reporter)

Comment 28

5 years ago
(In reply to jwwang from comment #27)
> Lately I've been poking around source code and build flags to see if I can
> get a better output which includes:
> 1. -funroll-loop which adds further improvement slightly over code which is
> already manually unrolled.
> 2. memory barrier which prevents improper instruction reordering from
> compiler which generates code where vld1.32 is followed by vmla.f32
> immediately and stalls the CPU pipeline.
> 3. __builtin_prefetch (available from GCC 4.7) which doesn't add visible
> improvement.
> 4. __builtin_assume_aligned which doesn't work well as intended.

We should be able to do all of these later in separate bugs.
(Reporter)

Comment 29

5 years ago
(In reply to Timothy B. Terriberry (:derf) from comment #25)
> ::: content/media/moz.build
> @@ +119,5 @@
> >  ]
> > +
> > +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']:
> > +    CPP_SOURCES += [
> > +        'AudioNodeEngineNEON.cpp',
> 
> This won't build. HAVE_ARM_NEON merely asserts that the compiler can build
> NEON code. It doesn't actually force using -mfloat-abi=softfp -mfpu=neon,
> which is required for these intrinsics to work.
> 
> This was easy to fix in the old Make-based build system, but I have no idea
> how to do it with moz.build files.

Perhaps gps can tell us how to do that.
Flags: needinfo?(gps)

Updated

5 years ago
Blocks: 894856
(Reporter)

Comment 30

5 years ago
Needinfo?ing some other build system peers.
Flags: needinfo?(ted)
Flags: needinfo?(mh+mozilla)
What exactly do you need to do? Add some compiler flags? You can OS_CXXFLAGS or CXXFLAGS += extra flags in Makefile.in (these variables aren't yet ported to moz.build).
Flags: needinfo?(ted)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #31)
> What exactly do you need to do? Add some compiler flags? You can OS_CXXFLAGS
> or CXXFLAGS += extra flags in Makefile.in (these variables aren't yet ported
> to moz.build).

Which needs to happen only on one file, so you need AudioNodeEngineNEON.$(OBJ_SUFFIX): CXXFLAGS += -mfpu=neon. We'll need a special syntax for thes things in mo.build, but we don't have one yet.
(Assignee)

Comment 33

5 years ago
You can add DEFINES+=['-mfpu=neon'] in moz.build. However, it won't work for it will be overwritten by default flags (-mfpu=vfp). Is this a bug of moz.build?
(Assignee)

Comment 34

5 years ago
Created attachment 778259 [details] [diff] [review]
NEON implementation of AudioNodeEngine.cpp part 1
Attachment #772519 - Attachment is obsolete: true
Attachment #778259 - Flags: review+
(Assignee)

Comment 35

5 years ago
It looks like part 2 can't be landed until Bug 877662 is resolved since it depends on the 16 byte alignment of SharedBuffer.
(In reply to jwwang from comment #33)
> You can add DEFINES+=['-mfpu=neon'] in moz.build. However, it won't work for
> it will be overwritten by default flags (-mfpu=vfp). Is this a bug of
> moz.build?

That's not a DEFINE, so it doesn't get put in the right place in the compile commandline. Use glandium's suggestion in comment 32.
(Assignee)

Comment 37

5 years ago
It looks like Bug 877662 is gonna take a while. Should I remove the dependency and 16-byte alignment assertion so that this patch can land first?

Updated

5 years ago
blocking-b2g: --- → koi+
Whiteboard: [FT: Media Recording, Sprint]

Updated

5 years ago
Whiteboard: [FT: Media Recording, Sprint] → [FT: Media Recording, Sprint 1]

Updated

5 years ago
Flags: needinfo?(ben)
(Reporter)

Comment 38

5 years ago
(In reply to jwwang from comment #37)
> It looks like Bug 877662 is gonna take a while. Should I remove the
> dependency and 16-byte alignment assertion so that this patch can land first?

Sure, it would be nice if you could do that!
(Assignee)

Comment 39

5 years ago
Created attachment 787285 [details] [diff] [review]
Part 1 - Add AudioNodeEngineNEON.cpp
Attachment #778259 - Attachment is obsolete: true
Attachment #787285 - Flags: review+
(Assignee)

Comment 40

5 years ago
Created attachment 787287 [details] [diff] [review]
Part 2 - call NEON functions.

Remove 16-byte alignment which is not required by NEON.
Attachment #770588 - Attachment is obsolete: true
Attachment #787287 - Flags: review+
(Assignee)

Updated

5 years ago
No longer depends on: 877662
(Assignee)

Updated

5 years ago
Keywords: checkin-needed
More specifically, armv6 bustage.
(Assignee)

Comment 44

5 years ago
/usr/bin/ccache /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ -o AudioNodeEngineNEON.o -c  -fvisibility=hidden -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL  -DSTATIC_EXPORTABLE_JS_API -DNO_NSPR_10_SUPPORT -DOS_POSIX=1 -DOS_LINUX=1  -D_IMPL_NS_LAYOUT -I/builds/slave/try-and-a6-0000000000000000000/build/ipc/chromium/src -I/builds/slave/try-and-a6-0000000000000000000/build/ipc/glue -I../../ipc/ipdl/_ipdlheaders  -I/builds/slave/try-and-a6-0000000000000000000/build/content/media -I. -I../../dist/include  -I/builds/slave/try-and-a6-0000000000000000000/build/obj-firefox/dist/include/nspr -I/builds/slave/try-and-a6-0000000000000000000/build/obj-firefox/dist/include/nss     -I/builds/slave/try-and-a6-0000000000000000000/build/content/base/src -I/builds/slave/try-and-a6-0000000000000000000/build/layout/generic -I/builds/slave/try-and-a6-0000000000000000000/build/layout/xul/base/src   -fPIC -isystem /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/platforms/android-9/arch-arm/usr/include  -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -mandroid -fno-short-enums -fno-exceptions -Wno-psabi -march=armv6 -mfpu=vfp -I/builds/slave/try-and-a6-0000000000000000000/build/build/stlport/stlport -I/builds/slave/try-and-a6-0000000000000000000/build/android-ndk/sources/cxx-stl/system/include -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pipe  -DNDEBUG -DTRIMMED -g -Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer -Werror -Wno-error=uninitialized -Wno-error=deprecated-declarations  -mfpu=neon  -isystem /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/platforms/android-9/arch-arm/usr/include  -DMOZILLA_CLIENT -include ../../mozilla-config.h -MD -MP -MF .deps/AudioNodeEngineNEON.o.pp  /builds/slave/try-and-a6-0000000000000000000/build/content/media/AudioNodeEngineNEON.cpp

There is the "-mfpu=neon" option when compiling AudioNodeEngineNEON.cpp.
However...

In file included from /builds/slave/try-and-a6-0000000000000000000/build/content/media/AudioNodeEngineNEON.cpp:7:0:
/builds/slave/try-and-a6-0000000000000000000/build/android-ndk/toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.7/include/arm_neon.h:32:2: error: #error You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use arm_neon.h

Will march=armv6 cause mfpu=neon to be ignored?
(Reporter)

Comment 45

5 years ago
NEON is not available in ARMv6 AFAIK, so we should only build AudioNodeEngineNEON.cpp for ARMv7, and just use the C++ implementation for ARMv6.  Sorry I did not catch this before. :/
(Assignee)

Comment 46

5 years ago
We have
if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']:
    CPP_SOURCES += [
        'AudioNodeEngineNEON.cpp',
    ]

in moz.build.

It looks like CONFIG['HAVE_ARM_NEON'] is true when building for ARMv6. We should have this fixed or have some other way to identify CPU arch.
Use BUILD_ARM_NEON instead of HAVE_ARM_NEON.
(Assignee)

Comment 48

5 years ago
It looks like AudioNodeEngineNEON.cpp is still included using BUILD_ARM_NEON.
https://tbpl.mozilla.org/?tree=Try&rev=6fe0abfe1724
(In reply to jwwang from comment #48)
> It looks like AudioNodeEngineNEON.cpp is still included using BUILD_ARM_NEON.
> https://tbpl.mozilla.org/?tree=Try&rev=6fe0abfe1724

Unfortunately, BUILD_ARM_NEON is 0 instead of nothing, so the test doesn't do what you think it does. Feel free to change that in build/autoconf/arch.m4 and js/src/build/autoconf/arch.m4.
Note that part 2 is still likely to fail on armv6 because it will still try to use the functions that are provided by that file you don't want to build.
(Assignee)

Comment 51

5 years ago
Created attachment 789401 [details] [diff] [review]
Part 1 - Add AudioNodeEngineNEON.cpp
Attachment #787285 - Attachment is obsolete: true
Attachment #789401 - Flags: review?(mh+mozilla)
(Assignee)

Comment 52

5 years ago
Created attachment 789404 [details] [diff] [review]
Part 2 - call NEON functions.
Attachment #787287 - Attachment is obsolete: true
Attachment #789404 - Flags: review?(mh+mozilla)
(Assignee)

Comment 53

5 years ago
It is tricky that CONFIG['BUILD_ARM_NEON'] is a string (which is '1') and it returns false when being compared with 1 (the integer).

It is confusing to have both HAVE_ARM_NEON and BUILD_ARM_NEON. I would like to have HAVE_ARM_NEON defined only when $ARM_ARCH >= 7 and remove BUILD_ARM_NEON if feasible.
Comment on attachment 789401 [details] [diff] [review]
Part 1 - Add AudioNodeEngineNEON.cpp

Review of attachment 789401 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/moz.build
@@ +125,5 @@
>      'VideoUtils.cpp',
>      'WebVTTLoadListener.cpp',
>  ]
> +
> +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['BUILD_ARM_NEON'] == '1':

Please just change the silliness in build/autoconf/arch.m4 (replace 0 with nothing), and just use CONFIG['BUILD_ARM_NEON'] here.
Attachment #789401 - Flags: review?(mh+mozilla) → review-
Attachment #789404 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 55

5 years ago
Created attachment 789485 [details] [diff] [review]
Part 1 - Add AudioNodeEngineNEON.cpp
Attachment #789401 - Attachment is obsolete: true
Attachment #789485 - Flags: review?(mh+mozilla)
(Assignee)

Comment 56

5 years ago
Created attachment 789486 [details] [diff] [review]
Part 2 - call NEON functions.
Attachment #789404 - Attachment is obsolete: true
Attachment #789486 - Flags: review+
Attachment #789485 - Flags: review?(mh+mozilla) → review+
(Assignee)

Updated

5 years ago
Keywords: checkin-needed
(Reporter)

Updated

5 years ago
Whiteboard: [FT: Media Recording, Sprint 1] → [FT: Media Recording, Sprint 1][checkin-needed-aurora]
https://hg.mozilla.org/mozilla-central/rev/305facc074df
https://hg.mozilla.org/mozilla-central/rev/84ec68df9294
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
https://hg.mozilla.org/releases/mozilla-aurora/rev/141f5c87fbf7
https://hg.mozilla.org/releases/mozilla-aurora/rev/66856ecb3698
status-firefox25: --- → fixed
status-firefox26: --- → fixed
Whiteboard: [FT: Media Recording, Sprint 1][checkin-needed-aurora] → [FT: Media Recording, Sprint 1]
You need to log in before you can comment on or make changes to this bug.