Closed Bug 882171 Opened 12 years ago Closed 12 years ago

Optimize the AudioNodeEngine.cpp routines for NEON

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla26

Project Flags:

blocking-b2g

koi+

Tracking Flags:

Tracking

Status

firefox25

---

fixed

firefox26

---

fixed

People

(Reporter: ehsan.akhgari, Assigned: jwwang)

References

Details

(Whiteboard: [FT: Media Recording, Sprint 1])

Attachments

(4 files, 11 obsolete files)

NEON implementation of AudioNodeEngine.cpp part 1 12 years ago JW Wang [:jwwang] [:jw_wang] 6.68 KB, patch		Details \| Diff \| Splinter Review
NEON implementation of AudioNodeEngine.cpp part 2 12 years ago JW Wang [:jwwang] [:jw_wang] 5.28 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
NEON implementation of AudioNodeEngine.cpp part 1 12 years ago JW Wang [:jwwang] [:jw_wang] 7.07 KB, patch	padenot : review+	Details \| Diff \| Splinter Review
NEON implementation of AudioNodeEngine.cpp part 2 12 years ago JW Wang [:jwwang] [:jw_wang] 5.36 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review
NEON implementation of AudioNodeEngine.cpp part 2 12 years ago JW Wang [:jwwang] [:jw_wang] 5.37 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review
Ｍicrobenchmark - AudioNodeEngineNEON.cpp with loop unrolling 12 years ago JW Wang [:jwwang] [:jw_wang] 5.30 KB, text/x-c++src		Details
Implementation - AudioNodeEngineNEON.cpp with loop unrolling 12 years ago JW Wang [:jwwang] [:jw_wang] 17.31 KB, text/x-c++src		Details
NEON implementation of AudioNodeEngine.cpp part 1 12 years ago JW Wang [:jwwang] [:jw_wang] 11.34 KB, patch	derf : review+	Details \| Diff \| Splinter Review
NEON implementation of AudioNodeEngine.cpp part 1 12 years ago JW Wang [:jwwang] [:jw_wang] 11.97 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review
Part 1 - Add AudioNodeEngineNEON.cpp 12 years ago JW Wang [:jwwang] [:jw_wang] 11.96 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review
Part 2 - call NEON functions. 12 years ago JW Wang [:jwwang] [:jw_wang] 4.53 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review
Part 1 - Add AudioNodeEngineNEON.cpp 12 years ago JW Wang [:jwwang] [:jw_wang] 11.96 KB, patch	glandium : review-	Details \| Diff \| Splinter Review
Part 2 - call NEON functions. 12 years ago JW Wang [:jwwang] [:jw_wang] 5.74 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Part 1 - Add AudioNodeEngineNEON.cpp 12 years ago JW Wang [:jwwang] [:jw_wang] 13.29 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Part 2 - call NEON functions. 12 years ago JW Wang [:jwwang] [:jw_wang] 5.75 KB, patch	jwwang : review+	Details \| Diff \| Splinter Review

(no longer active)

Reporter

Description

•

12 years ago

No description provided.

Paul Adenot (:padenot)

Comment 1

•

12 years ago

Ben, would you share your crazy setup you use to write ARM assembly for Opus on your FxOS device with me ? Otherwise, I'm gonna try to find a Pandaboard somewhere.

Flags: needinfo?(ben)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 2

•

12 years ago

Before we do this (or other optimization work) I think we need some benchmarks. Filed bug 882543 for that.

Depends on: 882543

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

12 years ago

No longer depends on: 882543

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

12 years ago

The SSE2 work is in bug 877662. That bug also makes audio blocks be 16-byte aligned.

Assignee: paul → nobody

Depends on: 877662

u459114

Updated

•

12 years ago

Assignee: nobody → jwwang

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 4

•

12 years ago

I am planning to use this library. http://projectne10.github.io/Ne10/doc/index.html Or is there a similar library integrated to gecko already?

Paul Adenot (:padenot)

Comment 5

•

12 years ago

This library does not seem to suit our needs, from what I read. Working directly with NEON intrinsics should be easy and fast enough for what we have to do to write the code ourselves without having to rely on a library. If we consider after profiling that the speedup is not that great and that we need more speed, we can then consider writing assembly directly, but this would be way slower to write.

(no longer active)

Reporter

Comment 6

•

12 years ago

Yeah, I agree. We should try to optimize Kiss FFT for NEON and then measure some test cases (ConvolverNode should be the most interesting consumer) and see if the FFT is the bottleneck or not.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 7

•

12 years ago

(In reply to Paul Adenot (:padenot) from comment #5) > This library does not seem to suit our needs, from what I read. I fail to see how this library doesn't fit our needs. Below is a list of functions that can be imiplemented in Ne10 functions. void AudioBufferAddWithScale(const float* aInput, float aScale, float* aOutput, uint32_t aSize) * ne10_add_float (when aScale == 1.0f) * ne10_mlac_float AudioBlockCopyChannelWithScale(const float* aInput, float aScale, float* aOutput) * ne10_mulc_float AudioBlockCopyChannelWithScale(const float aInput[WEBAUDIO_BLOCK_SIZE], const float aScale[WEBAUDIO_BLOCK_SIZE], float aOutput[WEBAUDIO_BLOCK_SIZE]) * ne10_mul_float

Paul Adenot (:padenot)

Comment 8

•

12 years ago

I'm sure you can do something with this lib, but I'm not sure it is worth importing it in the tree. ne10_add_float is just a vector add, ne10_mlac_float is just a multiply and accumulate. Nothing difficult to write, or I'm missing something.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 9

•

12 years ago

Just a thought about not reinventing the wheel. If it is not worth the porting, let roll our own.

(no longer active)

Reporter

Comment 10

•

12 years ago

(Sorry, for some reason I previously commented on this bug thinking that it's bug 885496!) (In reply to comment #9) > Just a thought about not reinventing the wheel. > If it is not worth the porting, let roll our own. I believe that the code involved here should be fairly simple and localized, and I would prefer to not have to import a large library just so that we can use a few of their routines.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 11

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 1 (obsolete) — Details — Splinter Review

Attachment #770000 - Flags: review?(ehsan)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 12

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 2 (obsolete) — Details — Splinter Review

Attachment #770001 - Flags: review?(ehsan)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 13

•

12 years ago

Comment on attachment 770000 [details] [diff] [review] NEON implementation of AudioNodeEngine.cpp part 1 Review of attachment 770000 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/AudioNodeEngineNEON.cpp @@ +29,5 @@ > + for (unsigned i = 0; i < aSize; i+=4) { > + vin = vld1q_f32((float32_t*)aInput+i); > + vout = vld1q_f32((float32_t*)aOutput+i); > + vout = vmlaq_f32(vout, vin, vscale); > + vst1q_f32((float32_t*)aOutput+i, vout); This doesn't properly handle the case where aSize is not a multiple of 4. @@ +83,5 @@ > + > + for (uint32_t i = 0; i < aSize * aChannelCount; i+=4) { > + vin = vld1q_f32((float32_t*)aBlock+i); > + vout = vmulq_f32(vin, vscale); > + vst1q_f32((float32_t*)aBlock+i, vout); Same here.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 14

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 1 (obsolete) — Details — Splinter Review

Handle the case where array size is not a multiple of 4.

Attachment #770000 - Attachment is obsolete: true

Attachment #770000 - Flags: review?(ehsan)

Attachment #770086 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

12 years ago

Attachment #770086 - Flags: review?(roc) → review?(paul)

Paul Adenot (:padenot)

Comment 16

•

12 years ago

Comment on attachment 770086 [details] [diff] [review] NEON implementation of AudioNodeEngine.cpp part 1 Review of attachment 770086 [details] [diff] [review]: ----------------------------------------------------------------- Could you try hand unrolling a couple functions, to check if we can get some speedup? Just to check if it is worth it. For example, I could get 50% speedup by unrolling the inplace gain implementation four times when using SSE. derf, could you sanity check this if you have a minute? I'm afraid I'm ignorant when it comes to ARM. ::: content/media/AudioNodeEngineNEON.cpp @@ +30,5 @@ > + aSize -= dif; > + unsigned i = 0; > + for (; i < aSize; i+=4) { > + vin = vld1q_f32((float32_t*)aInput+i); > + vout = vld1q_f32((float32_t*)aOutput+i); I remember that derf told me it was preferable to use array indexing rather that pointer arithmetic, in terms of speed. Anyways, I'd rather have static_casts at the beginning of the function that cluttering the inner loop.

Attachment #770086 - Flags: review?(tterribe)

Attachment #770086 - Flags: review?(paul)

Attachment #770086 - Flags: review+

(no longer active)

Reporter

Comment 17

•

12 years ago

Comment on attachment 770001 [details] [diff] [review] NEON implementation of AudioNodeEngine.cpp part 2 Review of attachment 770001 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/AudioNodeEngine.cpp @@ +22,5 @@ > aChunk->mDuration = WEBAUDIO_BLOCK_SIZE; > aChunk->mChannelData.SetLength(aChannelCount); > float* data = static_cast<float*>(buffer->Data()); > for (uint32_t i = 0; i < aChannelCount; ++i) { > + float* check = data + i*WEBAUDIO_BLOCK_SIZE; Nit: please call this channelData.

Attachment #770001 - Flags: review?(ehsan) → review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 18

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 2 (obsolete) — Details — Splinter Review

Attachment #770001 - Attachment is obsolete: true

Attachment #770587 - Flags: review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 19

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 2 (obsolete) — Details — Splinter Review

Attachment #770587 - Attachment is obsolete: true

Attachment #770588 - Flags: review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 20

•

12 years ago

Attached file Ｍicrobenchmark - AudioNodeEngineNEON.cpp with loop unrolling — Details

test results: AudioBufferAddWithScale : loop=1000000, time=1.553925 AudioBufferAddWithScale_NEON : loop=1000000, time=0.730693 AudioBufferAddWithScale_NEON_unroll : loop=1000000, time=0.741942 AudioBufferAddWithScale_NEON_unroll2 : loop=1000000, time=0.701317 AudioBlockCopyChannelWithScale : loop=1000000, time=0.788323 AudioBlockCopyChannelWithScale_NEON : loop=1000000, time=0.635880 AudioBlockCopyChannelWithScale_NEON_unroll : loop=1000000, time=0.495064 AudioBlockCopyChannelWithScale_NEON_unroll2 : loop=1000000, time=0.503057 AudioBlockCopyChannelWithScale : loop=1000000, time=1.164082 AudioBlockCopyChannelWithScale_NEON : loop=1000000, time=0.838090 AudioBlockCopyChannelWithScale_NEON_unroll : loop=1000000, time=0.678961 AudioBlockCopyChannelWithScale_NEON_unroll2 : loop=1000000, time=0.663595 AudioBufferInPlaceScale : loop=1000000, time=0.923929 AudioBufferInPlaceScale_NEON : loop=1000000, time=0.535680 AudioBufferInPlaceScale_NEON_unroll : loop=1000000, time=0.469168 AudioBufferInPlaceScale_NEON_unroll2 : loop=1000000, time=0.472063 AudioBlockPanStereoToStereo : loop=1000000, time=1.942470 AudioBlockPanStereoToStereo_NEON : loop=1000000, time=1.273460 AudioBlockPanStereoToStereo_NEON_unroll : loop=1000000, time=1.103122 AudioBlockPanStereoToStereo_NEON_unroll2 : loop=1000000, time=1.114045 Generally, unroll level 4 is better than those without unrolling. However, for some functions (ex: AudioBufferAddWithScale_NEON_unroll2), unroll level 2 is better than level 4. I think we might have to tweak functions case by case. Btw, there is no performance difference whether to use pointer arithmetic or array indexing. I guess the compiler is smart enough to figure this out.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 21

•

12 years ago

Attached file Implementation - AudioNodeEngineNEON.cpp with loop unrolling — Details

The file that goes with the benchmark test.

u459114

Comment 22

•

12 years ago

http://en.wikipedia.org/wiki/Software_pipelining

Timothy B. Terriberry (:derf)

Comment 23

•

12 years ago

(In reply to jwwang from comment #20) > Btw, there is no performance difference whether to use pointer arithmetic or > array indexing. I guess the compiler is smart enough to figure this out. Well, the important point, I think, is whether you change the base pointer in the loop (vin = vld1q_f32((float32_t*)aInput); aInput+=4;) vs. accessing it with the loop index (as you do here). I don't think the exact syntax matters. This is just a general observation that compilers usually produce better code in the latter case, but it may not always apply. See, e.g., <http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf> Section 3.2.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 24

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 1 (obsolete) — Details — Splinter Review

With loop unrolling.

Attachment #770086 - Attachment is obsolete: true

Attachment #770086 - Flags: review?(tterribe)

Attachment #772519 - Flags: review?(tterribe)

Timothy B. Terriberry (:derf)

Comment 25

•

12 years ago

Comment on attachment 772519 [details] [diff] [review] NEON implementation of AudioNodeEngine.cpp part 1 Review of attachment 772519 [details] [diff] [review]: ----------------------------------------------------------------- I took a look at the actual asm this generates, and it is moderately awful, as one would expect from intrinsics (lots of redundant addressing calculations, unaligned load/stores, small load/stores instead of large ones, etc.). It all looks like it will work, though. r=me once you fix the build issue. ::: content/media/AudioNodeEngineNEON.cpp @@ +22,5 @@ > + float* aOutput, > + uint32_t aSize) > +{ > + ASSERT_ALIGNED(aInput); > + ASSERT_ALIGNED(aOutput); I don't think you actually get any benefit from this, because there are no intrinsics that specifically take aligned pointers. All the asm it generates uses unaligned loads. @@ +33,5 @@ > + aSize -= dif; > + unsigned i = 0; > + for (; i < aSize; i+=16) { > + vin0 = vld1q_f32(ADDRESS_OF(aInput, i)); > + vin1 = vld1q_f32(ADDRESS_OF(aInput, i+4)); Ugh, there's no 4-register vld1 available from intrinsics? That's pretty poor. ::: content/media/AudioNodeEngineNEON.h @@ +1,4 @@ > +/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ > +/* this source code form is subject to the terms of the mozilla public > + * license, v. 2.0. if a copy of the mpl was not distributed with this file, > + * You can obtain one at http://mozilla.org/MPL/2.0/. */ This header has no multiple-include guard. ::: content/media/moz.build @@ +119,5 @@ > ] > + > +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']: > + CPP_SOURCES += [ > + 'AudioNodeEngineNEON.cpp', This won't build. HAVE_ARM_NEON merely asserts that the compiler can build NEON code. It doesn't actually force using -mfloat-abi=softfp -mfpu=neon, which is required for these intrinsics to work. This was easy to fix in the old Make-based build system, but I have no idea how to do it with moz.build files.

Attachment #772519 - Flags: review?(tterribe) → review+

Timothy B. Terriberry (:derf)

Comment 26

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #25) > Ugh, there's no 4-register vld1 available from intrinsics? That's pretty > poor. FWIW, since you don't actually care how this data is arranged in the registers, you could maybe use vld2q_f32() to load 32 bytes at a time? The cycle cost should be the same as a real 4-register vld1.32.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 27

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #26) > FWIW, since you don't actually care how this data is arranged in the > registers, you could maybe use vld2q_f32() to load 32 bytes at a time? The > cycle cost should be the same as a real 4-register vld1.32. Using float32x4x2_t in c code will result in stack push/pop instructions which will decrease the speed. Lately I've been poking around source code and build flags to see if I can get a better output which includes: 1. -funroll-loop which adds further improvement slightly over code which is already manually unrolled. 2. memory barrier which prevents improper instruction reordering from compiler which generates code where vld1.32 is followed by vmla.f32 immediately and stalls the CPU pipeline. 3. __builtin_prefetch (available from GCC 4.7) which doesn't add visible improvement. 4. __builtin_assume_aligned which doesn't work well as intended. I think if we need further optimization, we should go to assembly to get most out of it.

(no longer active)

Reporter

Comment 28

•

12 years ago

(In reply to jwwang from comment #27) > Lately I've been poking around source code and build flags to see if I can > get a better output which includes: > 1. -funroll-loop which adds further improvement slightly over code which is > already manually unrolled. > 2. memory barrier which prevents improper instruction reordering from > compiler which generates code where vld1.32 is followed by vmla.f32 > immediately and stalls the CPU pipeline. > 3. __builtin_prefetch (available from GCC 4.7) which doesn't add visible > improvement. > 4. __builtin_assume_aligned which doesn't work well as intended. We should be able to do all of these later in separate bugs.

(no longer active)

Reporter

Comment 29

•

12 years ago

(In reply to Timothy B. Terriberry (:derf) from comment #25) > ::: content/media/moz.build > @@ +119,5 @@ > > ] > > + > > +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']: > > + CPP_SOURCES += [ > > + 'AudioNodeEngineNEON.cpp', > > This won't build. HAVE_ARM_NEON merely asserts that the compiler can build > NEON code. It doesn't actually force using -mfloat-abi=softfp -mfpu=neon, > which is required for these intrinsics to work. > > This was easy to fix in the old Make-based build system, but I have no idea > how to do it with moz.build files. Perhaps gps can tell us how to do that.

Flags: needinfo?(gps)

Ivan Tsay (:ITsay)

Updated

•

12 years ago

Blocks: 894856

(no longer active)

Reporter

Comment 30

•

12 years ago

Needinfo?ing some other build system peers.

Flags: needinfo?(ted)

Flags: needinfo?(mh+mozilla)

Gregory Szorc [:gps]

Comment 31

•

12 years ago

What exactly do you need to do? Add some compiler flags? You can OS_CXXFLAGS or CXXFLAGS += extra flags in Makefile.in (these variables aren't yet ported to moz.build).

Flags: needinfo?(ted)

Flags: needinfo?(mh+mozilla)

Flags: needinfo?(gps)

Mike Hommey [:glandium]

Comment 32

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #31) > What exactly do you need to do? Add some compiler flags? You can OS_CXXFLAGS > or CXXFLAGS += extra flags in Makefile.in (these variables aren't yet ported > to moz.build). Which needs to happen only on one file, so you need AudioNodeEngineNEON.$(OBJ_SUFFIX): CXXFLAGS += -mfpu=neon. We'll need a special syntax for thes things in mo.build, but we don't have one yet.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 33

•

12 years ago

You can add DEFINES+=['-mfpu=neon'] in moz.build. However, it won't work for it will be overwritten by default flags (-mfpu=vfp). Is this a bug of moz.build?

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 34

•

12 years ago

Attached patch NEON implementation of AudioNodeEngine.cpp part 1 (obsolete) — Details — Splinter Review

Attachment #772519 - Attachment is obsolete: true

Attachment #778259 - Flags: review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 35

•

12 years ago

It looks like part 2 can't be landed until Bug 877662 is resolved since it depends on the 16 byte alignment of SharedBuffer.

(not currently active) Ted Mielczarek

Comment 36

•

12 years ago

(In reply to jwwang from comment #33) > You can add DEFINES+=['-mfpu=neon'] in moz.build. However, it won't work for > it will be overwritten by default flags (-mfpu=vfp). Is this a bug of > moz.build? That's not a DEFINE, so it doesn't get put in the right place in the compile commandline. Use glandium's suggestion in comment 32.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 37

•

12 years ago

It looks like Bug 877662 is gonna take a while. Should I remove the dependency and 16-byte alignment assertion so that this patch can land first?

Ivan Tsay (:ITsay)

Updated

•

12 years ago

blocking-b2g: --- → koi+

Whiteboard: [FT: Media Recording, Sprint]

Ivan Tsay (:ITsay)

Updated

•

12 years ago

Whiteboard: [FT: Media Recording, Sprint] → [FT: Media Recording, Sprint 1]

Paul Adenot (:padenot)

Updated

•

12 years ago

Flags: needinfo?(ben)

(no longer active)

Reporter

Comment 38

•

12 years ago

(In reply to jwwang from comment #37) > It looks like Bug 877662 is gonna take a while. Should I remove the > dependency and 16-byte alignment assertion so that this patch can land first? Sure, it would be nice if you could do that!

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 39

•

12 years ago

Attached patch Part 1 - Add AudioNodeEngineNEON.cpp (obsolete) — Details — Splinter Review

Attachment #778259 - Attachment is obsolete: true

Attachment #787285 - Flags: review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 40

•

12 years ago

Attached patch Part 2 - call NEON functions. (obsolete) — Details — Splinter Review

Remove 16-byte alignment which is not required by NEON.

Attachment #770588 - Attachment is obsolete: true

Attachment #787287 - Flags: review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

12 years ago

No longer depends on: 877662

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 41

•

12 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/a232e4df28e5 https://hg.mozilla.org/integration/b2g-inbound/rev/6126987777f3

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 42

•

12 years ago

Backed out for Android bustage. https://hg.mozilla.org/integration/b2g-inbound/rev/21dcb96139ee https://tbpl.mozilla.org/php/getParsedLog.php?id=26316759&tree=B2g-Inbound

Ryan VanderMeulen [:RyanVM]

Comment 43

•

12 years ago

More specifically, armv6 bustage.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 44

•

12 years ago

/usr/bin/ccache /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ -o AudioNodeEngineNEON.o -c -fvisibility=hidden -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -DNO_NSPR_10_SUPPORT -DOS_POSIX=1 -DOS_LINUX=1 -D_IMPL_NS_LAYOUT -I/builds/slave/try-and-a6-0000000000000000000/build/ipc/chromium/src -I/builds/slave/try-and-a6-0000000000000000000/build/ipc/glue -I../../ipc/ipdl/_ipdlheaders -I/builds/slave/try-and-a6-0000000000000000000/build/content/media -I. -I../../dist/include -I/builds/slave/try-and-a6-0000000000000000000/build/obj-firefox/dist/include/nspr -I/builds/slave/try-and-a6-0000000000000000000/build/obj-firefox/dist/include/nss -I/builds/slave/try-and-a6-0000000000000000000/build/content/base/src -I/builds/slave/try-and-a6-0000000000000000000/build/layout/generic -I/builds/slave/try-and-a6-0000000000000000000/build/layout/xul/base/src -fPIC -isystem /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/platforms/android-9/arch-arm/usr/include -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -mandroid -fno-short-enums -fno-exceptions -Wno-psabi -march=armv6 -mfpu=vfp -I/builds/slave/try-and-a6-0000000000000000000/build/build/stlport/stlport -I/builds/slave/try-and-a6-0000000000000000000/build/android-ndk/sources/cxx-stl/system/include -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pipe -DNDEBUG -DTRIMMED -g -Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer -Werror -Wno-error=uninitialized -Wno-error=deprecated-declarations -mfpu=neon -isystem /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/platforms/android-9/arch-arm/usr/include -DMOZILLA_CLIENT -include ../../mozilla-config.h -MD -MP -MF .deps/AudioNodeEngineNEON.o.pp /builds/slave/try-and-a6-0000000000000000000/build/content/media/AudioNodeEngineNEON.cpp There is the "-mfpu=neon" option when compiling AudioNodeEngineNEON.cpp. However... In file included from /builds/slave/try-and-a6-0000000000000000000/build/content/media/AudioNodeEngineNEON.cpp:7:0: /builds/slave/try-and-a6-0000000000000000000/build/android-ndk/toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.7/include/arm_neon.h:32:2: error: #error You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use arm_neon.h Will march=armv6 cause mfpu=neon to be ignored?

(no longer active)

Reporter

Comment 45

•

12 years ago

NEON is not available in ARMv6 AFAIK, so we should only build AudioNodeEngineNEON.cpp for ARMv7, and just use the C++ implementation for ARMv6. Sorry I did not catch this before. :/

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 46

•

12 years ago

We have if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['HAVE_ARM_NEON']: CPP_SOURCES += [ 'AudioNodeEngineNEON.cpp', ] in moz.build. It looks like CONFIG['HAVE_ARM_NEON'] is true when building for ARMv6. We should have this fixed or have some other way to identify CPU arch.

Mike Hommey [:glandium]

Comment 47

•

12 years ago

Use BUILD_ARM_NEON instead of HAVE_ARM_NEON.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 48

•

12 years ago

It looks like AudioNodeEngineNEON.cpp is still included using BUILD_ARM_NEON. https://tbpl.mozilla.org/?tree=Try&rev=6fe0abfe1724

Mike Hommey [:glandium]

Comment 49

•

12 years ago

(In reply to jwwang from comment #48) > It looks like AudioNodeEngineNEON.cpp is still included using BUILD_ARM_NEON. > https://tbpl.mozilla.org/?tree=Try&rev=6fe0abfe1724 Unfortunately, BUILD_ARM_NEON is 0 instead of nothing, so the test doesn't do what you think it does. Feel free to change that in build/autoconf/arch.m4 and js/src/build/autoconf/arch.m4.

Mike Hommey [:glandium]

Comment 50

•

12 years ago

Note that part 2 is still likely to fail on armv6 because it will still try to use the functions that are provided by that file you don't want to build.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 51

•

12 years ago

Attached patch Part 1 - Add AudioNodeEngineNEON.cpp (obsolete) — Details — Splinter Review

Attachment #787285 - Attachment is obsolete: true

Attachment #789401 - Flags: review?(mh+mozilla)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 52

•

12 years ago

Attached patch Part 2 - call NEON functions. (obsolete) — Details — Splinter Review

Attachment #787287 - Attachment is obsolete: true

Attachment #789404 - Flags: review?(mh+mozilla)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 53

•

12 years ago

It is tricky that CONFIG['BUILD_ARM_NEON'] is a string (which is '1') and it returns false when being compared with 1 (the integer). It is confusing to have both HAVE_ARM_NEON and BUILD_ARM_NEON. I would like to have HAVE_ARM_NEON defined only when $ARM_ARCH >= 7 and remove BUILD_ARM_NEON if feasible.

Mike Hommey [:glandium]

Comment 54

•

12 years ago

Comment on attachment 789401 [details] [diff] [review] Part 1 - Add AudioNodeEngineNEON.cpp Review of attachment 789401 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/moz.build @@ +125,5 @@ > 'VideoUtils.cpp', > 'WebVTTLoadListener.cpp', > ] > + > +if CONFIG['CPU_ARCH'] == 'arm' and CONFIG['BUILD_ARM_NEON'] == '1': Please just change the silliness in build/autoconf/arch.m4 (replace 0 with nothing), and just use CONFIG['BUILD_ARM_NEON'] here.

Attachment #789401 - Flags: review?(mh+mozilla) → review-

Mike Hommey [:glandium]

Updated

•

12 years ago

Attachment #789404 - Flags: review?(mh+mozilla) → review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 55

•

12 years ago

Attached patch Part 1 - Add AudioNodeEngineNEON.cpp — Details — Splinter Review

Attachment #789401 - Attachment is obsolete: true

Attachment #789485 - Flags: review?(mh+mozilla)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 56

•

12 years ago

Attached patch Part 2 - call NEON functions. — Details — Splinter Review

Attachment #789404 - Attachment is obsolete: true

Attachment #789486 - Flags: review+

Mike Hommey [:glandium]

Updated

•

12 years ago

Attachment #789485 - Flags: review?(mh+mozilla) → review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 57

•

12 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/305facc074df https://hg.mozilla.org/integration/b2g-inbound/rev/84ec68df9294

Keywords: checkin-needed

(no longer active)

Reporter

Updated

•

12 years ago

Whiteboard: [FT: Media Recording, Sprint 1] → [FT: Media Recording, Sprint 1][checkin-needed-aurora]

Ryan VanderMeulen [:RyanVM]

Comment 58

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/305facc074df https://hg.mozilla.org/mozilla-central/rev/84ec68df9294

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

Ryan VanderMeulen [:RyanVM]

Comment 59

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/141f5c87fbf7 https://hg.mozilla.org/releases/mozilla-aurora/rev/66856ecb3698

status-firefox25: --- → fixed

status-firefox26: --- → fixed

Whiteboard: [FT: Media Recording, Sprint 1][checkin-needed-aurora] → [FT: Media Recording, Sprint 1]

You need to log in before you can comment on or make changes to this bug.