Closed Bug 946021 Opened 10 years ago Closed 10 years ago

Enable ARM assembly in libopus ARM build

Categories

(Core :: Audio/Video, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: mark.hsj, Assigned: gcp)

References

Details

Attachments

(2 files, 3 obsolete files)

libopus includes assembly optimizations for ARM, but they are not enabled when
building gecko/media/libopus for ARM.  Instead the generic C code is built.
For improved performance, the assembly optimizations should be enabled when possible.
Assignee: nobody → giles
Summary: Enable assembly optimizations in libopus ARM build → Enable ARM assembly in libopus ARM build
Attached patch Hack for testing (obsolete) — Splinter Review
Blocks: 979716
Steven - can you try this on B2G and see how if it works, and how much difference it makes?
gcp/blassey: who should/can test this on Android?
Flags: needinfo?(slee)
Flags: needinfo?(gpascutto)
Flags: needinfo?(blassey.bugs)
Randall, can you post a try build and suggest what needs to be tested?
Flags: needinfo?(blassey.bugs)
Looking for a comparison for CPU time used in Opus encode and decode (audio-only calls).  NOTE: they run on webrtc threads that are not registered with the Gecko profiler.  A system or other external profiler will be needed.  (Even top wouldn't hurt to record avg CPU in an audio only webrtc call).
Can we wrap the webrtc calls in mozprofiler hooks? I experimented with that some, but the obvious places are C code and our hooks need to call C++.
media/webrtc/trunk/webrtc/modules/audio_coding/main/acm2/acm_opus.cc
or 
media/webrtc/trunk/webrtc/modules/audio_coding/main/source/acm_opus.cc

(not sure which one is being used now - hit both)
(In reply to Randell Jesup [:jesup] from comment #2)
> Steven - can you try this on B2G and see how if it works, and how much
> difference it makes?
I tried on my unagi(my peak is in the office). When applying the patch, the cpu usage of encoder is higher than before. I will try it on my peak on Monday again.

(In reply to Randell Jesup [:jesup] from comment #7)
> media/webrtc/trunk/webrtc/modules/audio_coding/main/acm2/acm_opus.cc
From the profiling data, it should use this version
The original patch does not really enable arm. This patch could tell the build system to include arm code. But it fails when compiling media/libopus/celt/arm/celt_pitch_xcorr_arm.s. I don't know how to read *.s files. Can someone help?

Here is part of the error.
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s: Assembler messages:
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:1: Error: bad instruction `copyright (c)2007-2008 CSIRO'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:2: Error: bad instruction `copyright (c)2007-2009 Xiph.Org Foundation'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:3: Error: bad instruction `copyright (c)2013 Parrot'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:4: Error: bad instruction `written by Aurélien Zanelli'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:6: Error: bad instruction `redistribution and use in source and binary forms,with or without'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:7: Error: bad instruction `modification, are permitted provided that the following conditions'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:8: Error: bad instruction `are met:'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:10: Error: junk at end of line, first unrecognized character is `-'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:11: Error: bad instruction `notice, this list of conditions and the following disclaimer.'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:13: Error: junk at end of line, first unrecognized character is `-'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:14: Error: bad instruction `notice, this list of conditions and the following disclaimer in the'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:15: Error: bad instruction `documentation and/or other materials provided with the distribution.'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:17: Error: bad instruction `this SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:18: Error: junk at end of line, first unrecognized character is ``'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:19: Error: bad instruction `limited TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:20: Error: bad instruction `a PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:21: Error: bad instruction `or CONTRIBUTORS BE LIABLE FOR ANY DIRECT,INDIRECT,INCIDENTAL,SPECIAL,'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:22: Error: bad instruction `exemplary, OR CONSEQUENTIAL DAMAGES(INCLUDING,BUT NOT LIMITED TO,'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:23: Error: bad instruction `procurement OF SUBSTITUTE GOODS OR SERVICES'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:23: Error: bad instruction `loss OF USE,DATA,OR'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:24: Error: bad instruction `profits'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:24: Error: bad instruction `or BUSINESS INTERRUPTION)HOWEVER CAUSED AND ON ANY THEORY OF'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:25: Error: bad instruction `liability, WHETHER IN CONTRACT,STRICT LIABILITY,OR TORT(INCLUDING'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:26: Error: bad instruction `negligence OR OTHERWISE)ARISING IN ANY WAY OUT OF THE USE OF THIS'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:27: Error: bad instruction `software, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:28: Error: bad instruction `area |.text|,CODE,READONLY'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:30: Error: bad instruction `get celt/arm/armopts.s'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:32: Error: bad instruction `if OPUS_ARM_MAY_HAVE_EDSP'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:33: Error: bad instruction `export celt_pitch_xcorr_edsp'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:34: Error: bad instruction `endif'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:36: Error: bad instruction `if OPUS_ARM_MAY_HAVE_NEON'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:37: Error: bad instruction `export celt_pitch_xcorr_neon'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:38: Error: bad instruction `endif'
/home/steven/workspace/b2g/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm.s:40: Error: bad instruction `if OPUS_ARM_MAY_HAVE_NEON'
Flags: needinfo?(slee)
(In reply to StevenLee[:slee] from comment #9)
> build system to include arm code. But it fails when compiling
> media/libopus/celt/arm/celt_pitch_xcorr_arm.s. I don't know how to read *.s
> files. Can someone help?

The .s file is using RVCT (ARM's assembler) syntax. To build with gcc you need to translate to gcc syntax. You need the celt/arm/arm2gnu.pl script from the upstream repository to do the translation.
There are still some compiling error with the patch. I manually generate .S files and put them in the objdir. Here is the profiling data on peak, https://bugzilla.mozilla.org/attachment.cgi?id=8388962
Assuming this is still full complexity, there's a small improvement: 9.2% silk -> 8.9%.  The rest seems in the noise.  We're still around 10.8 for opus encode.  Dropping complexity seems worth doing, and also reducing Opus bitrate on mobile (at least) to 16Kbps per IRC discussions.
What complexity are you using right now? The default (9 for version 1.1) or 10?
I'm on vacation for the next two weeks if someone wants to take over this bug.
I'm trying to properly fix this (by making update.sh do the conversion automatically etc), but:

 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s: Assembler messages:
 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s:33: Error: can't open celt/arm/armopts-gnu.S for reading: No such file or directory
 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s:35: Error: non-constant expression in ".if" statement
 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s:39: Error: non-constant expression in ".if" statement
 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s:43: Error: non-constant expression in ".if" statement
 0:35.12 /home/morbo/hg/mozilla-central/media/libopus/celt/arm/celt_pitch_xcorr_arm_gnu.s:257: Error: non-constant expression in ".if" statement

The latter few look like failures in the conversion script that should be fixed upstream.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #15)
> The latter few look like failures in the conversion script that should be
> fixed upstream.

The "non-constant expression in ".if" statement" messages are a result of the failed include.
I made a proper patch for this that should work for B2G as well. 

However testing it is a pain: I found no good way to do an audio-only WebRTC call.
Assignee: giles → gpascutto
Attachment #8387895 - Attachment is obsolete: true
Attachment #8388323 - Attachment is obsolete: true
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #8393534 - Flags: review?(tterribe)
Flags: needinfo?(gpascutto)
You could use the MediaRecorder API: http://simpl.info/mediarecorder/
Comment on attachment 8393534 [details] [diff] [review]
Patch 1. Enable ARM Assembler in Opus

Review of attachment 8393534 [details] [diff] [review]:
-----------------------------------------------------------------

r=me (with a few minor comments), but please have a real build peer take a look as well.

::: media/libopus/Makefile.in
@@ +9,5 @@
> +# These flags are a lie; they're just used to enable the requisite
> +# opcodes; actual arch detection is done at runtime.
> +ASFLAGS = -march=armv7-a -mfpu=neon
> +
> +celt_pitch_xcorr_arm-gnu.$(ASM_SUFFIX): celt_arm_dir celt/arm/armopts-gnu.S

I'm not sure I understand this rule. Isn't it celt/arm/armopts-gnu.S that depends upon this directory being created in the objdir?

::: media/libopus/moz.build
@@ +24,5 @@
> +    DEFINES['OPUS_ARM_EXTERNAL_ASM'] = True
> +    DEFINES['OPUS_ARM_INLINE_ASM'] = True
> +    DEFINES['OPUS_ARM_INLINE_EDSP'] = True
> +    DEFINES['OPUS_ARM_INLINE_MEDIA'] = True
> +    DEFINES['OPUS_ARM_INLINE_NEON'] = True

libopus doesn't actually use this define for anything yet, but you probably don't want to enable it unconditionally. Seems like a good future foot-gun.
Attachment #8393534 - Flags: review?(tterribe) → review+
>Isn't it celt/arm/armopts-gnu.S that depends upon this directory being created in the objdir?

Yes, but make will try to go into that directory to find that file and fail with an error if it doesn't already exist. So it (the dir) must exist to verify that the dependency doesn't exist. I don't claim to be a make expert but this is the only way it seems to work.
I made a MediaRecorder test page that does a simultaneous encode/decode, http://sjeng.org/mozilla/mrspeedtestnoauto.html, but see bug Bug 986074.

I tested on a Nexus S, which is a 1.0Ghz single core Cortex A8, i.e. a reasonably slow device by todays standards (but probably faster than many Firefox OS phones).

I saw 56% cpu usage. This is essentially identical with or without patch.

This is really pretty bad (both the CPU usage and the patch not doing anything about it).
gcp: can you capture a profile of your test?  That would help a ton.
Flags: needinfo?(gpascutto)
I ran some numbers on upstream an old 600 MHz Cortex A8 (just because I had it set up, so it was easy), to give us some expectation of what we should see:

time ./opus_demo voip 48000 1 32000 comp48-stereo.sw /dev/null

./configure --enable-fixed-point
real	0m40.239s
user	0m39.758s
sys	0m0.219s

./configure --enable-fixed-point --disable-asm
real	0m50.410s
user	0m49.836s
sys	0m0.297s

./configure CFLAGS="-Os" --enable-fixed-point
real	1m2.196s
user	1m1.586s
sys	0m0.234s

./configure CFLAGS="-Os" --enable-fixed-point --disable-asm
real	0m53.543s
user	0m53.094s
sys	0m0.188s

./configure
real	1m59.454s
user	1m57.633s
sys	0m0.328s

./configure --disable-asm
real	1m59.256s
user	1m57.461s
sys	0m0.422s

The -Os timings were surprising (but not so surprising that it didn't occur to me to run them): a lot of our inline asm directives are wrapped in inline functions, and I am guesing with -Os gcc is not even trying to inline them, even though that would produce smaller code.
I'll try to get more info. Capturing a full profile would require quite some setup, I'd need to reflash a custom kernel and/or oprofile modules. The Firefox built-in profiler might be able to see what's up if we're losing our CPU power outside Opus, though.

derf, your result confirms that enabling the inline ASM would *slow down* the code if Opus is compiled with -Os, right? I'll check the exact build lines.
Flags: needinfo?(gpascutto)
Modified patch per comments, and modified the build to make sure we compile Opus with -O3 on ARM if optimizations are enabled. (The default is -Os, which derf showed completely kills the assembler code).

The Firefox profiler is currently broken on Android (bug 983669 and dupes) so I can't capture a usable profile for you.

A test with top seems to indicate this drops the CPU usage for my testcase from ~56% to ~52%. So it's not much but it does seem to have some positive effect now instead of none measurable. Combined with derf's test which shows a significant speedup for fixing the compiler flags, I think it's reasonable enough to land.
Attachment #8393534 - Attachment is obsolete: true
Attachment #8395800 - Flags: review?(ted)
Attachment #8395800 - Flags: feedback?(tterribe)
Comment on attachment 8395800 [details] [diff] [review]
Patch 1. Enable ARM Assembler in Opus, enable speed opt

Review of attachment 8395800 [details] [diff] [review]:
-----------------------------------------------------------------

If people use clang with any of our ARM targets, you may want to grab
<http://git.xiph.org/?p=opus.git;a=commitdiff;h=e70faf98b046> from upstream, too.

::: media/libopus/update.sh
@@ +44,3 @@
>   
>  # copy files into the target directory
>  for file in ${STATIC_FILES} ${SRC_FILES} ${HDR_FILES}; do

Should probably copy over arm2gnu.pl, too, so we get upstream updates to it here. I don't think it's listed as a source file in any of the .mk files (if you'd rather upstream add it to a .mk file, I suppose we do things that way, too).
Attachment #8395800 - Flags: feedback?(tterribe) → review+
Comment on attachment 8395800 [details] [diff] [review]
Patch 1. Enable ARM Assembler in Opus, enable speed opt

Review of attachment 8395800 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/libopus/Makefile.in
@@ +21,5 @@
> +celt/arm/armopts-gnu.S: celt/arm/armopts.s
> +	$(PERL) $(srcdir)/celt/arm/arm2gnu.pl < $< > $@
> +# For all others, we can use an implicit rule with the configured $(ASM_SUFFIX).
> +%-gnu.$(ASM_SUFFIX): celt/arm/%.s
> +	$(PERL) $(srcdir)/celt/arm/arm2gnu.pl < $< > $@

If these are just coming from upstream, is there a reason we can't just run the perl script at import time instead of build time? That would be a lot nicer!
>If people use clang with any of our ARM targets

Not that I know. In any case we can update from upstream in a separate patch if needed.

>Should probably copy over arm2gnu.pl

The patch adds it to STATIC_FILES in update.sh, so it already gets copied.

>If these are just coming from upstream, is there a reason we can't just run the perl 
>script at import time instead of build time? That would be a lot nicer!

I have another patch that does this (I was trying it around comment 15 here). But IMHO it's not really any nicer as you have to hardcode a lot of hacking to get that file into shape, including the hack to fix the path to armopts-gnu.S. This approach stays much closer to what upstream does (it has essentially the same rules in its Makefile.in - i.e. upstream does the translation at build time) and is identical to what other libs in /media do.
This is the alternate approach. I gave up when I figured I needed to fix up the include path in the translated file and essentially hardcode that mess in the update script.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #28)
> The patch adds it to STATIC_FILES in update.sh, so it already gets copied.

Yes, you're right. Sloppy reviewing on my part!
Ted, we discussed this a bit on IRC, for example noting that this does *not* add a perl dependency because libtheora works exactly the same way. Can you make a decision (r-/r+) whether this can land or should be reworked to do the translation at import time?
Flags: needinfo?(ted)
Since this isn't any worse than what we have for libtheora I guess we can take it as-is. I'd like to get a followup filed to make these all happen at import time, for the reasons I've stated above.
Flags: needinfo?(ted)
Comment on attachment 8395800 [details] [diff] [review]
Patch 1. Enable ARM Assembler in Opus, enable speed opt

Review of attachment 8395800 [details] [diff] [review]:
-----------------------------------------------------------------

r=me as long as you file a bug to make these happen at import and not at build-time.

::: media/libopus/celt/arm/armopts.s
@@ +20,5 @@
> +   PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
> +   PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> +   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
> +   NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
> +   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

What sort of license header is this? Should this just be an MPL license header?
Attachment #8395800 - Flags: review?(ted) → review+
>What sort of license header is this? Should this just be an MPL license header?

Simplified BSD, just like the rest of Opus. Our license policy says we should keep the license. It's not like we can randomly switch the license on upstream code anyway.
Ah, it wasn't clear that was an upstream file, especially given the "(C) Mozilla" line at the top.
https://hg.mozilla.org/mozilla-central/rev/72661e078741
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
The follow up bug for the build system changes is bug 999607.
Depends on: 1024260
this breaks on arm10tdmi. #1089411
is there a configure flag or another possibility to opt-out again ? (i.e. build C code rather than the buggy asm)
You need to log in before you can comment on or make changes to this bug.