Closed Bug 926838 Opened 11 years ago Closed 9 years ago

[B2G] Convolver node run very slow on unagi/buri

Categories

(Core :: Web Audio, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: pyang, Assigned: padenot)

References

Details

(Keywords: perf)

Attachments

(9 files, 11 obsolete files)

919.88 KB, video/3gpp
Details
566.31 KB, application/x-compressed-tar
Details
5.90 MB, text/plain
Details
774.27 KB, patch
Details | Diff | Splinter Review
5.41 KB, patch
Details | Diff | Splinter Review
120.01 KB, patch
Details | Diff | Splinter Review
25.51 KB, patch
jwwang
: review+
Details | Diff | Splinter Review
29.56 KB, patch
Details | Diff | Splinter Review
2.79 KB, patch
gps
: review+
Details | Diff | Splinter Review
Use device:
Unagi/buri

Build info:
Gaia:     959ac2f692d85072ffdc3d16a041b5bf4735ae59
Gecko:    http://hg.mozilla.org/mozilla-central/rev/aa986b6ce882
BuildID   20131010040202
Version   27.0a1

Using a convolver node comparing with other nodes has a significant performance drop.

It does not have this issue on nexus4
STR: 
- Access http://zapion.github.io/webaudio/convolver.html
- Press play
Component: General → Web Audio
Product: Boot2Gecko → Core
Blocks: 923319
Keywords: perf
How bad is the perf hit here? Is it choppy audio? Slow playback?

A video might help clarify the priority of this bug.
Flags: needinfo?(pyang)
This is being tracked in bug 923319.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(pyang)
(In reply to Paul Adenot (:padenot) from comment #3)
> This is being tracked in bug 923319.
> 
> *** This bug has been marked as a duplicate of bug 923319 ***

Are we sure we want this duped against that general bug? That bug seems to be tracking perf at a demo level, where as this reduces down the scope. Ehsan also mentioned he wanted separate bugs filed, rather than globally defining against bug 923319.
Yeah, let's investigate this test case separately.  We can re-dupe after profiling with the patches that Karl has in his queue.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Paul - Can you answer my questions in comment 2? Trying to figure out if this needs to block 1.2 or not.
Flags: needinfo?(pyang)
Attached video convolver_buri.3gp
Attached video compares between nexus4 and buri
Flags: needinfo?(pyang)
Yikes. That sounds terrible on Buri. Is that Nexus 4 testing you are doing Android-based or FxOS based?

Anyways - no way we can ship with chopped up audio that's heard on Buri. Noming for blocking for 1.2.
blocking-b2g: --- → koi?
Nexus4 has been flashed with FxOS.
Status: REOPENED → NEW
blocking-b2g: koi? → ---
(In reply to Paul Yang [: pyang] from comment #9)
> Nexus4 has been flashed with FxOS.

Okay - then what happens when you run this same test case on a Firefox for Android & Chrome for Android device with similar hardware specs to the Buri device?

I'm assuming you didn't mean to set that unnom the blocking flag.
blocking-b2g: --- → koi?
Sorry that's an incident..

Need to find an android phone with similar spec.  Any suggestion?
Just try on android 4.1
test page: http://zapion.github.io/webaudio/convolver.html  work for both firefox and chrome desktop
Firefox for Andorid: chopped as well
google chrome: can not play
If we want this to block 1.2, we need to find someone to import a better FFT, and wire it to the FFTFrame implementation. More details in [1], and even more details now:

I originally said the ARM code is BSD-licensed. This is wrong, [2] gives us more details. 

A plan of action to solve this could be:
- Ask someone knowledgeable (gerv?) to check if we can use Chromium's code (which is "relicensed under BSD") in Gecko. Otherwise, we would have to ask ARM Limited to grant us the use of this code under a compatible license as well, I guess.
- Import the code in Gecko, making sure we are effectively importing the Float32 implementation written by the Google folks. We don't really care about the integer path, for our use case.
- Wire up the new OpenMax DL code to FFTFrame
- Benchmark, to make sure this was not all in vain (or just run the demo again, I think this might be obvious enough)

This will help Firefox for Android and B2G, and logically brings us in the same ballpark in terms of raw audio-crunching performance as Chrome for Android.

Thoughts? Maire, what would be an acceptable deadline to fix this? I'm a bit fuzzy on the timeline and road map and b2g releases code names, sorry.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=923319#c20
[2]: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/openmax_dl/README.chromium&sq=package:chromium&type=cs
Flags: needinfo?(mreavy)
(In reply to Paul Yang [: pyang] from comment #12)
> Just try on android 4.1
> test page: http://zapion.github.io/webaudio/convolver.html  work for both
> firefox and chrome desktop
> Firefox for Andorid: chopped as well
> google chrome: can not play

pyang, I seem to recall that old Chrome for Android versions don't have Web Audio (sorry, I don't know the version that introduced the feature). Could you try to update your Chrome for Android? Otherwise, I seem to recall having run Web Audio demos on Chrome for Android on both my Nexus 4 and Galaxy Nexus, but those phones are so powerful that it might work okay on Firefox despite not having an optimized FFT.
Gerv - can you give an opinion on comment 13 (importing the ARM FFT), and whether we should be able to import this directly or need to have discussions?  We need to know soon so we can decide if we want to try to do this in time for FxOS 1.2.  Thanks
Flags: needinfo?(gerv)
I need an "ARM account" to download the package at:
https://www.google.com/url?q=https://silver.arm.com/download/Software/Graphics/OX000-BU-00010-r1p0-00bet0/OX000-BU-00010-r1p0-00bet0.tgz
- does anyone have a copy they can upload here?

Google has "obtained permission to relicense under the BSD licence" - but it doesn't say which form of the BSD licence. Do we know of anywhere where there's an actual copy of the license text they are using, e.g. at the top of a file?

I don't foresee problems with using this code; we just need to make sure the ducks are in a row.

Gerv
Flags: needinfo?(gerv)
Gerv, see the attached package.

The LICENSE file for those particular files is [1], and the referenced top-level LICENSE file appears to be [2].

[1]: https://code.google.com/p/chromium/codesearch#chromium/src/third_party/openmax_dl/LICENSE&sq=package:chromium&type=cs
[2]: https://code.google.com/p/chromium/codesearch#chromium/src/LICENSE&sq=package:chromium&type=cs
Flags: needinfo?(gerv)
Are we planning to use

    * dl/api/omxtypes.h
    * dl/sp/api/omxSP.h

?

We should only take code from the Chromium repo, not directly from the ARM download. Is that going to be a problem?

Gerv
Flags: needinfo?(gerv)
(In reply to Paul Adenot (:padenot) from comment #13)
> 
> Thoughts? Maire, what would be an acceptable deadline to fix this? I'm a bit
> fuzzy on the timeline and road map and b2g releases code names, sorry.

Regarding the deadline: if this were to become a blocker bug for Web Audio on B2G (i.e. if this becomes koi+) and we want Web Audio to make v1.2, I believe we would need this coded, reviewed, checked in, and uplifted by Oct 25 (approximately 9 days).  Since this is a B2G deadline, I'd want to confirm and see if there were a way to get more time.

Regarding whether this should be a koi+ blocker: because this is using a convolver and convolution is noted in the Web Audio spec as being too expensive for slower devices, I don't believe we should make this bug a blocker -- using similar reasoning to the discussion in bug 923319:  https://bugzilla.mozilla.org/show_bug.cgi?id=923319#c25 -- unless there is a specific app that uses convolution that we must support for the games initiative on B2G.  

I'm needinfo-ing Ehsan and Martin to get this on their radar (if it isn't already) so that they have a chance to weigh in, but I don't believe convolution performance is a requirement for shipping v1 of Web Audio on B2G.
Flags: needinfo?(mreavy)
Flags: needinfo?(mbest)
Flags: needinfo?(ehsan)
(In reply to Gervase Markham [:gerv] from comment #19)
> Are we planning to use
> 
>     * dl/api/omxtypes.h
>     * dl/sp/api/omxSP.h
> 
> ?
> 
> We should only take code from the Chromium repo, not directly from the ARM
> download. Is that going to be a problem?
> 
> Gerv

Those two particular files are from the Khronos Group, they described a normative interface the code from ARM and Google implement, potentially with silicon vendor specific optimizations. I believe this is okay to import: our WebGL code already uses files from Khronos (e.g., [1]), and we can get them from the official Khronos website [2].

[1]: http://mxr.mozilla.org/mozilla-central/source/gfx/angle/include/KHR/khrplatform.h
[2]: http://www.khronos.org/files/openmax/headers/omx_dl_v1/omx_dl_v1_0_2.h.zip, from http://www.khronos.org/openmaxdl
(In reply to Paul Adenot (:padenot) from comment #21)
> > Are we planning to use
> > 
> >     * dl/api/omxtypes.h
> >     * dl/sp/api/omxSP.h
> > 
> > ?

I'm sorry; my message made it look like my two points were connected. They were not. I asked the above question because I wanted to know whether we had to deal with the Khronos-specific licensing text. As you say, we already have Khronos code in our tree and in about:license.

> > We should only take code from the Chromium repo, not directly from the ARM
> > download. Is that going to be a problem?

This question (which applies to all code this bug is considering using from this source) remains outstanding.

Gerv
Fair enough, pulling the blocking nomination then. Is this documented somewhere in dev docs that convolvers are too slow for mobile devices? If it isn't, we should get this documented on MDN somewhere.
blocking-b2g: koi? → ---
Gerv, using code only from the Google repo, and not from the ARM package is in fact exactly what we want to do, as Google did some modifications we need on the original ARM package.

We don't need any file from the ARM package if we can use all the files from the Google repo.
Great. In which case:

* Chromium code is already covered in about:license
* Khronos code is already covered in about:license

but you should add the relevant directory/file names to the block above each of those two licences, in the style of existing paragraphs saying "This licence applies to...".

Gerv
Before we make decisions about the koi-blocking-ness here, let's first confirm that an optimized FFT engine is all we need here.  Can someone please upload a profile of this on b2g on Buri, using the instructions here <https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29>?

Thanks!
Flags: needinfo?(ehsan)
(In reply to Paul Adenot (:padenot) from comment #14)
> (In reply to Paul Yang [: pyang] from comment #12)
> > Just try on android 4.1
> > test page: http://zapion.github.io/webaudio/convolver.html  work for both
> > firefox and chrome desktop
> > Firefox for Andorid: chopped as well
> > google chrome: can not play
> 
> pyang, I seem to recall that old Chrome for Android versions don't have Web
> Audio (sorry, I don't know the version that introduced the feature). Could
> you try to update your Chrome for Android? Otherwise, I seem to recall
> having run Web Audio demos on Chrome for Android on both my Nexus 4 and
> Galaxy Nexus, but those phones are so powerful that it might work okay on
> Firefox despite not having an optimized FFT.

Test with http://www.gsmarena.com/sony_xperia_acro_s-4781.php
Firefox beta: chopped every 2 seconds
Chrome: chopped every 5 seconds
Interesting. Gecko is clearly worse than Blink here, but Blink can't really make this demo work without glitching when not running on a beefy device. It might even not be feasible to do floating point convolution with a big impulse response on a modest device (2 second is on the longish side, for an impulse responses intended for normal reverb). Maybe we could do a little trick and do the convolution on integers with a conversion pass? Of course, we would only go this route after testing the other optimization we are talking about. And after getting a profile.
(In reply to Paul Adenot (:padenot) from comment #28)
> Interesting. Gecko is clearly worse than Blink here, but Blink can't really
> make this demo work without glitching when not running on a beefy device. It
> might even not be feasible to do floating point convolution with a big
> impulse response on a modest device (2 second is on the longish side, for an
> impulse responses intended for normal reverb). Maybe we could do a little
> trick and do the convolution on integers with a conversion pass? Of course,
> we would only go this route after testing the other optimization we are
> talking about. And after getting a profile.

Yes!  Needinfo-ing Paul for a profile here.
Flags: needinfo?(pyang)
Following the instruction here: https://intranet.mozilla.org/TPEPlatform/PerfTool to get the perf data.

It looks like FFTConvolver::process takes up most of the cpu time:
56.7% in MediaStreamGrph thread
21.6% in ConvolverWorker thread
20.9% in ConvolverWorker thread

I just don't understand why there are 2 ConvolverWorker threads.
Btw, <https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29> doesn't generate useful data. It looks like gecko profiler doesn't sample threads other than main thread by default...
(In reply to jwwang from comment #30)
> I just don't understand why there are 2 ConvolverWorker threads.

One for each channel in the impulse response, I expect.
Sounds like jwwang has the perf info we need, so I'm pulling needinfo on pyang.
Flags: needinfo?(pyang)
Probably SPS is using setitimer().  setitimer() used to interrupt any thread running, but in recent linux/glibc versions it follows POSIX semantics more closely, which mandates that signals be delivered to the thread that called setitimer().  gperftools has a bunch of code to determine if timers are shared or per-thread, for example.

Note that realtime ITIMER_REAL itimers may have different semantics with regard to threads than ITIMER_PROF (which is the default in jprof).  I'm not sure about ITIMER_VIRTUAL
(In reply to Randell Jesup [:jesup] from comment #34)
> Probably SPS is using setitimer().  setitimer() used to interrupt any thread
> running, but in recent linux/glibc versions it follows POSIX semantics more
> closely, which mandates that signals be delivered to the thread that called
> setitimer().  gperftools has a bunch of code to determine if timers are
> shared or per-thread, for example.
> 
> Note that realtime ITIMER_REAL itimers may have different semantics with
> regard to threads than ITIMER_PROF (which is the default in jprof).  I'm not
> sure about ITIMER_VIRTUAL

No, SPS doesn't use setitimer, it just sends a SIGPROF to the threads which opt in to SPS (yes, SPS is opt-in per thread.) <http://mxr.mozilla.org/mozilla-central/source/tools/profiler/platform-linux.cc#289>
(In reply to comment #30)
> Created attachment 818886 [details]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=818886&action=edit
> perf_20131018_152406.txt
> 
> Following the instruction here:
> https://intranet.mozilla.org/TPEPlatform/PerfTool to get the perf data.

Also, why is this on private intranet?  jld told me on IRC that he hasn't been maintaining his branches for a while...
I should explain how things got into this state, and this is as good a place as any.

To make the perf_event framework work usefully on b2g, I had to modify GCC to partially support an obsolete standard for stack unwinding.  And then fork a number of repositories to change the build configuration, and add processing tools, and deal with an upstream Android library that won't even build as-is with the compiler change.  And then fork the manifest to point at those, for each device, for each branch.  It was a nontrivial amount of work keeping things updated with upstream changes.

Despite that work, this profiler can't unwind anything that we don't build from C/C++ source code (whether it's proprietary, not under our control, handwritten asm, or JS jitcode), and is completely unaware of JS or any other Gecko-specific info.  (Note that on many devices we can't rebuild the non-Gecko userland ourselves, because the manufacturer has private changes.)  All told, I basically can't get this stuff upstreamed to Mozilla.


As for the Gecko profiler: it's not well documented yet, but you can do:

  ./profile.sh start [process name] [thread name]

And that will sample a specific thread (or, I think, multiple threads if they all match the name).  Currently it's limited to one thread name per process, but you can do something like this:

  ./profile.sh start b2g GeckoMain
  ./profile.sh start MyApp Compositor

and the capture/pull step will merge them into one profile, with the timelines lined up.

In this mode it doesn't turn on native stack unwinding yet; that's bug 926610 and I *think* we're at a point where I can make a convincing argument that it should be on by default.

See also bug 925111, which will expose more of the underlying Gecko profiler functionality and remove some of the restrictions I mentioned.
FYI, retesting in jprof: setitimer with ITIMER_PROF will interrupt any thread (it appears).  Doing it with ITIMER_REALTIME and all the signals seem to go to the main thread.  (Fedora 19 3.11.1)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #35)
> (In reply to Randell Jesup [:jesup] from comment #34)
> > Probably SPS is using setitimer().  setitimer() used to interrupt any thread
> > running, but in recent linux/glibc versions it follows POSIX semantics more
> > closely, which mandates that signals be delivered to the thread that called
> > setitimer().  gperftools has a bunch of code to determine if timers are
> > shared or per-thread, for example.
> > 
> > Note that realtime ITIMER_REAL itimers may have different semantics with
> > regard to threads than ITIMER_PROF (which is the default in jprof).  I'm not
> > sure about ITIMER_VIRTUAL
> 
> No, SPS doesn't use setitimer, it just sends a SIGPROF to the threads which
> opt in to SPS (yes, SPS is opt-in per thread.)
> <http://mxr.mozilla.org/mozilla-central/source/tools/profiler/platform-linux.
> cc#289>

Yeah, that was part of why SPS is often useless to me since large areas of CPU use are 'dark' especially in code that is imported.  It should at least have the option to sample all threads handled by our threadmanager (still wouldn't catch webrtc threads, but would avoid all our internal ones having to opt-in).  If you want to hide certain threads in the UI by default, fine, but capture the data by default unless there's a good reason not to.

Ok, this is off-topic.  Sorry.
Hey guys sorry for the delay.  I wanted to chime in here and say that this is not something that should block release.  Most people just want low latency playback and the current Audio Data API doesn't cut it.  With that said, we should optimize this in the earliest release possible.  Reverb is a pretty appealing feature and, if I'm not mistaken, the ConvolverNode is used to achieve this spacial effect.
Flags: needinfo?(mbest)
(In reply to Martin Best (:mbest) from comment #40)
> Hey guys sorry for the delay.  I wanted to chime in here and say that this
> is not something that should block release.  Most people just want low
> latency playback and the current Audio Data API doesn't cut it.  With that
> said, we should optimize this in the earliest release possible.  Reverb is a
> pretty appealing feature and, if I'm not mistaken, the ConvolverNode is used
> to achieve this spacial effect.

Yep, we're releasing on Tuesday with this unfixed.  :-)
JW -- Do you think you could take ownership of this bug?  Paul (Adenot) has a suggested plan of action for this bug (see Comment 13).  What do you think?
Flags: needinfo?(jwwang)
Sure.
Assignee: nobody → jwwang
Flags: needinfo?(jwwang)
Attached patch openmax dl porting part 1 (obsolete) — Splinter Review
Any other license files to be added for legal issues?
Attachment #828442 - Flags: review?(gerv)
Attached patch openmax dl porting part 2 (obsolete) — Splinter Review
Attachment #828443 - Flags: review?(ehsan)
Attached patch openmax dl porting part 3 (obsolete) — Splinter Review
I wonder how Chrome could compile with the syntax errors.
Attachment #828444 - Flags: review?(ehsan)
Attached patch openmax dl porting part 4 (obsolete) — Splinter Review
Since openmax dl require data to be 32-byte aligned, I create AlignedTArray to ease the job. Also modify callers of FFTBlock to use AlignedTArray.
Attachment #828445 - Flags: review?(ehsan)
Some perf data: kissfft v.s. openmax dl fft

unagi: no improvement, cpu 95% -> 95%
peak: cpu 76% -> 63%
helix: cpu 70% -> 67%
Comment on attachment 828442 [details] [diff] [review]
openmax dl porting part 1

Please make the modifications to about:license (toolkit/content/license.html) requested in comment 25.

Thanks :-)

Gerv
Attachment #828442 - Flags: review?(gerv) → review-
Comment on attachment 828443 [details] [diff] [review]
openmax dl porting part 2

Review of attachment 828443 [details] [diff] [review]:
-----------------------------------------------------------------

gps is a better reviewer for this.
Attachment #828443 - Flags: review?(ehsan) → review?(gps)
Attachment #828444 - Flags: review?(ehsan) → review+
Comment on attachment 828445 [details] [diff] [review]
openmax dl porting part 4

Review of attachment 828445 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webaudio/blink/HRTFKernel.cpp
@@ +52,5 @@
>      , m_sampleRate(sampleRate)
>  {
> +    AlignedTArray<float> buffer(length);
> +    mozilla::PodCopy(buffer.Elements(), impulseResponse, length);
> +    impulseResponse = buffer.Elements();

Please try to avoid this if impulseResponse is already aligned.

::: content/media/webaudio/blink/HRTFPanner.h
@@ +33,5 @@
>  }
>  
>  namespace WebCore {
>  
> +typedef nsTArray<float> AudioFloatArray;

What is this used for?
Attachment #828445 - Flags: review?(ehsan) → review+
Comment on attachment 828443 [details] [diff] [review]
openmax dl porting part 2

Review of attachment 828443 [details] [diff] [review]:
-----------------------------------------------------------------

r- because this will break the build.

::: media/openmax_dl/dl/moz.build
@@ +12,5 @@
> +        'api/armOMX.h',
> +        'api/omxtypes.h',
> +        'api/omxtypes_s.h',
> +    ]
> +    

Nit: whitespace

@@ +33,5 @@
> +        'sp/src/omxSP_FFTInit_C_SC32.c',
> +        'sp/src/omxSP_FFTInit_R_F32.c',
> +        'sp/src/omxSP_FFTInit_R_S16S32.c',
> +        'sp/src/omxSP_FFTInit_R_S32.c',
> +    ]

CSRCS doesn't exist any more. Use SOURCES for everything.

::: media/openmax_dl/moz.build
@@ +5,5 @@
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +DIRS += ['dl']
> +
> +MODULE = 'openmax_dl'

You shouldn't need MODULE here.
Attachment #828443 - Flags: review?(gps) → review-
Comment on attachment 828445 [details] [diff] [review]
openmax dl porting part 4

Review of attachment 828445 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/webaudio/blink/HRTFKernel.cpp
@@ +52,5 @@
>      , m_sampleRate(sampleRate)
>  {
> +    AlignedTArray<float> buffer(length);
> +    mozilla::PodCopy(buffer.Elements(), impulseResponse, length);
> +    impulseResponse = buffer.Elements();

changed to the following:

AlignedTArray<float> buffer;
if (((uintptr_t)impulseResponse & 31) != 0) {
  buffer.SetLength(length);
  mozilla::PodCopy(buffer.Elements(), impulseResponse, length);
  impulseResponse = buffer.Elements();
}

::: content/media/webaudio/blink/HRTFPanner.h
@@ +33,5 @@
>  }
>  
>  namespace WebCore {
>  
> +typedef nsTArray<float> AudioFloatArray;

Before patch, AudioFloatArray is defined in FFTConvolver.h which is included by HRTFPanner.h. After patch, AudioFloatArray is changed to AlignedAudioFloatArray in FFTConvolver.h, we need a typedef in HRTFPanner.h to avoid undefined types.
I thought the Khronos group license was in about:license, but if it's not, you'll need to add it, in alphabetical order, in the same way as the others.

Gerv
Attached patch openmax dl porting part 1 (obsolete) — Splinter Review
Attachment #828442 - Attachment is obsolete: true
Attachment #831257 - Flags: review?(gerv)
Attached patch openmax dl porting part 2 (obsolete) — Splinter Review
Attachment #828443 - Attachment is obsolete: true
Attachment #831258 - Flags: review?(gps)
Attached patch openmax dl porting part 3 (obsolete) — Splinter Review
fix build errors on emulators.
Attachment #828444 - Attachment is obsolete: true
Attachment #831260 - Flags: review+
Attached patch openmax dl porting part 4 (obsolete) — Splinter Review
Attachment #828445 - Attachment is obsolete: true
Attachment #831262 - Flags: review+
I also tried another FFT library: FFTW (http://www.fftw.org/)
Below are the perf data on various devices:

peak perf data, loop=1000, fft size=4096
  forward fft:
    kissfft=0.422585(100%), openmaxdl=0.219870(52.03%), fftw=0.178985(42.35%)
  inverse fft:
    kissfft=0.557688(100%), openmaxdl=0.241888(43.37%), fftw=0.344252(61.73%)
  fwd + inv:
    kissfft=0.980273(100%), openmaxdl=0.461758(47.11%), fftw=0.523237(53.38%)
    fwd fftw + inv openmaxdl = 0.420873(42.93%)

unagi perf data, loop=1000, fft size=4096
  forward fft:
    kissfft=0.589996(100%), openmaxdl=0.289338(49.04%), fftw=0.247986(42.03%)
  inverse fft:
    kissfft=0.758972(100%), openmaxdl=0.314789(41.48%), fftw=0.476989(62.85%)
  fwd + inv:
    kissfft=1.348968(100%), openmaxdl=0.604127(44.78%), fftw=0.724975(53.74%)
    fwd fftw + inv openmaxdl = 0.562775(41.72%)

helix perf data, loop=1000, fft size=4096
  forward fft:
    kissfft=0.491035(100%), openmaxdl=0.254743(51.88%), fftw=0.203626(41.47%)
  inverse fft:
    kissfft=0.626289(100%), openmaxdl=0.269434(43.02%), fftw=0.392426(62.66%)
  fwd + inv:
    kissfft=1.117324(100%), openmaxdl=0.524177(46.91%), fftw=0.596052(53.35%)
    fwd fftw + inv openmaxdl = 0.47306(42.34%)

Use of openmaxdl can save you more than 50% of the FFT time (compared to kissfft) on ARM devices which is quite good. However, the best combination is forward fftw + inverse openmaxdl.
Comment on attachment 831258 [details] [diff] [review]
openmax dl porting part 2

Review of attachment 831258 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/media/Makefile.in
@@ +15,5 @@
>      $(MOZ_HARFBUZZ_LIBS) \
>      $(NULL)
>  
> +ifeq ($(BUILD_ARM_NEON),1)
> +SHARED_LIBRARY_LIBS += $(DEPTH)/media/openmax_dl/dl/$(LIB_PREFIX)openmax_dl.$(LIB_SUFFIX)

This will conflict with bug 935881. If you land after, this is wrong. Check with glandium before landing this patch.

::: media/openmax_dl/dl/moz.build
@@ +35,5 @@
> +        'sp/src/omxSP_FFTInit_R_S16S32.c',
> +        'sp/src/omxSP_FFTInit_R_S32.c',
> +    ]
> +
> +    SOURCES += [

Why separate lists?
Attachment #831258 - Flags: review?(gps) → review+
(In reply to comment #59)
> I also tried another FFT library: FFTW (http://www.fftw.org/)
> Below are the perf data on various devices:
> 
> peak perf data, loop=1000, fft size=4096
>   forward fft:
>     kissfft=0.422585(100%), openmaxdl=0.219870(52.03%), fftw=0.178985(42.35%)
>   inverse fft:
>     kissfft=0.557688(100%), openmaxdl=0.241888(43.37%), fftw=0.344252(61.73%)
>   fwd + inv:
>     kissfft=0.980273(100%), openmaxdl=0.461758(47.11%), fftw=0.523237(53.38%)
>     fwd fftw + inv openmaxdl = 0.420873(42.93%)
> 
> unagi perf data, loop=1000, fft size=4096
>   forward fft:
>     kissfft=0.589996(100%), openmaxdl=0.289338(49.04%), fftw=0.247986(42.03%)
>   inverse fft:
>     kissfft=0.758972(100%), openmaxdl=0.314789(41.48%), fftw=0.476989(62.85%)
>   fwd + inv:
>     kissfft=1.348968(100%), openmaxdl=0.604127(44.78%), fftw=0.724975(53.74%)
>     fwd fftw + inv openmaxdl = 0.562775(41.72%)
> 
> helix perf data, loop=1000, fft size=4096
>   forward fft:
>     kissfft=0.491035(100%), openmaxdl=0.254743(51.88%), fftw=0.203626(41.47%)
>   inverse fft:
>     kissfft=0.626289(100%), openmaxdl=0.269434(43.02%), fftw=0.392426(62.66%)
>   fwd + inv:
>     kissfft=1.117324(100%), openmaxdl=0.524177(46.91%), fftw=0.596052(53.35%)
>     fwd fftw + inv openmaxdl = 0.47306(42.34%)
> 
> Use of openmaxdl can save you more than 50% of the FFT time (compared to
> kissfft) on ARM devices which is quite good. However, the best combination is
> forward fftw + inverse openmaxdl.

Great analysis, can you please file a follow-up bug for that?  Thanks!
Comment on attachment 831258 [details] [diff] [review]
openmax dl porting part 2

Review of attachment 831258 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/media/Makefile.in
@@ +15,5 @@
>      $(MOZ_HARFBUZZ_LIBS) \
>      $(NULL)
>  
> +ifeq ($(BUILD_ARM_NEON),1)
> +SHARED_LIBRARY_LIBS += $(DEPTH)/media/openmax_dl/dl/$(LIB_PREFIX)openmax_dl.$(LIB_SUFFIX)

As discussed with glandium, since Trunk is closed for Bug 937997, I can land the patch first and he will handle conflicts later.

::: media/openmax_dl/dl/moz.build
@@ +35,5 @@
> +        'sp/src/omxSP_FFTInit_R_S16S32.c',
> +        'sp/src/omxSP_FFTInit_R_S32.c',
> +    ]
> +
> +    SOURCES += [

Birds of a feather flock together. :) It seems visually pleasant to separate .c files and .S files in different blocks.
Whiteboard: [webaudio-perf]
Whiteboard: [webaudio-perf]
Priority: -- → P1
Comment on attachment 831257 [details] [diff] [review]
openmax dl porting part 1

Review of attachment 831257 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/openmax_dl/LICENSE
@@ +15,5 @@
> +Copyright © 2005-2008 The Khronos Group Inc. All Rights Reserved. 
> +
> +These materials are protected by copyright laws and contain material 
> +proprietary to the Khronos Group, Inc.  You may use these materials 
> +for implementing Khronos specifications, without altering or removing 

I'm afraid I've messed up here. I don't think this license is an open source licence. This looks like a field-of-use restriction to me.

As :padenot notes, we do have other code in the tree which comes from Khronos which has a standard BSD licence, but this isn't that licence.

Can someone help me understand:

* Are we actually using the files omxtypes.h and omxSP.h?
* I can't find those files in the OpenMax sections of the Khronos website. Can anyone find the current sources of those files at Khronos/
* Where did we get the copies that are now here? Chromium? Where can I find their source code management history?

Thanks, and sorry not to have noticed this sooner,

Gerv
Attachment #831257 - Flags: review?(gerv) → review-
Flags: needinfo?(paul)
Gerv, sorry I could not reply earlier, I somehow completely forgot this bug.

* Yes, I see the license is different. Does that mean we can't use the header files, or does that mean someone has to carefully review whether we can use them?
* They are available on the Khronos website:
  - Go to  http://www.khronos.org/openmaxdl
  - Click on "All OpenMAX DL 1.0.2.h header file" (http://www.khronos.org/files/openmax/headers/omx_dl_v1/omx_dl_v1_0_2.h.zip)
  - Extract the archive, and go to /Work/OpenMAX/ConnectUploads/r1p0_00bet0/New Folder/, to find the two aforementioned files.

  Interestingly enough, the previous version of these headers have a very bizarre header (same location on the website, a couple bulletpoint in the list below): <http://www.khronos.org/files/openmax/headers/omx_dl_v1/omx_dl_v1_0_1.h.zip>
  
> /**
>  * File: omxSP.h
>  * Brief: OpenMAX Signal processing DL library
> *
> * Copyright (c) 2005-2006 The Khronos Group Inc. All Rights Reserved.
> *
> */

* I can't speak for jwwang, but we can find them in their repo (the hash in the URL is their current tip at the time of this post, for these two files). I checked that the copy I checked out using their instructions contains those two files, so I believe I've got the right url:
  - http://git.chromium.org/gitweb/?p=external/webrtc/deps/third_party/openmax.git;a=tree;f=dl/api;h=d44e14c95df405d1f7d755d4ae107e71b3477614;hb=HEAD
  - http://git.chromium.org/gitweb/?p=external/webrtc/deps/third_party/openmax.git;a=blob;f=dl/sp/api/omxSP.h;h=5a7980ad452bcf16168e5d74aab66a28bf5d5f1e;hb=HEAD
Flags: needinfo?(paul) → needinfo?(gerv)
Gerv: would his use in our tree correspond to "implementing Kronos specifications"?  (I have to ask, in case it is permissible.)

padenot/jwwang: what our our options to avoid using these files?  Gerv, any comments?
Flags: needinfo?(paul)
Flags: needinfo?(jwwang)
The code is downloaded from https://code.google.com/p/chromium/codesearch#chromium/src/third_party/openmax_dl/.

In README.chromium, it says
"The original ARM license is unclear, but Google has obtained
permission to relicense this code under a BSD license."

Does that cover omxtypes.h and omxSP.h?
Flags: needinfo?(jwwang)
(In reply to Randell Jesup [:jesup] from comment #65)
> Gerv: would his use in our tree correspond to "implementing Kronos
> specifications"?  (I have to ask, in case it is permissible.)

It may well do, but this restriction means the code is not open source, and by policy we do not include non-open-source code in our products, even if we can obtain a right to use it for Mozilla.

I am happy to try and negotiate an alternative license with Khronos (are we members? who has contacts there?) if there is no simple technical alternative to using these files.

jwwang's question is a good one.

Gerv
Flags: needinfo?(gerv)
I know Jeff Gilbert do stuff with Khronos, maybe he has an idea?

This page tells me we are members: https://www.khronos.org/members/contributors#mozilla-corporation
Flags: needinfo?(paul) → needinfo?(jgilbert)
(In reply to Paul Adenot (:padenot) from comment #68)
> I know Jeff Gilbert do stuff with Khronos, maybe he has an idea?
> 
> This page tells me we are members:
> https://www.khronos.org/members/contributors#mozilla-corporation

Yep, we're members. When in doubt, we ask Neil Trevett. I can link you guys via email.
Flags: needinfo?(jgilbert)
Hi gerv,
Is there any update about the license issue?
If we can't get around it, we may like to try FFTW which is free. (https://bugzilla.mozilla.org/show_bug.cgi?id=926838#c61)
Flags: needinfo?(gerv)
(In reply to JW Wang[:jwwang] from comment #70)
> Hi gerv,
> Is there any update about the license issue?
> If we can't get around it, we may like to try FFTW which is free.
> (https://bugzilla.mozilla.org/show_bug.cgi?id=926838#c61)

I've contacted the Khronos person yesterday morning (Paris time). He told us that he can change the header for us. I told him the specific files we need changed, and he is hopefully going to get back to us soon.

As far as I know, we can't use GPL code in Gecko, and FFTW is GPL.
There is an update; I'm travelling today and will be with you tomorrow.

Gerv
Flags: needinfo?(gerv)
:padenot gives the current situation. Khronos have agreed to change the headers on the files, so we should continue with the assumption that they will, and make sure it gets done in our local copy. padenot: can you track that action?

(And yes, no GPLed code in Gecko.)

Gerv
Yes, I'll make sure it's done right.
Paul -- Where do we stand with updating the files, etc?  I'd love to make the switch over to a better FFT this coming week -- right after Nightly becomes Fx30.  Do you think that's possible?
Flags: needinfo?(paul)
The only thing we can do is wait for the Khronos person to update their header files so we can use them. I'll ping him today.

In the meantime, we could probably check if our patches still apply cleanly and work fine so we can land this as soon as the header files are updated.
Flags: needinfo?(paul)
There will be some merging required with changes in bug 956604.
I think removing the "Scale inverse FFT result by 1/N" code from the omx inverse routines would be best, but, if you prefer to leave that untouched, then I guess PadAndMakeScaledDFT and GetInverse can have omx-specific code.

The other thing to consider is ffmpeg's rdft files, as used in Chromium on platforms other than Mac and Android IIUC.  There is a neon version in ffmpeg.
Bug 923319 comment 15 observes that the (non-neon) ffmpeg code is faster than kissfft on Linux x86_64.
The ffmpeg code is LGPL 2.1 or later.
Any update on this?
Flags: needinfo?(karlt)
I'm not currently working on Web Audio, sorry.
Flags: needinfo?(karlt)
(In reply to Mark Finkle (:mfinkle) from comment #80)
> Any update on this?

Let's try :padenot...
Flags: needinfo?(padenot)
Gerv, Paul -- Where are we with Khronos and being ready to land as soon as the header files are updated?  

Paul -- I'd like to reassign this to either you or baku (someone who actually is working on web audio and ready to land once we have clearance).  Do you have a preference?
Flags: needinfo?(gerv)
(In reply to Jeff Gilbert [:jgilbert] from comment #82)
> (In reply to Mark Finkle (:mfinkle) from comment #80)
> > Any update on this?
> 
> Let's try :padenot...

No update on my side.
Flags: needinfo?(padenot)
My understanding is that baku and qDot are both interested in helping with WebAudio.  I'm adding JST to the bug to help connect dots on that.
Just carrying over my reply from another bug since this bug has a slightly different audience:

Gerv is actively working the licensing issues we need resolved on a separate bug.

Currently padenot and baku are actively working on web audio. I talked with Anthony (k17e), and a couple of NZ folks who already know web audio (e.g. karlt) will likely be freeing up soon to help me, padenot and baku now that MSE/EME are getting ready to ship.   I should have a better idea of who from NZ in a couple of weeks.

If there is anyone else available in Q2 to help with web audio, please have them reach out to me directly.  We're happy to have the help.

FYI: Padenot has been marking web audio bugs with the keyword "perf" to call out the bugs that will improve our web audio performance. We're starting to attack those.  In addition, we'll be adding support for audio workers in Q2.
Maire: I'm still chasing Khronos about relicensing the file we've been discussing. (As noted, if there are any more, let me know ASAP!)

Gerv
Flags: needinfo?(gerv)
Reassigning to Paul since JW is not currently working on Web Audio.  Once we resolve the relicensing issue (see Comment 87 -- thanks for your help, Gerv!), we'll land this ASAP. This is part of the web audio improvements targeted for landing in Q2.
Assignee: jwwang → padenot
Good news: got draft updated license text from Khronos today, and it's acceptable :-) So seeing light at the end of the tunnel.

Gerv
Excellent, Gerv.  Thanks!  Sounds like I will soon owe you a beer (or beverage of your choice) at the next work week.
FYI (and more good news): The remaining issues with the license text have been resolved, and we're clear to land the code. (Thanks again, Gerv!) Paul is preparing the code to land now.
(rebased)
Attachment #831257 - Attachment is obsolete: true
Attachment #831258 - Attachment is obsolete: true
Attachment #831260 - Attachment is obsolete: true
Attachment #831262 - Attachment is obsolete: true
Trying to make all this work again, I found that one of the header file we need is a modified version of the header file present on the Khronos website. Modifications have been done by Google.

Google's patches on those files seem to be under a 3-clause BSD [0], here are the relevant bits:
- "License: BSD"
- "The code was modified to work with gcc and a new implementation for a
floating-point FFT was added.". This references both Khronos' header and the implementation from ARM that is under BSD

                                    +---------------------+---+
                              +---> |Google's code we need|BSD|
+---+---------------------+   |     +--^------------------+---+
|BSD|Khronos' new headers |   |        | patch
+---+-----------------+---+   |     +---------------------+---+
                       ^      |     | ARM implementation  |BSD|
                       |     i|     +---------------------+---+
                      p|     m|
                      a|     p|
                      t|     o|
                      c|     r|
                      h|     t|
                       |      |
                   +---+------+---------+-------------+
                   | Khronos' base files|Weird license|
                   +--------------------+-------------+


Would:
- taking Khronos' new header file with the new license header
- applying Google's code modification on this file (modifications seem to be under BSD)
- putting this header in the tree, along with the rest of the implementation from ARM licensed under BSD

be a solution that complies with Khronos' licensing? Khronos' file is now under a compatible license, Google's patch is BSD, so I think that would work, but I'd rather ask.

[0]: https://code.google.com/p/webrtc/source/browse/deps/third_party/openmax/README.chromium?spec=svn3624&r=3624
Flags: needinfo?(gerv)
:padenot: that would work. If you have a couple of days to wait, though, let me ask Google if they will consent to just sticking their changes under the Khronos license, and simplifying the whole thing.

Gerv
Flags: needinfo?(gerv)
We've waited so long that I can wait a bit more at this point :-). Thanks again!
The build system has changed enough since JW wrote those patches that those
changes are necessary to build.

This also includes the modification Google did to the headers, necessary to
compile.

I did a tryrun [0], it's passing on some platforms (and it works nicely on my
flame), but Android 4.0 11+ is crashing badly. JW, did you have a green try run
last time, in which case I might have introduced an error when rebasing?

[0]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6e9ad2d39356
(see comment 99)
Flags: needinfo?(jwwang)
Sorry, I couldn't remember the detail. I guess I only do perf tests on some B2G devices as indicated by comment 48.
Flags: needinfo?(jwwang)
After talking a bit with snorp and disassembling a bit of code, I think it's a violation of alignment constraints somewhere. I've pushed a patch that asserts pointers are aligned right before calling into the assembly functions, we'll see.
Attached patch debug patch to check alignment (obsolete) — Splinter Review
This checks that we don't call the assembly function with unaligned buffers, and
it still crashes.
Blocks: 1158741
FFTBlock got bitrotten a bit, care to have a look?
Attachment #8597911 - Flags: review?(jwwang)
Attachment #8593961 - Attachment is obsolete: true
Gerv, this imports back Google's code modifications on top of the new Khronos'
patches, does that look right?
Attachment #8597914 - Flags: review?(gerv)
The way of doing things changed a bit since last time this got reviewed, so I
guess we need a re-review here: last time was in november 2013.
Attachment #8597919 - Flags: review?(gps)
Attachment #8596571 - Attachment is obsolete: true
Attachment #8595433 - Attachment is obsolete: true
With all this, it's green. The issue was that an array was not aligned properly.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb9b8cedfe1d
Attachment #8597919 - Flags: review?(gps) → review+
Comment on attachment 8597911 [details] [diff] [review]
[Part 4] Implement AlignedTArray for 32-byte alignment is required by openmax dl. Also modify callers

Review of attachment 8597911 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webaudio/AnalyserNode.cpp
@@ +255,4 @@
>    if (mWriteIndex == 0) {
>      inputBuffer = mBuffer.Elements();
>    } else {
> +    tmpBuffer.SetLength(FftSize());

Since this is a fallible array, we should check if SetLength() succeeds or not.

@@ +298,5 @@
>  AnalyserNode::AllocateBuffer()
>  {
>    bool result = true;
>    if (mBuffer.Length() != FftSize()) {
> +    mBuffer.SetLength(FftSize());

Ditto.

@@ +305,2 @@
>  
> +    mOutputBuffer.SetLength(FrequencyBinCount());

Ditto.
Attachment #8597911 - Flags: review?(jwwang) → review+
Comment on attachment 8597914 [details] [diff] [review]
Re-import Google's modifications. r=

Not sure what I'm supposed to be reviewing here; there are no bits of license text in this patch. If you do what I said in the comment above, I'm sure it'll be fine.

Gerv
Attachment #8597914 - Flags: review?(gerv)
Thanks, Gerv and everyone.  Looks like we're clear to land.  It's a bank holiday in France, so I expect we'll get this landed early next week.
sorry had to back this out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=9568754&repo=mozilla-inbound
Flags: needinfo?(padenot)
hrm, I flipped the test that checked whether the allocation succeded.

New try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c61f247abe11
Flags: needinfo?(padenot)
You need to log in before you can comment on or make changes to this bug.