Web Audio performance is exceptionally bad. Very high CPU, slow playback, and choppy/crackling audio running demo at: http://webaudiodemos.appspot.com/MIDIDrums/index.html In comparison, Chrome uses very little CPU and smoothly renders the demo STR: 1. Visit http://webaudiodemos.appspot.com/MIDIDrums/index.html 2. At the bottom right, click the # 4 demo 3. At the bottom left, click the play button beside "Beat" 4. Compare with Chrome (I'm running 30.0.x beta)
Created attachment 813541 [details] profile-webaudio This is cause by HRTF panning. The attached profile can be opened with sysprof on Linux.
Any chance you could post a textual summary of the expensive stuff here? Thanks!
Created attachment 813557 [details] profile-webaudio.png Screenshot of the important part of the profile.
So... kiss_fft? :-)
Thanks, Paul. FFTConvolver::process() is using almost 3 times as much cpu as DelayProcessor::Process(). That's not at all surprising because convolution is much more complicated than the delay. The other thing of note is that the CPU usage continues after stopping the demo (with the stop button next to "Beat"). In Chrome, the usage stops after the reverb fades. I wonder how many disconnected nodes/graphs Gecko might be playing (and Chrome not). I suspect much of the CPU usage here is processing null blocks. Sometimes clicking on "Minimize memory usage" in about:memory temporarily reduces the cpu load a bit. Bug 923301 deals with null blocks for ConvolverNodeEngine::ProduceAudioBlock(). I'll look at doing the same for PannerNodeEngine in bug 898291.
With patches from bug 923301 and bug 898291, things are better with MSG cpu no longer getting out of control running up to 100% until GC, and all CPU now falls back to initial state after the demo finishes. Still some catching up required to get close to Chromium. Still too much difference to explain via sample rate differences, I think. CPU usage percentages for various threads and applications on Linux: Chromium version is 30.0.1599.37. Nightly Patched Chromium MSG 64 - 100 43 25 ConvolverWorker 30 30 19 Total browser 149 133 70 Total browser with web audio tab hidden 127 111 61 Pulseaudio 3 3 8
Has anyone measured Web Audio performance on B2G yet? If not, we should do so now. If Web Audio performance is poor on B2G, this bug would block koi. Thanks.
Karl -- How do the demos on http://webaudiodemos.appspot.com/MIDIDrums/index.html sound with your patches from bug 923301 and bug 898291? Running Aurora (26.0a2) and the latest Nightly on Windows, demos 2, 4 and 5 on the page are filled with static and glitches and the audio starts to slow down after several seconds. I'm using a Windows 7 Core i7 2.4GHz (ThinkPad W520). It also sounds bad on a mid-end Win XP machine. On my MacBook Pro, demo 2 sounds really good for the first minute and then within 2-3 minutes, starts to get slow and crackle-y. Within 5 mins, it is horrible. (My MacBook Pro is older, from 2011.)
Paul & Karl -- I'm assigning this to Paul, but I think this bug will need attention from both of you.
Created attachment 815990 [details] webaudio_demo_2_opt.html Profile running option 2 of the drum demo - inbound pull from ~10/9, opt build, profiled with jprof (just a minute or two of the actual demo, no loading or interaction), samples ever 2ms of cpu time used (note: not 2ms realtime). Top hits are: 54% WebCore::FFTConvolver::process() called by: (36.0%) WebCore::ReverbConvolverStage::process() (18.3%) WebCore::HRTFPanner::pan() Most of the Convolver time is kf_work() 9.6% speex_resampler_process_float 7.3% mozilla::DelayProcessor::Process() 2.7% mozilla::AudioNodeStream::ObtainInputBlock() 2.6% mozilla::BufferComplexMultiply() 2.5% mozilla::AudioBufferAddWithScale() 1.5% malloc() 1.4% WebCore::DirectConvolver::process() 1.2% mozilla::AudioChunk::SetNull() 1.1% nsTArray_base<>::ShiftData()
Note also that on this machine (XEON 3.5GHz Linux) an opt build is glitchless, but a debug build sucks. Overall CPU %'s are similar (more weighted to upper-level functions), but likely it's blowing deadlines. build with --enable-jprof setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.002" (or equivalent bash) start firefox, start demo kill -PROF pid .... kill -USR1 pid ./jprof firefox jprof-log >/tmp/profile.html Docs for jprof and how to read them are in the tree. tools/jprof/README.html
I tried the same demo on my Galaxy S4 and only 1 of the 5 demos was able to run. The other 4 demos showed lights moving very slowly but no sound. I tried several times. Chrome for Android ran all 5 demos, though Chrome struggled with 1 of them at the beginning and then was fine.
(In reply to Maire Reavy [:mreavy] from comment #9) > Karl -- How do the demos on > http://webaudiodemos.appspot.com/MIDIDrums/index.html sound with your > patches from bug 923301 and bug 898291? Those patches were enough to make demo 4 acceptable on a 4-core 1.6 GHz i7 with Linux. There are still sometimes occasional minor glitches, but I've heard these on simple demo's also. Without the patches, demo 4 would often get filled with static and glitches, but would also be OK sometimes, depending on when GC ran. Demo 2 and 5 also seem fine to me. (I didn't check these without patches.) It's a different story on a Windows 7 1.3 GHz Pentium Dual Core. The patches make enough difference that there is now more playing than stuttering, but it is not OK. Some builds if others would like to test: http://email@example.com/ (In reply to Maire Reavy [:mreavy] from comment #7) > Has anyone measured Web Audio performance on B2G yet? I haven't.
Attachment 813557 [details] says that the convolver threads are spending at least 79% of their cycles doing FFT operations. This part is not affected by the patches from bug 923301 and bug 898291. There is only one long-living ConvolverNode in this demo, so GC timing is not relevant for these threads (while the demo is running). When I change the sample rate in Gecko from 48 to 44.1 kHz, to match Chromium, the cpu used on each convolver thread in demo 4 reduces from 30 to 27 %, but Chromium threads are using 19 % of a CPU each. 19 is only 70% of 27. This seems to be evidence that ffmpeg's RDFT, used in Chromium, is faster than kiss_fft. The use of FFTs should be the same in each browser because the code is borrowed from blink. (In reply to Maire Reavy [:mreavy] from comment #13) > Chrome for Android ran all 5 demos, though Chrome > struggled with 1 of them at the beginning and then was fine. I wonder what sample rate Chrome is using on Android. This page will say: http://people.mozilla.org/~karlt/sampleRate.html
Chrome is using 44100Hz here on a Galaxy Nexus. Note that in bug 918861, we are going towards using the device's preferred sample rate (in this case, this means switching from 48kHz to 44.1kHz on Galaxy Nexus, but staying at 48kHz on a Nexus 4, for example). This is what Chrome (at least desktop) does.
Galaxy S4 is 48000 Hz on Chrome & Chrome Beta
FYI, the jprof results above can also be separated by thread easily (--threads) if that's of help in the future. I agree that the FFT perf is likely the primary issue here, but there also is a significant issue of interaction with GC/CC.
I've tried convolver node on b2g and the performance is really bad, using devices: unagi/buri, other information described in bug 926838 Also try on nexus4 and it works fine.
So, Blink is using OpenMax DL's FFT, and specifically the implementation written by ARM. Their implementation is licensed under the terms of the BSD license, so we should be able to use it in Gecko. I expect this to speed up this test case, and basically anything that uses an FFT a lot. To get the files, go to , create an account, login, and revisit the page. The package contains a NEON assembly optimized FFT implementation, under `sp`. Now we just have to find someone that has time to write the code to be able to switch from KissFFT to OpenMax FFT when NEON is available, and specifically to test on B2G if the performances are better. Note that ARM also provides an ANSI C implementation alongside the assembly implementation, as a fallback. : https://silver.arm.com/browse/OX002
I've done some testing with Firefox 25 beta 7 (build ID: 20131010180222) on: Win 7 64-bit, Ubuntu 12.10 32-bit, Mac OS X 10.8.4 and Win 8.1 32-bit (comparing with Chrome), and here are my results: 1) if I switch tabs while playing a demo, the sound gets jerky until I move the focus again on the tab loading the URL (same behavior on Chrome) 2) the difference in performance between Firefox and Chrome is very noticeable, for all demos 3) on Mac in 64-bit mode: demo #2 and demo #4 seem to be the jerkiest 4) on Mac in 32-bit mode: demos #1, #2, #4 and #5 seem to be jerkier than in 64-bit mode
Given this bug's severity I'm nominating for tracking. Paul do you think this is severe enough to block Web Audio in Firefox 25?
We have discussed off-bugzilla, and the general agreement seem to not make this block 25. We want however to fix this, probably by implementing comment 20. Also, there are two Pauls in this discussion I'm not sure you are talking to me.
(In reply to Paul Adenot (:padenot) from comment #24) > We have discussed off-bugzilla, and the general agreement seem to not make > this block 25. We want however to fix this, probably by implementing comment > 20. Yes, in our off-bugzilla discussion Karl made the point that "The web audio spec points out that two of the effects, convolution and hrtf panning can be too expensive for slower devices. This demo uses both of these effects, and so is particularly demanding. The whole web audio interface should not be judged on the performance of these effects." So we feel we can relnote the problems with these demos for Fx25 while we work to fix them (comment 20). If there are serious performance issues with other demos (that are not known to be very expensive), that could affect whether we ship/don't ship in Fx25. Marc S and AaronMT are looking at other demos for Desktop and Android, and I believe jsmith and pyang are doing the same for FxOS since we're targeting Web Audio for v1.2.
I'd appreciate if people can file the issues that they've mentioned in the comments here as separate bugs to make sure they are not lost.
FWIW - the dupe pyang provided provides a simple test case for reproduction, so that might help diagnose this bug quicker if you make use of the simpler test case. http://zapion.github.io/webaudio/convolver.html
Considering comment 26 and comment 21 > 1) if I switch tabs while playing a demo, the sound gets jerky until I move > the focus again on the tab loading the URL (same behavior on Chrome) For this I've logged bug 927374 > 3) on Mac in 64-bit mode: demo #2 and demo #4 seem to be the jerkiest For this issue I've logged bug 927379 > 4) on Mac in 32-bit mode: demos #1, #2, #4 and #5 seem to be jerkier than in > 64-bit mode For this I've logged bug 927382
We'll consider an uplift if/when fixed.
I believe this bug, like bug 926838, will be resolved when we replace our FFT. Bug 926838 is where the FFT replacement is being discussed; so I'll ask for status there.
With all the dependent bugs landed, including bug 926838, the performance is still quite bad. Performance ranges from 3 to 4 times worse the Chrome. Let the demo run long enough and it occasionally stutters and crackles. My CPU is an Athlon II FX4 @ 2.8 GHz.
That should be Athlon II X4 -- in case anyone was paying attention. :-)
(In reply to IU from comment #33) > That should be Athlon II X4 -- in case anyone was paying attention. :-) Thanks for the info. There is a push right now on improving WebAudio performance so keep the feedback coming.
Hi Paul, Karl -- Can you take a look at the demo using our latest code & improvements and assess where the remaining perf problems are for this? (Ref: comment 32) We've landed a number of perf improvements; so we should investigate asap why this demo is still having issues. Thanks.
(In reply to IU from comment #32) > With all the dependent bugs landed, including bug 926838, the performance is > still quite bad. Performance ranges from 3 to 4 times worse the Chrome. > > Let the demo run long enough and it occasionally stutters and crackles. > > My CPU is an Athlon II FX4 @ 2.8 GHz. This CPU is x86, we have just landed the optimizations for ARM. Bug 1157768 is about optimizing for x86, qDot recently picked it up.
Bug 974089 was a significant part of the issue here when I last looked. I'm keen to look into that again, but have some things to finish off this week.
(In reply to Paul Adenot (:padenot) from comment #36) > This CPU is x86, we have just landed the optimizations for ARM. Bug 1157768 > is about optimizing for x86, qDot recently picked it up. Performance still sucks with Bug 1157768 landed. Any other ideas?
It's not in nightly yet.
Are you sure? I have nightly with cset 38d03bf4616e.
Yeah maybe. In any case, we need the followup to bug 974089 (and I don't know if it's been filed) to really make this great.
Hey Paul -- just a reminder that you wanted to close this and open up new bugs for the smaller issues that still remain.
I'm just going to close this one, new bugs have been created to handle the issues this report found (1189562 in particular), and Karl is working on it. This bug is not particularly useful anymore. Thanks for reporting the issues, and make sure to follow 1189562 to stay informed.