Closed
Bug 1000044
Opened 10 years ago
Closed 10 years ago
audio resampling consumes about 5% of cpu when gUM with audio
Categories
(Core :: Web Audio, defect)
Tracking
()
People
(Reporter: slee, Assigned: karlt)
References
Details
(Keywords: perf, Whiteboard: [priority][webrtc_uplift])
Attachments
(3 files)
Go to http://mozilla.github.io/webrtc-landing/gum_test.html and press "Audio". The cpu usage spends about 5% on resampling on peak. And total cpu usage of browser is about 22%.
Comment 1•10 years ago
|
||
If this is a concern for you, you can try lowering the quality of the resampler [1]. [1]:L http://dxr.mozilla.org/mozilla-central/source/content/media/MediaStreamGraph.cpp?from=MediaStreamGraph.cpp#2295
Assignee | ||
Comment 2•10 years ago
|
||
The profile shows the interpolating resampler is in use. I wonder whether this is resampling from 16 to 44.1 kHz. The denominator of the ratio there would be just outside the range at which the direct resampler would be used. http://people.mozilla.org/~karlt/sampleRate.html tells you the output sample rate. If inputs are likely to be at 16 kHz, then it would be more efficient to resample to 48 kHz. You can experiment by changing AudioStream::InitPreferredSampleRate() to return 48000. Another option to experiment with may be to modify the test at http://hg.mozilla.org/mozilla-central/annotate/8e797a1a4d65/media/libspeex_resampler/src/resample.c#l614 so that the direct resampler is used even at the cost of more memory. A test such as (den_rate * filt_len <= oversample * filt_len + 8 || den_rate * filt_len <= 28224) would mean "I want to use the direct resampler even if it uses more memory up to 55.125 kB (on systems with 16 bit samples)". If I calculate correctly, that should be just enough to use the direct resampler when resampling from 16 to 44.1. Whether that is actually faster depends on cache size, I assume.
Assignee | ||
Comment 3•10 years ago
|
||
When adaptive rates are added for handling clock drift, we may prefer the interpolating resampler because it is cheaper to set up for new rates. For rates that don't lead to small denominators, that will be the only option. It depends how often the rate is likely to vary from the expected rate.
Reporter | ||
Comment 4•10 years ago
|
||
I tried 1. set init flag to SPEEX_RESAMPLER_QUALITY_MIN and SPEEX_RESAMPLER_QUALITY_VOIP 2. Force AudioStream::InitPreferredSampleRate() to return 48000. I used gunshup to test. Because I want to compare the CPU usage with the old version which does the resampling in GIPS. Here is the test settings. * codec - g711 * aec -> pref("media.getusermedia.aec", 4); * Old version ** browser -> 24% * New version ** without any modification *** browser -> 37% ** SPEEX_RESAMPLER_QUALITY_MIN *** browser -> 34% ** SPEEX_RESAMPLER_QUALITY_VOIP *** browser -> 39% ** Force AudioStream::InitPreferredSampleRate() to return 48000 *** browser -> 29% [1] http://dxr.mozilla.org/mozilla-central/source/content/media/MediaStreamGraph.cpp?from=MediaStreamGraph.cpp#2295 [2] http://hg.mozilla.org/mozilla-central/annotate/8e797a1a4d65/media/libspeex_resampler/src/resample.c#l614
Reporter | ||
Comment 5•10 years ago
|
||
Comment 6•10 years ago
|
||
Nominating this as a v2.0 blocker and then will remove it as a dependency for Bug 984239 (the user story for getting H.264 functional on FxOS 2.0).
blocking-b2g: --- → 2.0?
Comment 7•10 years ago
|
||
Why is this needed for 2.0 specifically? Is this required for the Loop's MVP support?
Flags: needinfo?(mreavy)
Comment 8•10 years ago
|
||
It's not needed for Loop's MVP. We just really want to improve the perf for 2.0 (gecko 32). What's the proper may to mark this?
Flags: needinfo?(mreavy)
Comment 9•10 years ago
|
||
(In reply to Maire Reavy [:mreavy] (Please needinfo me) from comment #8) > It's not needed for Loop's MVP. We just really want to improve the perf for > 2.0 (gecko 32). What's the proper may to mark this? I'd mark then as backlog in the blocking-b2g flag with '[priority]' in the whiteboard to indicate that this is important to get resolved. I wouldn't block the release on it mainly because we only block on issues that must be fixed in Gecko 32 no matter what (true stop ship).
blocking-b2g: 2.0? → backlog
Whiteboard: [priority]
Assignee | ||
Comment 10•10 years ago
|
||
Some neon optimizations landed upstream, which would reduce resampling time by one third (on supporting systems). http://git.xiph.org/?p=speexdsp.git;a=commitdiff;h=0e5d424fdba2fd1c132428da38add0c0845b4178
Comment 11•10 years ago
|
||
Is there an eta for getting those speex changes into Mozilla?
Flags: needinfo?(jmvalin)
Flags: needinfo?(giles)
Comment 12•10 years ago
|
||
AFAIK, Tristan's working on this, not sure what the eta is, but I'm sure getting more testing can speed things up. There's also some changes to use the "direct" method (instead of interpolation) that should help as well. I'm not sure if they're in the Mozilla tree. Also, are you guys compiling this code as fixed-point, and what complexity are you using. Each of these issues (direct resampler patch, fixed-point, complexity) has roughly the same performance issue as the Neon code.
Flags: needinfo?(jmvalin)
Assignee | ||
Comment 13•10 years ago
|
||
Yes, this is compiled as fixed point on mobile. --enable-resample-full-sinc-table won't be suitable because there could be rate changes, but a change to the HUGEMEM heuristic may be an option (comment 2 and 3). SPEEX_RESAMPLER_QUALITY_DEFAULT is used but comment 4 indicated little gain from changing that. (I don't know whether this is "complexity".)
Comment 14•10 years ago
|
||
Karl is driving this, I think. If there's stuff upstream I'm happy to update the in-tree version if that's helpful. Just file a bug and assign it to me. Are we ready to day that, Karl?
Flags: needinfo?(giles)
Assignee | ||
Comment 15•10 years ago
|
||
I started getting some things ready for an update in bug 1033122 and bug 1033140. See also these changes for updating to bbe7e099, which doesn't include neon optimizations, and are missing the hugemem patches from opus-tools. https://hg.mozilla.org/try/rev/7827546c45a4 https://hg.mozilla.org/try/rev/be59b95af8d2 I'm taking some vacation, and so it'll be about 2 weeks before I get back to that. There may also be a little work enabling dynamic see detection.
Assignee | ||
Comment 16•10 years ago
|
||
The upstream neon optimizations have only been implemented for the direct resampler, so we'd either need to switch to the direct resampler, to benefit from the optimizations, or write some optimizations for the interpolating resampler. At 16 -> 44.1 using the direct resampler means about 441/8 ≅ 55 times as much memory/initialization as interpolating resampler at default quality. That is only 55.125 kB with 16 bit samples, but still 55 times as much initialization work. The current hugemem patch uses the direct resampler with up to 32 times as much memory/initialization at default quality. One option to consider might be to make this large memory choice only for initializing a new resampler, not for rate changes, but there would be a benefit even with rate changes if they are not too frequent. It would be possible to refactor things up bit to do a lazy sinc table initialization, but that would only be of benefit if processing chunks of less than 10 ms at each rate. I expect we need to process somewhat more than 10 ms at 16 -> 44.1 to benefit from the direct resampler.
Assignee | ||
Comment 17•10 years ago
|
||
Given we are not doing much in the way of rate changes ATM, I suggest we start with this and we can look at other improvements in the future if/when rate changes become a problem. This also changes the HUGEMEM test to be independent of quality. |oversample| has a maximum value of 32, so we know that the interpolating resampler will use less memory when st->den_rate > 441.
Updated•10 years ago
|
Whiteboard: [priority] → [priority][webrtc_uplift]
Comment 18•10 years ago
|
||
Comment on attachment 8461197 [details] [diff] [review] use direct resampler for 16->44.1k Review of attachment 8461197 [details] [diff] [review]: ----------------------------------------------------------------- This looks like the diff of a diff. I'm having a hard time making any sense of it.
Comment 19•10 years ago
|
||
(In reply to Jean-Marc Valin (:jmspeex) from comment #18) > Comment on attachment 8461197 [details] [diff] [review] > use direct resampler for 16->44.1k > > Review of attachment 8461197 [details] [diff] [review]: > ----------------------------------------------------------------- > > This looks like the diff of a diff. I'm having a hard time making any sense > of it. That is just an addition to the patchset that gets applied on top of uplifts of new Opus source; it's just updating that patchset to include the change to the source file so we won't lose the change on the next uplift of Opus source.
Assignee | ||
Comment 20•10 years ago
|
||
Yes, the changes to resample.c are the real changes. The changes to hugemem.patch just the same changes to a patch applied on uplifts, from speexdsp now.
Comment hidden (obsolete) |
Assignee | ||
Comment 22•10 years ago
|
||
On further thought, I don't think there is much to gain from the more complex heuristic in comment 21. It avoids the direct resampler when it is particularly costly, but in those situations (large down-sampling ratios) the processing is very costly and there is more potential to gain from the direct resampler. The heuristic in attachment 8461197 [details] [diff] [review] makes more sense I think because it weighs up the relative costs of initialization and processing, which are both proportional to filt_len (in an ideal implementation). If the initialization costs are too high because the down-sampling ratio makes filt_len too high, then the processing costs will be too high anyway. There is a different problem to solve for large-ratio down-sampling and the heuristic of comment 21 would not solve that.
Comment 23•10 years ago
|
||
Comment on attachment 8461197 [details] [diff] [review] use direct resampler for 16->44.1k Review of attachment 8461197 [details] [diff] [review]: ----------------------------------------------------------------- This looks like the diff of a diff. I'm having a hard time making any sense of it.
Attachment #8461197 -
Flags: review?(jmvalin) → review-
Comment 24•10 years ago
|
||
Comment on attachment 8461197 [details] [diff] [review] use direct resampler for 16->44.1k Review of attachment 8461197 [details] [diff] [review]: ----------------------------------------------------------------- re-marking for review per discussion in #media
Attachment #8461197 -
Flags: review- → review?(jmvalin)
Comment 25•10 years ago
|
||
Comment on attachment 8461197 [details] [diff] [review] use direct resampler for 16->44.1k Review of attachment 8461197 [details] [diff] [review]: ----------------------------------------------------------------- r+ with the following check I believe that the line if (st->den_rate <= 441) can probably have the bounds changed to 160 and still work for the 44.1<->48k case.
Attachment #8461197 -
Flags: review?(jmvalin) → review+
Assignee | ||
Comment 26•10 years ago
|
||
(In reply to Jean-Marc Valin (:jmspeex) from comment #25) Thanks for having a look. Yes, 44k1:48k simplies to 147/160, and so 160 would be sufficient for 44k1<->48k. However, here we are resampling from 16k to 44k1. We no longer have the common factor of 3 and so this simplifies only to 160:441. den_rate derives from out_rate and so we need to compare against 441 to use the direct resampler for 16k->44k1. Does 441 seem reasonable, given that?
Flags: needinfo?(jmvalin)
Assignee | ||
Comment 27•10 years ago
|
||
Memory use of the direct resampler is filt_len * den_rate * word_size. At default quality, when up-sampling, filt_len = 64. On mobile word_size = 2. So for 16k->44k1 on mobile, at default quality, memory use is 64 * 441 * 2 = 55.125 kB Memory use can be higher when down-sampling, if ratios are not nice. I don't think we use higher qualities. 192k->44k1 gives 640:147 and so direct resampler memory use is about 64 * 640 * 2 = 80 kB Both existing and proposed heuristics would select the direct resampler for this situation. There are possible situations that would use significantly more memory with the new approach, but those situations are expected only in demanding Web Audio use. The patch increases the switch point at large down-sampling from 16*(1+8) to 441, which makes the unlikely worst cases worse by a factor of 3.
Comment 28•10 years ago
|
||
If you're fine with the memory use, then go ahead. Also, can the user cause the resampler quality parameter to change. If so, you might want to check with the max memory for quality=10.
Flags: needinfo?(jmvalin)
Comment 29•10 years ago
|
||
The user has no control over the quality. Karl: what's the CPU win of this patch on b2g?
Assignee | ||
Comment 30•10 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #29) > Karl: what's the CPU win of this patch on b2g? I don't have my own measurements, but the direct resampler performs 1/4 as many multiplications as the interpolating resampler. Multiplications are not the only work as there is some copying involved. The time taken by the direct resampler for 96->44.1 has been measured on ARM to be about 1/3 that of the interpolating resampler. That's consistent with the 48k numbers in comment 4, so I think we can expect something similar. On top of that, only the direct resampler has neon optimizations. It can expect a further reduction in resampling time of 1/2, giving 1/6 overall, on neon architectures.
Assignee | ||
Comment 31•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/61147905cbf2 https://tbpl.mozilla.org/?tree=Try&rev=2d8268970c28
Comment 32•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/61147905cbf2
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Updated•9 years ago
|
blocking-b2g: backlog → ---
tracking-b2g:
--- → backlog
You need to log in
before you can comment on or make changes to this bug.
Description
•