Closed Bug 1042508 Opened 10 years ago Closed 10 years ago

Enable speex resampler neon optimizations where available

Categories

(Core :: Web Audio, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: karlt, Assigned: karlt)

References

Details

Attachments

(5 files, 1 obsolete file)

These optimizations have only been implemented upstream for the direct resamplers.
When built with FIXED_POINT, speex_resampler_process_float() assumes the samples are in the range ±0x7fff and so Gecko instead uses ConvertAudioSamples and speex_resampler_process_int(). This decision is based on MOZ_SAMPLE_TYPE_S16.
This ensures that code in resample.c will run on Intel x86 cpus even when SSE support has been compiled, and will provide similarly for neon support when enabled.
Attachment #8460754 - Flags: review?(paul)
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm says asm [volatile] ( AssemblerTemplate : [OutputOperands] [ : [InputOperands] [ : [Clobbers] ] ] ) which implies that Clobbers is optional even after the third colon, but the gcc used for b2g_try_emulator_dep builds says resample_neon.c: In function 'saturate_32bit_to_16bit': resample_neon.c:50: error: expected string literal before ')' token
Attachment #8460755 - Flags: review?(paul)
Attachment #8460757 - Flags: review?(paul)
Note that we miss out on the WORD2INT saturation optimization, but that is only used during sinct table initialization. The SIMD optimization is for O(filt_len) operations per processed sample.
Attachment #8460752 - Flags: review?(paul)
Comment on attachment 8460754 [details] [diff] [review] move resampler simd optimizations to separate translation units It seems mercurial has lost the file renames somewhere.
Attachment #8460754 - Flags: review?(paul)
Attachment #8460751 - Flags: review?(paul) → review+
Attachment #8460752 - Flags: review?(paul) → review+
Attachment #8460755 - Flags: review?(paul) → review+
Attachment #8460757 - Flags: review?(paul) → review+
Attachment #8460798 - Flags: review?(paul) → review+
[Blocking Requested - why for this release]: This reduces the CPU usage of a WebRTC call by 5%-8%. Given how resource intensive WebRTC is, we could really use this enhancement on v2.0.
blocking-b2g: --- → 2.0?
(In reply to Diego Wilson [:diego] from comment #11) > [Blocking Requested - why for this release]: This reduces the CPU usage of a > WebRTC call by 5%-8%. Given how resource intensive WebRTC is, we could > really use this enhancement on v2.0. Hey diego, this might help here but given the timeline for 2.0 and the baketime this has had, IMHO this looks risky to uplift at this point. I am also neeing Karl to give his input on risk profile here.
Flags: needinfo?(karlt)
(In reply to bhavana bajaj [:bajaj] from comment #12) > Karl to give his input on risk profile here. I don't know neon assembler myself, so I don't know the potential for edge case issues such as incorrect saturation. Things will sound bad anyway if content causes the corner cases are being reached. The common case has automated tests, which would pick up major bugs but maybe not smaller changes in quality. IIRC VLC has been using similar code for a long time. Porting this requires the SATURATE32PSHR changes from bug 1042504 and some of the refactoring of conditionals there. That's probably more costly in terms of developer work than risk. The greatest risk is probably generating and running the wrong code on platforms that won't run it. Such effects would be fatal, but therefore easier to detect.
Flags: needinfo?(karlt)
(In reply to Karl Tomlinson (:karlt) from comment #13) > (In reply to bhavana bajaj [:bajaj] from comment #12) > > Karl to give his input on risk profile here. > > I don't know neon assembler myself, so I don't know the potential for edge > case issues such as incorrect saturation. Things will sound bad anyway if > content causes the corner cases are being reached. The common case has > automated tests, which would pick up major bugs but maybe not smaller changes > in quality. IIRC VLC has been using similar code for a long time. > > Porting this requires the SATURATE32PSHR changes from bug 1042504 and some of > the refactoring of conditionals there. That's probably more costly in terms > of developer work than risk. > > The greatest risk is probably generating and running the wrong code on > platforms that won't run it. Such effects would be fatal, but therefore > easier to detect. Thanks karl. I've looped in :jsmith from QA to help get additional testing here. He'll work through some of the follow0up questions he may have.
blocking-b2g: 2.0? → 2.0+
Needs rebasing for b2g32 uplift.
Flags: needinfo?(karlt)
Karl isn't available to work on this. Further, uplifting is dead time in that the fixes will be available in the next release. Eric - is there someone available in Taipei to uplift the neon optimisations?
Flags: needinfo?(karlt) → needinfo?(echou)
Hi Anthony, (In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #16) > Karl isn't available to work on this. Further, uplifting is dead time in > that the fixes will be available in the next release. Eric - is there > someone available in Taipei to uplift the neon optimisations? Star is working on another potential 2.0 issue and Bruce are still on 2.1 feature. Blake might be available, but do you think it's ok for us to do the uplift even if we don't really know the logic behind the scene? :| Could Paul help out with uplifting?
Flags: needinfo?(echou) → needinfo?(steve)
Oops, sorry for the wrong needinfo.
Flags: needinfo?(steve)
This is hard to uplift as is. We have two options: (1) We can try to uplift the minimum of code needed, but that means writing new code. (2) We can uplift all the exact patches that landed on central in the exact same order (bugs 1033140, 1042504, 1042508, and backout and reapply 995075) I think (1) is riskier and longer to do than (2), but will eventually result in a smaller uplift (which should not be a criteria). (2) is, considering the patches have baked on central for some time, now, what I'm leaning towards. Ideally, when it's done, we would have the same code in b2g31 and central, so we benefit from the QA and test runs done there. That said, have we profiled the code? Are the optimizations really worth the uplift?
There isn't anyone available to work on this. RyanVM could try uplifting the chain of patches that Paul suggested. Otherwise we should cancel the uplift.
Flags: needinfo?(ryanvm)
I'm happy to do so, but it sounds like maybe we should get some perf measurements on 2.1 first to make sure it's worth the effort?
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #21) > I'm happy to do so, but it sounds like maybe we should get some perf > measurements on 2.1 first to make sure it's worth the effort? That's exactly what I think we need. Diego -- Can you provide some before and after perf numbers for these sets of patches?
Flags: needinfo?(dwilson)
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22) > That's exactly what I think we need. Diego -- Can you provide some before > and after perf numbers for these sets of patches? We need to know if we're near 100%.
Keywords: qawanted
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #23) > (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22) > > That's exactly what I think we need. Diego -- Can you provide some before > > and after perf numbers for these sets of patches? > > We need to know if we're near 100%. IIRC the CPU usage is near 95% for WebRTC on FFOS v2.0 8x10 Jay, can you share our perf improvements numbers for WebRTC?
Flags: needinfo?(dwilson) → needinfo?(jaywang)
Removing qawanted - jsmith recommends adding mlee to this bug.
Keywords: qawanted
As part of Li Gong's reorganization the fxOS Performance team has been dissolved. Performance issues will now be directly handled by component teams.
Flags: needinfo?(jaywang)
(In reply to Diego Wilson [:diego] from comment #24) > (In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #23) > > (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22) > > > That's exactly what I think we need. Diego -- Can you provide some before > > > and after perf numbers for these sets of patches? > > > > We need to know if we're near 100%. > > IIRC the CPU usage is near 95% for WebRTC on FFOS v2.0 8x10 > > Jay, can you share our perf improvements numbers for WebRTC? Without this patch, we see that average CPU % is around 97.3% and encoder FPS is around 25-28FPS. With the patch, the average CPU% is around 95.3% and the encodoer FPS goes up to 28-30FPS range. This is measured with end-to-end webRTC use-case. With the local WebRTC loopback, both cases gives about 30FPS, but the neon resempler helps to reduce CPU utilization by 5%.
(In reply to Mike Lee [:mlee] from comment #26) > As part of Li Gong's reorganization the fxOS Performance team has been > dissolved. Performance issues will now be directly handled by component > teams. Lets hope the patches apply cleanly then.
https://tbpl.mozilla.org/?tree=Try&rev=6e3c7a993a6e I'll go ahead and requested 2.0+ on the other bugs.
Try is green, just waiting on the approvals.
Ryan - thanks for uplifting us out of a pickle!
QA Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: