Closed
Bug 1042508
Opened 10 years ago
Closed 10 years ago
Enable speex resampler neon optimizations where available
Categories
(Core :: Web Audio, defect)
Tracking
()
People
(Reporter: karlt, Assigned: karlt)
References
Details
Attachments
(5 files, 1 obsolete file)
7.35 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
1.53 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
3.21 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
1.01 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
26.69 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
These optimizations have only been implemented upstream for the direct resamplers.
Assignee | ||
Comment 1•10 years ago
|
||
Attachment #8460751 -
Flags: review?(paul)
Assignee | ||
Comment 2•10 years ago
|
||
When built with FIXED_POINT, speex_resampler_process_float() assumes the
samples are in the range ±0x7fff and so Gecko instead uses ConvertAudioSamples
and speex_resampler_process_int(). This decision is based on
MOZ_SAMPLE_TYPE_S16.
Assignee | ||
Comment 3•10 years ago
|
||
This ensures that code in resample.c will run on Intel x86 cpus even when SSE
support has been compiled, and will provide similarly for neon support when
enabled.
Attachment #8460754 -
Flags: review?(paul)
Assignee | ||
Comment 4•10 years ago
|
||
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm says
asm [volatile] ( AssemblerTemplate : [OutputOperands] [ : [InputOperands] [ : [Clobbers] ] ] )
which implies that Clobbers is optional even after the third colon, but
the gcc used for b2g_try_emulator_dep builds says
resample_neon.c: In function 'saturate_32bit_to_16bit':
resample_neon.c:50: error: expected string literal before ')' token
Attachment #8460755 -
Flags: review?(paul)
Assignee | ||
Comment 5•10 years ago
|
||
Attachment #8460757 -
Flags: review?(paul)
Assignee | ||
Comment 6•10 years ago
|
||
Note that we miss out on the WORD2INT saturation optimization, but that is only used during sinct table initialization. The SIMD optimization is for O(filt_len) operations per processed sample.
Assignee | ||
Updated•10 years ago
|
Attachment #8460752 -
Flags: review?(paul)
Assignee | ||
Comment 7•10 years ago
|
||
Comment on attachment 8460754 [details] [diff] [review]
move resampler simd optimizations to separate translation units
It seems mercurial has lost the file renames somewhere.
Attachment #8460754 -
Flags: review?(paul)
Assignee | ||
Comment 8•10 years ago
|
||
Attachment #8460754 -
Attachment is obsolete: true
Attachment #8460798 -
Flags: review?(paul)
Updated•10 years ago
|
Attachment #8460751 -
Flags: review?(paul) → review+
Updated•10 years ago
|
Attachment #8460752 -
Flags: review?(paul) → review+
Updated•10 years ago
|
Attachment #8460755 -
Flags: review?(paul) → review+
Updated•10 years ago
|
Attachment #8460757 -
Flags: review?(paul) → review+
Updated•10 years ago
|
Attachment #8460798 -
Flags: review?(paul) → review+
Assignee | ||
Comment 9•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/85e2540e752f
https://hg.mozilla.org/integration/mozilla-inbound/rev/2566329e44b9
https://hg.mozilla.org/integration/mozilla-inbound/rev/82f2fe2768d5
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2404857df44
https://hg.mozilla.org/integration/mozilla-inbound/rev/16cb26a14843
https://hg.mozilla.org/integration/mozilla-inbound/rev/42f62e9467f4
Flags: in-testsuite-
Comment 10•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/85e2540e752f
https://hg.mozilla.org/mozilla-central/rev/2566329e44b9
https://hg.mozilla.org/mozilla-central/rev/82f2fe2768d5
https://hg.mozilla.org/mozilla-central/rev/a2404857df44
https://hg.mozilla.org/mozilla-central/rev/16cb26a14843
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment 11•10 years ago
|
||
[Blocking Requested - why for this release]: This reduces the CPU usage of a WebRTC call by 5%-8%. Given how resource intensive WebRTC is, we could really use this enhancement on v2.0.
blocking-b2g: --- → 2.0?
Comment 12•10 years ago
|
||
(In reply to Diego Wilson [:diego] from comment #11)
> [Blocking Requested - why for this release]: This reduces the CPU usage of a
> WebRTC call by 5%-8%. Given how resource intensive WebRTC is, we could
> really use this enhancement on v2.0.
Hey diego, this might help here but given the timeline for 2.0 and the baketime this has had, IMHO this looks risky to uplift at this point. I am also neeing Karl to give his input on risk profile here.
Flags: needinfo?(karlt)
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #12)
> Karl to give his input on risk profile here.
I don't know neon assembler myself, so I don't know the potential for edge
case issues such as incorrect saturation. Things will sound bad anyway if
content causes the corner cases are being reached. The common case has
automated tests, which would pick up major bugs but maybe not smaller changes
in quality. IIRC VLC has been using similar code for a long time.
Porting this requires the SATURATE32PSHR changes from bug 1042504 and some of
the refactoring of conditionals there. That's probably more costly in terms
of developer work than risk.
The greatest risk is probably generating and running the wrong code on
platforms that won't run it. Such effects would be fatal, but therefore
easier to detect.
Flags: needinfo?(karlt)
Comment 14•10 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #13)
> (In reply to bhavana bajaj [:bajaj] from comment #12)
> > Karl to give his input on risk profile here.
>
> I don't know neon assembler myself, so I don't know the potential for edge
> case issues such as incorrect saturation. Things will sound bad anyway if
> content causes the corner cases are being reached. The common case has
> automated tests, which would pick up major bugs but maybe not smaller changes
> in quality. IIRC VLC has been using similar code for a long time.
>
> Porting this requires the SATURATE32PSHR changes from bug 1042504 and some of
> the refactoring of conditionals there. That's probably more costly in terms
> of developer work than risk.
>
> The greatest risk is probably generating and running the wrong code on
> platforms that won't run it. Such effects would be fatal, but therefore
> easier to detect.
Thanks karl. I've looped in :jsmith from QA to help get additional testing here. He'll work through some of the follow0up questions he may have.
blocking-b2g: 2.0? → 2.0+
Comment 15•10 years ago
|
||
Needs rebasing for b2g32 uplift.
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → fixed
status-firefox32:
--- → wontfix
status-firefox33:
--- → wontfix
status-firefox34:
--- → fixed
Flags: needinfo?(karlt)
Keywords: branch-patch-needed
Comment 16•10 years ago
|
||
Karl isn't available to work on this. Further, uplifting is dead time in that the fixes will be available in the next release. Eric - is there someone available in Taipei to uplift the neon optimisations?
Flags: needinfo?(karlt) → needinfo?(echou)
Comment 17•10 years ago
|
||
Hi Anthony,
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #16)
> Karl isn't available to work on this. Further, uplifting is dead time in
> that the fixes will be available in the next release. Eric - is there
> someone available in Taipei to uplift the neon optimisations?
Star is working on another potential 2.0 issue and Bruce are still on 2.1 feature. Blake might be available, but do you think it's ok for us to do the uplift even if we don't really know the logic behind the scene? :|
Could Paul help out with uplifting?
Flags: needinfo?(echou) → needinfo?(steve)
Comment 19•10 years ago
|
||
This is hard to uplift as is. We have two options:
(1) We can try to uplift the minimum of code needed, but that means writing new code.
(2) We can uplift all the exact patches that landed on central in the exact same order (bugs 1033140, 1042504, 1042508, and backout and reapply 995075)
I think (1) is riskier and longer to do than (2), but will eventually result in a smaller uplift (which should not be a criteria).
(2) is, considering the patches have baked on central for some time, now, what I'm leaning towards. Ideally, when it's done, we would have the same code in b2g31 and central, so we benefit from the QA and test runs done there.
That said, have we profiled the code? Are the optimizations really worth the uplift?
Comment 20•10 years ago
|
||
There isn't anyone available to work on this. RyanVM could try uplifting the chain of patches that Paul suggested. Otherwise we should cancel the uplift.
Flags: needinfo?(ryanvm)
Comment 21•10 years ago
|
||
I'm happy to do so, but it sounds like maybe we should get some perf measurements on 2.1 first to make sure it's worth the effort?
Flags: needinfo?(ryanvm)
Comment 22•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #21)
> I'm happy to do so, but it sounds like maybe we should get some perf
> measurements on 2.1 first to make sure it's worth the effort?
That's exactly what I think we need. Diego -- Can you provide some before and after perf numbers for these sets of patches?
Flags: needinfo?(dwilson)
Comment 23•10 years ago
|
||
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22)
> That's exactly what I think we need. Diego -- Can you provide some before
> and after perf numbers for these sets of patches?
We need to know if we're near 100%.
Keywords: qawanted
Comment 24•10 years ago
|
||
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #23)
> (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22)
> > That's exactly what I think we need. Diego -- Can you provide some before
> > and after perf numbers for these sets of patches?
>
> We need to know if we're near 100%.
IIRC the CPU usage is near 95% for WebRTC on FFOS v2.0 8x10
Jay, can you share our perf improvements numbers for WebRTC?
Flags: needinfo?(dwilson) → needinfo?(jaywang)
Comment 25•10 years ago
|
||
Removing qawanted - jsmith recommends adding mlee to this bug.
Keywords: qawanted
Comment 26•10 years ago
|
||
As part of Li Gong's reorganization the fxOS Performance team has been dissolved. Performance issues will now be directly handled by component teams.
Flags: needinfo?(jaywang)
Comment 27•10 years ago
|
||
(In reply to Diego Wilson [:diego] from comment #24)
> (In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #23)
> > (In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #22)
> > > That's exactly what I think we need. Diego -- Can you provide some before
> > > and after perf numbers for these sets of patches?
> >
> > We need to know if we're near 100%.
>
> IIRC the CPU usage is near 95% for WebRTC on FFOS v2.0 8x10
>
> Jay, can you share our perf improvements numbers for WebRTC?
Without this patch, we see that average CPU % is around 97.3% and encoder FPS is around 25-28FPS.
With the patch, the average CPU% is around 95.3% and the encodoer FPS goes up to 28-30FPS range. This is measured with end-to-end webRTC use-case. With the local WebRTC loopback, both cases gives about 30FPS, but the neon resempler helps to reduce CPU utilization by 5%.
Comment 28•10 years ago
|
||
(In reply to Mike Lee [:mlee] from comment #26)
> As part of Li Gong's reorganization the fxOS Performance team has been
> dissolved. Performance issues will now be directly handled by component
> teams.
Lets hope the patches apply cleanly then.
Comment 29•10 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=6e3c7a993a6e
I'll go ahead and requested 2.0+ on the other bugs.
Keywords: branch-patch-needed
Comment 30•10 years ago
|
||
Try is green, just waiting on the approvals.
Comment 31•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/267c11d34c91
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/5550e4079b31
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/56ef32b42cb5
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/5ffe87c3f6f0
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/1843d25906fd
Comment 32•10 years ago
|
||
Ryan - thanks for uplifting us out of a pickle!
Updated•10 years ago
|
QA Whiteboard: [qa-]
You need to log in
before you can comment on or make changes to this bug.
Description
•