Closed Bug 1157635 Opened 5 years ago Closed 3 years ago

flush subnormals in feedback loops - followup

Categories

(Core :: Web Audio, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox40 --- affected
firefox53 --- fixed

People

(Reporter: baku, Assigned: dminor)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(2 files, 1 obsolete file)

Depends on: 1027864
Component: DOM → Web Audio
Keywords: perf
Good to have, might be easy to fix if we are ok with fixing this by setting DAZ/FTZ using SSE2 intrinsics.

This can bring quite a speedup in some cases with graph that contain biquads that are fed silence, which should be a rather common case (think a drumloop with sounds with a short release, followed by a lowpass).
Priority: -- → P2
Biquads do an explicit flush to zero, but there may be some places we've missed.
I haven't checked AudioParam target curves.

If FTZ equivalents are not available on all supported platforms, then we'd still need to also catch each "approaching zero" behavior individually, but getting the hardware to do this for us when it can is probably a little more efficient.

The debugging patch in https://bugzilla.mozilla.org/show_bug.cgi?id=944143#c4 may be useful for determining where this might be hurting us.
I expect the FP_ASSIST performance counter can track this, but I haven't got it working here.  This might work on some Intel CPUs:

watch perf stat -e r1eca -e cycles -t <msg-thread-id> sleep 2

On my Nehalem here, I have numbers for rc0/INST_RETIRED:ANY_P but 0 for r1f7/FP_ASSIST:ALL.

http://www.bnikolic.co.uk/blog/hpc-prof-events.html
I *think* now that we're dropping support for non-SSE x86 processors we're ok to go ahead with this on the assumption that we're only responsible for providing x86 and ARM implementations and we would rely on external contributors to provide implementations for other processors.
Assignee: nobody → dminor
Depends on: 1271755
The reason my previous attempts appeared not to work is that we were comparing double values to FLT_MIN in some cases, but the smallest normalized double is smaller than FLT_MIN. If I'm careful to use FLT_MIN for floats and DBL_MIN for doubles, then DenormalDisabler works as expected.
Status: NEW → ASSIGNED
Try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b06329ca450370374f919dd694f27c80e1ee090e

Looks like problems with linux32 and android. My understanding is that ARM should flush denormals to zero by default, I'm going to try the tests on my phone to see if the emulator is just not emulating this behaviour.
Updated try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=89b8d44df97b4bc8721e581e556f549292a1bfcf

I've updated the DenormalDisabler from blink's version so it now supports ARM and I've disabled it for linux32 as we don't yet require SSE on that platform.
Comment on attachment 8778850 [details]
Bug 1157635 - Update DenormalDisabler.h from chromium rev e3bc1d8cebf65a51bb23abf204cf1ba059f1e865;

https://reviewboard.mozilla.org/r/69980/#review95862

::: dom/media/webaudio/blink/DenormalDisabler.h:1
(Diff revision 1)
>  /*

This is from https://chromium.googlesource.com/chromium/src.git/+/e3bc1d8cebf65a51bb23abf204cf1ba059f1e865/third_party/WebKit/Source/platform/audio/DenormalDisabler.h?format=TEXT

That's a chromium revision, so please either s/blink/chromium/ in the commit message or use a blink revision.

::: dom/media/webaudio/blink/DenormalDisabler.h
(Diff revision 1)
>   */
>  
>  #ifndef DenormalDisabler_h
>  #define DenormalDisabler_h
>  
> -#include <cmath>

cmath is removed in this revision and math.h added in the next.  Seems cmath should just remain.

::: dom/media/webaudio/blink/DenormalDisabler.h:37
(Diff revision 1)
>  // Deal with denormals. They can very seriously impact performance on x86.
>  
>  // Define HAVE_DENORMAL if we support flushing denormals to zero.
> -#if defined(XP_WIN) && defined(_MSC_VER)
> +
> +#if defined (XP_WIN) && defined(_MSC_VER)
>  #define HAVE_DENORMAL

The 
#define HAVE_DENORMAL 1
from chromium should work fine here, so no need to change it.  Following the changes in chromium code where it makes sense makes it easier to compare.

Similarly, "// Windows compiled using MSVC with SSE2"

can be kept here.

::: dom/media/webaudio/blink/DenormalDisabler.h:41
(Diff revision 1)
> +#if defined (XP_WIN) && defined(_MSC_VER)
>  #define HAVE_DENORMAL
>  #endif
>  
>  #if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
>  #define HAVE_DENORMAL

Chromium added:

// X86 chips can flush denormals
Comment on attachment 8778850 [details]
Bug 1157635 - Update DenormalDisabler.h from chromium rev e3bc1d8cebf65a51bb23abf204cf1ba059f1e865;

https://reviewboard.mozilla.org/r/69980/#review95866
Attachment #8778850 - Flags: review?(karlt) → review+
Comment on attachment 8778851 [details]
Bug 1157635 - Remove linux32 support for flushing denormals from DenormalDisabler.h;

https://reviewboard.mozilla.org/r/69982/#review95864

With bug 1274196 landing, I think this part should be skipped.

I guess there's a way to test whether math is using SSE or not, but I don't know what that way would be, and it shouldn't block the changes here.
Having SSE2 is tier 1, without is tier 3.
Attachment #8778851 - Flags: review?(karlt)
Comment on attachment 8778852 [details]
Bug 1157635 - Use DenormalDisabler.h to automatically flush subnormals;

https://reviewboard.mozilla.org/r/69984/#review95874

::: dom/media/webaudio/blink/Biquad.cpp
(Diff revision 1)
> -    // TODO: Remove this code when Bug 1157635 is fixed.
>      if (x1 == 0.0 && x2 == 0.0 && (y1 != 0.0 || y2 != 0.0) &&
> -        fabs(y1) < FLT_MIN && fabs(y2) < FLT_MIN) {
> +        fabs(y1) < DBL_MIN && fabs(y2) < DBL_MIN) {

I think this is unnecessarily increasing the length of a decaying tail, and so should be left with tests against FLT_MIN.

Once y1,y2 are < FLT_MIN, we are expecting the output to be zero and so hasTail() should return false.

::: dom/media/webaudio/blink/Biquad.cpp:88
(Diff revision 1)
>        for (int i = framesToProcess; i-- && fabsf(destP[i]) < FLT_MIN; ) {
> -        destP[i] = 0.0f;
> +        destP[i] = DenormalDisabler::flushDenormalFloatToZero(destP[i]);
> +        MOZ_ASSERT(destP[i] == 0.0 || fabs(destP[i]) > FLT_MIN, "output should not be subnormal");
>        }

I think it would be clearer to put this whole for-loop in "#ifndef HAVE_DENORMAL" (even though that name might be the reverse of what's expected) and leave the = 0.0f, and I'm not confident the compiler will optimize away the math involving fabsf/absf for us.

Feel free to just not change this at all too, as optimizing this is lower priority than OneIteration().

::: dom/media/webaudio/blink/IIRFilter.cpp:106
(Diff revision 1)
> -        // TODO: Remove this code when Bug 1157635 is fixed.
> -        if (fabs(yn) >= FLT_MIN) {
> +        destP[n] = WebCore::DenormalDisabler::flushDenormalFloatToZero(yn);
> +        MOZ_ASSERT(destP[n] == 0.0 || fabs(destP[n]) > FLT_MIN, "output should not be subnormal");
> -            destP[n] = yn;
> -        } else {
> -            destP[n] = 0.0;
> -        }

This change is OK if you like.  It is less efficient on systems without FTZ because the conversion from double yn to the float parameter of flushDenormalFloatToZero() will raise an exception, but those systems are not the priority, and hopefully this should happen only for a few values rather than several blocks of values.

It is more efficient on systems with FTZ, yes.

::: dom/media/webaudio/blink/ZeroPole.cpp:69
(Diff revision 1)
>      if (lastX == 0.0f && lastY != 0.0f && fabsf(lastY) < FLT_MIN) {
>          // Flush future values to zero (until there is new input).
>          lastY = 0.0;
>          // Flush calculated values.
>          for (int i = framesToProcess; i-- && fabsf(destination[i]) < FLT_MIN; ) {
> -            destination[i] = 0.0f;
> +            destination[i] = DenormalDisabler::flushDenormalFloatToZero(destination[i]);
> +            MOZ_ASSERT(destination[i] == 0.0 || fabs(destination[i]) > FLT_MIN, "output should not be subnormal");
>          }
>      }

I think it would be clearer to put this whole conditional in "#ifndef HAVE_DENORMAL" and leave the = 0.0f, and I'm not confident the compiler will optimize away the math involving fabsf/absf for us.

Feel free to just not change this at all too, as optimizing this is lower priority than OneIteration().
Attachment #8778852 - Flags: review?(karlt)
Attachment #8778851 - Attachment is obsolete: true
Comment on attachment 8778852 [details]
Bug 1157635 - Use DenormalDisabler.h to automatically flush subnormals;

https://reviewboard.mozilla.org/r/69984/#review96674

r+ with a couple of changes.

::: dom/media/webaudio/blink/Biquad.cpp:88
(Diff revision 2)
>        // Flush future values to zero (until there is new input).
>        y1 = y2 = 0.0;
>        // Flush calculated values.
> +      #ifndef HAVE_DENORMAL
>        for (int i = framesToProcess; i-- && fabsf(destP[i]) < FLT_MIN; ) {
> -        destP[i] = 0.0f;
> +        destP[i] = 0.0;

Keep 0.0f please, as this is float.

::: dom/media/webaudio/blink/ZeroPole.cpp:67
(Diff revision 2)
>      if (lastX == 0.0f && lastY != 0.0f && fabsf(lastY) < FLT_MIN) {
>          // Flush future values to zero (until there is new input).
>          lastY = 0.0;
> +
> +        #ifndef HAVE_DENORMAL

Here in ZeroPole, lastY is float and so lastY will already be zero if < FLT_MIN, with FTZ.

Therefore, put the #ifndef HAVE_DENORMAL around the whole if () block.
Attachment #8778852 - Flags: review?(karlt) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a542aaed86b
Update DenormalDisabler.h from chromium rev e3bc1d8cebf65a51bb23abf204cf1ba059f1e865; r=karlt
https://hg.mozilla.org/integration/autoland/rev/e50bc53ae80c
Use DenormalDisabler.h to automatically flush subnormals; r=karlt
https://hg.mozilla.org/mozilla-central/rev/3a542aaed86b
https://hg.mozilla.org/mozilla-central/rev/e50bc53ae80c
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.