Open Bug 1027624 Opened 8 years ago Updated 11 months ago

Float denormal issue in JavaScript processor node in Web Audio API

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

UNCONFIRMED

People

(Reporter: letz, Unassigned)

Details

Attachments

(1 file)

1.06 MB, text/html
Details
Attached file piano.html
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.76.4 (KHTML, like Gecko) Version/6.1.4 Safari/537.76.4

Steps to reproduce:

We successfully compile our C++ audio processing code with emcripten in asm.js to deploy on the web using the WebAudio API , so running the resulting asm.js code in a ScriptProcessorNode in the Web Audio API. 

Our C++ code uses the following denormalized float number protection code ("protection" is needed since denormalized float number computation is awfully slow and has to be avoided): 

#ifdef __SSE__
   #include <xmmintrin.h>
   #ifdef __SSE2__
       #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8040)
   #else
       #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8000)
   #endif
#else
   #define AVOIDDENORMALS
#endif

Basically we add a call at AVOIDDENORMALS before each audio  block processing. It seems this AVOIDDENORMALS is just removed by the emcripten compiler and so we get asm.js code that seems to produce denormalized floats and the speed issue occurs.

The attached "piano.html" page contains a piano physical model that is compiled in asm.js and run as a Web Audio API ScriptProcessorNode. It you hit the "gate" button a sound is played. After some seconds the CPU use (seen in activity monitor on OSX raises to 100%)


Actual results:

ScriptProcessorNode node takes a lot of time to execute since float denormal issue happens


Expected results:

ScriptProcessorNode should probably be processes in an context where flush denormals to zero is done automatically.
Severity: normal → major
Hardware: x86 → x86_64
Component: Untriaged → Web Audio
Product: Firefox → Core
Component: Web Audio → Projects
Product: Core → Audio/Visual Infrastructure
Version: 33 Branch → unspecified
Component: Projects → Web Audio
Product: Audio/Visual Infrastructure → Core
Version: unspecified → Trunk
Can't you just flush the denormals to zero by hand in JavaScript? We can't set the flag here, because this would break IEEE754 compatibility. This is what we do internally for platforms that don't have CPU settings for that.
Severity: major → normal
Flags: needinfo?(letz)
OS: Mac OS X → All
Hardware: x86_64 → x86
Is there a Java Script API for that? That could be called once before each audio block computation  to put the processor in the "denormals to zero" mode ? (could not find it…)
Flags: needinfo?(letz)
No, just iterate on the buffer, detect denormals and replace them by zeros. This is just normal IEEE754 stuff, it does not depend on the language.
This is way too slow and complicated : you would have to check at several places in the audio processing code when denormals can occur and treat them. Awful... Not feasible in practice. The better way would be to have a JavaScript function to set the processor in "denormals to zero" mode, to be used exactly the was we do in C/C++: once at the beginning of the audio block computation, the way we use this AVOIDDENORMALS macro.
To elaborate on that: this is quite a common practice in audio code to flush denormals to zero, because in practice it is what we want. So we need a cheap way to do that, especially if processors allow that.

See:

http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-flush-denormals-confidence/

"For instance, in audio processing applications, denormal values usually represent a signal so quiet that it is out of the human hearing range. Because of this, a common measure to avoid denormals on processors where there would be a performance penalty is to cut the signal to zero once it reaches denormal levels or mix in an extremely quiet noise signal."

http://www.juce.com/forum/topic/resolving-denormal-floats-once-and-all

https://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/

http://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/

"Signal processing, especially audio processing, is another area where recursive iterations usually lead to dernormals. Disabling them is generally the best practical solution too, because it would be extremely expensive to check the signal and disable the feedback loops at every computation stage. Moreover, it’s usually a null signal that triggers the denormal slowness (think of this simple iteration: y[n] = x[n-1] * 0.1 + y[n-1] * 0.9), so curing the processing at one stage could worsen it at a subsequent stage!

But if you can’t disable denormals, you can still inject noise at some points to ensure that the signal stays away from the denormal range…"
(In reply to letz from comment #4)
> This is way too slow and complicated : you would have to check at several
> places in the audio processing code when denormals can occur and treat them.
> Awful... Not feasible in practice.

Are there really that many places where it would be necessary to check for
subnormals?

These only cause problems when the output of processing depends on previous
output, right?

Bug 1027864 is something we should fix in the Web Audio implementation.
I don't know whether that is enough to address your particular use case.

If the audioprocess function is storing it previous output to use for
feedback, then it will still need to flush subnormals.

> The better way would be to have a
> JavaScript function to set the processor in "denormals to zero" mode, to be
> used exactly the was we do in C/C++: once at the beginning of the audio
> block computation, the way we use this AVOIDDENORMALS macro.

I assume emscripten doesn't translate the __SSE2__ parts of the c++ and so
will rely on the other (c++) techniques used on non-SSE2 platforms.

If there is a reason why the detecting subnormals in JS is not practical and a
better means is required, then that's a JS question and I'll change the
component of this bug.

(In reply to letz from comment #5)
> "[...] it would be extremely expensive to
> check the signal and disable the feedback loops at every computation stage.
> Moreover, it’s usually a null signal that triggers the denormal slowness
> (think of this simple iteration: y[n] = x[n-1] * 0.1 + y[n-1] * 0.9), so
> curing the processing at one stage could worsen it at a subsequent stage!

It is not necessary to check at every computation, but this recursive
iteration would be a place to check.

A null signal does trigger subnormals for these recurrence relations / IIR
filters, but flushing subnormals in these iterations does not cause problems
in general.

> But if you can’t disable denormals, you can still inject noise at some
> points to ensure that the signal stays away from the denormal range…"

Null signals can be optimized, and silent feedback loops can be collected, but
noise injection would prohibit this.
The "add noise" method at appropriate places is the alternative method, yes, but not very practical. The point is that in C/C++, and when the processor support it of course, this AVOIDDENORMALS macro is very easy to use. It would be a pity not to be able to use the same kind of technique in JS, when it is supported by the underlying processor. Could a FlushDenormalToZero() method be added in the Math package for instance ? that would set the processor in the appropriate mode if possible and return true. If not the programmer would have to use the "add noise" method, basically the way he/she would have to do in C/C++.
Another important point: when we run our C/C++ audio code in a CoreAudio read-time thread on OSX (so automatically started by the CoreAudio layer…) we don't need to add this AVOIDDENORMALS macro. So it means that CoreAudio probably *automatically* adds this AVOIDDENORMALS stuff before calling the client application audio code. So it means also that CoreAudio assume that any audio code can safely run with this flush denormal to zero" setup… 
You could probably safely use the same trick: just use the  AVOIDDENORMALS kind of stuff before calling any JavaScript Web Audio processor node in the audio chain. This would be the more transparent manner for the programmer.
Slow denormals is a low level platform specific problem. This is why it should be handled directly by the web audio api implementation on the platform that have the problem, not by the user code (after all the user code is supposed to be high level and platform independent). The best solution would be to have the web audio api to automatically set the FZ and DAZ flags on Intel processors before calling user's javascript code and restoring the flags after.
This is not feasible. JavaScript needs to respect IEEE754, that's part of the standard. We can't run JavaScript code with those flags set. Waldo (cc-ed) told me this is not negotiable.

We need to talk to JavaScript people and come up with a standard and reasonable way to do this.

The point about running the C++ code in CoreAudio threads does not apply here because I believe the CPU flag are reset when js code starts running.
OK, so then this discussion should go between the designer/implementers of the Web Audio API and JavaScript. All developers using JavaScript to add audio nodes in the Web Audio API will potentially suffer from this issue, which is (and we should not forget that…) a specific Intel processor problem. 

You can't really expect all developers to "protect" their hight level JavaScript code from this kind of very low-level issue, because 1) this is hard to do 2) this would be slow. This is just nonsense.
It's a JS spec issue, not a Firefox-specific issue or a WebAudio spec issue. The JS spec says JS arithmetic has to follow IEEE754. You want something different.
Component: Web Audio → JavaScript Engine
There are several ways this could be addressed in JS, e.g.
-- "use flush_denormals_to_zero"
-- Math.flushDenormalToZero(v) (returns 0 if v is denormal, otherwise v)
-- flush-to-zero versions of JS SIMD intrinsics (or just make the intrinsics flush-to-zero by default)
A global variable that changes the semantics of all JS arithmetic is probably not a good idea, even though it matches what the hardware provides.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13)
> A global variable that changes the semantics of all JS arithmetic is
> probably not a good idea, even though it matches what the hardware provides.

It's also entirely unrealistic that it would ever be accepted into the language. TC39 has a very strict "no more modes" policy, which certainly won't be broken for a case like this.

The SIMD intrinsics seem most promising to me. CC'ing folks working on that, and Luke because this is probably most relevant to asm.js code.
The association of Web Audio API and asm.js is a terrific platform that can have a deep impact on the way we write, distributed and use audio software in the future. Unfortunately having slow denormals without a real solution could potentially defeat the whole process.  

My proposition was to have an exception, limited to custom dsp effects written in JS in the framework of the web audio API. This would not affect general JS code outside this very specific context. JS arithmetic remains IEEE754 compatible.
To see the problem, here is a piano physical model in C++ compiled in asm.js using emcsripten. A single string is then duplicated (for "bug" demonstration purpose) 16 times, so 16 strings are computed all the time.

http://faust.grame.fr/www/piano.html

Chrome CPU  here ==> 36 %

Hit the "gate" button to play a same note on all 16 string, wait some seconds, CPU raise to 100%

Same issue with Firefox and Safari WebKit.

Stéphane Letz
With SIMD we do technically have the opportunity to introduce new semantics for arith operations, so we could mandate FTZ.  In fact, ARM Neon forces FTZ.  The problem is that the FTZ/DAZ mxcsr flags affect both SIMD and scalar double arithmetic so we'd potentially have to flip the flag on and off repeatedly.  For this reason, I think everyone wanted to leave the denormal behavior undefined.  However, looking at agner.org, stmxcsr doesn't seem terribly expensive, so perhaps we could do this.  As long as noone is interleaving scalar arith in SIMD loops, in theory we could hoist the stmxcsr's to before/after the loop.  Any thoughts on this Dan/Benjamin?
(In reply to YANN ORLAREY from comment #15)
> My proposition was to have an exception, limited to custom dsp effects
> written in JS in the framework of the web audio API. This would not affect
> general JS code outside this very specific context. JS arithmetic remains
> IEEE754 compatible.

Except for the JS arithmetic in your DSP effects written in JS.  Which means JS arithmetic would not actually remain IEEE-754 compatible.

C/C++ can get away with compiler-specific AVOIDDENORMALS sorts of things to modify floating-point behavior, because floating point computations don't have specified arithmetic semantics.  JS in contrast precisely defines floating point arithmetic.  There's no leeway to select different ones.  Different semantics require new operations defined to have those semantics.

(In reply to letz from comment #16)
> Chrome CPU  here ==> 36 %

Are you asserting, based on reading of Blink/v8 code, profiling, and perhaps Blink/v8 patching, that Chrome is flushing denormals here, and that's the specific reason it's faster?  Or is it at all possible that some other unrelated factor(s) is/are in play and might be the cause of the performance difference?
"Chrome CPU  here ==> 36 % "

The point was to show that 16 piano simulated strings compiled in asm.js when running "normally" (no denormal issue when the note is played) consume like 36 % on this machine, and the CPU raises to 100% when the notes becomes silent and denormal problem starts to happen.
(In reply to letz from comment #19)
> The point was to show that 16 piano simulated strings compiled in asm.js
> when running "normally" (no denormal issue when the note is played) consume
> like 36 % on this machine, and the CPU raises to 100% when the notes becomes
> silent and denormal problem starts to happen.

Oh!  I misread your comment.  I see what you meant now.  Sorry I got confused here.
(In reply to Luke Wagner [:luke] from comment #17)
> With SIMD we do technically have the opportunity to introduce new semantics
> for arith operations, so we could mandate FTZ.  In fact, ARM Neon forces
> FTZ.  The problem is that the FTZ/DAZ mxcsr flags affect both SIMD and
> scalar double arithmetic so we'd potentially have to flip the flag on and
> off repeatedly.  For this reason, I think everyone wanted to leave the
> denormal behavior undefined.  However, looking at agner.org, stmxcsr doesn't
> seem terribly expensive, so perhaps we could do this.  As long as noone is
> interleaving scalar arith in SIMD loops, in theory we could hoist the
> stmxcsr's to before/after the loop.  Any thoughts on this Dan/Benjamin?

ARM NEON doesn't have double precision operations, so to emulate them on ARM you would have to change the flush-to-zero flag in the FPCR, issue the VFP instructions, and then change the flush-to-zero flag back. If you have interleaved SIMD and scalar floating-point operations, this could be quite a mess.

ARM64 has denormals in both scalar and SIMD arithmetic (and the SIMD has double precision operations), and the flush-to-zero flag affects both.
The initial JS SIMD API only has Float32x4 (and (U)Int32x4).  Does audio processing need double precision floats?
It may in some cases.
Oops, looks like there is a Float64x2 in the works in https://github.com/johnmccutchan/ecmascript_simd.
> Except for the JS arithmetic in your DSP effects written in JS.  Which means
> JS arithmetic would not actually remain IEEE-754 compatible.

Yes, that's right.
Insisting on IEEE-754-compatibility basically means insisting on audio-processing-incompatibility.

Audio processing apparently needs its own domain (just like OpenGL for visuals). Web Audio API is a very good start in that direction even if the ScriptProcessorNode stays a its weakest element.

Maybe Javascript is just not the right language for extending the Web Audio API possibilities.
For a moment we believed that this could work anyway, especially thinking that future JIT compilers could pull performance up to almost optimal (and of course after having fixed the threading issues).

Now, the idea of strictly IEEE-754 compatible audio processing doesn't give a lot of hope and finally confirms the need for an audio adapted processing domain and language that would allow for properly extending the basic Web Audio API nodes.
I'd vote for Faust (evidently having something like AVOIDDENORMALS enabled during the execution of the JIT compiled Faust expressions :-). It will be hard to come up with something more appropriate.

I have a dream...
(In reply to Norbert.Schnell from comment #25)
> > Except for the JS arithmetic in your DSP effects written in JS.  Which means
> > JS arithmetic would not actually remain IEEE-754 compatible.
> 
> Yes, that's right.
> Insisting on IEEE-754-compatibility basically means insisting on
> audio-processing-incompatibility.

I can't agree more. Realtime audio applications and more generally signal processing applications heavily rely on recursive IIR filters. These recursive filters (for example Y[n] = X[n] + 0.5*Y[n-1]) will inevitably produce a huge number of denormals every second in particular when the input signal becomes 0.

The problem is that denormals are generally an order of magnitude slower to process, with nearly no benefit for audio applications. This is why we usually prefer to relax IEEE compatibility and run with FTZ and DAZ flags set. In other words strict IEEE compatibility is currently incompatible with realtime audio application as long as denormals handling is so slow on our processors. 

This is why we request a pragmatic solution with a clearly delimited exception allowing to relax IEEE compatibility in some well defined cases. Otherwise JS, despite all the interesting developments around the Web Audio API and asm.js, will remain audio-processing-incompatible.
This problem might also be addressed by the value types work that is in progress. "Denormalized floats" could just be a distinct type of number that follows different rules.
A debug version of the "piano" example compiled with emcc -profiling and -s LINKABLE=1 so that lines can be looked at: 

http://faust.grame.fr/www/piano-debug.html

The real audio computation is in "__ZN5piano7computeEiPPfS1_"
(In reply to Norbert.Schnell from comment #25)
> Maybe Javascript is just not the right language for extending the Web Audio
> API possibilities.
> For a moment we believed that this could work anyway, especially thinking
> that future JIT compilers could pull performance up to almost optimal (and
> of course after having fixed the threading issues).

I might well be overreading you.  (If so, consider the rest of this as a suggestion to beware how others might interpret your arguments, adjusting them accordingly.)

But to the extent you're saying this because you want to bad-mouth JS, without actually caring whether JS can be adapted to meet your needs.  (And your comment reads that way to me, although not unequivocally so.  I could well be wrong!)

If indeed this is what you're doing: in doing so, you won't win you many friends among the engineers proposing and implementing JS spec changes to address this use case.  And you might materially delay those spec changes being implemented in Firefox.

Not for any *principled* reason.  But because engineers are humans.  We get as upset as anyone when the projects we work on are insulted, and we may retaliate (deliberately, or unintentionally from simply wearying of conflict) by taking longer to design and implement a fix.

Polite disagreement is fine.  No one at Mozilla agrees with every decision Mozilla makes.  But you're more likely to see this bug fixed if you don't obliquely insult the product we work on.
Three thoughts:
- If your code has been manually written and is intensively using Float32, you might want to give a try to optimize arithmetic operations with Math.fround, by wrapping your operations with Math.fround. That gives hints to the JIT to use asm instructions specialized for floats rather than for doubles. For instance, if you have:
> function f(i) { return 3 * i + 5; }
You can rewrite it:
> var f32=Math.fround;function f(i) { return f32(f32(f32(3) * f32(i)) + f32(5)); }
And corresponding JIT code will be using float32 variants of the assembly operations, whenever possible. See also [1].
- If your code is compiled with emscripten, make sure you have set the PRECISE_F32 flag at compile time to enable these Math.fround optimizations.
- Otherwise, discussion is more about the spec, in which case it should probably take place on the ecmascript mailing list [2]. I would suggest a decorator function withFTZ that takes a function as an in parameter:

withFTZ(function () {
 // do stuff here that will be done with FTZ set
});

This way it could be easily inlined in the JITs as:
1) Set FTZ
2) Execute the function script
3) Unset FTZ
This is just a wild idea and could probably be enhanced, but that seems to fit well this case.

[1] https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/
[2] https://mail.mozilla.org/listinfo/es-discuss
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #29)
> (In reply to Norbert.Schnell from comment #25)
> > Maybe Javascript is just not the right language for extending the Web Audio
> > API possibilities.
> > For a moment we believed that this could work anyway, especially thinking
> > that future JIT compilers could pull performance up to almost optimal (and
> > of course after having fixed the threading issues).
> 
> I might well be overreading you.  (If so, consider the rest of this as a
> suggestion to beware how others might interpret your arguments, adjusting
> them accordingly.)

Yes, no, I really have nothing against Javascript in general neither as a language nor as a dev and runtime environment. I use it with a lot of joy everyday since a couple of month to create interactive audio applications.

The reflection we need (and this is unfortunately not the right place to have it) is how we can properly extend the possibilities of Web Audio API - without breaking the elegant simplicity of the existing nodes nor exploding the number of nodes - within a "domain" (language/algebra/formalism and conditions of execution) that is adapted to audio processing and can drag the knowhow of expert DSP programmers into the context of web development. There are certain things we have to enable and many others that we don't need. So if it is Javascript its maybe not the complete same Javascript - with the same set of functions and the same conditions of compilation and execution - as in the rest of the web development (e.g. no need to change the background colors or to send a message over WebSockets from an audio "thread", but need to avoid denormals and need to have well specified communication channels with the rest of the environment).

I hope we can find the right place to open this discussion across the audio processing and web developer communities soon.
(In reply to Benjamin Bouvier [:bbouvier] from comment #30)

> - Otherwise, discussion is more about the spec, in which case it should
> probably take place on the ecmascript mailing list [2]. I would suggest a
> decorator function withFTZ that takes a function as an in parameter:
> 
> withFTZ(function () {
>  // do stuff here that will be done with FTZ set
> });
> 
> This way it could be easily inlined in the JITs as:
> 1) Set FTZ
> 2) Execute the function script
> 3) Unset FTZ
> This is just a wild idea and could probably be enhanced, but that seems to
> fit well this case.

Yes, that would be perfect. 

> 
> [1]
> https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-
> in-javascript/
> [2] https://mail.mozilla.org/listinfo/es-discuss
We got the confirmation that in practice, (almost..) all audio applications developed with CoreAudio on OSX, will be "automagically" protected against denormals.

Read the thread here http://lists.apple.com/archives/coreaudio-api/2014/Jun/index.html
Two kind of crazy ideas:

1. Does the denormal slowdown happen in specific parts of the code? I wonder if we could add a PGO-like option in emscripten where the code checks for denormals in practice, then we rebuild with the profile and emscripten would emit round-to-zero in the relevant places where it was actually seen. This would require no manual user action (except for running a build to profile).

2. Much crazier ;) - could audio processing be done by a shader? That is, upload the data to a WebGL buffer, use a compute shader, and retrieve the data? GPUs always round to zero, so the problem would go away. If this makes sense in theory, we could write a JS library that makes it convenient. Or is typical audio processing non-GPU-able?
(In reply to Alon Zakai (:azakai) from comment #34)
> 2. Much crazier ;) - could audio processing be done by a shader? That is,
> upload the data to a WebGL buffer, use a compute shader, and retrieve the
> data? GPUs always round to zero, so the problem would go away. If this makes
> sense in theory, we could write a JS library that makes it convenient. Or is
> typical audio processing non-GPU-able?

In practice, this induces a lot of latency, because you would need to work on big buffers. I imagine the goal is to have real time interaction with the software, here, so latency should be minimal (especially considering this is going to be used by musician that are used to physical instruments, and tend to be even pickier about latency). Also, even with the latency issue solved, audio code tends to be quite hard to convert to shaders (with some exceptions, I've had some good results doing massive convolutions on a GPU for examples, for reverbs).
Thanks Alon for you proposal with emscripten, but don't forget that asm.js code can be directly generated. We can now do that with a new asm.js generation back-end we have recently added in the Faust compiler. 

The withFTZ(code…) seems like the better proposal up to now.
You need to log in before you can comment on or make changes to this bug.