1027624 - Float denormal issue in JavaScript processor node in Web Audio API

Reporter

Description

•

10 years ago

Attached file piano.html — Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.76.4 (KHTML, like Gecko) Version/6.1.4 Safari/537.76.4 Steps to reproduce: We successfully compile our C++ audio processing code with emcripten in asm.js to deploy on the web using the WebAudio API , so running the resulting asm.js code in a ScriptProcessorNode in the Web Audio API. Our C++ code uses the following denormalized float number protection code ("protection" is needed since denormalized float number computation is awfully slow and has to be avoided): #ifdef __SSE__ #include <xmmintrin.h> #ifdef __SSE2__ #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8040) #else #define AVOIDDENORMALS _mm_setcsr(_mm_getcsr() | 0x8000) #endif #else #define AVOIDDENORMALS #endif Basically we add a call at AVOIDDENORMALS before each audio block processing. It seems this AVOIDDENORMALS is just removed by the emcripten compiler and so we get asm.js code that seems to produce denormalized floats and the speed issue occurs. The attached "piano.html" page contains a piano physical model that is compiled in asm.js and run as a Web Audio API ScriptProcessorNode. It you hit the "gate" button a sound is played. After some seconds the CPU use (seen in activity monitor on OSX raises to 100%) Actual results: ScriptProcessorNode node takes a lot of time to execute since float denormal issue happens Expected results: ScriptProcessorNode should probably be processes in an context where flush denormals to zero is done automatically.

letz

Reporter

Updated

•

10 years ago

Severity: normal → major

Hardware: x86 → x86_64

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

10 years ago

Component: Untriaged → Web Audio

Product: Firefox → Core

letz

Reporter

Updated

•

10 years ago

Component: Web Audio → Projects

Product: Core → Audio/Visual Infrastructure

Version: 33 Branch → unspecified

Paul Adenot (:padenot)

Updated

•

10 years ago

Component: Projects → Web Audio

Product: Audio/Visual Infrastructure → Core

Version: unspecified → Trunk

Paul Adenot (:padenot)

Comment 1

•

10 years ago

Can't you just flush the denormals to zero by hand in JavaScript? We can't set the flag here, because this would break IEEE754 compatibility. This is what we do internally for platforms that don't have CPU settings for that.

Severity: major → normal

Flags: needinfo?(letz)

OS: Mac OS X → All

Hardware: x86_64 → x86

letz

Reporter

Comment 2

•

10 years ago

Is there a Java Script API for that? That could be called once before each audio block computation to put the processor in the "denormals to zero" mode ? (could not find it…)

Flags: needinfo?(letz)

Paul Adenot (:padenot)

Comment 3

•

10 years ago

No, just iterate on the buffer, detect denormals and replace them by zeros. This is just normal IEEE754 stuff, it does not depend on the language.

letz

Reporter

Comment 4

•

10 years ago

This is way too slow and complicated : you would have to check at several places in the audio processing code when denormals can occur and treat them. Awful... Not feasible in practice. The better way would be to have a JavaScript function to set the processor in "denormals to zero" mode, to be used exactly the was we do in C/C++: once at the beginning of the audio block computation, the way we use this AVOIDDENORMALS macro.

letz

Reporter

Comment 5

•

10 years ago

To elaborate on that: this is quite a common practice in audio code to flush denormals to zero, because in practice it is what we want. So we need a cheap way to do that, especially if processors allow that. See: http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-flush-denormals-confidence/ "For instance, in audio processing applications, denormal values usually represent a signal so quiet that it is out of the human hearing range. Because of this, a common measure to avoid denormals on processors where there would be a performance penalty is to cut the signal to zero once it reaches denormal levels or mix in an extremely quiet noise signal." http://www.juce.com/forum/topic/resolving-denormal-floats-once-and-all https://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz/ http://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ "Signal processing, especially audio processing, is another area where recursive iterations usually lead to dernormals. Disabling them is generally the best practical solution too, because it would be extremely expensive to check the signal and disable the feedback loops at every computation stage. Moreover, it’s usually a null signal that triggers the denormal slowness (think of this simple iteration: y[n] = x[n-1] * 0.1 + y[n-1] * 0.9), so curing the processing at one stage could worsen it at a subsequent stage! But if you can’t disable denormals, you can still inject noise at some points to ensure that the signal stays away from the denormal range…"

Karl Tomlinson (:karlt)

Comment 6

•

10 years ago

(In reply to letz from comment #4) > This is way too slow and complicated : you would have to check at several > places in the audio processing code when denormals can occur and treat them. > Awful... Not feasible in practice. Are there really that many places where it would be necessary to check for subnormals? These only cause problems when the output of processing depends on previous output, right? Bug 1027864 is something we should fix in the Web Audio implementation. I don't know whether that is enough to address your particular use case. If the audioprocess function is storing it previous output to use for feedback, then it will still need to flush subnormals. > The better way would be to have a > JavaScript function to set the processor in "denormals to zero" mode, to be > used exactly the was we do in C/C++: once at the beginning of the audio > block computation, the way we use this AVOIDDENORMALS macro. I assume emscripten doesn't translate the __SSE2__ parts of the c++ and so will rely on the other (c++) techniques used on non-SSE2 platforms. If there is a reason why the detecting subnormals in JS is not practical and a better means is required, then that's a JS question and I'll change the component of this bug. (In reply to letz from comment #5) > "[...] it would be extremely expensive to > check the signal and disable the feedback loops at every computation stage. > Moreover, it’s usually a null signal that triggers the denormal slowness > (think of this simple iteration: y[n] = x[n-1] * 0.1 + y[n-1] * 0.9), so > curing the processing at one stage could worsen it at a subsequent stage! It is not necessary to check at every computation, but this recursive iteration would be a place to check. A null signal does trigger subnormals for these recurrence relations / IIR filters, but flushing subnormals in these iterations does not cause problems in general. > But if you can’t disable denormals, you can still inject noise at some > points to ensure that the signal stays away from the denormal range…" Null signals can be optimized, and silent feedback loops can be collected, but noise injection would prohibit this.

letz

Reporter

Comment 7

•

10 years ago

The "add noise" method at appropriate places is the alternative method, yes, but not very practical. The point is that in C/C++, and when the processor support it of course, this AVOIDDENORMALS macro is very easy to use. It would be a pity not to be able to use the same kind of technique in JS, when it is supported by the underlying processor. Could a FlushDenormalToZero() method be added in the Math package for instance ? that would set the processor in the appropriate mode if possible and return true. If not the programmer would have to use the "add noise" method, basically the way he/she would have to do in C/C++.

letz

Reporter

Comment 8

•

10 years ago

Another important point: when we run our C/C++ audio code in a CoreAudio read-time thread on OSX (so automatically started by the CoreAudio layer…) we don't need to add this AVOIDDENORMALS macro. So it means that CoreAudio probably *automatically* adds this AVOIDDENORMALS stuff before calling the client application audio code. So it means also that CoreAudio assume that any audio code can safely run with this flush denormal to zero" setup… You could probably safely use the same trick: just use the AVOIDDENORMALS kind of stuff before calling any JavaScript Web Audio processor node in the audio chain. This would be the more transparent manner for the programmer.

YANN ORLAREY

Comment 9

•

10 years ago

Slow denormals is a low level platform specific problem. This is why it should be handled directly by the web audio api implementation on the platform that have the problem, not by the user code (after all the user code is supposed to be high level and platform independent). The best solution would be to have the web audio api to automatically set the FZ and DAZ flags on Intel processors before calling user's javascript code and restoring the flags after.

Paul Adenot (:padenot)

Comment 10

•

10 years ago

This is not feasible. JavaScript needs to respect IEEE754, that's part of the standard. We can't run JavaScript code with those flags set. Waldo (cc-ed) told me this is not negotiable. We need to talk to JavaScript people and come up with a standard and reasonable way to do this. The point about running the C++ code in CoreAudio threads does not apply here because I believe the CPU flag are reset when js code starts running.

letz

Reporter

Comment 11

•

10 years ago

OK, so then this discussion should go between the designer/implementers of the Web Audio API and JavaScript. All developers using JavaScript to add audio nodes in the Web Audio API will potentially suffer from this issue, which is (and we should not forget that…) a specific Intel processor problem. You can't really expect all developers to "protect" their hight level JavaScript code from this kind of very low-level issue, because 1) this is hard to do 2) this would be slow. This is just nonsense.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

10 years ago

It's a JS spec issue, not a Firefox-specific issue or a WebAudio spec issue. The JS spec says JS arithmetic has to follow IEEE754. You want something different.

Component: Web Audio → JavaScript Engine

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 13

•

10 years ago

There are several ways this could be addressed in JS, e.g. -- "use flush_denormals_to_zero" -- Math.flushDenormalToZero(v) (returns 0 if v is denormal, otherwise v) -- flush-to-zero versions of JS SIMD intrinsics (or just make the intrinsics flush-to-zero by default) A global variable that changes the semantics of all JS arithmetic is probably not a good idea, even though it matches what the hardware provides.

Till Schneidereit [:till]

Comment 14

•

10 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #13) > A global variable that changes the semantics of all JS arithmetic is > probably not a good idea, even though it matches what the hardware provides. It's also entirely unrealistic that it would ever be accepted into the language. TC39 has a very strict "no more modes" policy, which certainly won't be broken for a case like this. The SIMD intrinsics seem most promising to me. CC'ing folks working on that, and Luke because this is probably most relevant to asm.js code.

YANN ORLAREY

Comment 15

•

10 years ago

The association of Web Audio API and asm.js is a terrific platform that can have a deep impact on the way we write, distributed and use audio software in the future. Unfortunately having slow denormals without a real solution could potentially defeat the whole process. My proposition was to have an exception, limited to custom dsp effects written in JS in the framework of the web audio API. This would not affect general JS code outside this very specific context. JS arithmetic remains IEEE754 compatible.

letz

Reporter

Comment 16

•

10 years ago

To see the problem, here is a piano physical model in C++ compiled in asm.js using emcsripten. A single string is then duplicated (for "bug" demonstration purpose) 16 times, so 16 strings are computed all the time. http://faust.grame.fr/www/piano.html Chrome CPU here ==> 36 % Hit the "gate" button to play a same note on all 16 string, wait some seconds, CPU raise to 100% Same issue with Firefox and Safari WebKit. Stéphane Letz

Luke Wagner [:luke]

Comment 17

•

10 years ago

With SIMD we do technically have the opportunity to introduce new semantics for arith operations, so we could mandate FTZ. In fact, ARM Neon forces FTZ. The problem is that the FTZ/DAZ mxcsr flags affect both SIMD and scalar double arithmetic so we'd potentially have to flip the flag on and off repeatedly. For this reason, I think everyone wanted to leave the denormal behavior undefined. However, looking at agner.org, stmxcsr doesn't seem terribly expensive, so perhaps we could do this. As long as noone is interleaving scalar arith in SIMD loops, in theory we could hoist the stmxcsr's to before/after the loop. Any thoughts on this Dan/Benjamin?

Jeff Walden [:Waldo]

Comment 18

•

10 years ago

(In reply to YANN ORLAREY from comment #15) > My proposition was to have an exception, limited to custom dsp effects > written in JS in the framework of the web audio API. This would not affect > general JS code outside this very specific context. JS arithmetic remains > IEEE754 compatible. Except for the JS arithmetic in your DSP effects written in JS. Which means JS arithmetic would not actually remain IEEE-754 compatible. C/C++ can get away with compiler-specific AVOIDDENORMALS sorts of things to modify floating-point behavior, because floating point computations don't have specified arithmetic semantics. JS in contrast precisely defines floating point arithmetic. There's no leeway to select different ones. Different semantics require new operations defined to have those semantics. (In reply to letz from comment #16) > Chrome CPU here ==> 36 % Are you asserting, based on reading of Blink/v8 code, profiling, and perhaps Blink/v8 patching, that Chrome is flushing denormals here, and that's the specific reason it's faster? Or is it at all possible that some other unrelated factor(s) is/are in play and might be the cause of the performance difference?

letz

Reporter

Comment 19

•

10 years ago

"Chrome CPU here ==> 36 % " The point was to show that 16 piano simulated strings compiled in asm.js when running "normally" (no denormal issue when the note is played) consume like 36 % on this machine, and the CPU raises to 100% when the notes becomes silent and denormal problem starts to happen.

Jeff Walden [:Waldo]

Comment 20

•

10 years ago

(In reply to letz from comment #19) > The point was to show that 16 piano simulated strings compiled in asm.js > when running "normally" (no denormal issue when the note is played) consume > like 36 % on this machine, and the CPU raises to 100% when the notes becomes > silent and denormal problem starts to happen. Oh! I misread your comment. I see what you meant now. Sorry I got confused here.

czwarich

Comment 21

•

10 years ago

(In reply to Luke Wagner [:luke] from comment #17) > With SIMD we do technically have the opportunity to introduce new semantics > for arith operations, so we could mandate FTZ. In fact, ARM Neon forces > FTZ. The problem is that the FTZ/DAZ mxcsr flags affect both SIMD and > scalar double arithmetic so we'd potentially have to flip the flag on and > off repeatedly. For this reason, I think everyone wanted to leave the > denormal behavior undefined. However, looking at agner.org, stmxcsr doesn't > seem terribly expensive, so perhaps we could do this. As long as noone is > interleaving scalar arith in SIMD loops, in theory we could hoist the > stmxcsr's to before/after the loop. Any thoughts on this Dan/Benjamin? ARM NEON doesn't have double precision operations, so to emulate them on ARM you would have to change the flush-to-zero flag in the FPCR, issue the VFP instructions, and then change the flush-to-zero flag back. If you have interleaved SIMD and scalar floating-point operations, this could be quite a mess. ARM64 has denormals in both scalar and SIMD arithmetic (and the SIMD has double precision operations), and the flush-to-zero flag affects both.

Luke Wagner [:luke]

Comment 22

•

10 years ago

The initial JS SIMD API only has Float32x4 (and (U)Int32x4). Does audio processing need double precision floats?

letz

Reporter

Comment 23

•

10 years ago

It may in some cases.

Luke Wagner [:luke]

Comment 24

•

10 years ago

Oops, looks like there is a Float64x2 in the works in https://github.com/johnmccutchan/ecmascript_simd.

Norbert.Schnell

Comment 25

•

10 years ago

> Except for the JS arithmetic in your DSP effects written in JS. Which means > JS arithmetic would not actually remain IEEE-754 compatible. Yes, that's right. Insisting on IEEE-754-compatibility basically means insisting on audio-processing-incompatibility. Audio processing apparently needs its own domain (just like OpenGL for visuals). Web Audio API is a very good start in that direction even if the ScriptProcessorNode stays a its weakest element. Maybe Javascript is just not the right language for extending the Web Audio API possibilities. For a moment we believed that this could work anyway, especially thinking that future JIT compilers could pull performance up to almost optimal (and of course after having fixed the threading issues). Now, the idea of strictly IEEE-754 compatible audio processing doesn't give a lot of hope and finally confirms the need for an audio adapted processing domain and language that would allow for properly extending the basic Web Audio API nodes. I'd vote for Faust (evidently having something like AVOIDDENORMALS enabled during the execution of the JIT compiled Faust expressions :-). It will be hard to come up with something more appropriate. I have a dream...

YANN ORLAREY

Comment 26

•

10 years ago

(In reply to Norbert.Schnell from comment #25) > > Except for the JS arithmetic in your DSP effects written in JS. Which means > > JS arithmetic would not actually remain IEEE-754 compatible. > > Yes, that's right. > Insisting on IEEE-754-compatibility basically means insisting on > audio-processing-incompatibility. I can't agree more. Realtime audio applications and more generally signal processing applications heavily rely on recursive IIR filters. These recursive filters (for example Y[n] = X[n] + 0.5*Y[n-1]) will inevitably produce a huge number of denormals every second in particular when the input signal becomes 0. The problem is that denormals are generally an order of magnitude slower to process, with nearly no benefit for audio applications. This is why we usually prefer to relax IEEE compatibility and run with FTZ and DAZ flags set. In other words strict IEEE compatibility is currently incompatible with realtime audio application as long as denormals handling is so slow on our processors. This is why we request a pragmatic solution with a clearly delimited exception allowing to relax IEEE compatibility in some well defined cases. Otherwise JS, despite all the interesting developments around the Web Audio API and asm.js, will remain audio-processing-incompatible.

Niko Matsakis [:nmatsakis]

Comment 27

•

10 years ago

This problem might also be addressed by the value types work that is in progress. "Denormalized floats" could just be a distinct type of number that follows different rules.

letz

Reporter

Comment 28

•

10 years ago

A debug version of the "piano" example compiled with emcc -profiling and -s LINKABLE=1 so that lines can be looked at: http://faust.grame.fr/www/piano-debug.html The real audio computation is in "__ZN5piano7computeEiPPfS1_"

Jeff Walden [:Waldo]

Comment 29

•

10 years ago

(In reply to Norbert.Schnell from comment #25) > Maybe Javascript is just not the right language for extending the Web Audio > API possibilities. > For a moment we believed that this could work anyway, especially thinking > that future JIT compilers could pull performance up to almost optimal (and > of course after having fixed the threading issues). I might well be overreading you. (If so, consider the rest of this as a suggestion to beware how others might interpret your arguments, adjusting them accordingly.) But to the extent you're saying this because you want to bad-mouth JS, without actually caring whether JS can be adapted to meet your needs. (And your comment reads that way to me, although not unequivocally so. I could well be wrong!) If indeed this is what you're doing: in doing so, you won't win you many friends among the engineers proposing and implementing JS spec changes to address this use case. And you might materially delay those spec changes being implemented in Firefox. Not for any *principled* reason. But because engineers are humans. We get as upset as anyone when the projects we work on are insulted, and we may retaliate (deliberately, or unintentionally from simply wearying of conflict) by taking longer to design and implement a fix. Polite disagreement is fine. No one at Mozilla agrees with every decision Mozilla makes. But you're more likely to see this bug fixed if you don't obliquely insult the product we work on.

Benjamin Bouvier [:bbouvier] (inactive)

Comment 30

•

10 years ago

Three thoughts: - If your code has been manually written and is intensively using Float32, you might want to give a try to optimize arithmetic operations with Math.fround, by wrapping your operations with Math.fround. That gives hints to the JIT to use asm instructions specialized for floats rather than for doubles. For instance, if you have: > function f(i) { return 3 * i + 5; } You can rewrite it: > var f32=Math.fround;function f(i) { return f32(f32(f32(3) * f32(i)) + f32(5)); } And corresponding JIT code will be using float32 variants of the assembly operations, whenever possible. See also [1]. - If your code is compiled with emscripten, make sure you have set the PRECISE_F32 flag at compile time to enable these Math.fround optimizations. - Otherwise, discussion is more about the spec, in which case it should probably take place on the ecmascript mailing list [2]. I would suggest a decorator function withFTZ that takes a function as an in parameter: withFTZ(function () { // do stuff here that will be done with FTZ set }); This way it could be easily inlined in the JITs as: 1) Set FTZ 2) Execute the function script 3) Unset FTZ This is just a wild idea and could probably be enhanced, but that seems to fit well this case. [1] https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic-in-javascript/ [2] https://mail.mozilla.org/listinfo/es-discuss

Norbert.Schnell

Comment 31

•

10 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #29) > (In reply to Norbert.Schnell from comment #25) > > Maybe Javascript is just not the right language for extending the Web Audio > > API possibilities. > > For a moment we believed that this could work anyway, especially thinking > > that future JIT compilers could pull performance up to almost optimal (and > > of course after having fixed the threading issues). > > I might well be overreading you. (If so, consider the rest of this as a > suggestion to beware how others might interpret your arguments, adjusting > them accordingly.) Yes, no, I really have nothing against Javascript in general neither as a language nor as a dev and runtime environment. I use it with a lot of joy everyday since a couple of month to create interactive audio applications. The reflection we need (and this is unfortunately not the right place to have it) is how we can properly extend the possibilities of Web Audio API - without breaking the elegant simplicity of the existing nodes nor exploding the number of nodes - within a "domain" (language/algebra/formalism and conditions of execution) that is adapted to audio processing and can drag the knowhow of expert DSP programmers into the context of web development. There are certain things we have to enable and many others that we don't need. So if it is Javascript its maybe not the complete same Javascript - with the same set of functions and the same conditions of compilation and execution - as in the rest of the web development (e.g. no need to change the background colors or to send a message over WebSockets from an audio "thread", but need to avoid denormals and need to have well specified communication channels with the rest of the environment). I hope we can find the right place to open this discussion across the audio processing and web developer communities soon.

YANN ORLAREY

Comment 32

•

10 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #30) > - Otherwise, discussion is more about the spec, in which case it should > probably take place on the ecmascript mailing list [2]. I would suggest a > decorator function withFTZ that takes a function as an in parameter: > > withFTZ(function () { > // do stuff here that will be done with FTZ set > }); > > This way it could be easily inlined in the JITs as: > 1) Set FTZ > 2) Execute the function script > 3) Unset FTZ > This is just a wild idea and could probably be enhanced, but that seems to > fit well this case. Yes, that would be perfect. > > [1] > https://blog.mozilla.org/javascript/2013/11/07/efficient-float32-arithmetic- > in-javascript/ > [2] https://mail.mozilla.org/listinfo/es-discuss

letz

Reporter

Comment 33

•

10 years ago

We got the confirmation that in practice, (almost..) all audio applications developed with CoreAudio on OSX, will be "automagically" protected against denormals. Read the thread here http://lists.apple.com/archives/coreaudio-api/2014/Jun/index.html

Alon Zakai (:azakai)

Comment 34

•

10 years ago

Two kind of crazy ideas: 1. Does the denormal slowdown happen in specific parts of the code? I wonder if we could add a PGO-like option in emscripten where the code checks for denormals in practice, then we rebuild with the profile and emscripten would emit round-to-zero in the relevant places where it was actually seen. This would require no manual user action (except for running a build to profile). 2. Much crazier ;) - could audio processing be done by a shader? That is, upload the data to a WebGL buffer, use a compute shader, and retrieve the data? GPUs always round to zero, so the problem would go away. If this makes sense in theory, we could write a JS library that makes it convenient. Or is typical audio processing non-GPU-able?

Paul Adenot (:padenot)

Comment 35

•

10 years ago

(In reply to Alon Zakai (:azakai) from comment #34) > 2. Much crazier ;) - could audio processing be done by a shader? That is, > upload the data to a WebGL buffer, use a compute shader, and retrieve the > data? GPUs always round to zero, so the problem would go away. If this makes > sense in theory, we could write a JS library that makes it convenient. Or is > typical audio processing non-GPU-able? In practice, this induces a lot of latency, because you would need to work on big buffers. I imagine the goal is to have real time interaction with the software, here, so latency should be minimal (especially considering this is going to be used by musician that are used to physical instruments, and tend to be even pickier about latency). Also, even with the latency issue solved, audio code tends to be quite hard to convert to shaders (with some exceptions, I've had some good results doing massive convolutions on a GPU for examples, for reverbs).

letz

Reporter

Comment 36

•

10 years ago

Thanks Alon for you proposal with emscripten, but don't forget that asm.js code can be directly generated. We can now do that with a new asm.js generation back-end we have recently added in the Faust compiler. The withFTZ(code…) seems like the better proposal up to now.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3