Open Bug 1518857 Opened 5 years ago Updated 2 years ago

provide optimized implementations of float modulus

Categories

(Core :: JavaScript Engine: JIT, enhancement, P2)

enhancement

Tracking

()

Performance Impact low
Tracking Status
firefox66 --- fix-optional

People

(Reporter: froydnj, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: perf:responsiveness)

Bas has this testcase (see URL) that jesup profiled. On Windows, Chrome and Firefox are roughly the same speed. On Linux, Firefox is ~4x slower than Chrome, and a performance profile showed that we were spending a significant amount of time in glibc in __ieee754_fmod.

Discussion on #jsapi eventually revealed that a) Windows's version of fmod() is significantly faster (or at least looks faster, doing everything on actual floats, rather than the bit representations thereof) than glibc's; and b) v8 provides an inline assembly version of floating-point modulus using x87 instructions:

https://github.com/v8/v8/blob/01f824c1767842aa3ccd9166a3ab2feb05266bc4/src/compiler/backend/x64/code-generator-x64.cc#L1357-L1395

At least on x86-ish platforms; on arm/aarch64, v8 will call out to something that falls through to std::fmod, as we apparently already do.

No idea what the performance is like on Mac, but on Linux, this is someplace where benchmarks might show a significant slowdown for us, especially as it's entirely possible to fall into floating-point modulus operations inadvertently, as https://github.com/torch2424/wasmBoy/issues/216#issuecomment-451010495 shows. =/

It seems like this is a pitfall, where we could easily fall into.
The question is how frequently do we fall in these pitfalls?

Priority: -- → P2
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p3:responsiveness]

Slow fmod is also responsible for this comment on HN: https://news.ycombinator.com/item?id=21269231

The first test case has val % 2 where val is a Double, whereas the second test cases uses val & 1. This difference leads to ~10x performance difference between both tests (at least on Linux). When the first test case is changed to use val & 1, performance is roughly the same between both tests.

Depends on: 1673840
Blocks: 1684224
Performance Impact: --- → P3
Whiteboard: [qf:p3:responsiveness]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.