Open Bug 1062544 Opened 11 years ago Updated 3 years ago

DOMMatrix runtime performance is much lower on windows than other platforms when NaNs are involved, due to x87 instructions in gfx::Matrix::operator*

Categories

(Core :: Graphics, defect)

x86_64
Windows 7
defect

Tracking

()

People

(Reporter: cabanier, Unassigned)

References

()

Details

Run: http://jsperf.com/dommatrix-perf on different platform. On all platforms 'Native' and 'JS equivalent' have similar performance except on Windows where 'Native' is 3 to 4 times slower
Component: JavaScript Engine → JavaScript Engine: JIT
Summary: DOMMatrix runtime performance is much lower on windows that other platforms → DOMMatrix runtime performance is much lower on windows than other platforms
I seriously doubt this is a jit issue. Need a profile on Windows. Kyle, do you have a profiling setup, or know who does?
Flags: needinfo?(khuey)
dmajor said he will do profiling after lunch. Couldn't this be a jit issue, especially if people on mac and linux use 64bit builds and on Windows 32bit. We've seen slower perf on Windows quite often. But better to wait for profiles.
I tested a 32-bit build on Mac and saw pretty much the same numbers as in a 64-bit build. Sorry, should have mentioned this in comment 1.
If dmajor can't profile this bent can.
Flags: needinfo?(khuey)
Some more data: 1) The original testcase ended up with a lot of NaNs. http://jsperf.com/dommatrix-perf/7 doesn't have the same issue and has different numbers, but still a bit slower on Windows compared to JS than on other platforms. 2) dmajor's profile shows time mostly taken in mozilla::gfx::Matrix::operator* and dmajor was kind enough to pastebin the codegen from MSVC for that function. It looks sort of like this: 157 5b3e98e6 d902 fld dword ptr [edx] 157 5b3e98e8 8b4508 mov eax,dword ptr [ebp+8] 157 5b3e98eb d809 fmul dword ptr [ecx] 157 5b3e98ed d94208 fld dword ptr [edx+8] 157 5b3e98f0 d84904 fmul dword ptr [ecx+4] 157 5b3e98f3 dec1 faddp st(1),st etc. For comparison, here's the same function on Mac (64-bit, but I bet 32-bit is the same): 0x0000000103d22a2b <_ZNK7mozilla3gfx6MatrixmlERKS1_+23>: movss 0x4(%rbx),%xmm1 0x0000000103d22a30 <_ZNK7mozilla3gfx6MatrixmlERKS1_+28>: movss -0x18(%rbp),%xmm0 0x0000000103d22a35 <_ZNK7mozilla3gfx6MatrixmlERKS1_+33>: movaps %xmm4,%xmm2 0x0000000103d22a38 <_ZNK7mozilla3gfx6MatrixmlERKS1_+36>: mulss %xmm0,%xmm2 0x0000000103d22a3c <_ZNK7mozilla3gfx6MatrixmlERKS1_+40>: addss %xmm5,%xmm2 0x0000000103d22a40 <_ZNK7mozilla3gfx6MatrixmlERKS1_+44>: movss 0xc(%rbx),%xmm11 etc. The point being on Mac, even in 32-bit mode, and on linux64, and in our JIT we know we can use SSE2 instructions for floating point math, but MSVC with our compile options uses x87 instructions. Apparently x87 instruction on Intel hardware are really slow when dealing with non-finite floats. Our options here are basically: 1) Have a runtime-detected version of operator* that uses SSE2 stuff. 2) Ignore the issue because this NaN business should be rare and in any case we want to move people to 64-bit builds. as far as I can tell. None of this has anything to do with the JIT.
Component: JavaScript Engine: JIT → Graphics
Summary: DOMMatrix runtime performance is much lower on windows than other platforms → DOMMatrix runtime performance is much lower on windows than other platforms when NaNs are involved, due to x87 instructions in gfx::Matrix::operator*
Can we just re-implement DOMMatrix as a JS-implemented WebIDL component, and let the JIT take care of things instead?
The binding part will get much slower, then, sadly. The call from C++ to JS is not that cheap. :(
(In reply to Boris Zbarsky [:bz] from comment #7) > The binding part will get much slower, then, sadly. The call from C++ to JS > is not that cheap. :( Would that be for live object where the C++ side would query/change the JS object?
No, that would be for the suggestion in comment 6.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.