1062544 - DOMMatrix runtime performance is much lower on windows than other platforms when NaNs are involved, due to x87 instructions in gfx::Matrix::operator*

Reporter

Description

•

11 years ago

Run: http://jsperf.com/dommatrix-perf on different platform. On all platforms 'Native' and 'JS equivalent' have similar performance except on Windows where 'Native' is 3 to 4 times slower

Rik Cabanier

Reporter

Updated

•

11 years ago

URL: http://jsperf.com/dommatrix-perf

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Component: JavaScript Engine → JavaScript Engine: JIT

Rik Cabanier

Reporter

Updated

•

11 years ago

Summary: DOMMatrix runtime performance is much lower on windows that other platforms → DOMMatrix runtime performance is much lower on windows than other platforms

Boris Zbarsky [:bzbarsky]

Comment 1

•

11 years ago

I seriously doubt this is a jit issue. Need a profile on Windows. Kyle, do you have a profiling setup, or know who does?

Flags: needinfo?(khuey)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 2

•

11 years ago

dmajor said he will do profiling after lunch. Couldn't this be a jit issue, especially if people on mac and linux use 64bit builds and on Windows 32bit. We've seen slower perf on Windows quite often. But better to wait for profiles.

Boris Zbarsky [:bzbarsky]

Comment 3

•

11 years ago

I tested a 32-bit build on Mac and saw pretty much the same numbers as in a 64-bit build. Sorry, should have mentioned this in comment 1.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 4

•

11 years ago

If dmajor can't profile this bent can.

Flags: needinfo?(khuey)

Boris Zbarsky [:bzbarsky]

Comment 5

•

11 years ago

Some more data: 1) The original testcase ended up with a lot of NaNs. http://jsperf.com/dommatrix-perf/7 doesn't have the same issue and has different numbers, but still a bit slower on Windows compared to JS than on other platforms. 2) dmajor's profile shows time mostly taken in mozilla::gfx::Matrix::operator* and dmajor was kind enough to pastebin the codegen from MSVC for that function. It looks sort of like this: 157 5b3e98e6 d902 fld dword ptr [edx] 157 5b3e98e8 8b4508 mov eax,dword ptr [ebp+8] 157 5b3e98eb d809 fmul dword ptr [ecx] 157 5b3e98ed d94208 fld dword ptr [edx+8] 157 5b3e98f0 d84904 fmul dword ptr [ecx+4] 157 5b3e98f3 dec1 faddp st(1),st etc. For comparison, here's the same function on Mac (64-bit, but I bet 32-bit is the same): 0x0000000103d22a2b <_ZNK7mozilla3gfx6MatrixmlERKS1_+23>: movss 0x4(%rbx),%xmm1 0x0000000103d22a30 <_ZNK7mozilla3gfx6MatrixmlERKS1_+28>: movss -0x18(%rbp),%xmm0 0x0000000103d22a35 <_ZNK7mozilla3gfx6MatrixmlERKS1_+33>: movaps %xmm4,%xmm2 0x0000000103d22a38 <_ZNK7mozilla3gfx6MatrixmlERKS1_+36>: mulss %xmm0,%xmm2 0x0000000103d22a3c <_ZNK7mozilla3gfx6MatrixmlERKS1_+40>: addss %xmm5,%xmm2 0x0000000103d22a40 <_ZNK7mozilla3gfx6MatrixmlERKS1_+44>: movss 0xc(%rbx),%xmm11 etc. The point being on Mac, even in 32-bit mode, and on linux64, and in our JIT we know we can use SSE2 instructions for floating point math, but MSVC with our compile options uses x87 instructions. Apparently x87 instruction on Intel hardware are really slow when dealing with non-finite floats. Our options here are basically: 1) Have a runtime-detected version of operator* that uses SSE2 stuff. 2) Ignore the issue because this NaN business should be rare and in any case we want to move people to 64-bit builds. as far as I can tell. None of this has anything to do with the JIT.

Component: JavaScript Engine: JIT → Graphics

Summary: DOMMatrix runtime performance is much lower on windows than other platforms → DOMMatrix runtime performance is much lower on windows than other platforms when NaNs are involved, due to x87 instructions in gfx::Matrix::operator*

Nathan Froyd [:froydnj]

Comment 6

•

11 years ago

Can we just re-implement DOMMatrix as a JS-implemented WebIDL component, and let the JIT take care of things instead?

Boris Zbarsky [:bzbarsky]

Comment 7

•

11 years ago

The binding part will get much slower, then, sadly. The call from C++ to JS is not that cheap. :(

Rik Cabanier

Reporter

Comment 8

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #7) > The binding part will get much slower, then, sadly. The call from C++ to JS > is not that cheap. :( Would that be for live object where the C++ side would query/change the JS object?

Boris Zbarsky [:bzbarsky]

Comment 9

•

11 years ago

No, that would be for the suggestion in comment 6.

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Bugzilla

DOMMatrix runtime performance is much lower on windows than other platforms when NaNs are involved, due to x87 instructions in gfx::Matrix::operator*

Categories

(Core :: Graphics, defect)

Tracking

()

People

(Reporter: cabanier, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated