Closed
Bug 482200
Opened 15 years ago
Closed 6 years ago
Investigate how best to do 4x4 matrices in JS
Categories
(Core :: JavaScript Engine, enhancement)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: ilmari.heikkinen, Unassigned)
Details
Attachments
(2 files, 2 obsolete files)
4x4 matrix multiplications are used in 3D engines to build the transformation matrices for each object. The amount of matrix multiplications you can do per frame affects the amount of objects you can have in a scene. So the faster the mmult, the more complex scenes you can manage. On my 2GHz Core 2 box and 2009-03-01 3.2a1pre Minefield, it takes ~23 ms to do 1000 4x4 mmults in JavaScript. Which would limit a JS engine running at 30 fps to 300-1000 objects (depending on whether you're memoizing the object transform stack for static objects or not.) At 60 fps, the amount objects is at least halved, as you're also seeing static overhead increase in proportion (10 ms per-frame overhead at 30 fps leaves 23 ms for the engine, but only 6 ms at 60 fps.)
Reporter | ||
Comment 1•15 years ago
|
||
Actually, a C port of mmul4x4 compiled with -O2 runs only 4 times faster. And a bit of googling suggests that using SSE might give a 2-4x speedup. So maybe I'm just doing the wrong assumption here by thinking that matrix-by-matrix multiplication is the best way to go for object transformations.
That was going to be my next question, yeah. That's pretty encouraging jit-wise, though obviously it could be better, especially with SIMD stuff. I don't think you can get away from matrix-by-matrix mults though; I mean, you can optimize for translation-only matrices for example, but I don't think that'll help that much.
Reporter | ||
Comment 3•15 years ago
|
||
Timed by setting iters to 10k and `time ./mmul`
Reporter | ||
Comment 4•15 years ago
|
||
Ah, sorry about that, using alloca makes things slow in the C example. Replacing it with malloc (or posix_memalign(&ptr, 16, sz)) reduces the time to 160 ms for a million 4x4 mmults at -O2, 80 ms at -O3. So 150-300x faster than JavaScript. 80 ms is around what you'd expect for a million mmults, as the code does 16x {8x movsd, 4x mulsd, 3x addsd, 2.5x addq}. ADDSD throughput is 1/cycle, MULSD 2, MOVSD 2, ADDQ 1, for a total of 184 cycles: 1e6 / (2.13e9 / 184) = 0.086.
Reporter | ||
Comment 5•15 years ago
|
||
Unrolled the two inner loops, dropping the runtime to 9 ms.
Attachment #366269 -
Attachment is obsolete: true
Reporter | ||
Comment 6•15 years ago
|
||
Attachment #366280 -
Attachment is obsolete: true
Component: Canvas: 2D → Canvas: WebGL
Hey Ilmari. Just a suggestion, maybe you'd want to make your testcase a bit larger, since when I tried it on my machine it completed in 1ms w/o even making a noticeable spike in the CPU on one core (latest nightly FF3.7a3pre) http://learningwebgl.com/blog/?p=1828 And that might interest you too - ran into it on Planet WebGL - offloading matrix mult onto the GPU
BTW, there are some links to other Bugzilla bugs and optimisation suggestions in comments to that post.
The original benchmark here runs quite fast now, because of what's said in comment #7. However, optimizing matrix multiplication in browsers would be a worthwhile goal -- investigating the fastest ways to represent matrices in JS, and whether things like CSSMatrix etc. are useful perf-wise would be good.
Summary: [c3d] JavaScript 4x4 matrix multiplication benchmark → Investigate how best to do 4x4 matrices in JS
Severity: normal → enhancement
Updated•13 years ago
|
Assignee: nobody → general
Component: Canvas: WebGL → JavaScript Engine
QA Contact: canvas.2d → general
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 10•6 years ago
|
||
Old performance bug, no longer really applicable, because the overall system is now faster. Therefore resolving as INCOMPLETE.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•