Closed Bug 482200 Opened 15 years ago Closed 6 years ago

Investigate how best to do 4x4 matrices in JS

Tracking

()

Status:

RESOLVED INCOMPLETE

People

(Reporter: ilmari.heikkinen, Unassigned)

Details

Attachments

(2 files, 2 obsolete files)

JavaScript 4x4 matrix multiplication benchmark 15 years ago Ilmari Heikkinen 1.19 KB, text/html		Details
C port of mmul4x4 15 years ago Ilmari Heikkinen 588 bytes, text/plain		Details
JavaScript 4x4 matrix multiplication benchmark, optimized 15 years ago Ilmari Heikkinen 1.66 KB, text/html		Details
C port of mmul4x4, using posix_memalign and with innermost loop manually unrolled 15 years ago Ilmari Heikkinen 715 bytes, text/plain		Details

Ilmari Heikkinen

Reporter

Description

•

15 years ago

Attached file JavaScript 4x4 matrix multiplication benchmark (obsolete) — Details

4x4 matrix multiplications are used in 3D engines to build the transformation matrices for each object. The amount of matrix multiplications you can do per frame affects the amount of objects you can have in a scene. So the faster the mmult, the more complex scenes you can manage.

On my 2GHz Core 2 box and 2009-03-01 3.2a1pre Minefield, it takes ~23 ms to do 1000 4x4 mmults in JavaScript. Which would limit a JS engine running at 30 fps to 300-1000 objects (depending on whether you're memoizing the object transform stack for static objects or not.)

At 60 fps, the amount objects is at least halved, as you're also seeing static overhead increase in proportion (10 ms per-frame overhead at 30 fps leaves 23 ms for the engine, but only 6 ms at 60 fps.)

Ilmari Heikkinen

Reporter

Comment 1

•

15 years ago

Actually, a C port of mmul4x4 compiled with -O2 runs only 4 times faster. And a bit of googling suggests that using SSE might give a 2-4x speedup. So maybe I'm just doing the wrong assumption here by thinking that matrix-by-matrix multiplication is the best way to go for object transformations.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 2

•

15 years ago

That was going to be my next question, yeah.  That's pretty encouraging jit-wise, though obviously it could be better, especially with SIMD stuff.  I don't think you can get away from matrix-by-matrix mults though; I mean, you can optimize for translation-only matrices for example, but I don't think that'll help that much.

Ilmari Heikkinen

Reporter

Comment 3

•

15 years ago

Attached file C port of mmul4x4 (obsolete) — Details

Timed by setting iters to 10k and `time ./mmul`

Ilmari Heikkinen

Reporter

Comment 4

•

15 years ago

Ah, sorry about that, using alloca makes things slow in the C example. Replacing it with malloc (or posix_memalign(&ptr, 16, sz)) reduces the time to 160 ms for a million 4x4 mmults at -O2, 80 ms at -O3. So 150-300x faster than JavaScript.

80 ms is around what you'd expect for a million mmults, as the code does 16x {8x movsd, 4x mulsd, 3x addsd, 2.5x addq}. ADDSD throughput is 1/cycle, MULSD 2, MOVSD 2, ADDQ 1, for a total of 184 cycles: 1e6 / (2.13e9 / 184) = 0.086.

Ilmari Heikkinen

Reporter

Comment 5

•

15 years ago

Attached file JavaScript 4x4 matrix multiplication benchmark, optimized — Details

Unrolled the two inner loops, dropping the runtime to 9 ms.

Attachment #366269 - Attachment is obsolete: true

Ilmari Heikkinen

Reporter

Comment 6

•

15 years ago

Attached file C port of mmul4x4, using posix_memalign and with innermost loop manually unrolled — Details

Attachment #366280 - Attachment is obsolete: true

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

15 years ago

Component: Canvas: 2D → Canvas: WebGL

nemo

Comment 7

•

14 years ago

Hey Ilmari.

Just a suggestion, maybe you'd want to make your testcase a bit larger, since when I tried it on my machine it completed in 1ms w/o even making a noticeable spike in the CPU on one core (latest nightly FF3.7a3pre)

http://learningwebgl.com/blog/?p=1828

And that might interest you too - ran into it on Planet WebGL - offloading matrix mult onto the GPU

nemo

Comment 8

•

14 years ago

BTW, there are some links to other Bugzilla bugs and optimisation suggestions in comments to that post.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 9

•

14 years ago

The original benchmark here runs quite fast now, because of what's said in comment #7.  However, optimizing matrix multiplication in browsers would be a worthwhile goal -- investigating the fastest ways to represent matrices in JS, and whether things like CSSMatrix etc. are useful perf-wise would be good.

Summary: [c3d] JavaScript 4x4 matrix multiplication benchmark → Investigate how best to do 4x4 matrices in JS

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

14 years ago

Severity: normal → enhancement

George Wright (:gw280) (needinfo me!)

Updated

•

13 years ago

Assignee: nobody → general

Component: Canvas: WebGL → JavaScript Engine

QA Contact: canvas.2d → general

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

André Bargull [:anba]

Comment 10

•

6 years ago

Old performance bug, no longer really applicable, because the overall system is now faster. Therefore resolving as INCOMPLETE.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Investigate how best to do 4x4 matrices in JS

Categories

(Core :: JavaScript Engine, enhancement)

Tracking

()

People

(Reporter: ilmari.heikkinen, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Comment 10

Attachment

General

Description

File Name

Content Type