Open Bug 750947 Opened 12 years ago Updated 2 years ago

IonMonkey: Optimize Uint32 values

Tracking

()

Status:

NEW

People

(Reporter: azakai, Unassigned)

References

(Blocks 5 open bugs)

Details

(Keywords: perf, Whiteboard: [js:p2][games:p?][ion:t][shumway:p1])

Attachments

(2 files, 3 obsolete files)

runs simple loop 12 years ago Tom S [:evilpie] 32.38 KB, patch		Details \| Diff \| Splinter Review
passes tests 12 years ago Tom S [:evilpie] 52.74 KB, patch		Details \| Diff \| Splinter Review
base patch for x64 12 years ago Tom S [:evilpie] 51.75 KB, patch		Details \| Diff \| Splinter Review
refreshed patch 12 years ago Tom S [:evilpie] 38.53 KB, patch		Details \| Diff \| Splinter Review
shell version of benchmark from comment 0 10 years ago Till Schneidereit [:till] 590 bytes, text/plain		Details

Alon Zakai (:azakai)

Reporter

Description

•

12 years ago

In code compiled from C++, unsigned 32-bit ints mainly come from Uint32Arrays and from >>> 0. Then operations on such Uint32 values can be done as 32-bit ints if they just flow into something that forces to unsigned 32-bit ints anyhow, which is the same as the source of such values, Uint32Arrays and >>> 0 basically. (And if along the way we can be sure to not add/multiply over 53 bits of precision, of course.)

This could be important in a significant amount of compiled code, anything that uses unsigned 32-bit ints basically. dvander says IonMonkey can optimize this kind of thing.

Here is a simple benchmark that tests this, and is also an example of the kind of code that is relevant for optimization here:

https://github.com/kripken/misc-js-benchmarks/blob/master/uint32.js

v8/node is twice as fast as jm+ti on that benchmark. (IonMonkey is currently many times slower.)

Note that the benchmark purposefully mixes a signed 32-bit int (i) with unsigned 32-bit ints, which is ok to optimize since it flows into >>> later.

David Mandelin [:dmandelin]

Updated

•

12 years ago

Blocks: gecko-games

Marty Rosenberg [:mjrosenb]

Comment 1

•

12 years ago

I'd thought about this when writing my implicit truncate optimization, It shouldn't be too difficult to augment that to handle these cases as well.

David Mandelin [:dmandelin]

Updated

•

12 years ago

Blocks: WebJSPerf

Whiteboard: [js:p2]

Martin Best (:mbest)

Updated

•

12 years ago

Whiteboard: [js:p2] → [js:p2][games:p2]

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

12 years ago

Summary: Optimize Uint32 values in IonMonkey → IonMonkey: Optimize Uint32 values

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

12 years ago

Blocks: IonSpeed

Tom S [:evilpie]

Comment 2

•

12 years ago

Attached patch runs simple loop (obsolete) — Details — Splinter Review

This is at a point where some stuff works, namely the testcase from bug 769765 comment 4. This makes this about 5-10 times faster, and I didn't even optimize the compare or add instruction yet.

This is probably going to be quite some work to integrate nicely with TI.

Eg what happens if we do some shifting, which would normally result in an double, but now we stay in the jit. So now TI never observed a double, this value might be written into some array, which was observed as being of packed ints. Now we need to write a double into it, but don't tell TI about the new type. I didn't test this yet, but I think this and other similar stuff could happen.

After this packed uint calculation in some hot loop should be fast, but of course need to a little bit more work when boxing them.

Assignee: general → evilpies

Status: NEW → ASSIGNED

Tom S [:evilpie]

Updated

•

12 years ago

Blocks: 769765

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

12 years ago

Whiteboard: [js:p2][games:p2] → [js:p2][games:p2][ion:t]

Tom S [:evilpie]

Comment 3

•

12 years ago

Attached patch passes tests (obsolete) — Details — Splinter Review

This passes all but one jit test, that is making me insane! (Just in case: jit-test/tests/ion/bug741202.js) Includes bailout handling and probably safe TI usage. Doesn't really help the benchmark in comment #0, because we never promote anything to uint32 there.

The sad thing is, we can only optimize instructions to return uint32 if 
a) TI already observed doubles for that operation and thus doesn't get confused when we suddenly come up with a double at some random place.
b) the operation can not fail / leave the jit: eg. only flows into add, sub etc. I still need to implement some optimization pass that finds these kind of important cases and promotes them to uint32. I think the v8 patch has basically just that part.

Attachment #644687 - Attachment is obsolete: true

Tom S [:evilpie]

Comment 4

•

12 years ago

Hehe just found the problem with that test, while looking at the [review]. :)

Tom S [:evilpie]

Comment 5

•

12 years ago

Attached patch base patch for x64 (obsolete) — Details — Splinter Review

So I consider this a base patch. What this does, it adds basic support for uint32 everywhere it is needed. What this patch does not is optimize specific instructions to handle uint32 values.

This passes all tests.

Attachment #647304 - Attachment is obsolete: true

Attachment #649401 - Flags: feedback?(dvander)

Alon Zakai (:azakai)

Reporter

Comment 6

•

12 years ago

v8 recently added an optimization for this,

http://code.google.com/p/v8/issues/detail?id=2097

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 7

•

12 years ago

Don't forget to remove the trailing whitespace you're adding

Jan de Mooij [:jandem]

Updated

•

12 years ago

Blocks: 787851

Marco Castelluccio [:marco]

Updated

•

12 years ago

Blocks: 767238

Paul Wright

Comment 8

•

12 years ago

Sounds like this could be a nice win.

Tom S [:evilpie]

Comment 9

•

12 years ago

Attached patch refreshed patch — Details — Splinter Review

This is just the refreshed version of the last patch. So still only support for x64. We changed a few things, most notably I introduced NoUint32Policy for MPassArg so it only ever sees already boxed uint32 values, because we would have to box them in LStackArg anyway.

Passes jit-tests.

Attachment #649401 - Attachment is obsolete: true

Attachment #649401 - Flags: feedback?(dvander)

Paul Wright

Comment 10

•

12 years ago

How is this coming along?

Tom S [:evilpie]

Comment 11

•

12 years ago

I need to start looking into this again. I basically forgot most of the stuff I did here.

Tom S [:evilpie]

Comment 12

•

11 years ago

Sorry. I am not really working on this. The base patch is basically good i think. We just need to start looking into where to make use of the uint32 types.

Tom S [:evilpie]

Comment 13

•

11 years ago

I am not working on this.

Assignee: evilpies → general

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 14

•

11 years ago

JS guys, how useful is this performance-wise going forward?  We have asm.js/Odin for ports, so this would only be useful for hand-written JS.  Is this going to be make a big enough difference?

Alon Zakai (:azakai)

Reporter

Comment 15

•

11 years ago

Even if this is less urgent after asm.js, I think this is still important:

1. Not all code generators use asm.js, for example Mandreel. Might be they have not gotten around to it, but there are also use cases like Duetto which intentionally compiles into something that does use some amount of JS objects, so it can't be asm.js. asm.is is an important use case but just one among many. Aside from code generators, there is also hand-written code that could benefit from this kind of optimization (crypto stuff, turbulenz games, as two random examples off the top of my head).

2. asm.js will likely change over time, so it is possible we will have cases where a browser and some asm.js code are not in sync, and will not validate. It would be good to not fall off a cliff in those cases. Similarly, sometimes people add debugging statements, which breaks asm validation, and again the less performance changes the better.

Till Schneidereit [:till]

Comment 16

•

10 years ago

This is very relevant for Shumway. We don't use asm.js, but deal with lots and lots of int32s: all coordinates and all color values are internally treated that way.

Blocks: 885526

Whiteboard: [js:p2][games:p2][ion:t] → [js:p2][games:p2][ion:t][shumway:p1]

Till Schneidereit [:till]

Comment 17

•

10 years ago

Ok, npb made me aware of the fact that this is strictly about uint32. Signed int32 is well-supported already, so Shumway should be good here. Sorry for the noise.

No longer blocks: 885526

Status: ASSIGNED → NEW

Whiteboard: [js:p2][games:p2][ion:t][shumway:p1] → [js:p2][games:p2][ion:t]

Till Schneidereit [:till]

Comment 18

•

10 years ago

Attached file shell version of benchmark from comment 0 — Details

Ok, this *is* relevant to Shumway after all, and to all code that deals with lots of color values.

Replacing `Uint32Array` with `Int32Array` in the attached shell test case makes us go from 440ms to 358ms on my machine, so about 19% faster.

The good news is that we're faster than v8 in all cases: their results are 522ms and 467ms, respectively.

Till Schneidereit [:till]

Updated

•

10 years ago

Whiteboard: [js:p2][games:p2][ion:t] → [js:p2][games:p2][ion:t][shumway:p1]

Chris Peterson [:cpeterson]

Updated

•

10 years ago

Blocks: shumway-m4

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

Chris Peterson [:cpeterson]

Comment 19

•

10 years ago

This Uint32 optimization does not block Shumway's M4 milestone.

Blocks: shumway-1.0
No longer blocks: shumway-m4

Keywords: perf

Jan de Mooij [:jandem]

Updated

•

10 years ago

Depends on: 1111000

Jan de Mooij [:jandem]

Updated

•

10 years ago

Blocks: 1111000

No longer depends on: 1111000

Anthony Vaughn [:anthony][:avaughn][San Francisco - Pacific]

Updated

•

8 years ago

Whiteboard: [js:p2][games:p2][ion:t][shumway:p1] → [js:p2][games:p?][ion:t][shumway:p1]

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.