Open Bug 750947 Opened 12 years ago Updated 2 years ago

IonMonkey: Optimize Uint32 values

Categories

(Core :: JavaScript Engine, defect)

defect

Tracking

()

People

(Reporter: azakai, Unassigned)

References

(Blocks 5 open bugs)

Details

(Keywords: perf, Whiteboard: [js:p2][games:p?][ion:t][shumway:p1])

Attachments

(2 files, 3 obsolete files)

In code compiled from C++, unsigned 32-bit ints mainly come from Uint32Arrays and from >>> 0. Then operations on such Uint32 values can be done as 32-bit ints if they just flow into something that forces to unsigned 32-bit ints anyhow, which is the same as the source of such values, Uint32Arrays and >>> 0 basically. (And if along the way we can be sure to not add/multiply over 53 bits of precision, of course.)

This could be important in a significant amount of compiled code, anything that uses unsigned 32-bit ints basically. dvander says IonMonkey can optimize this kind of thing.

Here is a simple benchmark that tests this, and is also an example of the kind of code that is relevant for optimization here:

https://github.com/kripken/misc-js-benchmarks/blob/master/uint32.js

v8/node is twice as fast as jm+ti on that benchmark. (IonMonkey is currently many times slower.)

Note that the benchmark purposefully mixes a signed 32-bit int (i) with unsigned 32-bit ints, which is ok to optimize since it flows into >>> later.
I'd thought about this when writing my implicit truncate optimization, It shouldn't be too difficult to augment that to handle these cases as well.
Blocks: WebJSPerf
Whiteboard: [js:p2]
Whiteboard: [js:p2] → [js:p2][games:p2]
Summary: Optimize Uint32 values in IonMonkey → IonMonkey: Optimize Uint32 values
Attached patch runs simple loop (obsolete) — Splinter Review
This is at a point where some stuff works, namely the testcase from bug 769765 comment 4. This makes this about 5-10 times faster, and I didn't even optimize the compare or add instruction yet.

This is probably going to be quite some work to integrate nicely with TI.

Eg what happens if we do some shifting, which would normally result in an double, but now we stay in the jit. So now TI never observed a double, this value might be written into some array, which was observed as being of packed ints. Now we need to write a double into it, but don't tell TI about the new type. I didn't test this yet, but I think this and other similar stuff could happen.

After this packed uint calculation in some hot loop should be fast, but of course need to a little bit more work when boxing them.
Assignee: general → evilpies
Status: NEW → ASSIGNED
Blocks: 769765
Whiteboard: [js:p2][games:p2] → [js:p2][games:p2][ion:t]
Attached patch passes tests (obsolete) — Splinter Review
This passes all but one jit test, that is making me insane! (Just in case: jit-test/tests/ion/bug741202.js) Includes bailout handling and probably safe TI usage. Doesn't really help the benchmark in comment #0, because we never promote anything to uint32 there.

The sad thing is, we can only optimize instructions to return uint32 if 
a) TI already observed doubles for that operation and thus doesn't get confused when we suddenly come up with a double at some random place.
b) the operation can not fail / leave the jit: eg. only flows into add, sub etc. I still need to implement some optimization pass that finds these kind of important cases and promotes them to uint32. I think the v8 patch has basically just that part.
Attachment #644687 - Attachment is obsolete: true
Hehe just found the problem with that test, while looking at the [review]. :)
Attached patch base patch for x64 (obsolete) — Splinter Review
So I consider this a base patch. What this does, it adds basic support for uint32 everywhere it is needed. What this patch does not is optimize specific instructions to handle uint32 values.

This passes all tests.
Attachment #647304 - Attachment is obsolete: true
Attachment #649401 - Flags: feedback?(dvander)
v8 recently added an optimization for this,

http://code.google.com/p/v8/issues/detail?id=2097
Don't forget to remove the trailing whitespace you're adding
Blocks: 787851
Sounds like this could be a nice win.
Attached patch refreshed patchSplinter Review
This is just the refreshed version of the last patch. So still only support for x64. We changed a few things, most notably I introduced NoUint32Policy for MPassArg so it only ever sees already boxed uint32 values, because we would have to box them in LStackArg anyway.

Passes jit-tests.
Attachment #649401 - Attachment is obsolete: true
Attachment #649401 - Flags: feedback?(dvander)
How is this coming along?
I need to start looking into this again. I basically forgot most of the stuff I did here.
Sorry. I am not really working on this. The base patch is basically good i think. We just need to start looking into where to make use of the uint32 types.
I am not working on this.
Assignee: evilpies → general
JS guys, how useful is this performance-wise going forward?  We have asm.js/Odin for ports, so this would only be useful for hand-written JS.  Is this going to be make a big enough difference?
Even if this is less urgent after asm.js, I think this is still important:

1. Not all code generators use asm.js, for example Mandreel. Might be they have not gotten around to it, but there are also use cases like Duetto which intentionally compiles into something that does use some amount of JS objects, so it can't be asm.js. asm.is is an important use case but just one among many. Aside from code generators, there is also hand-written code that could benefit from this kind of optimization (crypto stuff, turbulenz games, as two random examples off the top of my head).

2. asm.js will likely change over time, so it is possible we will have cases where a browser and some asm.js code are not in sync, and will not validate. It would be good to not fall off a cliff in those cases. Similarly, sometimes people add debugging statements, which breaks asm validation, and again the less performance changes the better.
This is very relevant for Shumway. We don't use asm.js, but deal with lots and lots of int32s: all coordinates and all color values are internally treated that way.
Blocks: 885526
Whiteboard: [js:p2][games:p2][ion:t] → [js:p2][games:p2][ion:t][shumway:p1]
Ok, npb made me aware of the fact that this is strictly about uint32. Signed int32 is well-supported already, so Shumway should be good here. Sorry for the noise.
No longer blocks: 885526
Status: ASSIGNED → NEW
Whiteboard: [js:p2][games:p2][ion:t][shumway:p1] → [js:p2][games:p2][ion:t]
Ok, this *is* relevant to Shumway after all, and to all code that deals with lots of color values.

Replacing `Uint32Array` with `Int32Array` in the attached shell test case makes us go from 440ms to 358ms on my machine, so about 19% faster.

The good news is that we're faster than v8 in all cases: their results are 522ms and 467ms, respectively.
Whiteboard: [js:p2][games:p2][ion:t] → [js:p2][games:p2][ion:t][shumway:p1]
Assignee: general → nobody
This Uint32 optimization does not block Shumway's M4 milestone.
Blocks: shumway-1.0
No longer blocks: shumway-m4
Keywords: perf
Depends on: 1111000
Blocks: 1111000
No longer depends on: 1111000
Whiteboard: [js:p2][games:p2][ion:t][shumway:p1] → [js:p2][games:p?][ion:t][shumway:p1]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: