C-like corrections should help, not hurt

NEW
Unassigned

Status

()

defect
7 years ago
5 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Depends on 1 bug, Blocks 3 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [js:p2])

Attachments

(3 attachments)

Attached are two versions of a fasta benchmark. 'slow' does proper sign correction and so forth: All comparisons have |0 for unsigned and >>>0 for signed, all additions get |0 or >>>0 to show they are just integer operations, etc. 'fast' on the other hand does not have these corrections - the code happens to work fine without them, no overflows happen and no signeds need to be unsigneds or vice versa in comparisons.

'slow' is the normal emscripten code generation mode - because it guarantees code will work, and also because in theory this information should help the JS engine if it takes it into account, to remove overflow checks.

'slow' is about 6% slower than 'fast' on my machine on IonMonkey, so it looks like the additional information is not used to optimize in all cases, and it looks like additional (unneeded) work is done because of them.

To be specific, the important cases are probably that in

(x+y)|0

if we know x and y are ints, could be done as simple int addition. And

x|0 > y|0

if we know x and y are ints, could be just int comparison. And in

x>>>0 > y>>>0

if we know x and y are ints, we can do an unsigned comparison between them.
Posted file fast.js
Posted file slow.js
Diff between fast.js and slow.js
Blocks: IonSpeed
Improving this would help with all compiled games (emscripten, mandreel, etc.).
Blocks: gecko-games
Note that Ion does not compile scripts with bytecode length > 1500. Adding |0 and >>> 0 generates more bytecode so may make us reach that limit faster. Not a difference in this case though, on both fast.js and slow.js we don't Ion-compile the functions at lines 1 (global), 929, 1190 and 2964 because of their size. I don't know how hot these functions are though.

The |0 in (x + y)|0 helps remove overflow checks for the add, and int32|0 doesn't generate any extra code. However, x >>> 0 still generates more code: JM+TI always calls a stub and Ion does one of two things for x >>> 0:

(1) If the result always fit in an int32: shift with 0, then check if the result is >= 0, else bailout.
(2) Else: shift, then convert the result to double.

Things to do here:

(1) Emitting "shl x, 0" (shift by 0) on x86 is silly. I thought we optimized this already due to a problem with our asm spew, will fix. We still have to check x >= 0 though.

(2) JM+TI should not call a stub for x >>> 0.

(3) Bug 750947 should help Ion a lot. With that fixed we can make x >>> 0 a no-op in the most common cases.

(4) I can't find it but didn't bhackett file a bug to optimize the x>>>0 > y>>>0 comparison pattern?
Depends on: 750947
Depends on: 787906
(In reply to Jan de Mooij [:jandem] from comment #5)
> (2) JM+TI should not call a stub for x >>> 0.
>
> (4) I can't find it but didn't bhackett file a bug to optimize the x>>>0 >
> y>>>0 comparison pattern?

This is bug 769765, which I didn't file but I wrote a patch for to fix a massive regression vs. Chrome.  I wasn't able to get this patch reviewed, though.  I won't be able to do anything with that bug before leaving later this week (gone til November), feel free to do whatever you want with that bug.
Whiteboard: [js:p2]
Whiteboard: [js:p2] → [js:p2][games:p1]
This is really Asm.js mostly; though the corrections should still be useful, once we have Asm.js this won't be that critical, correct?
Slightly less important perhaps, but as mentioned in

https://bugzilla.mozilla.org/show_bug.cgi?id=767238#c2

I think it is still important.
Whiteboard: [js:p2][games:p1] → [js:p2]
Blocks: JSIL
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.