Closed Bug 771106 Opened 13 years ago Closed 1 year ago

Meta: improve memory access performance in Emscripten-translated code

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: bhackett1024, Unassigned)

References

Details

(Whiteboard: [js:t])

Attachments

(5 files)

basic translated fannkuch 13 years ago Brian Hackett [Laid off!] 80.84 KB, application/x-javascript		Details
hand-optimized translated fannkuch 13 years ago Brian Hackett [Laid off!] 78.86 KB, application/x-javascript		Details
C++ fannkuch 13 years ago Brian Hackett [Laid off!] 3.18 KB, text/plain		Details
fannkuches 13 years ago Alon Zakai (:azakai) 123.02 KB, application/octet-stream		Details
testcase (2).html 1 year ago Mayank Bansal 83.77 KB, text/html		Details

Brian Hackett [Laid off!]

Reporter

Description

•

13 years ago

Currently, memory access performance is a major gap between native code and Emscripten-translated JS (and other autotranslators). For a native access like: x[i] From C this is compiled to a single base+index instruction. The translated JS for this looks something like: Mem[(x + (i << 2)) >> 2] In some cases new variables can be introduced by optimizations (don't know if these are in Emscripten or the Closure compiler) which can eliminate one or both of these shifts, but those don't seem to be widely applicable and aren't used much on the (simple) fannkuch benchmark. This metabug is about changes to the JITs and/or Emscripten to allow the translation of x[i] to be compiled by the JIT to a base+index instruction and (hopefully hoisted from loops) bounds check. While Emscripten is the focus here, not looking to overly constrain the input and allow adaptation to/by other autotranslators.

Brian Hackett [Laid off!]

Reporter

Comment 1

•

13 years ago

Attached file basic translated fannkuch — Details

Translated version of fannkuch-11 benchmark by emscripten -O3 (code is 1s faster than -O2). I currently get 10.2s in JM and 10.0s in IM.

Brian Hackett [Laid off!]

Reporter

Comment 2

•

13 years ago

Attached file hand-optimized translated fannkuch — Details

Above benchmark hand optimized to eliminate most shifts. I get 9.3s in JM and 23.9s in IM (weird perf fault, needs investigating). This changes the representation of 'x' in the above to be an index into Mem[] rather than the absolute offset. Don't know how hard this change would be to make to Emscripten, but generating code like this would require less pattern matching in the JIT and would apply more easily to other JS engines.

Brian Hackett [Laid off!]

Reporter

Comment 3

•

13 years ago

Attached file C++ fannkuch — Details

Original C++. gcc -O3 is 3.3s for me.

Brian Hackett [Laid off!]

Reporter

Updated

•

13 years ago

Attachment #639312 - Attachment is patch: false

Brian Hackett [Laid off!]

Reporter

Updated

•

13 years ago

Depends on: 771285

Brian Hackett [Laid off!]

Reporter

Updated

•

13 years ago

Depends on: 771383

Alon Zakai (:azakai)

Comment 4

•

13 years ago

Regarding things like Mem[(x + (i << 2)) >> 2] it is possible to split off x >> 2 and if that recurs to define x2 = x >> 2. However for (i << 2) >> 2 it is not trivial to replace it with say i | 0 since a few bits can get zeroed out here. This might become easier though once emscripten has a C++ LLVM backend (see bug 771285 comment 5) because likely inside LLVM it is straightforward to detect when such operations are on pointers (and we can reasonably assume the top few bits are not needed).

Alon Zakai (:azakai)

Comment 5

•

13 years ago

Attached file fannkuches — Details

Ok, we already have the infrastructure in the current compiler to optimize similar expressions, I did some tests now. Attached are 4 versions of fannkuch, with an example of the code in each. Closure was not run to keep things readable. src.0.js HEAP32[($i_23_i << 2 >> 2) + $20$s2] src.1.js HEAP32[($i_23_i & 1073741823) + $20$s2] src.2.js HEAP32[($i_23_i | 0) + $20$s2] src.3.js HEAP32[$i_23_i + $20$s2] So, src.0.js is the original unmodified compiler. src.1.js replaces << >> with a single & with the proper mask, which is a safe transformation that seems like it could be useful (1 operation instead of 2). src.2.js does an unsafe transformation of << >> to | 0, which is valid for pointers and happens to be ok here. Finally, src.3.js is the same as the previous one but without the |0, for the smallest possible code. time mozjs --ion -n src.*.js 11 gives src.0.js 9.869 seconds src.1.js 9.797 src.2.js 9.793 src.3.js 10.273 1 and 2 give slightly less than a 1% speedup. Very different than the hand-optimized version mentioned earlier so I guess that hand-optimized one did something important that these simple optimizations did not. Note that 1% is not a bad thing in itself. However 1 generates larger code and 2 is unsafe, so for now I won't use these optimizations in emscripten. 3 is slower, not surprising I guess, the JIT needs to add checks on the type of the variable inside HEAP.

Alon Zakai (:azakai)

Comment 6

•

13 years ago

Btw, $20$s2 is a helper variable the optimizer generated, after seeing that $20 was used through >> 2 several times, so it defined $20$s2 = $20 >> 2.

Brian Hackett [Laid off!]

Reporter

Comment 7

•

13 years ago

Ah, what are the times with -m -n? ion still behaves weirdly here and is slower on the optimized benchmark.

Alon Zakai (:azakai)

Comment 8

•

13 years ago

Ok, with -m -n I get src.0.js 9.761 src.1.js 9.713 src.2.js 9.749 src.3.js 10.141

Brian Hackett [Laid off!]

Reporter

Updated

•

13 years ago

Depends on: 771835

Brian Hackett [Laid off!]

Reporter

Updated

•

13 years ago

Depends on: 771864

David Mandelin [:dmandelin]

Updated

•

13 years ago

Blocks: 767238

Whiteboard: [js:t]

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Assignee: general → nobody

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Mayank Bansal

Comment 9

•

1 year ago

Nightly: https://share.firefox.dev/4dgpNgY (3s)
Chrome: https://share.firefox.dev/3yHAGZW (2.5s)

Is there anything to do further here?

Flags: needinfo?(nicolas.b.pierron)

Mayank Bansal

Comment 10

•

1 year ago

Attached file testcase (2).html — Details

Iain Ireland [:iain]

Comment 11

•

1 year ago

This is two generations old; Emscripten moved to target asm.js, and now targets Wasm. The metabug this was blocking (bug 767238) has been closed, so it should be safe to close this one too.

Status: NEW → RESOLVED

Closed: 1 year ago

Flags: needinfo?(nicolas.b.pierron)

Resolution: --- → WONTFIX

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

9 months ago

No longer depends on: 771864

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

9 months ago

No longer depends on: 771835

You need to log in before you can comment on or make changes to this bug.