br_table is slow or it doesn't scale
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
People
(Reporter: brezaevlad, Unassigned)
Details
Attachments
(1 file)
55.37 KB,
application/octet-stream
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36
Steps to reproduce:
I work on the mono runtime team. We have an interpreter (for MSIL/C#) written in C that is compiled using emscripten to wasm. The interpreter is basically a while (1) followed by a huge switch. Performance seems subpar on spider monkey (and JavaScriptCore), compared to V8. After further investigation, it would seem br_table doesn't scale up.
Consider this C file https://gist.github.com/BrzVlad/1b5a6bdb20205db1c970edae709d337e. Compile it to wasm using an emsdk (emsdk/upstream/emscripten/emcc -O2 huge-interp.c). Run it once as is and once with SWITCH_CASE defined to nothing.
Actual results:
While on v8 you can see that the performance is more or less the same, on sm the test case becomes 10x slower, which shouldn't happen. I suspect this greatly impacts our performance.
Expected results:
I hope the performance on this test case can be improved soon. I would hope to be able to compare the performance of the wasm runtimes with our interpreter without this small issue distorting the results.
In addition to this, it would be great if you could give me some tips, so I can compare the generated code by the wasm runtimes for hot code paths. I'm particularly interested in a flag/env var to force the best tier (is it --ion-eager ?). Also how can I check the generated native code for a method, or for all of them (have it dumped to console).
Comment 1•5 years ago
|
||
Thanks for opening an issue. I wonder if this is the lack of jump threading that's biting us here...
With respect to flags you can use: in a browser build, you can use the about:config
prefs to set javascript.options.wasm_baselinejit
to false, and this will ensure that everything goes through the Ionmonkey backend, which is the most optimizing one. With the JS shell, you can try --wasm-compiler=ion
.
To see the generated code, you'll need a JS shell, and then set the env variable IONFLAGS=codegen
, grep for "wasm" in the (very large) output; wasm functions are printed, and can be looked up by their function index.
Reporter | ||
Comment 2•5 years ago
|
||
Since I'm investigating low level things, I prefer not to run from browser. I just get the sm runtime shell through jsvu, https://github.com/GoogleChromeLabs/jsvu.
I tried running with IONFLAGS=codegen ./sm --wasm-compiler=ion a.out.js
, from ubuntu bash, but I'm getting no output. Do I need to have a runtime with some special debugging capabilities ? I know this was the case for example with v8, where I ended up building the runtime from scratch.
I submitted a similar issue to apple yestereday, and, checking the native code, it turns out that they have iterative comparisons for the integer passed to the switch, rather than doing a single indirect call through a table. Given the perf regression between sm and jsc are eerily similar, I would suspect the same issue happens here.
Comment 3•5 years ago
|
||
Would it be possible to attach a stand-alone version we can run in the JS shell?
Reporter | ||
Comment 4•5 years ago
|
||
Attached the binaries generated by emscripten for the testcase from the gist. This is the variant where the switch has 1000 cases.
Comment 5•5 years ago
|
||
The Wasm Baseline compiler is much faster. For Ion it looks like LICM is hoisting a ton of instructions before the loop (for loading the strings in unreachable_method is my guess). Try with --ion-licm=off
, then Ion is faster than the Wasm Baseline compiler for me.
Reporter | ||
Comment 6•5 years ago
|
||
Thank you. I can confirm that once I pass that flag the performance goes up to a normal level. I also tried that flag when running our whole runtime and it has no impact on performance, which means we are not affected by this. Maybe this perf problem can be fixed, if you consider the microbenchmark as being relevant, but, as far as I'm concerned, this bug can even be closed.
I would still ask for some help with getting native codegen output from sm, since IONFLAGS=codegen didn't work with my js shell. It would help me investigate reasons for slowness in general, properly diagnose problems and submit more specific reports, unlike in this scenario.
Comment 7•5 years ago
|
||
You need an --enable-debug
build. The build instructions for the shell are here: https://firefox-source-docs.mozilla.org/js/build.html
Updated•5 years ago
|
Updated•5 years ago
|
Comment 8•5 years ago
|
||
In an enable-debug shell build you can also use the built-in function wasmDis to disassemble a function, this is sometimes easier than slogging through the output from IONFLAGS=codegen. The argument to wasmDis is a JS function that was exported from a wasm module:
js> var ins = new WebAssembly.Instance(new WebAssembly.Module(wasmTextToBinary(`
(module (func (export "f") (param v128) (result v128) (v128.and (local.get 0) (local.get 0))))
`)))
js> wasmDis(ins.exports.f)
00000000 41 81 fa 33 08 00 00 cmp $0x833, %r10d
00000007 0f 84 03 00 00 00 jz 0x0000000000000010
0000000D 0f 0b ud2
0000000F 90 nop
00000010 41 56 push %r14
00000012 55 push %rbp
00000013 48 8b ec mov %rsp, %rbp
00000016 66 0f db c0 pand %xmm0, %xmm0
0000001A 5d pop %rbp
0000001B 41 5e pop %r14
0000001D c3 ret
Comment 9•5 years ago
|
||
S1 or S2 bugs need an assignee - could you find someone for this bug?
Comment 10•5 years ago
|
||
I'm actually going to WONTFIX this as the poster says that this is a microbenchmark artifact that does not affect an actual application.
Description
•