Closed Bug 1641599 Opened 2 years ago Closed 2 years ago

br_table is slow or it doesn't scale

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

76 Branch
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: brezaevlad, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36

Steps to reproduce:

I work on the mono runtime team. We have an interpreter (for MSIL/C#) written in C that is compiled using emscripten to wasm. The interpreter is basically a while (1) followed by a huge switch. Performance seems subpar on spider monkey (and JavaScriptCore), compared to V8. After further investigation, it would seem br_table doesn't scale up.

Consider this C file https://gist.github.com/BrzVlad/1b5a6bdb20205db1c970edae709d337e. Compile it to wasm using an emsdk (emsdk/upstream/emscripten/emcc -O2 huge-interp.c). Run it once as is and once with SWITCH_CASE defined to nothing.

Actual results:

While on v8 you can see that the performance is more or less the same, on sm the test case becomes 10x slower, which shouldn't happen. I suspect this greatly impacts our performance.

Expected results:

I hope the performance on this test case can be improved soon. I would hope to be able to compare the performance of the wasm runtimes with our interpreter without this small issue distorting the results.

In addition to this, it would be great if you could give me some tips, so I can compare the generated code by the wasm runtimes for hot code paths. I'm particularly interested in a flag/env var to force the best tier (is it --ion-eager ?). Also how can I check the generated native code for a method, or for all of them (have it dumped to console).

Thanks for opening an issue. I wonder if this is the lack of jump threading that's biting us here...

With respect to flags you can use: in a browser build, you can use the about:config prefs to set javascript.options.wasm_baselinejit to false, and this will ensure that everything goes through the Ionmonkey backend, which is the most optimizing one. With the JS shell, you can try --wasm-compiler=ion.

To see the generated code, you'll need a JS shell, and then set the env variable IONFLAGS=codegen, grep for "wasm" in the (very large) output; wasm functions are printed, and can be looked up by their function index.

Since I'm investigating low level things, I prefer not to run from browser. I just get the sm runtime shell through jsvu, https://github.com/GoogleChromeLabs/jsvu.

I tried running with IONFLAGS=codegen ./sm --wasm-compiler=ion a.out.js, from ubuntu bash, but I'm getting no output. Do I need to have a runtime with some special debugging capabilities ? I know this was the case for example with v8, where I ended up building the runtime from scratch.

I submitted a similar issue to apple yestereday, and, checking the native code, it turns out that they have iterative comparisons for the integer passed to the switch, rather than doing a single indirect call through a table. Given the perf regression between sm and jsc are eerily similar, I would suspect the same issue happens here.

Would it be possible to attach a stand-alone version we can run in the JS shell?

Attached the binaries generated by emscripten for the testcase from the gist. This is the variant where the switch has 1000 cases.

The Wasm Baseline compiler is much faster. For Ion it looks like LICM is hoisting a ton of instructions before the loop (for loading the strings in unreachable_method is my guess). Try with --ion-licm=off, then Ion is faster than the Wasm Baseline compiler for me.

Thank you. I can confirm that once I pass that flag the performance goes up to a normal level. I also tried that flag when running our whole runtime and it has no impact on performance, which means we are not affected by this. Maybe this perf problem can be fixed, if you consider the microbenchmark as being relevant, but, as far as I'm concerned, this bug can even be closed.

I would still ask for some help with getting native codegen output from sm, since IONFLAGS=codegen didn't work with my js shell. It would help me investigate reasons for slowness in general, properly diagnose problems and submit more specific reports, unlike in this scenario.

You need an --enable-debug build. The build instructions for the shell are here: https://firefox-source-docs.mozilla.org/js/build.html

Severity: -- → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Type: enhancement → defect

In an enable-debug shell build you can also use the built-in function wasmDis to disassemble a function, this is sometimes easier than slogging through the output from IONFLAGS=codegen. The argument to wasmDis is a JS function that was exported from a wasm module:

js> var ins = new WebAssembly.Instance(new WebAssembly.Module(wasmTextToBinary(`
  (module (func (export "f") (param v128) (result v128) (v128.and (local.get 0) (local.get 0))))
`)))
js> wasmDis(ins.exports.f)
00000000  41 81 fa 33 08 00 00      cmp $0x833, %r10d
00000007  0f 84 03 00 00 00         jz 0x0000000000000010
0000000D  0f 0b                     ud2
0000000F  90                        nop
00000010  41 56                     push %r14
00000012  55                        push %rbp
00000013  48 8b ec                  mov %rsp, %rbp
00000016  66 0f db c0               pand %xmm0, %xmm0
0000001A  5d                        pop %rbp
0000001B  41 5e                     pop %r14
0000001D  c3                        ret

S1 or S2 bugs need an assignee - could you find someone for this bug?

Flags: needinfo?(lhansen)

I'm actually going to WONTFIX this as the poster says that this is a microbenchmark artifact that does not affect an actual application.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(lhansen)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.