1641599 - br_table is slow or it doesn't scale

Reporter

Description

•

5 years ago

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36

Steps to reproduce:

I work on the mono runtime team. We have an interpreter (for MSIL/C#) written in C that is compiled using emscripten to wasm. The interpreter is basically a while (1) followed by a huge switch. Performance seems subpar on spider monkey (and JavaScriptCore), compared to V8. After further investigation, it would seem br_table doesn't scale up.

Consider this C file https://gist.github.com/BrzVlad/1b5a6bdb20205db1c970edae709d337e. Compile it to wasm using an emsdk (emsdk/upstream/emscripten/emcc -O2 huge-interp.c). Run it once as is and once with SWITCH_CASE defined to nothing.

Actual results:

While on v8 you can see that the performance is more or less the same, on sm the test case becomes 10x slower, which shouldn't happen. I suspect this greatly impacts our performance.

Expected results:

I hope the performance on this test case can be improved soon. I would hope to be able to compare the performance of the wasm runtimes with our interpreter without this small issue distorting the results.

In addition to this, it would be great if you could give me some tips, so I can compare the generated code by the wasm runtimes for hot code paths. I'm particularly interested in a flag/env var to force the best tier (is it --ion-eager ?). Also how can I check the generated native code for a method, or for all of them (have it dumped to console).

Benjamin Bouvier [:bbouvier] (inactive)

Comment 1

•

5 years ago

Thanks for opening an issue. I wonder if this is the lack of jump threading that's biting us here...

With respect to flags you can use: in a browser build, you can use the about:config prefs to set javascript.options.wasm_baselinejit to false, and this will ensure that everything goes through the Ionmonkey backend, which is the most optimizing one. With the JS shell, you can try --wasm-compiler=ion.

To see the generated code, you'll need a JS shell, and then set the env variable IONFLAGS=codegen, grep for "wasm" in the (very large) output; wasm functions are printed, and can be looked up by their function index.

Vlad Brezae

Reporter

Comment 2

•

5 years ago

Since I'm investigating low level things, I prefer not to run from browser. I just get the sm runtime shell through jsvu, https://github.com/GoogleChromeLabs/jsvu.

I tried running with IONFLAGS=codegen ./sm --wasm-compiler=ion a.out.js, from ubuntu bash, but I'm getting no output. Do I need to have a runtime with some special debugging capabilities ? I know this was the case for example with v8, where I ended up building the runtime from scratch.

I submitted a similar issue to apple yestereday, and, checking the native code, it turns out that they have iterative comparisons for the integer passed to the switch, rather than doing a single indirect call through a table. Given the perf regression between sm and jsc are eerily similar, I would suspect the same issue happens here.

Jan de Mooij [:jandem]

Comment 3

•

5 years ago

Would it be possible to attach a stand-alone version we can run in the JS shell?

Vlad Brezae

Reporter

Comment 4

•

5 years ago

Attached file Binaries generated by emscripten — Details

Attached the binaries generated by emscripten for the testcase from the gist. This is the variant where the switch has 1000 cases.

Jan de Mooij [:jandem]

Comment 5

•

5 years ago

The Wasm Baseline compiler is much faster. For Ion it looks like LICM is hoisting a ton of instructions before the loop (for loading the strings in unreachable_method is my guess). Try with --ion-licm=off, then Ion is faster than the Wasm Baseline compiler for me.

Vlad Brezae

Reporter

Comment 6

•

5 years ago

Thank you. I can confirm that once I pass that flag the performance goes up to a normal level. I also tried that flag when running our whole runtime and it has no impact on performance, which means we are not affected by this. Maybe this perf problem can be fixed, if you consider the microbenchmark as being relevant, but, as far as I'm concerned, this bug can even be closed.

I would still ask for some help with getting native codegen output from sm, since IONFLAGS=codegen didn't work with my js shell. It would help me investigate reasons for slowness in general, properly diagnose problems and submit more specific reports, unlike in this scenario.

Jan de Mooij [:jandem]

Comment 7

•

5 years ago

You need an --enable-debug build. The build instructions for the shell are here: https://firefox-source-docs.mozilla.org/js/build.html

Lars T Hansen [:lth]

Updated

•

5 years ago

Severity: -- → S2

Status: UNCONFIRMED → NEW

Ever confirmed: true

Priority: -- → P2

Lars T Hansen [:lth]

Updated

•

5 years ago

Type: enhancement → defect

Lars T Hansen [:lth]

Comment 8

•

5 years ago

In an enable-debug shell build you can also use the built-in function wasmDis to disassemble a function, this is sometimes easier than slogging through the output from IONFLAGS=codegen. The argument to wasmDis is a JS function that was exported from a wasm module:

js> var ins = new WebAssembly.Instance(new WebAssembly.Module(wasmTextToBinary(`
  (module (func (export "f") (param v128) (result v128) (v128.and (local.get 0) (local.get 0))))
`)))
js> wasmDis(ins.exports.f)
00000000  41 81 fa 33 08 00 00      cmp $0x833, %r10d
00000007  0f 84 03 00 00 00         jz 0x0000000000000010
0000000D  0f 0b                     ud2
0000000F  90                        nop
00000010  41 56                     push %r14
00000012  55                        push %rbp
00000013  48 8b ec                  mov %rsp, %rbp
00000016  66 0f db c0               pand %xmm0, %xmm0
0000001A  5d                        pop %rbp
0000001B  41 5e                     pop %r14
0000001D  c3                        ret

Rachel Tublitz [:rachel]

Comment 9

•

5 years ago

S1 or S2 bugs need an assignee - could you find someone for this bug?

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Comment 10

•

5 years ago

I'm actually going to WONTFIX this as the poster says that this is a microbenchmark artifact that does not affect an actual application.

Status: NEW → RESOLVED

Closed: 5 years ago

Flags: needinfo?(lhansen)

Resolution: --- → WONTFIX

Bugzilla

br_table is slow or it doesn't scale

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

Tracking

()

People

(Reporter: brezaevlad, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type