Optimize emscripten exploded constants as proper constants
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
For reasons maybe having to do with v8 not handling v128.const properly (at least according to the SIMD implementation support table), emscripten appears to generate this mess for wasm_i32x4_const(1,2,3,4)
:
i32.const 1
i32x4.splat
i32.const 2
i32x4.replace_lane 1
i32.const 3
i32x4.replace_lane 2
i32.const 4
i32x4.replace_lane 3
This will be no fun at all if it shows up in loops. It also generates this code for wasm_i32x4_make(1,2,3,4)
. Some kind of pattern matching seems called for here, at least in the short term.
(That's the output from -S; the output from the compiler with -o test.wasm is illegal binary, so it's hard to know right now.)
Presumably, for something like wasm_i32x4_make(1,2,x,4)
it'll be the same again, but we can do better than the above code by loading a constant and then replacing the one lane. Not sure how much work that is to implement though.
Comment 1•4 years ago
|
||
The v128.const instruction is behind the -munimplemented-simd128 flag because it has not been implemented in V8 yet. I believe that's the only instruction left behind that flag, so if you support v128.const, you should be able to use that flag for testing. There's a fair amount of logic at https://github.com/llvm/llvm-project/blob/e78431354bcb6bec5be9adf4ea37d860445f8c16/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp#L1416-L1575 that goes into figuring out the best way to construct vectors using as few instructions as possible, so hopefully pattern matching won't be necessary.
Assignee | ||
Comment 2•4 years ago
•
|
||
Thanks! There's an interesting problem here. Consider a simple test case t.cpp:
v128_t xx;
extern v128_t test(v128_t a, v128_t b);
int main() {
xx = test(wasm_f32x4_const(1, 2, 3, 4), wasm_f32x4_const(10, 20, 30, 40));
}
Compiling with emcc v1.39.17 (-O3 -msimd128 -munimplemented-simd128 -c -o t.wasm), we get this type of code as desired:
00008f: fd 0c 00 00 80 3f 00 00 00 | v128.const 0x3f800000 0x40000000 0x40400000 0x40800000
000098: 40 00 00 40 40 00 00 80 40 |
0000a1: fd 0c 00 00 20 41 00 00 a0 | v128.const 0x41200000 0x41a00000 0x41f00000 0x42200000
0000aa: 41 00 00 f0 41 00 00 20 42 |
0000b3: 10 80 80 80 80 00 | call 0 <env._Z4testDv4_iS_>
But in a more complicated mandelbrot program I have, compiled with the same options, I'm seeing wasm constants both floating and int being compiled as splats:
000163: 41 7f | i32.const 4294967295
000165: fd 11 | i32x4.splat
and
0001e9: 43 00 00 00 00 | f32.const 0x0p+0
0001ee: fd 13 | f32x4.splat
Clearly it's not much work to generate good machine code for these patterns, but it's curious that they're being generated at all. Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?
Assignee | ||
Comment 3•4 years ago
|
||
Emscripten will generate eg f32.const + f32x4.splat instead of
f32x4.const when the same value is used for all lanes, presumably
because this is more compact bytecode. Fold the MIR node to recover
the constant, for all types.
Updated•4 years ago
|
Comment 4•4 years ago
|
||
Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?
Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.
Assignee | ||
Updated•4 years ago
|
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a5ba8c617b11 Fold scalar.const+NxM.splat as v128.const. r=bbouvier
Comment 6•4 years ago
|
||
Backed out for build bustages.
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305583819&repo=autoland&lineNumber=12842
Backout: https://hg.mozilla.org/integration/autoland/rev/5096e4f816e8616912859aad0fe61c9782c569c2
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/19e5e3e94bf4 Fold scalar.const+NxM.splat as v128.const. r=bbouvier
Assignee | ||
Comment 8•4 years ago
|
||
(In reply to Thomas Lively from comment #4)
Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?
Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.
I see. And the shuffle results from builds of lane values such as
v128_t xx;
extern v128_t test(v128_t* a);
int main() {
v128_t y = test(nullptr);
xx = wasm_i32x4_make(wasm_i32x4_extract_lane(y, 1),
wasm_i32x4_extract_lane(y, 0),
wasm_i32x4_extract_lane(y, 2),
wasm_i32x4_extract_lane(y, 1));
test(&xx);
}
which turns into eg
000091: 10 80 80 80 80 00 | call 0 <env._Z4testPDv4_i>
000097: 20 00 | local.get 0
000099: fd 0d 04 05 06 07 00 01 02 | v8x16.shuffle 0x07060504 0x03020100 0x0b0a0908 0x07060504
0000a2: 03 08 09 0a 0b 04 05 06 07 |
(which we recognize as a PSHUFD, and wherein local 0 is just synthetic, y's home location being in memory). Probably done here then.
Comment 9•4 years ago
|
||
bugherder |
Description
•