Closed Bug 1643681 Opened 4 years ago Closed 4 years ago

Optimize emscripten exploded constants as proper constants

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

For reasons maybe having to do with v8 not handling v128.const properly (at least according to the SIMD implementation support table), emscripten appears to generate this mess for wasm_i32x4_const(1,2,3,4):

        i32.const       1
        i32x4.splat
        i32.const       2
        i32x4.replace_lane      1
        i32.const       3
        i32x4.replace_lane      2
        i32.const       4
        i32x4.replace_lane      3

This will be no fun at all if it shows up in loops. It also generates this code for wasm_i32x4_make(1,2,3,4). Some kind of pattern matching seems called for here, at least in the short term.

(That's the output from -S; the output from the compiler with -o test.wasm is illegal binary, so it's hard to know right now.)

Presumably, for something like wasm_i32x4_make(1,2,x,4) it'll be the same again, but we can do better than the above code by loading a constant and then replacing the one lane. Not sure how much work that is to implement though.

The v128.const instruction is behind the -munimplemented-simd128 flag because it has not been implemented in V8 yet. I believe that's the only instruction left behind that flag, so if you support v128.const, you should be able to use that flag for testing. There's a fair amount of logic at https://github.com/llvm/llvm-project/blob/e78431354bcb6bec5be9adf4ea37d860445f8c16/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp#L1416-L1575 that goes into figuring out the best way to construct vectors using as few instructions as possible, so hopefully pattern matching won't be necessary.

Thanks! There's an interesting problem here. Consider a simple test case t.cpp:

v128_t xx;
extern v128_t test(v128_t a, v128_t b);
int main() {
    xx = test(wasm_f32x4_const(1, 2, 3, 4), wasm_f32x4_const(10, 20, 30, 40));
}

Compiling with emcc v1.39.17 (-O3 -msimd128 -munimplemented-simd128 -c -o t.wasm), we get this type of code as desired:

 00008f: fd 0c 00 00 80 3f 00 00 00 | v128.const 0x3f800000 0x40000000 0x40400000 0x40800000
 000098: 40 00 00 40 40 00 00 80 40 | 
 0000a1: fd 0c 00 00 20 41 00 00 a0 | v128.const 0x41200000 0x41a00000 0x41f00000 0x42200000
 0000aa: 41 00 00 f0 41 00 00 20 42 | 
 0000b3: 10 80 80 80 80 00          | call 0 <env._Z4testDv4_iS_>

But in a more complicated mandelbrot program I have, compiled with the same options, I'm seeing wasm constants both floating and int being compiled as splats:

 000163: 41 7f                      |     i32.const 4294967295
 000165: fd 11                      |     i32x4.splat

and

 0001e9: 43 00 00 00 00             |       f32.const 0x0p+0
 0001ee: fd 13                      |       f32x4.splat

Clearly it's not much work to generate good machine code for these patterns, but it's curious that they're being generated at all. Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Emscripten will generate eg f32.const + f32x4.splat instead of
f32x4.const when the same value is used for all lanes, presumably
because this is more compact bytecode. Fold the MIR node to recover
the constant, for all types.

Assignee: nobody → lhansen
Status: NEW → ASSIGNED

Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.

Keywords: leave-open
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a5ba8c617b11
Fold scalar.const+NxM.splat as v128.const. r=bbouvier
Flags: needinfo?(lhansen)
Attachment #9154878 - Attachment description: Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r?bbouvier → Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r=bbouvier
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19e5e3e94bf4
Fold scalar.const+NxM.splat as v128.const. r=bbouvier

(In reply to Thomas Lively from comment #4)

Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.

I see. And the shuffle results from builds of lane values such as

v128_t xx;
extern v128_t test(v128_t* a);
int main() {
    v128_t y = test(nullptr);
    xx = wasm_i32x4_make(wasm_i32x4_extract_lane(y, 1),
                         wasm_i32x4_extract_lane(y, 0),
                         wasm_i32x4_extract_lane(y, 2),
                         wasm_i32x4_extract_lane(y, 1));
    test(&xx);
}

which turns into eg

 000091: 10 80 80 80 80 00          | call 0 <env._Z4testPDv4_i>
 000097: 20 00                      | local.get 0
 000099: fd 0d 04 05 06 07 00 01 02 | v8x16.shuffle 0x07060504 0x03020100 0x0b0a0908 0x07060504
 0000a2: 03 08 09 0a 0b 04 05 06 07 | 

(which we recognize as a PSHUFD, and wherein local 0 is just synthetic, y's home location being in memory). Probably done here then.

Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: