Closed Bug 1643681 Opened 4 years ago Closed 4 years ago

Optimize emscripten exploded constants as proper constants

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla79

Tracking Flags:

Tracking

Status

firefox79

---

fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r=bbouvier 4 years ago Lars T Hansen [:lth] 47 bytes, text/x-phabricator-request		Details \| Review

Lars T Hansen [:lth]

Assignee

Description

•

4 years ago

For reasons maybe having to do with v8 not handling v128.const properly (at least according to the SIMD implementation support table), emscripten appears to generate this mess for wasm_i32x4_const(1,2,3,4):

        i32.const       1
        i32x4.splat
        i32.const       2
        i32x4.replace_lane      1
        i32.const       3
        i32x4.replace_lane      2
        i32.const       4
        i32x4.replace_lane      3

This will be no fun at all if it shows up in loops. It also generates this code for wasm_i32x4_make(1,2,3,4). Some kind of pattern matching seems called for here, at least in the short term.

(That's the output from -S; the output from the compiler with -o test.wasm is illegal binary, so it's hard to know right now.)

Presumably, for something like wasm_i32x4_make(1,2,x,4) it'll be the same again, but we can do better than the above code by loading a constant and then replacing the one lane. Not sure how much work that is to implement though.

Thomas Lively

Comment 1

•

4 years ago

The v128.const instruction is behind the -munimplemented-simd128 flag because it has not been implemented in V8 yet. I believe that's the only instruction left behind that flag, so if you support v128.const, you should be able to use that flag for testing. There's a fair amount of logic at https://github.com/llvm/llvm-project/blob/e78431354bcb6bec5be9adf4ea37d860445f8c16/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp#L1416-L1575 that goes into figuring out the best way to construct vectors using as few instructions as possible, so hopefully pattern matching won't be necessary.

Lars T Hansen [:lth]

Assignee

Comment 2

•

4 years ago

•

Edited

Thanks! There's an interesting problem here. Consider a simple test case t.cpp:

v128_t xx;
extern v128_t test(v128_t a, v128_t b);
int main() {
    xx = test(wasm_f32x4_const(1, 2, 3, 4), wasm_f32x4_const(10, 20, 30, 40));
}

Compiling with emcc v1.39.17 (-O3 -msimd128 -munimplemented-simd128 -c -o t.wasm), we get this type of code as desired:

 00008f: fd 0c 00 00 80 3f 00 00 00 | v128.const 0x3f800000 0x40000000 0x40400000 0x40800000
 000098: 40 00 00 40 40 00 00 80 40 | 
 0000a1: fd 0c 00 00 20 41 00 00 a0 | v128.const 0x41200000 0x41a00000 0x41f00000 0x42200000
 0000aa: 41 00 00 f0 41 00 00 20 42 | 
 0000b3: 10 80 80 80 80 00          | call 0 <env._Z4testDv4_iS_>

But in a more complicated mandelbrot program I have, compiled with the same options, I'm seeing wasm constants both floating and int being compiled as splats:

 000163: 41 7f                      |     i32.const 4294967295
 000165: fd 11                      |     i32x4.splat

and

 0001e9: 43 00 00 00 00             |       f32.const 0x0p+0
 0001ee: fd 13                      |       f32x4.splat

Clearly it's not much work to generate good machine code for these patterns, but it's curious that they're being generated at all. Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Lars T Hansen [:lth]

Assignee

Comment 3

•

4 years ago

Attached file Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r=bbouvier — Details

Emscripten will generate eg f32.const + f32x4.splat instead of
f32x4.const when the same value is used for all lanes, presumably
because this is more compact bytecode. Fold the MIR node to recover
the constant, for all types.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → lhansen

Status: NEW → ASSIGNED

Thomas Lively

Comment 4

•

4 years ago

Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.

Lars T Hansen [:lth]

Assignee

Updated

•

4 years ago

Keywords: leave-open

Pulsebot

Comment 5

•

4 years ago

Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a5ba8c617b11
Fold scalar.const+NxM.splat as v128.const. r=bbouvier

Razvan Maries

Comment 6

•

4 years ago

Backed out for build bustages.

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=a5ba8c617b118404852cc3ed478c6e2fd07bbec3&selectedTaskRun=VK32a5A3RiOsZFNJYAzccQ-0

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305583819&repo=autoland&lineNumber=12842

Backout: https://hg.mozilla.org/integration/autoland/rev/5096e4f816e8616912859aad0fe61c9782c569c2

Flags: needinfo?(lhansen)

Lars T Hansen [:lth]

Assignee

Updated

•

4 years ago

Flags: needinfo?(lhansen)

Phabricator Automation

Updated

•

4 years ago

Attachment #9154878 - Attachment description: Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r?bbouvier → Bug 1643681 - Fold scalar.const+NxM.splat as v128.const. r=bbouvier

Pulsebot

Comment 7

•

4 years ago

Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19e5e3e94bf4
Fold scalar.const+NxM.splat as v128.const. r=bbouvier

Lars T Hansen [:lth]

Assignee

Comment 8

•

4 years ago

(In reply to Thomas Lively from comment #4)

Is there an assumption in emcc maybe about the more compact encoding being better, and being possible here (but not for the [1,2,3,4] case)?

Exactly. The LLVM code naively chooses between constructing the vector using a v128.const, a splat, or a swizzle based on which one will require the fewest follow up replace_lane instructions. It would be easy to assign weights to those choices in LLVM to make it prefer to use a const more and a splat less.

I see. And the shuffle results from builds of lane values such as

v128_t xx;
extern v128_t test(v128_t* a);
int main() {
    v128_t y = test(nullptr);
    xx = wasm_i32x4_make(wasm_i32x4_extract_lane(y, 1),
                         wasm_i32x4_extract_lane(y, 0),
                         wasm_i32x4_extract_lane(y, 2),
                         wasm_i32x4_extract_lane(y, 1));
    test(&xx);
}

which turns into eg

 000091: 10 80 80 80 80 00          | call 0 <env._Z4testPDv4_i>
 000097: 20 00                      | local.get 0
 000099: fd 0d 04 05 06 07 00 01 02 | v8x16.shuffle 0x07060504 0x03020100 0x0b0a0908 0x07060504
 0000a2: 03 08 09 0a 0b 04 05 06 07 |

(which we recognize as a PSHUFD, and wherein local 0 is just synthetic, y's home location being in memory). Probably done here then.

Keywords: leave-open

Natalia Csoregi [:nataliaCs]

Comment 9

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/19e5e3e94bf4

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Optimize emscripten exploded constants as proper constants

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

Tracking

()

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type