Closed Bug 1699192 Opened 3 years ago Closed 2 years ago

[exploration] Experiment with AVX encoding and (maybe) assumed-aligned loads in simd wormhole


(Core :: JavaScript: WebAssembly, task, P3)






(Reporter: lth, Unassigned)


(Blocks 2 open bugs)


Since it's the workhorse of inner loops in the machine learning codes, the WHPMADDUBSW operation could usefully use an AVX encoding (to avoid clobbering a register whose value is needed, thus necessitating an additional move to preserve that value). This will be a little tricky, because we do not want to enable AVX for any other instructions at all, yet the encoding is chosen fairly deep down in the pipeline. Probably this means changing the AVX test in the encoder from if (AVXPresent(...)) { ... } else { ... } to if (AVXPresent(...) || op == WHPMADDUBSW && AVXReallyPresent(...) { ... } else { ... } since the AVXPresent predicate is subject to various switches that are off (and shall remain off).

Another issue here is that we're not able to fuse a v128.load into a WHPMADDUBSW. I'm not sure how valuable this is - if the code preloads a bunch of registers and then operates on them then there's no sense in trying to fuse anything, but if it consists of load-and-operate pairs then the matter is different. But the problem is that fusing only works if the load is aligned, and we have no guarantee of that. We could do an exception handler fixup of unaligned loads but this is basically going to be a mess. But for starters we could look at the code to see if it would match the pattern, and if it does then we could experimentally try for a fusing, and then we could measure the result to see if there's an improvement.

Related discussion here:

See Also: → simd-avx2

We may solve this differently and it's not a priority right now to investigate this.

Assignee: lhansen → nobody
Blocks: 1713056
Type: enhancement → task
Summary: Experiment with AVX encoding and (maybe) assumed-aligned loads in simd wormhole → [exploration] Experiment with AVX encoding and (maybe) assumed-aligned loads in simd wormhole

This optimization is too narrow. Also, looking at intgemm multiply code, it is rarely direct memory operands for pmaddubsw.

We're intending to phase out the wormhole.

Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.