64-bit imul by small constant yields unnecessary constant setup
Categories
(Core :: JavaScript Engine: JIT, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox91 | --- | fixed |
People
(Reporter: lth, Assigned: lukas.bernhard, Mentored)
References
(Blocks 1 open bug)
Details
(Keywords: good-first-bug)
Attachments
(1 file)
A wasm i64 multiply by 5 yields this code:
0000002A 41 bb 05 00 00 00 mov $0x05, %r11d
00000030 49 0f af c3 imul %r11, %rax
which is because of this code in MacroAssembler-x64-inl.h:
void MacroAssembler::mul64(Imm64 imm, const Register64& dest) {
movq(ImmWord(uintptr_t(imm.value)), ScratchReg);
imulq(ScratchReg, dest.reg);
}
which seems to be an artifact of there not being an imulq variant in the assembler that takes an immediate operand, though there is a variant that takes an imm32 in the instruction set.
Reporter | ||
Updated•3 years ago
|
Comment 1•3 years ago
|
||
If we do not need overflow checks, we should implement it as:
lea [%rax + %rax * 4], %r11
Reporter | ||
Comment 2•3 years ago
|
||
Indeed, better strength reduction would be good too! Strength reduction is pretty ad-hoc at this point: -1, 0, 1, 2, and other powers of 2, but no work on (2^n)+1 (becomes LEA for small n), reduction to a sequence of adds, etc.
Reporter | ||
Comment 4•3 years ago
|
||
(In reply to Mang Yau from comment #3)
I would like to work on this.
Sure thing. I think you should disregard the discussion in comment 1 and later for now, and only try to fix the problem outlined in comment 0.
This bug may be a little challenging, not because it's hard but because there are a number of concepts and a fair amount of code that you may need to understand to fix it.
Assignee | ||
Comment 5•3 years ago
|
||
Mang Yau are you still on this issue or can I give it a try?
(In reply to lukas.bernhard from comment #5)
Mang Yau are you still on this issue or can I give it a try?
Hey Lukas, I have been a bit busy and haven't had much time to dedicate to this issue. You can give it a try :)
Assignee | ||
Comment 7•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
While working on the bug and reading surrounding multiplication code I noticed pop2xI64ForMulI64
(on x64) makes (seemingly) unnecessary assumptions about rdx
being clobbered; this shouldn't be the case for a reg*reg multiplication so a pop2xI64(r0, r1);
should be sufficient.
Furthermore: while the constant setup has been removed as described in the ticket the generated code remains suboptimal. In particular, multiplication now emits code such as:
movq rcx, rax
imulq imm8/32, rax, rax
Instead, a imulq imm8/32, rcx, rax
would be sufficient, saving one reg->reg move (imul with immediate can select source register and dest register independently).
inline void mul64(Imm64 src1, const Register64& src2, const Register64& dest)
could be made available on x64, with a special case if src2==dest + src1 cannot be encoded as immediate.
Generating this code seems to require changes to CodeGenerator::visitMulI64
in CodeGenerator-x86-shared.cpp (due to some code being shared with x86-32); if this change is desirable I can give it a try (or work on something else if optimizing a mov reg->reg is deemed too effortful).
Reporter | ||
Comment 9•3 years ago
|
||
Patches are in general welcome (do file new bugs for new issues). pop2xI64ForMulI64 is in the baseline compiler so we may not care all that much about micro-optimizations like these, but anything that affects code generation in ion (the optimizing compiler) is interesting in principle.
Comment 10•3 years ago
|
||
Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/autoland/rev/574004fd100a Remove unnecessary constant setup for 64-bit imul. r=lth
Comment 11•3 years ago
|
||
bugherder |
Description
•