Closed Bug 1712321 Opened 3 years ago Closed 3 years ago

64-bit imul by small constant yields unnecessary constant setup

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

x86_64
All
enhancement

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox91 --- fixed

People

(Reporter: lth, Assigned: lukas.bernhard, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: good-first-bug)

Attachments

(1 file)

A wasm i64 multiply by 5 yields this code:

0000002A  41 bb 05 00 00 00         mov $0x05, %r11d
00000030  49 0f af c3               imul %r11, %rax

which is because of this code in MacroAssembler-x64-inl.h:

void MacroAssembler::mul64(Imm64 imm, const Register64& dest) {
  movq(ImmWord(uintptr_t(imm.value)), ScratchReg);
  imulq(ScratchReg, dest.reg);
}

which seems to be an artifact of there not being an imulq variant in the assembler that takes an immediate operand, though there is a variant that takes an imm32 in the instruction set.

Mentor: lhansen
Keywords: good-first-bug

If we do not need overflow checks, we should implement it as:

  lea [%rax + %rax * 4], %r11
See Also: → 1275442

Indeed, better strength reduction would be good too! Strength reduction is pretty ad-hoc at this point: -1, 0, 1, 2, and other powers of 2, but no work on (2^n)+1 (becomes LEA for small n), reduction to a sequence of adds, etc.

See Also: → 1712303

I would like to work on this.

(In reply to Mang Yau from comment #3)

I would like to work on this.

Sure thing. I think you should disregard the discussion in comment 1 and later for now, and only try to fix the problem outlined in comment 0.

This bug may be a little challenging, not because it's hard but because there are a number of concepts and a fair amount of code that you may need to understand to fix it.

Mang Yau are you still on this issue or can I give it a try?

(In reply to lukas.bernhard from comment #5)

Mang Yau are you still on this issue or can I give it a try?

Hey Lukas, I have been a bit busy and haven't had much time to dedicate to this issue. You can give it a try :)

Assignee: nobody → lukas.bernhard
Status: NEW → ASSIGNED

While working on the bug and reading surrounding multiplication code I noticed pop2xI64ForMulI64 (on x64) makes (seemingly) unnecessary assumptions about rdx being clobbered; this shouldn't be the case for a reg*reg multiplication so a pop2xI64(r0, r1); should be sufficient.
Furthermore: while the constant setup has been removed as described in the ticket the generated code remains suboptimal. In particular, multiplication now emits code such as:

movq rcx, rax
imulq imm8/32, rax, rax

Instead, a imulq imm8/32, rcx, rax would be sufficient, saving one reg->reg move (imul with immediate can select source register and dest register independently).
inline void mul64(Imm64 src1, const Register64& src2, const Register64& dest) could be made available on x64, with a special case if src2==dest + src1 cannot be encoded as immediate.
Generating this code seems to require changes to CodeGenerator::visitMulI64 in CodeGenerator-x86-shared.cpp (due to some code being shared with x86-32); if this change is desirable I can give it a try (or work on something else if optimizing a mov reg->reg is deemed too effortful).

Patches are in general welcome (do file new bugs for new issues). pop2xI64ForMulI64 is in the baseline compiler so we may not care all that much about micro-optimizations like these, but anything that affects code generation in ion (the optimizing compiler) is interesting in principle.

Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/574004fd100a
Remove unnecessary constant setup for 64-bit imul. r=lth
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: