wasm: Baseline JIT regalloc spilling is broken

RESOLVED INVALID

Status

()

Core
JavaScript Engine: JIT
RESOLVED INVALID
2 years ago
2 years ago

People

(Reporter: lth, Assigned: lth)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
When the baseline jit was rewritten from preorder to postorder, an important aspect of the register allocator was lost: preserving allocated registers in the face of spilling.

Generally, the compiler structures computation as a sequence of pops (which allocate registers) followed by the operation and then an explicit freeing of the registers that are no longer needed.  In the current register allocator, when a register cannot be allocated it spills all registers by calling sync(), and this frees all registers.  Clearly this is wrong; other registers that were previously popped for the operation must not become candidates for allocation, but must instead be preserved.

That we have not seen consequences of this is probably a combination of testing the hard cases (eg Angry Bots) only on x64, where spilling happens rarely, and having jit-tests that don't much create register pressure.
(Assignee)

Comment 1

2 years ago
Created attachment 8777793 [details] [diff] [review]
bug1292129-spill-preserve.patch

This passes jit-test testing locally on x64 (not that that means much, obviously).  Please discuss.

What I've done here is a minimal solution: When allocating a GPR, pass in the live GPRs (zero, one, or two) so that they remain allocated; when allocating an FPR, pass in the live FPRs ditto (zero or one).

In a sense this is not the correct solution since a sync() when we're trying to allocate a GPR also frees FPRs and vice versa, and we'd sort of expect the register allocator to preserve the allocations of the other type.  As it happens, the compiler doesn't need to depend on that, given how it's structured.

Another thing worth noting is that I have opted not to pass in register sets, but to pass in individual registers.  I'm not sure what the right solution is.  Clearly these sets would only be needed in the case of a sync(), which should happen somewhat rarely (more often on x86) so the cost of iterating across them is probably not a big deal.  Not sure how costly it is to create them.  Open for discussion.
Attachment #8777793 - Flags: feedback?(hv1989)
Not sure if I'm a fan of how this works ... This clobbers the calls to pop/need/... a lot. And almost all those function now need variants with 0,1,2 registers ...

I'm wondering if we cannot do:

Have a datastructure keeping track which registers have been requested in any emitXXX functions. Upon requesting sync we throw away all registers, except the registers that are in the usedXXX datastructure.

{
  ...
  AllocatableGeneralRegisterSet availGPR_;
  AllocatableFloatRegisterSet availFPU_;
  AllocatableFloatRegisterSet usedGPR_;
  AllocatableFloatRegisterSet usedFPU_;
  ...
}

And use the usedGPR/FPU to keep track which registers has been asked.

usedGPR_.clear();
usedFPU_.clear();
emitXXX();

For every function we start with an empty usedXXX list. And the needXXX/popXXX functions take registers from the availXXX_ to the usedXXX_ list.

That way we can just call needXX without needing to specify which registers are already live. And it is similar to passing the registers set, but implicit instead of explicit.


The only issue we have here is that in emitXXX function we would need to reserve the most specific registers first. E.g. first allocate EAX, to make sure a regular needXXX doesn't take it first. But that seems very local and doable to keep in mind.
(Assignee)

Comment 3

2 years ago
(In reply to Hannes Verschore [:h4writer] from comment #2)
> Not sure if I'm a fan of how this works ... This clobbers the calls to
> pop/need/... a lot. And almost all those function now need variants with
> 0,1,2 registers ...

I don't understand what /a lot/ means.  We need exactly what's in the code in this patch, which does not seem like much.

> I'm wondering if we cannot do:
> 
> Have a datastructure keeping track which registers have been requested in
> any emitXXX functions. Upon requesting sync we throw away all registers,
> except the registers that are in the usedXXX datastructure.
> 
> {
>   ...
>   AllocatableGeneralRegisterSet availGPR_;
>   AllocatableFloatRegisterSet availFPU_;
>   AllocatableFloatRegisterSet usedGPR_;
>   AllocatableFloatRegisterSet usedFPU_;
>   ...
> }
> 
> And use the usedGPR/FPU to keep track which registers has been asked.
> 
> usedGPR_.clear();
> usedFPU_.clear();
> emitXXX();
> 
> For every function we start with an empty usedXXX list. And the
> needXXX/popXXX functions take registers from the availXXX_ to the usedXXX_
> list.
> 
> That way we can just call needXX without needing to specify which registers
> are already live. And it is similar to passing the registers set, but
> implicit instead of explicit.

I suppose.  I'd want to see some code first :)

There's actually a different possibility, and I'm not sure yet whether it's orthogonal or simply an alternative.  When the need() functions perform a sync, this is generally not desirable because it spills things that might not need to be spilled.  What they could do is pick a register that is not going to be used by the emitter that is requesting a register, and spill that, and return that.  (This is sometimes a little tricky due to the invariant on the expression stack that everything below the first Mem item is spilled.)  I suspect that that requires not a global list, like you suggest, nor a type-limited local list, like I suggest.

Another mild concern I have is about the cost of maintaining the register sets, but I don't have a good intuition about what that will be.  I don't want more loops than necessary but clear() should not require any, we should only need to iterate on sync(), which is expensive anyway.  And clearly we're paying something to be passing the LiveGPR/LiveFPU parameters too.

> The only issue we have here is that in emitXXX function we would need to
> reserve the most specific registers first. E.g. first allocate EAX, to make
> sure a regular needXXX doesn't take it first. But that seems very local and
> doable to keep in mind.

It may require some mild restructuring here and there but I don't see that as a showstopper.
(Assignee)

Comment 4

2 years ago
Actually, I'm starting to doubt that this is a problem at all.  The error in my thinking is that sync() frees all registers, but that's not right: sync() only frees registers that are on the stack, and when eg popI32() calls sync the previously popped values are no longer on the stack (and registers reserved with eg needI32() are not either).  Hence previously popped registers will not be freed here, as desired.
(Assignee)

Comment 5

2 years ago
Closing, following discussion with Hannes on IRC.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INVALID
Attachment #8777793 - Flags: feedback?(hv1989)
You need to log in before you can comment on or make changes to this bug.