Closed Bug 1336027 Opened 3 years ago Closed 2 years ago

Wasm baseline: Abstractions, large and small

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: lth, Assigned: lth)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: leave-open)

Attachments

(1 file, 2 obsolete files)

The baseline compiler is starting to settle down, so it will soon be time to clean up the code in a general way and introduce some abstractions.  The goals are to make the code easier to understand and to introduce some lightweight protection barriers that can be checked mechanically, where there is now only a gentlemen's agreement and a comment that's easily missed.

In the class of small abstractions, the low-level register allocators can be broken out as classes (in the same file) and their APIs can be made more predictable.  They will mostly end up as shims around the existing register allocation machinery, but this will clarify things.

In the class of larger abstractions, Luke has been pushing for a structure more like WasmIonCompiler, with top level emitter functions that accept a function generator object as a parameter.  Personally I'm not fond of this style but it's certainly something to discuss, and maybe it'll lead to something useful.

Some of the smaller things can land soon, if they are ready; the larger things should wait until we've branched for FF54, as most major changes to the compiler will have landed then (debugging, asm.js-ectomy)
I think there is value in breaking out classes insofar as you can pick a group of related fields with some invariant that can be maintained by some more-minimal interface (than just poking at the fields directly).  So a register allocator class sounds like a great idea.

For WasmIonCompile.cpp/FunctionCompiler, maybe you're right.  The design of FunctionCompiler was mostly derived (as in literally cleaved) from asm.js validation in AsmJS.cpp and originally AsmJS.cpp contained both AST validation and MIR generation that were fused into one linear pass over the AST.  So there was quite a lot to do for each operator and thus a big readability win to separate the asm.js validation logic (which was fairly stateless and depending only on some fixed environment) into non-member functions that called into a FunctionCompiler that did all the MIR stuff.  But now that these two algorithms have been un-fused (to allow them to run on different threads) and the wasm bytecode traversal/validation has been mostly factored out into OpIter and also made non-recursive, you're probably right that there's no longer much value in having the FunctionCompiler/non-member-function split.
Assignee: nobody → lhansen
The wasm baseline compiler has too much platform code under #ifdefs (and it'll become worse, with the atomics).  This patch factors a few of those functions having to do with int32->int64 conversion (ie sign extension and zero extension) and int64->int32 conversion (chopping) and reinterpretation (double->int64 and back) into the masm layer.

Once these have landed there will be the possibility for using these from the various CodeGenerator methods, but that'll have to be the subject of another bug.
Attachment #8918250 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8918250 [details] [diff] [review]
bug1336027-move-platform-functions-to-masm.patch

Review of attachment 8918250 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm-inl.h
@@ +82,5 @@
> +void
> +MacroAssembler::move8To64SignExtend(Register src, Register64 dest)
> +{
> +    as_sxtb(dest.low, src, 0);
> +    ma_asr(Imm32(31), src, dest.high);

nit: ma_asr(Imm32(31), dest.low, dest.high);

As far as we know src could exactly fit on 8 bits, and copying the sign bit would result in a 0 high part, instead of -1 high part, leading to a coercion error.

@@ +89,5 @@
> +void
> +MacroAssembler::move16To64SignExtend(Register src, Register64 dest)
> +{
> +    as_sxth(dest.low, src, 0);
> +    ma_asr(Imm32(31), src, dest.high);

nit: ma_asr(Imm32(31), dest.low, dest.high);

@@ +95,5 @@
> +
> +void
> +MacroAssembler::move32To64SignExtend(Register src, Register64 dest)
> +{
> +    move32(src, dest.low);

nit: if (src != dest.low)  ?

::: js/src/jit/mips64/MacroAssembler-mips64-inl.h
@@ +30,5 @@
>  
> +void
> +MacroAssembler::moveDoubleToGPR64(FloatRegister src, Register64 dest)
> +{
> +    MOZ_CRASH("NYI: moveDoubleToGPR64");

nit: MIPS is no longer a tier1 priority for the check_macroassembler_style script.  Removing these function might be better than keeping them with a MOZ_CRASH.

::: js/src/jit/x86/MacroAssembler-x86-inl.h
@@ +30,5 @@
>      movl(src.high, dest.high);
>  }
>  
> +void
> +MacroAssembler::moveDoubleToGPR64(FloatRegister src, Register64 dest)

nit: have a look at unboxDouble and boxDouble implementation:
http://searchfox.org/mozilla-central/source/js/src/jit/x86/MacroAssembler-x86.h#639,681
Attachment #8918250 - Flags: review?(nicolas.b.pierron) → review+
Keywords: leave-open
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> Comment on attachment 8918250 [details] [diff] [review]
> bug1336027-move-platform-functions-to-masm.patch
> 
> Review of attachment 8918250 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/arm/MacroAssembler-arm-inl.h
> @@ +82,5 @@
> > +void
> > +MacroAssembler::move8To64SignExtend(Register src, Register64 dest)
> > +{
> > +    as_sxtb(dest.low, src, 0);
> > +    ma_asr(Imm32(31), src, dest.high);
> 
> nit: ma_asr(Imm32(31), dest.low, dest.high);

Oh, good catch.  Refactoring bug, thankfully.

> ::: js/src/jit/mips64/MacroAssembler-mips64-inl.h
> @@ +30,5 @@
> >  
> > +void
> > +MacroAssembler::moveDoubleToGPR64(FloatRegister src, Register64 dest)
> > +{
> > +    MOZ_CRASH("NYI: moveDoubleToGPR64");
> 
> nit: MIPS is no longer a tier1 priority for the check_macroassembler_style
> script.  Removing these function might be better than keeping them with a
> MOZ_CRASH.

We don't seem to have a coherent policy on this (sometimes the reviewer say "maybe implement MIPS as a courtesy") so I'll keep these, since MIPS does not implement wasm yet and not having these will make their try runs and landings fail for no good reason.

> ::: js/src/jit/x86/MacroAssembler-x86-inl.h
> @@ +30,5 @@
> >      movl(src.high, dest.high);
> >  }
> >  
> > +void
> > +MacroAssembler::moveDoubleToGPR64(FloatRegister src, Register64 dest)
> 
> nit: have a look at unboxDouble and boxDouble implementation:
> http://searchfox.org/mozilla-central/source/js/src/jit/x86/MacroAssembler-
> x86.h#639,681

Good tip, thanks.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4c074bb5d8f
wasm baseline, move platform functionality into MacroAssembler layer. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4c074bb5d8fa683dcb082913ffd64442ddc3314
Bug 1336027 - wasm baseline, move platform functionality into MacroAssembler layer. r=nbp
Refactor register management.

Primarily this lifts the Reg{F,I}{32,64} types and the ScratchRegisterScope definitions out of the BaseCompiler and creates a new BaseRegAlloc class that encapsulates register sets and provides register allocation services.  Inline wrappers in BaseCompiler forward transparently to the new regalloc member so that there is not too much churn here.

For the refactoring to work out I had to create a BaseCompilerBase type that can be used by the hoisted definitions, and I had to virtualize the sync() method of BaseCompiler.  This should not be a performance issue because that function is already large and slow and should not be inlined in the first place; there is also enough type information here for the C++ compiler to avoid the indirect call, but even if the compiler does not devirtualize it should be fine.

Secondarily there is some cleanup, including naming cleanup.  I have decided for now to stick to my existing scheme where the register type is encoded in the method name even when that is redundant, because (a) sometimes it must be there anyway, as for needI32(), and (b) sometimes it is confusing, even if unambiguous, if it were to be omitted.  For example, the specific-register allocators need2xI32(a,b) and need2xI64(a,b) are distinguishable by the types of the arguments, but omitting the type names they would both probably end up having the same name, and that does not seem helpful.
Attachment #8919347 - Flags: review?(bbouvier)
Comment on attachment 8919347 [details] [diff] [review]
bug1336027-regalloc-refactor.patch

Review of attachment 8919347 [details] [diff] [review]:
-----------------------------------------------------------------

I like the idea of splitting this, thanks! I've got a suggestion though, below. (in any case, if you don't mind, can you use mozreview for this patch? it will make code motion easier to review)

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +354,5 @@
> +            return AnyRegister(i64_.reg);
> +#else
> +            // The compiler is written so that this is never needed: any() is called
> +            // on arbitrary registers for asm.js but asm.js does not have 64-bit ints.
> +            // For wasm, any() is called on arbitrary registers only on 64-bit platforms.

nit: comments must fit on 79 chars

@@ +370,5 @@
> +        RegI64 i64_;
> +        RegF32 f32_;
> +        RegF64 f64_;
> +    };
> +    enum { NONE, I32, I64, F32, F64 } tag;

A random idea; would it make sense to remove NONE from this list and use a Maybe<AnyReg> in all the places that would expect a NONE reg instead? As suggested in another bug, probably a few `if (!reg.isInvalid()) do_something(reg)` could be replaced with `maybe_reg.map(do_something)` instead.

Also another random idea: this could just be a mozilla::Variant.

@@ +373,5 @@
> +    };
> +    enum { NONE, I32, I64, F32, F64 } tag;
> +};
> +
> +class BaseCompilerBase

Would it make sense to also take care of this refactoring bug to rename BaseCompiler to BaselineCompiler? It bite me a few times in the past when debugging or searching code. And now with BaseCompilerBase, the confusion is total.

@@ +391,5 @@
> +    bool scratchRegisterTaken() const {
> +        return scratchTaken;
> +    }
> +
> +    void setScratchRegisterTaken(bool state) {

Could these functions be in the RegAlloc class instead, the setter could be removed and the getter would just check if scratch is in availableGPR? (also, it handles only ScratchI32, is that intended?)

@@ +396,5 @@
> +        scratchTaken = state;
> +    }
> +#endif
> +
> +    virtual void sync() = 0;

I understand managing the virtual stack is a responsibility of the BaselineCompiler class, but sync() also does spilling, which is the responsibility of the register allocator.

How about we make the BaselineCompiler class check whether there are available registers and sync if needed? (so put back the needI32 etc. methods there, but have the RegAlloc have class to return a RegI32 etc, without the sync() check) Then we don't even need the virtual base class at all, and responsibilities are better enclosed?

@@ +418,5 @@
> +    //    registers.
> +    //  - notes in Architecture-arm.h indicate that when we use a float register
> +    //    that aliases a double register we only use the low float register,
> +    //    never the high float register.  I think those notes lie, or at least
> +    //    are confusing.

Then let's make these notes not lie, instead?

@@ +545,5 @@
> +#ifdef JS_CODEGEN_X86
> +      , singleByteRegs(GeneralRegisterSet(Registers::SingleByteRegs))
> +#endif
> +    {
> +        // jit/RegisterAllocator.h: RegisterAllocator::RegisterAllocator()

How about factorizing this out?
Attachment #8919347 - Flags: review?(bbouvier) → feedback+
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)
> Comment on attachment 8919347 [details] [diff] [review]
> bug1336027-regalloc-refactor.patch
>
> @@ +370,5 @@
> > +        RegI64 i64_;
> > +        RegF32 f32_;
> > +        RegF64 f64_;
> > +    };
> > +    enum { NONE, I32, I64, F32, F64 } tag;
> 
> A random idea; would it make sense to remove NONE from this list and use a
> Maybe<AnyReg> in all the places that would expect a NONE reg instead? As
> suggested in another bug, probably a few `if (!reg.isInvalid())
> do_something(reg)` could be replaced with `maybe_reg.map(do_something)`
> instead.

That adds a tag field where we already have one, so I don't think so.

> Also another random idea: this could just be a mozilla::Variant.

That is more meaningful, I'll look into it.  Not if it makes the code significantly more complex though.
 
> @@ +373,5 @@
> > +    };
> > +    enum { NONE, I32, I64, F32, F64 } tag;
> > +};
> > +
> > +class BaseCompilerBase
> 
> Would it make sense to also take care of this refactoring bug to rename
> BaseCompiler to BaselineCompiler? It bite me a few times in the past when
> debugging or searching code. And now with BaseCompilerBase, the confusion is
> total.

It's "BaseCompiler" because Luke did not like "FunctionCompiler" and thought the "Baseline" prefix was too long:
https://bugzilla.mozilla.org/show_bug.cgi?id=1232205#c43.  I don't care very much, but I'd prefer not to change it now.

> @@ +391,5 @@
> > +    bool scratchRegisterTaken() const {
> > +        return scratchTaken;
> > +    }
> > +
> > +    void setScratchRegisterTaken(bool state) {
> 
> Could these functions be in the RegAlloc class instead, the setter could be
> removed and the getter would just check if scratch is in availableGPR?

I'll look into it, but I don't know if that's simplifying anything, and we need the BaseCompilerBase for the sync virtual anyway.

> (also, it handles only ScratchI32, is that intended?)

Yes, because that's a special case for x86.  (Existing code; this has just moved.)

> @@ +396,5 @@
> > +        scratchTaken = state;
> > +    }
> > +#endif
> > +
> > +    virtual void sync() = 0;
> 
> I understand managing the virtual stack is a responsibility of the
> BaselineCompiler class, but sync() also does spilling, which is the
> responsibility of the register allocator.

sync() in its current incarnation is not spilling per se, and since it traverses the compiler's value stack it is definitely not a natural part of the register allocator.

> How about we make the BaselineCompiler class check whether there are
> available registers and sync if needed? (so put back the needI32 etc.
> methods there, but have the RegAlloc have class to return a RegI32 etc,
> without the sync() check) Then we don't even need the virtual base class at
> all, and responsibilities are better enclosed?

I don't think so.  Getting the needI32() etc members out of the BaseCompiler was a welcome piece of cleanup, it removes most matters around the weakly typed register types from the BaseCompiler.  I would have liked not to have the virtual but in terms of logic there's nothing particularly difficult about it.

In a perfect world we might put the value stack and the regalloc together, but then the value stack is intertwined with the rest of the compiler in various ways and it becomes difficult to make a clean cut between the value stack and the rest of the compiler.

> @@ +418,5 @@
> > +    //    registers.
> > +    //  - notes in Architecture-arm.h indicate that when we use a float register
> > +    //    that aliases a double register we only use the low float register,
> > +    //    never the high float register.  I think those notes lie, or at least
> > +    //    are confusing.
> 
> Then let's make these notes not lie, instead?

Yeah, whenever I get around to it.  I just moved this comment.

> @@ +545,5 @@
> > +#ifdef JS_CODEGEN_X86
> > +      , singleByteRegs(GeneralRegisterSet(Registers::SingleByteRegs))
> > +#endif
> > +    {
> > +        // jit/RegisterAllocator.h: RegisterAllocator::RegisterAllocator()
> 
> How about factorizing this out?

I could probably factor parts of it, but the baseline compiler has additional needs, this code is not a literal copy.  I'd probably be more inclined to flesh out the comment.  I'll see what I can do.
Another idea from bug 1377576 is to get rid of a bunch of noise resulting from common code paths where a register that needs to be freed is either an actual register or an invalid register.  (There's noise around the allocation sometimes too, notably around temp registers.)

It's not completely obvious that Maybe<RegI32> is the right solution because that introduces another tag where InvalidReg is already such a tag, and the need for additional guards; it may be that an abstraction like maybeFreeI32() is a better approach.  But either way something better than "if (r != invalidI32()) freeI32(r)" here there and everywhere is wanted.
Priority: P3 → P2
Attachment #8919347 - Attachment is obsolete: true
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)
> Comment on attachment 8919347 [details] [diff] [review]
> bug1336027-regalloc-refactor.patch

Just to follow up on the new patch and the old discussion:

> A random idea; would it make sense to remove NONE from [AnyReg] and use a
> Maybe<AnyReg> in all the places that would expect a NONE reg instead?

Did that, works fine.

> Also another random idea: this could just be a mozilla::Variant.

Tried that for a while, but basically this just ended up moving stuff around without simplifying anything, so I abandoned the idea.

> Would it make sense to also take care of this refactoring bug to rename
> BaseCompiler to BaselineCompiler?

Didn't do that for reasons discussed in comment 10.

> Could [the scratch register] functions be in the RegAlloc class instead,

Did that, and I like the result.

> the setter could be removed and the getter would just check if scratch is in availableGPR?

Did not do that, since it's a special case and does not need the generality of that mechanism, certainly not yet.

> [about where sync() lives]
> 
> I understand managing the virtual stack is a responsibility of the
> BaselineCompiler class, but sync() also does spilling, which is the
> responsibility of the register allocator.
> 
> How about we make the BaselineCompiler class check whether there are
> available registers and sync if needed? (so put back the needI32 etc.
> methods there, but have the RegAlloc have class to return a RegI32 etc,
> without the sync() check) Then we don't even need the virtual base class at
> all, and responsibilities are better enclosed?

Did not do that for reasons spelled out in comment 10, but with the scratch register machinery gone from BaseCompilerBase I renamed it to BaseCompilerInterface and it is now just a container for virtual functions.  This is OK for now.  Eventually sync() will probably evolve to have a richer interface and we can revisit needs then.

> @@ +418,5 @@
> > +    //    registers.
> > +    //  - notes in Architecture-arm.h indicate that when we use a float register
> > +    //    that aliases a double register we only use the low float register,
> > +    //    never the high float register.  I think those notes lie, or at least
> > +    //    are confusing.
> 
> Then let's make these notes not lie, instead?

The comment is gone from Architecture-arm.h so I removed this one too.

> @@ +545,5 @@
> > +#ifdef JS_CODEGEN_X86
> > +      , singleByteRegs(GeneralRegisterSet(Registers::SingleByteRegs))
> > +#endif
> > +    {
> > +        // jit/RegisterAllocator.h: RegisterAllocator::RegisterAllocator()
> 
> How about factorizing this out?

Did that.

> Another idea from bug 1377576 is to get rid of a bunch of noise resulting from common code paths where a register that needs to be freed is either an actual register or an invalid register.  (There's noise around the allocation sometimes too, notably around temp registers.)
> 
> It's not completely obvious that Maybe<RegI32> is the right solution because that introduces another tag where InvalidReg is already such a tag, and the need for additional guards; it may be that an abstraction like maybeFreeI32() is a better approach.  But either way something better than "if (r != invalidI32()) freeI32(r)" here there and everywhere is wanted.

Ended up cleaning up temp register management and also introduced maybeFreeI32(), the result is very pleasing I think and does not need the complexity of Maybe, where we already have "invalid" as a way of tracking the same information.
And I should mention: this sits on top of a fairly large patch queue for wasm atomics at the moment, and there are a few artifacts in here from that work, which may confuse you if you try to apply it to the current tip.
Comment on attachment 8927300 [details]
Bug 1336027 - wasm baseline, refactor registers and register allocation,

https://reviewboard.mozilla.org/r/198612/#review203736


C/C++ static analysis found 1 defect in this patch.

You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp`


::: js/src/wasm/WasmBaselineCompile.cpp:1583
(Diff revision 1)
>      //    prioritizing registers differently (takeLast instead of
>      //    takeFirst) but we may also be able to allocate an unused
>      //    register on demand to free up one we need, thus avoiding the
>      //    sync.  That type of fix would go into needI32().
>  
> -    void sync() {
> +    void sync() override final {

Warning: 'override' is redundant since the function is already declared 'final' [clang-tidy: modernize-use-override]
Comment on attachment 8927300 [details]
Bug 1336027 - wasm baseline, refactor registers and register allocation,

https://reviewboard.mozilla.org/r/198612/#review204194

Looks good to me, thanks! the maybeFree methods make it much easier to read indeed. I've left a new set of suggestions, with an rs=me on them if you think they make sense and are valuable. Otherwise, even the patch as is is a great improvement to the interfaces in the baseline compiler, so r=me on that. Thank you for the patch!

::: js/src/wasm/WasmBaselineCompile.cpp:385
(Diff revision 2)
> +        RegF64 f64_;
> +    };
> +    enum { I32, I64, F32, F64 } tag;
> +};
> +
> +class BaseCompilerInterface

Random idea, rename it Sync and have its only function be a virtual operator()() instead?

or even pass a std::lambda to the class below?

::: js/src/wasm/WasmBaselineCompile.cpp:477
(Diff revision 2)
> +    }
> +
> +    void allocInt64(Register64 r) {
> +#ifdef JS_PUNBOX64
> +        MOZ_ASSERT(isAvailableGPR(r.reg));
> +        availGPR.take(r.reg);

Could just be a call to `allocGPR(r.reg)`? (and two calls in the case of 32 bits)

::: js/src/wasm/WasmBaselineCompile.cpp:511
(Diff revision 2)
> +                return true;
> +        }
> +        return false;
> +    }
> +
> +    void allocGPRPair(Register* low, Register* high) {

For symmetry with other `alloc` functions, could we assert `hasGPRPair` here?

::: js/src/wasm/WasmBaselineCompile.cpp:543
(Diff revision 2)
> +        availGPR.add(r);
> +    }
> +
> +    void freeInt64(Register64 r) {
> +#ifdef JS_PUNBOX64
> +        availGPR.add(r.reg);

(in the spirit of the above suggestion, could be `freeGPR(r.reg)`)

::: js/src/wasm/WasmBaselineCompile.cpp:720
(Diff revision 2)
> +
> +    void addKnownF64(RegF64 r) {
> +        knownFPU.add(r);
> +    }
> +
> +    void endLeakCheck() {

Since we have methods with names starting with `start` and then `end`, maybe we could use an external RAII class here, and knownGPR/knownFPU could go within this RAII class instead of the regalloc class?

::: js/src/wasm/WasmBaselineCompile.cpp:1310
(Diff revision 2)
>          needI32(except);
>  #endif
>      }
>  
> -    void freeF64(RegF64 r) {
> -        freeFPU(r);
> +    void maybeFreeI32(RegI32 r) {
> +	if (r != invalidI32())

rs=me to add new method `bool isInvalid() const` in various register classes
Attachment #8927300 - Flags: review?(bbouvier) → review+
Depends on: 1419025
Comment on attachment 8927300 [details]
Bug 1336027 - wasm baseline, refactor registers and register allocation,

Moved the patch to bug 1419025; I'm making this bug a metabug.
Attachment #8927300 - Attachment is obsolete: true
Depends on: 1419034
There are a few more things we can do here after factoring out the regalloc and the frame.

We should at least logically separate operations on the value stack from other operations; this is just moving code around.  Also a little cleanup around the loading of constants.

We should consistenly choose a op(src, dest) format for methods that move stuff.  After the previous item the only operations that don't follow that pattern are operations on the value stack, which consistently use an op(dest, src) format.  They are strongly typed (Reg, Stk&) and (Stk&, reg), so there's no space for confusion, but since fixing this is just a matter of moving a lot of text around this is a good time to do it.

We *could* move the Stk type and its operations out of BaseCompiler but since this type is completely encapsulated already and really a nice ADT I don't think that would have much value at this time.  If/when the register allocator becomes smarter about requesting how much to spill, that might change.
Depends on: 1420104
Additionally, there's a piece of technical debt I'm not sure I want to take on right now but it definitely belongs here: when the atomics code landed, it landed with a fair amount of regalloc and platform logic in what is technically the non-platform layer (because, 64-bit atomics on 32-bit platforms, especially x86).  It would be nice to do something about that.  Maybe there's a generic pattern that can be applied to all the x86-specific logic in the non-platform layer.
Depends on: 1420158
Depends on: 1421993
Depends on: 1423477
No longer blocks: Rabaldr-ARM64
The issue in comment 20 was mostly fixed by bug 1421993; what's left we can live with.

Of the issues in comment 19, we won't do the last one (moving Stk out of BaseCompiler), at least not yet; the others are done.

That leaves one item, the RAII register management, and that's a separate bug that can just live by itself for a while.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.