Closed Bug 1420158 Opened 2 years ago Closed 2 years ago

Rabaldr, use RAII to manage regalloc platform dependencies

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(4 files)

When the atomics code landed, it landed with a fair amount of regalloc and platform logic in what is technically the non-platform layer - there are many hardware constraints here, and though 64-bit atomics on x86 are worst, all the platforms are bad in some way.

For some simpler cases in the past I have used function abstraction here, see eg pop2xI32ForIntMulDiv(), which pops to specific registers on Intel platforms but to general registers otherwise.  It works well there because after popping the input values the registers are treated the same on all platforms.

For the atomics, where we not only allocate to specific registers in specific orders but also free registers in a platform-dependent way, a more general approach uses an RAII type (which can live in the platform layer) to encapsulate allocation and deallocation of the registers, leaving the platform-independent code clean.
Summary: Rabaldr, use RAII to manage regalloc platform deps → Rabaldr, use RAII to manage regalloc platform dependencies
This is just a sample but I figured I would ask for some feedback before I take it much further.

The basic idea is to move regalloc code from the platform-independent layer of the compiler (further down in the file, after the BaseCompiler definition) to the platform-dependent layer (further up, inside the definition) and to place allocation and deallocation next to each other in constructor and destructor within the RAII class.  The RAII class will dealloc every register it holds onto; non-dealloc'd registers are out parameters to the constructor.

This is a little messy since the RAII classes must call back into BaseCompiler for pretty much every service, but since the RAII classes are just inner classes with a reference to an instance of the class they are defined within, this feels manageable.  And these inner classes are small.

(At the moment I just plopped the new classes down a little randomly but they could be placed very close to code that uses them and thus create a tighter connection than there was before between regalloc code and regusing code.)
Attachment #8931369 - Flags: feedback?(bbouvier)
Comment on attachment 8931369 [details] [diff] [review]
bug1420158-regalloc-as-raii.patch

Review of attachment 8931369 [details] [diff] [review]:
-----------------------------------------------------------------

I like the idea of having all the per-platform register constraints hidden under these RAII classes. We could have a goal of having all the platform specific code in a given file section of the WasmBaselineCompile.cpp file, and these RAII classes could fit in there; it would be very nice when porting to new platforms, since we'd know we'd have only to tweak all these platform specific stubs at e.g. the top of the file. What do you think?

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +4154,5 @@
> +    }
> +#endif
> +
> +    // Register allocation for AtomicCmpXchg with access size <= 32 bits.
> +    friend class AtomicCmpXchg32Regs;

nit: friend struct?

@@ +7778,5 @@
> +            AccessCheck check;
> +            RegI32 rp = popMemoryAccess(&access, &check);
> +            RegI32 tls = maybeLoadTlsForAccess(check);
> +
> +            atomicCompareExchange(&access, &check, tls, rp, r.rexpect, r.rnew, rd);

This method could even take the AtomicCmpXchg32Regs class as the input, and remove one parameter.
Attachment #8931369 - Flags: feedback?(bbouvier) → feedback+
Technical debt to consider here: bug 1420104 comment 18.
Priority: -- → P3
I should note that (a) those patches sit on top of the patches for bug 1421993 but I don't think that matters much and (b) there are many other RAII things we could do, so consider this a first step.  The patches add a substantial number of lines to the code but they did find bugs, and the new code is easier to check for correctness and centralizes platform-specific code.
Blocks: 1423864
(In reply to Benjamin Bouvier [:bbouvier] from comment #2)
> Comment on attachment 8931369 [details] [diff] [review]
> bug1420158-regalloc-as-raii.patch
> 
> Review of attachment 8931369 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I like the idea of having all the per-platform register constraints hidden
> under these RAII classes. We could have a goal of having all the platform
> specific code in a given file section of the WasmBaselineCompile.cpp file,
> and these RAII classes could fit in there; it would be very nice when
> porting to new platforms, since we'd know we'd have only to tweak all these
> platform specific stubs at e.g. the top of the file. What do you think?

Yes.  The patches here move us some way in that direction.  I also see that there are large sections of the file that contain no (or very little) platform-dependent code.  Those are not all necessarily moveable to below all the platform-dependent code, but some of them are.  And in any case we are moving in a direction where large sections of the file can be flagged as either platform-indepedent or platform-dependent / porting APIs.

I think I'll try to do that moving / labeling in a separate bug, and it can happen whenever, but I also think it's worth doing soon since we're now seeing MIPS code starting to appear and there's ARM64.  The price of success, I guess...
Comment on attachment 8934514 [details]
Bug 1420158 - rabaldr register targeting, Cleanup: move code and rename.

https://reviewboard.mozilla.org/r/205422/#review212268

Makes sense, thanks.

::: js/src/wasm/WasmBaselineCompile.cpp:5857
(Diff revision 1)
>  }
>  
>  void
>  BaseCompiler::emitExtendI32ToI64()
>  {
> -    RegI64 x0 = popI32ForSignExtendI64();
> +    RegI64 x0;

nit: for consistency, rename `r`?
Attachment #8934514 - Flags: review?(bbouvier) → review+
Comment on attachment 8934515 [details]
Bug 1420158 - rabaldr register targeting, Create RAII wrappers for atomic ops regalloc/targeting.

https://reviewboard.mozilla.org/r/205424/#review212272

I didn't look too closely at the register usage because I assume they're just moved from their previous locations and differently formated, so please let me know if you want a further look there.

Otherwise, the patch looks good to me; I've added a few suggestions (remove templates in and there, different names mostly, and a possible different usage of the RAII classes). Thanks!

::: js/src/wasm/WasmBaselineCompile.cpp:4381
(Diff revision 1)
> +        ~PopBase() {
> +            maybeFree(rd_);
> +        }
> +
> +        // Take and clear the Rd - use this when pushing Rd.
> +        T rd() {

The name doesn't quite suggest the intent here, maybe rename to `consumeRd`, or `takeRd`, or `stealRd`?

::: js/src/wasm/WasmBaselineCompile.cpp:4393
(Diff revision 1)
> +
> +    friend class PopAtomicCmpXchg32Regs;
> +    class PopAtomicCmpXchg32Regs : public PopBase<RegI32>
> +    {
> +        using Base = PopBase<RegI32>;
> +        RegI32 rexpect, rnew;

Alternatively, could the RAII classes not inherit from PopBase, but compose and contain many instances of PopBase, one for each register that's used by the instruction? (then I think all the dtors can be removed)

(If so, then the getters in PopBase would need a rename, like get()/set()/consume(), or operator T(), operator= and consume())

Also: instead of having a consume/takeover/steael method, could the RAII class take care of pushing too? it seems it always has all the information needed for that purpose (result type, etc.)

::: js/src/wasm/WasmBaselineCompile.cpp:4493
(Diff revision 1)
> +#endif
> +
> +#ifdef JS_CODEGEN_X86
> +        template<typename T>
> +        void
> +        atomicCmpXchg64(T srcAddr, RegI32 ebx) {

No need for the template parameter here, right? (I seem to recall that the T was always one type on x86, and always a different one on other platforms)

::: js/src/wasm/WasmBaselineCompile.cpp:4501
(Diff revision 1)
> +            bc->masm.compareExchange64(srcAddr, rexpect, bc->specific.ecx_ebx, getRd());
> +        }
> +#else
> +        template<typename T>
> +        void
> +        atomicCmpXchg64(T srcAddr) {

here neither, if i remember correcly

::: js/src/wasm/WasmBaselineCompile.cpp:4538
(Diff revision 1)
> +# endif
> +
> +# ifdef JS_CODEGEN_X86
> +        template<typename T>
> +        void
> +        atomicLoad64(T srcAddr, RegI32 ebx) {

(same comment here and below)

::: js/src/wasm/WasmBaselineCompile.cpp:8101
(Diff revision 1)
>      return true;
>  }
>  
> +#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM)
> +
> +# define ATOMIC_PTR(name, access, tls, ptr)                             \

Didn't you remove this macro...

/me is confused by the order in which he reviewed patches
Attachment #8934515 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)

> >  void
> >  BaseCompiler::emitExtendI32ToI64()
> >  {
> > -    RegI64 x0 = popI32ForSignExtendI64();
> > +    RegI64 x0;
> 
> nit: for consistency, rename `r`?

Register naming in Rabalrd is actually a mess.  For a while I tried to have a convention where 'r' (or r0, r1, ...) was a 32-bit register and 'x' a 64-bit register (inspired by the ARM64 naming), and 'f' and 'd' for float and double.  But that works only for really tiny functions; with more registers it breaks down completely and proper names are needed, eg, 'srcDest', 'lhs', 'rhs'.  In single-value situations like this presumably something like 'val' would be appropriate.

Also after renaming x0 to r, I see that emitExtendI64_32() and emitExtendI32ToI64() are exactly the same function.

I wonder, should I fix the problem in flight or file followup bugs...?  Renaming is automatic rs+ right?   :-)
> Renaming is automatic rs+ right?   :-)

There's no such thing as automatic rs+, as far as I know ;) But rs=me on these trivial register renamings in WasmBaselineCompile.cpp (please update patches / import new ones for the renamings, for history).
Sure thing.  I think the rules that work well are:

If the function is small then
  If there is one register that is both source and destination then
    It is called r
  else if there is one source register and one destination register then
    The source is called rs
    The destination is called rd
  else if there are two registers, one source/dest and one source then
    They are called r and rs respectively
  else if there are three registers, two source registers and a dest then
    They are called rs0 and rs1 and rd

  If there is a temp register for any of the cases above then
    It is called "temp" (not "tmp")

  If there are multiple temps then
    They are temp0, temp1, etc

(For large functions the rules need to be more flexible.)
(In reply to Benjamin Bouvier [:bbouvier] from comment #10)
> Comment on attachment 8934515 [details]
> Bug 1420158 - rabaldr register targeting, Create RAII wrappers for atomic
> ops regalloc/targeting.
> 
> https://reviewboard.mozilla.org/r/205424/#review212272
> 
> ::: js/src/wasm/WasmBaselineCompile.cpp:4381
> (Diff revision 1)
> > +        ~PopBase() {
> > +            maybeFree(rd_);
> > +        }
> > +
> > +        // Take and clear the Rd - use this when pushing Rd.
> > +        T rd() {
> 
> The name doesn't quite suggest the intent here, maybe rename to `consumeRd`,
> or `takeRd`, or `stealRd`?

I agree; I was on the fence about this already.  takeRd() it is.

> ::: js/src/wasm/WasmBaselineCompile.cpp:4393
> (Diff revision 1)
> > +
> > +    friend class PopAtomicCmpXchg32Regs;
> > +    class PopAtomicCmpXchg32Regs : public PopBase<RegI32>
> > +    {
> > +        using Base = PopBase<RegI32>;
> > +        RegI32 rexpect, rnew;
> 
> Alternatively, could the RAII classes not inherit from PopBase, but compose
> and contain many instances of PopBase, one for each register that's used by
> the instruction? (then I think all the dtors can be removed)

That doesn't quite work, because sometimes a register needs to be freed conditionally, depending on the overall state of the parent object.  PopAtomicRMW32Regs is a case in point here, on x64 it frees rv only if rv is different from EAX.  (It used to be even harder but that was before I fixed the bug with the incorrect use of the scratch register.)  PopAtomicCmpXchg64Regs is similar; on Intel it frees only rexpect, but on ARM it frees rnew additionally.  We could finesse those cases with a flag in PopBase, I guess...

I'm not going to do this now, but I think that it's worth pursuing further, either because it leads to cleanup locally or because we end up with a mechanism that is more generally useful.

> Also: instead of having a consume/takeover/steael method, could the RAII
> class take care of pushing too? it seems it always has all the information
> needed for that purpose (result type, etc.)

The RAII class doesn't always know if the result is wanted, though it could be told that that.  (The code for Exchange and Store is often shared, except for Store we don't push the result value.  This is why we have takerd() clear Rd so that Rd is not freed by ~PopBase(); sometimes we don't call takeRd().)

I actually think this would turn the RAII classes into a special case (in all other cases the pushing is very explicit in the emitWhatever() function) so I probably won't pursue this.

> ::: js/src/wasm/WasmBaselineCompile.cpp:4493
> (Diff revision 1)
> > +#endif
> > +
> > +#ifdef JS_CODEGEN_X86
> > +        template<typename T>
> > +        void
> > +        atomicCmpXchg64(T srcAddr, RegI32 ebx) {
> 
> No need for the template parameter here, right? (I seem to recall that the T
> was always one type on x86, and always a different one on other platforms)

Excellent question.  *Currently* in Rabaldr it is BaseIndex for ARM and x64 and Address for x86.  The removal of ATOMIC_PTR pretty much cements that since that exposes the concrete type on the platform, while the macro hid it.  (Operand would have been the better type, as you said.)

In the MacroAssemblers, srcAddr is however templated in the atomic ops and x86 supports BaseIndex, and ideally we would pick an address representation on a case-by-case basis that is best for the case at hand.  For example, if the 'ptr' part of the address is zero and all we have is the constant offset, then we should be using Address on ARM and x64; if we could figure out the register allocation, then we should be using BaseIndex on x86.

I think that I would like to leave this flexibility in the code for the time being.  On the one hand, that's more generality than we currently need.  On the other hand, it frees us from tying ourselves to a representation that we're not actually needing to depend on.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/767e0d01228b
rabaldr register targeting, Cleanup: move code and rename. r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/624593062d7b
rabaldr register targeting, Create RAII wrappers for atomic ops regalloc/targeting.  r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/9057f729e1eb
rabaldr, rename register variables.  rs=bbouvier
You need to log in before you can comment on or make changes to this bug.