Closed Bug 695389 Opened 13 years ago Closed 6 years ago

[nanojit] Refactorization of RegAlloc, enable overlapping registers for ARM

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX
Q2 12 - Cyril

People

(Reporter: virgilp, Assigned: wmaddox)

References

Details

Attachments

(3 files)

Float/Float4 support needs access to much more ARM registers. This patch prepares the register allocator to handle overlapping registers. It does not actually include the NativeArm changes, it's just the base work to enable adding Qn and Sn registers in NativeArm
Blocks: float/float4
Note: ARM doesn't use overlapping registers, that'll be another bug. This patch just adapts the native platforms to the new regalloc interface.
Tested on buildbot (works on all platforms). Not tested on PPC, SH4, Sparc - likely won't work there.
Comment on attachment 567800 [details] [diff] [review]
RegAlloc base changes

Please review or reassign for review,
Attachment #567800 - Flags: review?(wmaddox)
Attachment #567800 - Flags: review?(edwsmith)
Comment on attachment 567803 [details] [diff] [review]
RegAlloc changes to core (tamarin, not nanojit)

wmaddox@adobe.com, edwsmith@adobe.com,
Attachment #568029 - Flags: review?(wmaddox)
Attachment #568029 - Flags: review?(edwsmith)
Attachment #567803 - Flags: review?(wmaddox)
Attachment #567803 - Flags: review?(edwsmith)
Comment on attachment 567800 [details] [diff] [review]
RegAlloc base changes

Overall it looks ok, but the scheme for overlapping registers needs
to be explained clearly and the allocation/retire code need to be
more obviously correct, before R+.

substance:

The "RegAlloc basics" comment needs an introduction explaining overlapping registers.

>    inline void RegAlloc::retire(Register r)

The for loop doesn't seem necessary, it will always have 1 bit set (r).  I searched
in this patch and the others for a way for rmask(r) to result in >1 bit set in
the mask, but couldn't find it.  It would be easier to follow
the #ifdefd code if the #endif didn't land between an if() and
the then statement, even if it means duplicating a line or two of code.

> #define IS_REG_IN_MASK() etc

prefer inline functions unless there's a strong justification for macros.

in allocReg():
>       RegisterMask tmp_set= setA_F_;
>       for (Register r = lsReg(tmp_set); tmp_set; r = nextLsReg(tmp_set, r)){
>           if(!isFree(r)) CLR_REG_IN_MASK(r,setA_F_);
>       }

I think you can do this all with bitwise operations.  This is critical path,
so its worth the effort.

The logic for UnspecifiedReg that can fall through to the eviction code,
is too confusing.  This is the fusion reactor of the allocator, its got
to be ultra clear and fast.  Can you refactor the predicates to work on masks,
so we can make a single test & branch to either path, like the old code was? 
The assert looks weird -- it seems to imply release builds can work with
broken invariants, which shouldn't be the intention.

I can't see how all the versions of findAvailableReg() are sure to return
UnspecifiedReg on failure.  in some cases they map down to msReg() which
IIRC are undefined if passed a mask of 0s.

nits:

several places, bug for example Assembler.cpp:410 - local style is // C++ comments, capitalize, punctuate, try not to abbreviate.  I noticed lots of other local
style bustage.  Since I'll probably be who lands this, I can do a style pass
before landing.
Attachment #567800 - Flags: review?(edwsmith) → review-
Attachment #567803 - Flags: review?(edwsmith) → review+
Comment on attachment 568029 [details] [diff] [review]
Regalloc - changes to the various NativeXXX files to work with the allocator

NativeARM.cpp:
> RegisterMask    free = (~_allocator.activeMask()) & AllowableFlagRegs;

a RegAlloc.freeMask() method would clean this up, I think i've seen the same expression in a couple of places.  If you change it and its just a mechanical change, no need to re-review.

> inline bool IsFpSReg(Register _r){ return _r == S0; } 

Please add a comment for IsFpReg, IsFpSReg, since the distinction is subtle. (IsGpReg() too, for symmetry)

NativeX64.cpp:
> case LIR_std2f

This change looks different from the other mechanical changes... is it fixing a bug?
Attachment #568029 - Flags: review?(edwsmith) → review+
On activeMask()/ RegAlloc.freeMask(): no, IMO the code should never get its hands on the "free" set to perform allocations (even of temporary registers).
This is the only place where I made an exception, and I only made it due to laziness :P - the correct solution there is a "allocTempRegIfFree()" method. I'll do that (I'll make it platform-specific since it's simpler and only ARM appears to need it).

Looking again, the loop that you mentioned above is also an artefact of the previous implementation (which was based on "clobbered registers" and is completely useless in this implementation - i.e. has no effect at all.)

On the "firstAvailableReg" you are right that the assertion is unnecesary (it's simply redundant, as things are implemented today). BUT: the invariant is not violated in release code, the "if" path is for the "RA_REGISTERS_OVERLAP" case.

It is clear that I need to explain the scheme better. Currently some loops appear (ARE) unnecessary because nobody uses it. Here's how it works for ARM:
0-15: GP regs
16-48: Sn regs (single precision floats)
48-64: D16-D31 regs (double precision floats)
The registers 16-48 are also  D0-15, and also Q0-Q7. The registers D16-D31 are also Q8-Q15 (note that this is exactly the hardware representation).
What this means is that rmask(S0)  = 0x10000, rmask(S1)=0x20000, and rmask(D0) = 0x30000 (and rmask(Q0)= 0x70000). On overlapping registers the mapping scheme where "rmask(reg)= exactly one bit" is no longer true (and also no longer true is that the relationship  between a register and its mask is bidirectional: you can always get a mask from a register, but you can no longer tell if "0x30000" means "D0" or "S0 or S1". 
I'll add more comments in code, too.
Blocks: 696730
with this scheme we cannot address D16-D31 or Q8-Q15, without a >64bit RegisterMask, correct?  (okay with me, just confirming).
Oh yes we can, that's the beauty of it :)

Sn regs are limited to S32, they don't run up to S64. i.e. D16-D31 don't have any sub-registers. We have:
16 GP regs
32 Sn regs
16 Dn regs (upper bank, D16-D31)
All the others are just names given to collections of physical registers (e.g. D0 = S0,S1;  Q0= S0,S1,S2,S3 )

Check out the masks in https://bugzilla.mozilla.org/show_bug.cgi?id=696730
Comment on attachment 567800 [details] [diff] [review]
RegAlloc base changes

I find this diff almost impossible to follow -- there needs to be an explanation of the overall scheme, as well as better comments in the code.  Locally, there are a few snippets that give me pause, for example:

    RegisterMask set__F_ = _free;
...
    RegisterMask setA_F_ = setA___ & set__F_;
...
    RegisterMask tmp_set= setA_F_;
    for (Register r = lsReg(tmp_set); tmp_set; r = nextLsReg(tmp_set, r)){
        if(!isFree(r)) CLR_REG_IN_MASK(r,setA_F_);
    }

Here, every bit set in setA_A_F_ is also set in _free.  But isFree() simply queries _free, so isFree(r) must always return true.

There are large blocks of code that have both moved to another file entirely, and also had significant internal changes.  I'd like to see the patch broken into two, in which the first does only the refactoring needed to migrate code from Assembler to RegAlloc, and the second introduces the substantive changes to support overlapping registers.
Attachment #567800 - Flags: review?(wmaddox) → review-
Attachment #567803 - Flags: review?(wmaddox) → review+
In general, it needs to be obvious that the register allocator is producing the same results as before (unless you have made specific improvements that are called out) when new register classes (e.g., float/float4) are not being used in the generated code.
(This item may well be completed.)
Assignee: nobody → wmaddox
Priority: -- → P2
Target Milestone: --- → Q2 12 - Cyril
Priority: P2 → P3
changeset: 6798:4f58915784a2
user:      Virgil Palanciuc <virgilp@adobe.com>
summary:   imported patch bug-695389.regalloc.coreChanges.diff

http://hg.mozilla.org/tamarin-redux/rev/4f58915784a2
changeset: 6799:49eeecaec4f1
user:      Virgil Palanciuc <virgilp@adobe.com>
summary:   imported patch bug-695389.regalloc.natives.diff

http://hg.mozilla.org/tamarin-redux/rev/49eeecaec4f1
Comment on attachment 568029 [details] [diff] [review]
Regalloc - changes to the various NativeXXX files to work with the allocator

Removing stale review request.
Attachment #568029 - Flags: review?(wmaddox)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: