Wasm baseline: Incorrect float register management (ARM, probably others)

RESOLVED FIXED in Firefox 53

Status

()

Core
JavaScript Engine: JIT
P1
normal
RESOLVED FIXED
a year ago
6 months ago

People

(Reporter: lth, Assigned: nbp)

Tracking

(Blocks: 1 bug)

unspecified
mozilla53
ARM
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)

Details

Attachments

(4 attachments, 4 obsolete attachments)

(Reporter)

Description

a year ago
(Spun off from bug 1320956.)

The way the baseline compiler handles the float registers is incorrect.  It happens to work on x86 because on that platform, float and double registers always overlap.  On ARM they do not - there are float registers that don't have a double personality, and vice versa - and the way we try to handle that is incorrect.  Notably, the way we mask the bitset with AllSingleMask or AllDoubleMask in order to try to force the allocation of a particular register is incorrect, since for a double register to be available its two bits (if it has a single personality) must also be set in the single part of the mask.

I am somewhat concerned that our register sets do not really suit my needs here, since I need to check - quickly - whether a register of a particular type is available and also to allocate both specific registers and registers of a specific type (float or double).  The support for some of that seems nonexistent.  It may be that it can be provided by specializing SpecializedRegisterSet to FloatRegister.  It may be, that like our register allocator, I should bypass the FloatRegister sets altogether.

Test case: https://github.com/lars-t-hansen/embenchen/asm_v_wasm/wasm_box2d.js, just run it in a shell on ARM (emulator's fine) with --wasm-always-baseline.
(Reporter)

Comment 1

a year ago
Oh, and the way the bug currently manifests is as an an assertion when running a DEBUG build.
(Reporter)

Comment 2

a year ago
Created attachment 8816443 [details] [diff] [review]
bug1321521-float-regalloc.patch

Sketch that needs cleanup, but in principle should be OK, and if the mapping from register -> code -> register can be made reliable cross-platform this will be obviously correct and obviously fast.

This fixes the assertion in the compiler (unsurprisingly) and some wasm benchmarks that previously failed in the compiler now pass (eg fasta), but box2d now crashes the ARM simulator (sometimes a MOZ_CRASH, sometimes a segv).  Need to test on hardware before drawing any conclusions.
(Reporter)

Comment 3

a year ago
Re the new crash: after 249,412,745 instructions executed the program executes a jump into la-la land as part of a brTable sequence.  (This is in the simulator, which is lucky, because the instruction it jumps to is not implemented in the simulator so we see the fault immediately.)

  0x0e8bf098  e49df004       ldr pc, [sp], #+4          ; return from function
  0x0e8c9aa8  e28dd004       add sp, sp, #4             ; frame popping in endCall
  0x0e8c9aac  e28dd004       add sp, sp, #4             ; frame popping in freeStack
  0x0e8c9ab0  e59d00c8       ldr r0, [sp, #+200]        ; pop
  0x0e8c9ab4  e3500302       cmp r0, #134217728         ; aka 0x08000000
  0x0e8c9ab8  2a0006b6       bcs +6880 -> 0xe8cb598     ; heap bounds check, branch on OOB
  0x0e8c9abc  e79b0000       ldr r0, [r11, +r0]         ; load via heapreg
  0x0e8c9ac0  e3a01000       mov r1, #0                 ; sad,
  0x0e8c9ac4  e0500001       subs r0, r0, r1            ;   when you stop to think about it
  0x0e8c9ac8  e3500004       cmp r0, #4                 ; brtable bounds check, table length=4
  0x0e8c9acc  3a00000b       bcc +52 -> 0xe8c9b00       ; jump to dispatch if ok
                                                        ; (branch taken)
  0x0e8c9b00  e1a0700f       mov r7, pc                 ; compute
  0x0e8c9b04  e2477024       sub r7, r7, #36            ;   table base
  0x0e8c9b08  e797f100       ldr pc, [r7, +r0, lsl #2]  ;     and jump to *(base + r0*4)
  0xffff0002  61706d6d       cmnvs r0, sp, ror #26

The brTable stubs and table are laid out between the bounds check at 0xe8c9acc and the dispatch code at 0xe8c9b00.  The stubs can be as large as we want and in particular they can pop value stack, though here they are just simple branch instructions.  These are followed by the table containing the addresses of those stubs.

  0x0e8c9acc  3a00000b       bcc +52 -> 0xe8c9b00
  0x0e8c9ad0  ea000011       b +76 -> 0xe8c9b1c         ; jump to out-of-range handling
  0x0e8c9ad4  ea00000d       b +60 -> 0xe8c9b10         ; jump to case 0
  0x0e8c9ad8  ea00000d       b +60 -> 0xe8c9b14         ; jump to case 1
  0x0e8c9adc  ea00000d       b +60 -> 0xe8c9b18         ; jump to case 2
  0x0e8c9ae0  ea000009       b +44 -> 0xe8c9b0c         ; jump to case 3
  0x0e8c9ae4  0e8c9ad4       TABLE ENTRY 0
  0x0e8c9ae8  ea000001       b +12 -> 0xe8c9af4         ; ouch!
  0x0e8c9aec  ffff0002       unknown                    ; ouch!!
  0x0e8c9af0  80000000       andhi r0, r0, r0           ; ouch!!!
  0x0e8c9af4  0e8c9ad8       TABLE ENTRY 1
  0x0e8c9af8  0e8c9adc       TABLE ENTRY 2
  0x0e8c9afc  0e8c9ae0       TABLE ENTRY 3
  0x0e8c9b00  e1a0700f       mov r7, pc
  0x0e8c9b04  e2477024       sub r7, r7, #36
  0x0e8c9b08  e797f100       ldr pc, [r7, +r0, lsl #2]

Looks like some constants are being dumped in the middle of our table, which is really not good: There's a jump to "skip over the constants" and then a floating point constant.

Fortunately this explains perfectly why we end up at PC=0xFFFF0002 in our trace.
(Reporter)

Updated

a year ago
Blocks: 1322288
(Reporter)

Comment 4

a year ago
nbp, the api that i'm trying to create on top of the register sets is basically this:

  isAvailableFloat32(FloatRegister)
  isAvailableFloat64(FloatRegister)
  hasFloat32()
  hasFloat64()
  allocFloat32()
  allocFloat64()
  allocFloat32(FloatRegister)
  allocFloat64(FloatRegister)
  freeFloat32(FloatRegister)
  freeFloat64(FloatRegister)

While eg isAvailable() and freeFloat() could be generic, I need to allocate float registers of specific types.  Since the AllocatableFloatRegisterSet does not offer that, I've been trying to emulate it by masking off the bits of the register set with AllDoubleMask and AllSingleMask, but that does not do the right thing on ARM for sure (this bug), and x64 probably (bug 1322288).  This is because the bit representation of the registers are not a single bit.

My preference would probably be to have hasFloat32(), hasFloat64(), takeFloat32() and takeFloat64() APIs on AllocatableFloatRegisterSet, and I'm assuming that AllocatableFloatRegisterSet::has(r) will do the right thing for all float registers 'r' since I'm also using that.
(Reporter)

Comment 5

a year ago
Created attachment 8817345 [details] [diff] [review]
bug1321521-float-regalloc.patch

A functioning cross-platform register allocator for float registers, which is more than good enough for the baseline compiler.  We'll use this if using the register sets doesn't work out.
Attachment #8816443 - Attachment is obsolete: true
(Reporter)

Updated

a year ago
No longer blocks: 1322288
(Reporter)

Comment 6

a year ago
Let's P1 this, there's risk that it is shaky on x86 and x64, not just ARM.
Priority: P2 → P1
(Assignee)

Comment 7

11 months ago
I am working on a patch to add the following interface as part of the Allocatable sets:

  AllocatableSet<…> alloc(…);

(In reply to Lars T Hansen [:lth] (On PTO until 27 Dec) from comment #4)
>   isAvailableFloat32(FloatRegister)
>   isAvailableFloat64(FloatRegister)

  alloc.has(reg) // (1)

>   hasFloat32()
>   hasFloat64()

  alloc.hasAny<RegTypeName::Float32>()
  alloc.hasAny<RegTypeName::Float64>()

>   allocFloat32()
>   allocFloat64()

  FloatRegister r = alloc.getAny<RegTypeName::Float32>()
  FloatRegister r = alloc.getAny<RegTypeName::Float64>()

>   allocFloat32(FloatRegister)
>   allocFloat64(FloatRegister)

  alloc.take(reg) // (1)

>   freeFloat32(FloatRegister)
>   freeFloat64(FloatRegister)

  alloc.add(reg) // (1)

(1) FloatRegister are already typed, there is no need to distinguish the type of the FloatRegister in the function name.

You were facing 2 issues:
  - getAny is not specialize for the type of the register.
  - hasAny does not exists.

These would be fixed as part of the patch that I will attach later today.

> This is because the bit representation of the registers are not a
> single bit.

This representation was made to handle proper handling of aliases while taking specific registers, or putting them back in the Allocatable/Live sets.

> My preference would probably be to have hasFloat32(), hasFloat64(),
> takeFloat32() and takeFloat64() APIs on AllocatableFloatRegisterSet

I admit that the templates for RegisterSets were nice for abstracting over the content of the register set, but they are not easy to add simple concepts to a single representation.

I honestly thought of this design considering that "one day" (very hypothetical) we might add extra constraints for allocating byte registers as part of the GPR, as we could do on x86/x64. Thus, not making much differences compared to the Float registers on ARM.
(Assignee)

Comment 8

11 months ago
Created attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

This patch implements the logic for x86, x64, ARM, and ARM64.  I left MIPS
out of the patch, but this should not be hard to add later.

This patch removes the getAny, getFirst, getLast functions from the
TypedRegisterSet<..> classes.  The reasons these are removed from this class
is that they do depend on the behaviour associated with the set (Live,
Allocatable).

Thus, I added an any<register-type-name> function on the AllocatableSet and
the LiveSet, such that getAny, getFirst, and getLast could be rewritten
there.

To make it easy, I also added a default specializations for the template
such that getAny on a AllocatableGeneralRegisterSet does the right thing
without having to say getAny<RegTypeName::GPR>.  This is made with the
DefaultType constexpr, "inherited" from the Register / FloatRegister.

The RegTypeName (register-type-name) describes all the kind of registers
which we might query, and depending on each architecture support, we have to
specialize the IndexRegister and IndexAllocatable functions.

The IndexRegister function filters out the requested register type out of
a live set.

The IndexAllocatable function filters out the requested register type out of
an allocatable set.  This involves reading the alignOrDominated alias rules,
to deduce the masking that we have to do.

On ARM, which is the only special case here (except MIPS), the allocatable
set can be in the following states (for one set of registers):

  - ...0...00... : None available
  - ...0...10... : s+1 available
  - ...1...01... : s+0 available
  - ...0...10... : s+1 available
  - ...1...11... : s+0, s+1, and d+0 available

Later, for adding SIMD support to ARM, one would have to add a new
IndexAllocatable specialization, which filter out the doubles returned by
IndexAllocatable<RegTypeName::Float64>, in the exact same way as this is
done for Double and Singles.
(Assignee)

Updated

11 months ago
Attachment #8821631 - Flags: review?(jolesen)
(Reporter)

Comment 9

11 months ago
This certainly suits my needs, and fixes the bug, so what's not to like :-)
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

Review of attachment 8821631 [details] [diff] [review]:
-----------------------------------------------------------------

Just a general comment: These register set classes seem to be getting very complicated for solving such a simple problem. I wonder if that is really necessary?

Compare Cretonne's model of the arm32 floating point register bank: https://github.com/stoklund/cretonne/blob/master/lib/cretonne/meta/isa/arm32/registers.py

The floating point register sets are represented as 64 bits where each S-register takes up 1 bit, each D-register takes up 2 bits, and each Q-register takes up 4 bits. The S-registers are limited to a count of 32. This representation is both simpler and faster to work with, but you can't distinguish between a register set containing {s2, s3} and one containing {d1}. Do you really need to?


The other general problem with the register sets is the documentation. It isn't there. I assume that these sets started out as simple 1-1 models of the x86 registers and that was "self-documenting". Now that they are growing to represent some kind of general register bank model, I get quite confused. What is the functionality provided by the register sets? What is the underlying register model? How should I use them? What do I need to do to add support for a new kind of register?

::: js/src/jit/arm/Architecture-arm.h
@@ +623,5 @@
> +    s2d = (s2d >> 1 | s2d) & 0x33333333; // 0a0b -> 00ab
> +    s2d = (s2d >> 2 | s2d) & 0x0f0f0f0f; // 00ab00cd -> 0000abcd
> +    s2d = (s2d >> 4 | s2d) & 0x00ff00ff;
> +    s2d = (s2d >> 8 | s2d) & 0x0000ffff;
> +    s2d = s2d << FloatRegisters::TotalSingle;

wat

This needs much more documentation. I have no idea what is going on here. The tests for this algorithm would be a good place for explaining what is going on.

I would prefer parentheses instead of having to look up the relative C operator precedences of | and >>.
(Reporter)

Comment 11

11 months ago
I agree the complexity is substantial in these register sets, which is why I'm always just about ready to abandon them and create my own simpler ones.  What we're looking at here is nbp's attempt to help me avoid that / stop me from doing just that...

For my own uses I do not need to compare register sets - they are just bags of registers - so my answer to the question "you can't distinguish between a register set containing {s2, s3} and one containing {d1}. Do you really need to?" is "no".  I only need the simple API outlined earlier with an implementation backing it that takes aliasing into account.

Isn't the quoted instruction sequence just a population count?  (Don't know why we would not just call a mozilla:: function for that.)
No, when you implement popcount() like that you need to use '+' instead of '|'. This is doing something else ;-)


LLVM also has to handle aliasing registers, and I did some experimentation with different representations of register sets for a register allocator. I looked at three basic methods:

1) Keep a set of the allocated registers. When checking if a register is available, all of its aliases must be checked too. When allocating a register, just add it to the set.

2) Keep a set of unavailable registers. When checking if a register is available, just check if it is in the set. When allocating a register, add all of its aliases to the set too.

3) Keep a set of unavailable "register units". When checking if a register is available, look at all of its units. When allocating a register, mark all of its units as unavailable.

The "register units" in 3) are an abstraction representing the shared bits of registers. Most registers will only have a single unit, but in cases like the ARM32 floating point registers, more units are needed. LLVM switched to the 3) representation because the register alias sets can get quite large when you model the finer details of NEON. It also has performance benefits for general purpose registers because %rsi, %esi, and %si share a single unit. See http://www.llvm.org/docs/doxygen/html/MCRegisterInfo_8h_source.html#l00525


I thought that SpiderMonkey was using method 3) as well, at least for the integer registers. But now I'm wondering if it is just tracking the 32-bit integer registers. Or maybe the difference doesn't matter. For the floating point registers, the model seems to be closer to 1) above.

As the comment in RegisterSets.h points out, there are two different uses for register sets:

a) Keep track of which registers are available for the register allocator.
b) Keep a list of typed registers for safe points. This use case probably doesn't care about register aliasing at all.

I think the simplicity of x86/x64 has tricked us into attempting to use the same representation for these two uses. But when you generalize to more complicated aliasing models like ARM, it doesn't work that well.

I would recommend to use representation 3) for use case a) and representation 1) for use case b). Split it up.
(Assignee)

Comment 13

11 months ago
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #10)
> Just a general comment: These register set classes seem to be getting very
> complicated for solving such a simple problem. I wonder if that is really
> necessary?

Yes, it is.

We had security issues cause by the fact that we were not making a clear distinction between Live and Allocatable sets.  Methods for which were meant to manipulate live sets were used to manipulate allocate sets and the opposite.

This code provide the Allocatable Set and the Live Set, with similar methods, where the only difference is in the small set of functions which compose them (see AllocatableSetAccessors & LiveSetAcccessors)

Maybe this could be simplified, but I think we have to keep the distinction between live and allocatable in SpiderMonkey.

> Compare Cretonne's model of the arm32 floating point register bank:
> https://github.com/stoklund/cretonne/blob/master/lib/cretonne/meta/isa/arm32/
> registers.py
> 
> The floating point register sets are represented as 64 bits where each
> S-register takes up 1 bit, each D-register takes up 2 bits, and each
> Q-register takes up 4 bits. The S-registers are limited to a count of 32.
> This representation is both simpler and faster to work with, but you can't
> distinguish between a register set containing {s2, s3} and one containing
> {d1}. Do you really need to?

One of the concern we have is to encode these register sets as part of the Safepoint, and we need a compact way to represent these live sets.

With the current representation, each typed-register in a live set takes a single bit (S-register & D-register).  Q-register are not yet implemented on ARM32, but the principle remains the same.

> The other general problem with the register sets is the documentation. It
> isn't there. I assume that these sets started out as simple 1-1 models of
> the x86 registers and that was "self-documenting". Now that they are growing
> to represent some kind of general register bank model, I get quite confused.

The problem did not came from x86, but from the addition of Float32 to JavaScript.

> What is the functionality provided by the register sets? What is the
> underlying register model? How should I use them? What do I need to do to
> add support for a new kind of register?

The main functionality is to not be able to mix methods between LiveSet and AllocatableSet.

The register model is that each typed-register is represented by a single bit in a LiveSet.  This way, from the LiveSet, we can distinguish the type associated with the register on x86/x64, and we can distinguish 2 adjacent single registers from 1 double register on ARM.

Allocatable sets are using the same typed-register set to represent the set of available registers.  To properly handle aliasing we have the notion of "alignedAliases".  This is the only thing we need to check in constant time if a register is available or not, and to also add the aliases in the set of available registers.

between %al, %ah, %ax, %eax, %rax.  SpiderMonkey also uses a single bit to represent GPR on x86/x64.

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #10)
> ::: js/src/jit/arm/Architecture-arm.h
> @@ +623,5 @@
> > +    s2d = (s2d >> 1 | s2d) & 0x33333333; // 0a0b -> 00ab
> > +    s2d = (s2d >> 2 | s2d) & 0x0f0f0f0f; // 00ab00cd -> 0000abcd
> > +    s2d = (s2d >> 4 | s2d) & 0x00ff00ff;
> > +    s2d = (s2d >> 8 | s2d) & 0x0000ffff;
> > +    s2d = s2d << FloatRegisters::TotalSingle;
> 
> wat

This function convert the set of allocatable single register set into a set of allocatable double register set.

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #12)
> […] It also has performance benefits
> for general purpose registers because %rsi, %esi, and %si share a single
> unit.

Currently, none of the platforms take advantage of this typed-register model to distinguish GPR, such as
(Assignee)

Comment 14

11 months ago
(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> (In reply to Jakob Stoklund Olesen [:jolesen] from comment #12)
> > […] It also has performance benefits
> > for general purpose registers because %rsi, %esi, and %si share a single
> > unit.
> 
> Currently, none of the platforms take advantage of this typed-register model
> to distinguish GPR, such as …

… between %al, %ah, %ax, %eax, %rax.  SpiderMonkey also uses a single bit to represent GPR on x86/x64.
I agree that we need both the LiveSet and the AllocatableSet. I think that much of the complexity comes from sharing code between the two. The two types have quite different requirements, and each would be quite simple on its own.

One approach would be to let Lars write his own allocatable-set implementation that works for his use case. Then we can delete AllocatableFloatRegisterSet which isn't used anywhere else anyway.

If you want to land this patch, I am not going to oppose it, but I don't think I am qualified to review it. The heavy use of templates and inheritance makes it very hard to figure out without the tribal knowledge of what is supposed to be going on. I would strongly suggest that you try to put some of that knowledge into comments in the code. Some recent comments on comments:

    http://www.ganssle.com/tem/tem319.html#article3
    http://www.ganssle.com/tem/tem320.html#article3

In short, the comments shouldn't say *what* the code is doing, but *why* it is doing it.
(Reporter)

Comment 16

11 months ago
Created attachment 8823982 [details] [diff] [review]
bug1321521-regalloc-v2.patch

This patch removes all uses of the AllocatableRegisterSets in the baseline compiler, not just for float registers but also for GPRs.  Overall this yields clarity and transparent performance characteristics (and fixes the float allocation bug).  I'd like to land this; the float allocation bug must be fixed before the merge window closes.

More could be done here but should be done in followup bugs, if at all:

- The register set information could be computed once, not once per compilation,
  which matters if many compilations are small (and they tend to be), but
  static mutable data has multithreading issues so needs careful thought.
  Also, I think think this might best be done as part of reducing per-compile
  overhead generally.

- It would be nice to handle both the high doubles on ARM and the high singles
  on ARM and MIPS, but this is not crucial - so far, capacity spilling happens
  very rarely, it is only when we start allocating variables to registers
  that this might truly matter.  (And we'll still be a baseline compiler.)
Attachment #8817345 - Attachment is obsolete: true
Attachment #8823982 - Flags: review?(jolesen)
(Reporter)

Comment 17

11 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a9a7fe430d98
(Reporter)

Comment 18

11 months ago
Created attachment 8823988 [details] [diff] [review]
bug1321521-arm64-none-addenda.patch

And addenda to make it compile on Arm64 (missing isInvalid, asSingle, and asDouble APIs on FloatRegister) and None (incorrectly defines FloatRegister::Total as zero when this variable can be used as an array length, cf Register::Total in the same file).
Attachment #8823988 - Flags: review?(jolesen)
(Assignee)

Comment 19

11 months ago
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> If you want to land this patch, I am not going to oppose it, but I don't
> think I am qualified to review it. […]

Thanks for letting us know.
Attachment #8821631 - Flags: review?(jolesen) → review?(jdemooij)
(Assignee)

Comment 20

11 months ago
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> I agree that we need both the LiveSet and the AllocatableSet. I think that
> much of the complexity comes from sharing code between the two. The two
> types have quite different requirements, and each would be quite simple on
> its own.

That was my original expectation as well, except that when I tried to do so I failed at avoiding the duplication of the code.

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> One approach would be to let Lars write his own allocatable-set
> implementation that works for his use case. Then we can delete
> AllocatableFloatRegisterSet which isn't used anywhere else anyway.

I would prefer to replace code instead of adding more to do the same thing, but if we replace the current code, we should at least ensure that we have feature parity.
(Reporter)

Comment 21

11 months ago
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> 
> (In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> > One approach would be to let Lars write his own allocatable-set
> > implementation that works for his use case. Then we can delete
> > AllocatableFloatRegisterSet which isn't used anywhere else anyway.
> 
> I would prefer to replace code instead of adding more to do the same thing,
> but if we replace the current code, we should at least ensure that we have
> feature parity.

Can you be more specific about what you mean here?  Do you mean, for example, that whatever code I implement should handle the multi-aliasing, or are you thinking about something else?

Comment 22

11 months ago
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

Review of attachment 8821631 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't really follow the discussion in this bug and I'm not that familiar with how these sets work internally. Lars is probably a better reviewer.
Attachment #8821631 - Flags: review?(jdemooij) → review?(lhansen)
(Assignee)

Comment 23

11 months ago
(In reply to Lars T Hansen [:lth] from comment #21)
> (In reply to Nicolas B. Pierron [:nbp] from comment #20)
> > 
> > (In reply to Jakob Stoklund Olesen [:jolesen] from comment #15)
> > > One approach would be to let Lars write his own allocatable-set
> > > implementation that works for his use case. Then we can delete
> > > AllocatableFloatRegisterSet which isn't used anywhere else anyway.
> > 
> > I would prefer to replace code instead of adding more to do the same thing,
> > but if we replace the current code, we should at least ensure that we have
> > feature parity.
> 
> Can you be more specific about what you mean here?  Do you mean, for
> example, that whatever code I implement should handle the multi-aliasing, or
> are you thinking about something else?

I mean that if we intent to use a new register set representation, we should do it as a replacement for everything and not adding a new one for a single use case.
Comment on attachment 8823982 [details] [diff] [review]
bug1321521-regalloc-v2.patch

Review of attachment 8823982 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me. I agree that the lost ARM registers are not worth chasing in the baseline compiler.

I think the code organization would be better overall if the two register sets were implemented in separate classes.

The tables mapping between codes and indexes seem superfluous. See the inline comments.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +683,5 @@
> +    static const uint32_t GprIndexLimit = 32;
> +
> +    uint32_t gprRegs;
> +    uint32_t gprCodeToIndex[Registers::Total];
> +    Register gprIndexToRegister[GprIndexLimit];

It's not completely clear why this second level of indirection is needed. Couldn't you just use register codes to name the bits in gprRegs directly?

In initGprRegisters, you would compute a bitmask of allocatable registers as the initial value for gprRegs. It wouldn't necessarily be a contiguous mask as the one you're computing now.

@@ +706,5 @@
> +            Register r = *iter;
> +            uint32_t index = nextIndex++;
> +            MOZ_ASSERT(index < GprIndexLimit);
> +            gprIndexToRegister[index] = r;
> +            gprCodeToIndex[r.code()] = index;

I think a bounds-checking assertion is in order here too.

@@ +738,5 @@
> +        return gprRegs != 0;
> +    }
> +
> +    bool isAvailable(Register r) {
> +        return (gprRegs & (1 << toRegisterIndex(r))) != 0;

'1u', just because signed arithmetic makes me nervous.

@@ +783,2 @@
>          freeGPR(r);
>          return available;

This part would be clearer if you broke it into a hasTwoGPR() method.

Would be more efficient as hasGPR() && !is_power_of_two(gprRegs).

@@ +822,5 @@
> +    // exposed to the rest of the compiler, but it means we will be suboptimal
> +    // on ARM (where some doubles are not singles and can't be used, and there
> +    // are low and high singles aliasing a double and we can only use one of
> +    // them) and on MIPS (also low and high singles).  The suboptimality is not
> +    // important for the baseline compiler at this time.

Here, I would like to see it spelled out which floating point registers you use on ARM, i.e. d0-d15 and s0, s2, ..., s30.

It takes a lot of detective work to figure out what the asSingle/asDouble stuff does on ARM.

@@ +832,5 @@
> +    static const uint32_t FpuIndexLimit = 32;
> +
> +    uint32_t fpuRegs;
> +    uint32_t fpuCodeToIndex[FloatRegisters::Total];
> +    FloatRegister fpuIndexToRegister[FpuIndexLimit];

fpuIndexToDoubleRegister?

@@ +853,5 @@
> +                uint32_t i = r.numAlignedAliased();
> +                while (i-- > 0) {
> +                    FloatRegister a;
> +                    r.alignedAliased(i, &a);
> +                    if (a.isSingle()) {

You are assuming here that r.asSingle() == a and a.asDouble() == r. It's not obvious how these methods will behave on ARM and MIPS, so an assertion would be in order.

@@ +857,5 @@
> +                    if (a.isSingle()) {
> +                        uint32_t index = nextIndex++;
> +                        MOZ_ASSERT(index < FpuIndexLimit);
> +                        fpuIndexToRegister[index] = r;
> +                        fpuCodeToIndex[r.code()] = index;

Add a bounds check here.

I see that fpuCodeToIndex does provide a translation here since the ARM isDouble() codes don't start at 0. It still bugs me that this couldn't be a simple constant translation instead of a table lookup.

I imagine something like: Capture the code of the first allocatable double register and assert that all other allocatable double registers have codes within 32 from the first one.
Attachment #8823982 - Flags: review?(jolesen) → review+
Comment on attachment 8823988 [details] [diff] [review]
bug1321521-arm64-none-addenda.patch

Review of attachment 8823988 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/jit/none/Architecture-none.h
@@ +83,5 @@
>      static const char* GetName(Code) { MOZ_CRASH(); }
>      static Code FromName(const char*) { MOZ_CRASH(); }
>  
>      static const Code Invalid = invalid_reg;
> +    static const uint32_t Total = 1;

Deserves a comment.
Attachment #8823988 - Flags: review?(jolesen) → review+
(Assignee)

Comment 26

11 months ago
Comment on attachment 8823982 [details] [diff] [review]
bug1321521-regalloc-v2.patch

Review of attachment 8823982 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +749,5 @@
>  
>      Register allocGPR() {
>          MOZ_ASSERT(hasGPR());
> +        uint32_t index = mozilla::CountTrailingZeroes32(gprRegs);
> +        gprRegs &= ~(1 << index);

This code is literally duplicating what the AllocatableGeneralRegisterSet is doing. Any reason to do so?

If you fear any performance issue, I can assure you that all our C++ compilers are inlining through all the templates that we have, even in debug-optimized builds.

@@ +904,5 @@
> +
> +    void allocFloat64(FloatRegister r) {
> +        MOZ_ASSERT(r.isDouble());
> +        MOZ_ASSERT(isAvailable(r));
> +        fpuRegs &= ~(1 << toRegisterIndex(r));

This used to be the old behaviour, and not what you asked me for last time we met.
(Reporter)

Comment 27

11 months ago
(In reply to Nicolas B. Pierron [:nbp] from comment #26)
> Comment on attachment 8823982 [details] [diff] [review]
> bug1321521-regalloc-v2.patch
> 
> Review of attachment 8823982 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmBaselineCompile.cpp
> @@ +749,5 @@
> >  
> >      Register allocGPR() {
> >          MOZ_ASSERT(hasGPR());
> > +        uint32_t index = mozilla::CountTrailingZeroes32(gprRegs);
> > +        gprRegs &= ~(1 << index);
> 
> This code is literally duplicating what the AllocatableGeneralRegisterSet is
> doing. Any reason to do so?

The purpose of this patch, which has not landed, is to remove the use of AllocatableGeneralRegisterSet completely, so yes, there is a reason for doing this.  (See below.)

> If you fear any performance issue, I can assure you that all our C++
> compilers are inlining through all the templates that we have, even in
> debug-optimized builds.
> 
> @@ +904,5 @@
> > +
> > +    void allocFloat64(FloatRegister r) {
> > +        MOZ_ASSERT(r.isDouble());
> > +        MOZ_ASSERT(isAvailable(r));
> > +        fpuRegs &= ~(1 << toRegisterIndex(r));
> 
> This used to be the old behaviour, and not what you asked me for last time
> we met.

I don't know what you mean by "old behaviour" precisely, but I guess you probably mean that there's a one-to-one correspondence between singles and doubles here.  Yes, that's correct -- see the large block comment above the float register allocator.  While it was always my preference to have access to the full ARM and MIPS register sets, including the high doubles and the high signles, it is not actually a requirement for the baseline compiler.

I sense that you're angry about something.  Let me explain:  I developed this patch in response to Jakob's suggestion that the AllocatableFloatRegisterSets are really not needed if the baseline compiler stops using them.  A natural but not necessary consequence of that was to expand the patch to also avoid using the AllocatableGeneralRegisterSet.

I suspect you misunderstand slightly what I mean about "transparent" performance.  I think most people find the register sets we already have to be extremely hard to understand (I certainly do, and Jakob seems to feel that way too) and though they may in fact perform very well it is far from obvious that they would do so.  Nothing else was implied.

Now.  I am currently cleaning up my patches following Jakob's review, and when I'm done they will sit on this bug until I've had a chance to review your patch and we know what the fate of the latter patch is.  If we can land your patch then fine, I can use that.  If not I have a backup plan.
(Reporter)

Comment 28

11 months ago
Created attachment 8824383 [details] [diff] [review]
bug1321521-arm64-none-addenda-v2.patch

ARM/None adjustments updated.  Carrying r+.
Attachment #8823988 - Attachment is obsolete: true
Attachment #8824383 - Flags: review+
(Reporter)

Comment 29

11 months ago
Created attachment 8824384 [details] [diff] [review]
bug1321521-regalloc-v3.patch

Regalloc, updated to match review comments.  However, I have kept the table indirect even for the GPRs, as I did not feel that the range of the Register's code() was well enough pinned down to know that I could use a uint32_t.  We can discuss that further if it becomes necessary to land this patch.
Attachment #8823982 - Attachment is obsolete: true
Attachment #8824384 - Flags: review+
(Assignee)

Comment 30

11 months ago
(In reply to Lars T Hansen [:lth] from comment #27)
> I sense that you're angry about something.  Let me explain:  I developed
> this patch in response to Jakob's suggestion that the
> AllocatableFloatRegisterSets are really not needed if the baseline compiler
> stops using them.  A natural but not necessary consequence of that was to
> expand the patch to also avoid using the AllocatableGeneralRegisterSet.

Code behind AllocatableFloatRegisterSet is mostly the same as for AllocatableGeneralRegisterSet, which is used all over IonMonkey.

So doing only this work for baseline will only remove a single line from RegisterSets.h.  From my point of view adding this work will lead to only have a custom RegisterSet for WasmBaseline.

Working on this, by only looking at WasmBaseline is from my point of view a wasted effort.  If you were to modify AllocatableFloatRegisterSet to have a nicer interface on top of the existing one, then I would find these patches more appealing.

If on the other hand the "wat" was something like "I do not understand how the register mask are manipulated and how this IndexAllocatable function play a role in this patch" my answer would probably have been better, and this would probably not ended with a suggestion to duplicate what already exists only for a single part of the engine.

> […]  I think most people find the register sets we already have to
> be extremely hard to understand (I certainly do, and Jakob seems to feel
> that way too) and though they may in fact perform very well it is far from
> obvious that they would do so.  Nothing else was implied.

The problem that I have is that I have not seen much questions asking details about the ins and outs of the current system, and why it is designed that way.  On the other hand, what I see, is a sequence of event in a hurry to get patches for a subset of the engine to replace something that we do not understand.

I think this is sad because of the "wat", jumping to conclusion quickly (as inferred from the "wat"), and looking for a solution which does not resolve the problem faced in SpiderMonkey (as being only used in WasmBaseline).
(Reporter)

Comment 31

11 months ago
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

Review of attachment 8821631 [details] [diff] [review]:
-----------------------------------------------------------------

(More to come, these are some preliminary observations, feedback welcome.)

IMO "IndexRegister" and "IndexAllocatable" are not great names, both because "Register" is very generic and in any case what you mean is "Live" (if I understand your overview comment correctly), and because "Index" is usually taken to extract one element from a set, not extract a subset.  "FilterLive" and "FilterAllocatable" might be more appropriate.  If "Live" is too specific, maybe "FilterIndividual"?  I've also tried to come up with a word that indicates that the return valus are sets, not single registers, but nothing comes to mind.

::: js/src/jit/RegisterSets.h
@@ +395,5 @@
> +    static constexpr enum RegTypeName DefaultType = RegType::DefaultType;
> +
> +    template <enum RegTypeName Name>
> +    SetType anyRegisterIndex() const {
> +        return T::template IndexRegister<Name>(bits());

Use bits_ here and below for consistency with the rest of the class?

::: js/src/jit/arm/Architecture-arm.h
@@ +616,5 @@
> +template <> inline VFPRegister::SetType
> +VFPRegister::IndexAllocatable<RegTypeName::Float64>(SetType set)
> +{
> +    // Convert "aabb ccdd eeff gghh" to "abcd efgh".
> +    SetType s2d = IndexAllocatable<RegTypeName::Float32>(set);

This would be clearer if it just used set & AllSingleMask, IMO.

Based on the comment there appears to be an invariant here that the layout must be aabb ..., ie, the bits must be duplicated.  It would be useful to assert that, if we can do so at affordable cost.  But the last line of the function appears to contradict that.  So I'm not sure what to believe.

I can only echo Jakob's comment that we want more documentation here.

It also would seem helpful to copy some of the text from your overview comment into this file, since it documents something that is now not actually documented.  On x86 that's mostly OK because the set representation there is simple, but here it is not.

::: js/src/jit/arm64/Architecture-arm64.h
@@ +433,5 @@
>      }
>  
> +    template <enum RegTypeName = DefaultType>
> +    static SetType IndexRegister(SetType s) {
> +        return SetType(0);

Does this want a MOZ_CRASH("NYI")?  It seems to me you don't want to quietly return 0 here.  Same comment in other files.

::: js/src/jit/x86-shared/Architecture-x86-shared.h
@@ +442,5 @@
> +    static constexpr enum RegTypeName DefaultType = RegTypeName::Float64;
> +
> +    template <enum RegTypeName = DefaultType>
> +    static SetType IndexRegister(SetType s) {
> +        return SetType(0);

MOZ_CRASH("NYI") here?
(Assignee)

Comment 32

11 months ago
(In reply to Lars T Hansen [:lth] from comment #31)
> (More to come, these are some preliminary observations, feedback welcome.)

Thanks for the review :)

> IMO "IndexRegister" and "IndexAllocatable" are not great names, both because
> "Register" is very generic and in any case what you mean is "Live" (if I
> understand your overview comment correctly), and because "Index" is usually
> taken to extract one element from a set, not extract a subset.  "FilterLive"
> and "FilterAllocatable" might be more appropriate.  If "Live" is too
> specific, maybe "FilterIndividual"?  I've also tried to come up with a word
> that indicates that the return valus are sets, not single registers, but
> nothing comes to mind.

"Filter" sounds like a good prefix. I used "index" to meant that these function convert typed-part into an indexable subset.

> ::: js/src/jit/RegisterSets.h
> > +    template <enum RegTypeName Name>
> > +    SetType anyRegisterIndex() const {
> > +        return T::template IndexRegister<Name>(bits());
> 
> Use bits_ here and below for consistency with the rest of the class?

I do not have any strong opinion on this part yet. I can go both ways.

> ::: js/src/jit/arm/Architecture-arm.h
> @@ +616,5 @@
> > +template <> inline VFPRegister::SetType
> > +VFPRegister::IndexAllocatable<RegTypeName::Float64>(SetType set)
> > +{
> > +    // Convert "aabb ccdd eeff gghh" to "abcd efgh".
> > +    SetType s2d = IndexAllocatable<RegTypeName::Float32>(set);
> 
> This would be clearer if it just used set & AllSingleMask, IMO.

I thought of that, but I deliberately chose to use this function here, as this would be what the Vector128 function will look like once it is added.

The Vector128 function would look at the set of allocatable Float64, and do the same bit masks.  Thus, using the Float32 here was intended in order to be symmetrical with the expected Vector128.

> Based on the comment there appears to be an invariant here that the layout
> must be aabb ..., ie, the bits must be duplicated.  It would be useful to
> assert that, if we can do so at affordable cost.  But the last line of the
> function appears to contradict that.  So I'm not sure what to believe.

This is a comment issue, as the first line is made to "and" to 2 single bits.
Thus we have:
  s7.s6.s5.s4. s3.s2.s1.s0. which is converted into (s7 & s6).(s5 & s4).(s3 & s2).(s1 & s0).

Honestly, knowing that if s0 is not present then d0 cannot be present either, we could just shift by one and remove every other bits.

  s7s6s5s4 s3s2s1s0 which is converted into s7s5s3s1


> I can only echo Jakob's comment that we want more documentation here.
>
> It also would seem helpful to copy some of the text from your overview
> comment into this file, since it documents something that is now not
> actually documented.  On x86 that's mostly OK because the set representation
> there is simple, but here it is not.

Ok, I will add the comment about the bit-set representation to the ARM architecture, and repeat it above this bits logic.

> ::: js/src/jit/arm64/Architecture-arm64.h
> @@ +433,5 @@
> >      }
> >  
> > +    template <enum RegTypeName = DefaultType>
> > +    static SetType IndexRegister(SetType s) {
> > +        return SetType(0);
> 
> Does this want a MOZ_CRASH("NYI")?  It seems to me you don't want to quietly
> return 0 here.  Same comment in other files.

I think Filter*<Type> returning 0 means that none are available, and it makes more sense than crashing from my point of view.

Thus, register types which are not yet available such as Vector128, or availble in a given register set such GPR appear as not being available from FloatRegister of ARM.
(Reporter)

Comment 33

11 months ago
(In reply to Nicolas B. Pierron [:nbp] from comment #32)
> (In reply to Lars T Hansen [:lth] from comment #31)
> > (More to come, these are some preliminary observations, feedback welcome.)
> 
> Thanks for the review :)
> 
> > IMO "IndexRegister" and "IndexAllocatable" are not great names, both because
> > "Register" is very generic and in any case what you mean is "Live" (if I
> > understand your overview comment correctly), and because "Index" is usually
> > taken to extract one element from a set, not extract a subset.  "FilterLive"
> > and "FilterAllocatable" might be more appropriate.  If "Live" is too
> > specific, maybe "FilterIndividual"?  I've also tried to come up with a word
> > that indicates that the return valus are sets, not single registers, but
> > nothing comes to mind.
> 
> "Filter" sounds like a good prefix. I used "index" to meant that these
> function convert typed-part into an indexable subset.

Ah ok.  At the risk of bikeshedding this to death that gives me another idea.  I know I like longer names than most other people here, but how about "RegistersAsIndexableSet" (or LiveAsIndexableSet) and "AllocatableAsIndexableSet" then?

> > ::: js/src/jit/RegisterSets.h
> > > +    template <enum RegTypeName Name>
> > > +    SetType anyRegisterIndex() const {
> > > +        return T::template IndexRegister<Name>(bits());
> > 
> > Use bits_ here and below for consistency with the rest of the class?
>
> I do not have any strong opinion on this part yet. I can go both ways.

Then go with bits_ since all the other methods in the class use that.

> > ::: js/src/jit/arm/Architecture-arm.h
> > @@ +616,5 @@
> > > +template <> inline VFPRegister::SetType
> > > +VFPRegister::IndexAllocatable<RegTypeName::Float64>(SetType set)
> > > +{
> > > +    // Convert "aabb ccdd eeff gghh" to "abcd efgh".
> > > +    SetType s2d = IndexAllocatable<RegTypeName::Float32>(set);
> > 
> > This would be clearer if it just used set & AllSingleMask, IMO.
> 
> I thought of that, but I deliberately chose to use this function here, as
> this would be what the Vector128 function will look like once it is added.
> 
> The Vector128 function would look at the set of allocatable Float64, and do
> the same bit masks.  Thus, using the Float32 here was intended in order to
> be symmetrical with the expected Vector128.

Fair enough.

> > Based on the comment there appears to be an invariant here that the layout
> > must be aabb ..., ie, the bits must be duplicated.  It would be useful to
> > assert that, if we can do so at affordable cost.  But the last line of the
> > function appears to contradict that.  So I'm not sure what to believe.
> 
> This is a comment issue, as the first line is made to "and" to 2 single bits.
> Thus we have:
>   s7.s6.s5.s4. s3.s2.s1.s0. which is converted into (s7 & s6).(s5 & s4).(s3
> & s2).(s1 & s0).

Right.

> Honestly, knowing that if s0 is not present then d0 cannot be present
> either, we could just shift by one and remove every other bits.
> 
>   s7s6s5s4 s3s2s1s0 which is converted into s7s5s3s1

OK, that's what I suspected.

> > I can only echo Jakob's comment that we want more documentation here.
> >
> > It also would seem helpful to copy some of the text from your overview
> > comment into this file, since it documents something that is now not
> > actually documented.  On x86 that's mostly OK because the set representation
> > there is simple, but here it is not.
> 
> Ok, I will add the comment about the bit-set representation to the ARM
> architecture, and repeat it above this bits logic.
> 
> > ::: js/src/jit/arm64/Architecture-arm64.h
> > @@ +433,5 @@
> > >      }
> > >  
> > > +    template <enum RegTypeName = DefaultType>
> > > +    static SetType IndexRegister(SetType s) {
> > > +        return SetType(0);
> > 
> > Does this want a MOZ_CRASH("NYI")?  It seems to me you don't want to quietly
> > return 0 here.  Same comment in other files.
> 
> I think Filter*<Type> returning 0 means that none are available, and it
> makes more sense than crashing from my point of view.
> 
> Thus, register types which are not yet available such as Vector128, or
> availble in a given register set such GPR appear as not being available from
> FloatRegister of ARM.

Yes, I understand that; my issue is really that no action that the register allocator can make will free up any registers of the desired type, and unless the register allocator knows, at a higher level, whether there are any registers of the desired type then there is a (granted, slight) chance of ilooping.

But I'm fine with what you have, my temperament is simply that if something is not available then always crash at the low level when possible, and make policy decisions possible at a higher level.

(Will finish review shortly.)
(Reporter)

Comment 34

11 months ago
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

Review of attachment 8821631 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.  I'm not going to swear that I understand all the template magic but I believe the patch.

r+ with documentation and naming issues addressed, not just what we talked about before (IndexRegister etc) but also a new one here: I'm not fond of the use of the word "any", as discussed further below.

BTW there's this comment in Architecture-arm.h:

// ARM doesn't have more than 32 registers. Don't take more bits than we'll
// need. Presently, we don't have plans to address the upper and lower
// halves of the double registers seprately, so 5 bits should suffice. If we
// do decide to address them seprately (vmov, I'm looking at you), we will
// likely specify it as a separate field.

With your change I think this comment is incorrect, is that not so?  I believe 5 bits is still sufficient, because `kind` tracks the register type and we don't use the high doubles (and even then 5 bits may be OK), but the information about not using the high singles is probably not right.  Arguably the comment was already incorrect, since numAliased() and aliased() already seem to account for the high singles, and those functions are used by the backtracking allocator.  Can you look into this and clean it up, if appropriate, before landing?

::: js/src/jit/RegisterSets.h
@@ +542,5 @@
>  
>  // Base accessors classes have the minimal set of raw methods to manipulate the register set
>  // given as parameter in a consistent manner.  These methods are:
>  //
> +//    - any<Type>: Returns a bit-set of all the register of a specific type

"any" does not seem like a great name for what is effectively an "all" function, esp since hasAny() means "has at least one".  Could we use "all" here and in similar places (quite a lot of them, actually)?  If you feel "all" is misleading since it's really about all the registers present in the set, how about "members" or "present"?

@@ +594,5 @@
>      }
>  
> +    template <enum RegTypeName Name>
> +    bool hasAny(RegType reg) const {
> +        return bool(any<Name>());

A matter of taste, obviously, but any<Name>() != 0 is clearer IMO.

@@ +771,5 @@
>  
>    public:
>      DEFINE_ACCESSOR_CONSTRUCTORS_(SpecializedRegSet)
>  
> +  public:

Redundant, unless it's a style thing.

@@ +792,5 @@
>      }
>  
> +    template <enum RegTypeName Name>
> +    bool hasAny() const {
> +        return !!Parent::template any<Name>();

Again, "Parent::template any<Name>() != 0" is IMO clearer, but in any case it's good to be consistent, so if you insist on using cast-to-bool please pick !!x as here or bool(x) as used elsewhere in this patch and use the same form everywhere.

@@ +882,5 @@
>  
>    public:
>      DEFINE_ACCESSOR_CONSTRUCTORS_(SpecializedRegSet)
>  
> +  public:

Redundant, unless it's a style thing.

::: js/src/jit/arm/Architecture-arm.h
@@ +619,5 @@
> +    // Convert "aabb ccdd eeff gghh" to "abcd efgh".
> +    SetType s2d = IndexAllocatable<RegTypeName::Float32>(set);
> +    static_assert(FloatRegisters::TotalSingle == 32, "Wrong mask");
> +    s2d = ((0xcccccccc & s2d) >> 1) & s2d; // 0a0b
> +    s2d = (s2d >> 1 | s2d) & 0x33333333; // 0a0b -> 00ab

I agree with Jakob that parens around the shift operators is desirable.
Attachment #8821631 - Flags: review?(lhansen) → review+
(Assignee)

Comment 35

11 months ago
Comment on attachment 8821631 [details] [diff] [review]
RegisterSets: Add a register type to getAny and add the equivalent hasAny function.

Review of attachment 8821631 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/RegisterSets.h
@@ +624,5 @@
> +        return set_.gprs().anyRegisterIndex<Name>();
> +    }
> +    template <enum RegTypeName Name>
> +    FloatRegisterSet::SetType anyFpu() const {
> +        return set_.fpus().anyRegisterIndex<Name>();

self-nit: These should be anyAllocatable<Name> (s/…/allAllocatable<Name>) as they are in the AllocatableSetAccessors class.
I will fix this before landing.
(Assignee)

Comment 36

10 months ago
(In reply to Lars T Hansen [:lth] from comment #34)
> BTW there's this comment in Architecture-arm.h:
> 
> // ARM doesn't have more than 32 registers. Don't take more bits than we'll
> // need. Presently, we don't have plans to address the upper and lower
> // halves of the double registers seprately, so 5 bits should suffice. If we
> // do decide to address them seprately (vmov, I'm looking at you), we will
> // likely specify it as a separate field.
> 
> With your change I think this comment is incorrect, is that not so?

Definitely, this comments pre-date the introduction of Single registers to our ARM backend.
(Assignee)

Comment 37

10 months ago
Created attachment 8826592 [details] [diff] [review]
nit & comments interdiff.

This should be all, unless I forgot one of the previous remark.
Assignee: lhansen → nicolas.b.pierron
Attachment #8826592 - Flags: feedback?(lhansen)
(Reporter)

Comment 38

10 months ago
Comment on attachment 8826592 [details] [diff] [review]
nit & comments interdiff.

Review of attachment 8826592 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, I think there's a bug (as discussed on IRC) but otherwise good to go.

::: js/src/jit/arm/Architecture-arm.h
@@ +651,5 @@
> +    //                     ^                 ^ ^                                     ^
> +    //                     '-- d15      d0 --' '-- s31                          s0 --'
> +    //
> +    //     ...0...00... : s{2n}, s{2n+1} and d{n} are not available
> +    //     ...1...01... : s{2n} is available (*)

Probably remove the (*) here because the footnote is not here.

@@ +657,5 @@
> +    //     ...1...11... : s{2n}, s{2n+1} and d{n} are available
> +    //
> +    // The goal of this function is to return the set of double registers which
> +    // are available as an indexable bit set. This implies that iff a double bit
> +    // is set, then the register is available.

"is set in the returned set", to make sure we don't confuse it with the input.

@@ +663,5 @@
> +    // To do so, this functions converts the 32 bits set of single registers
> +    // into a 16 bits set of equivalent double registers. Then, we mask out
> +    // double registers which do not have all the single register that compose
> +    // them. As d{n} bit is set when s{2n} is available, we only need to take
> +    // s{2n+1} into account.

Very nice.

@@ +671,2 @@
>      static_assert(FloatRegisters::TotalSingle == 32, "Wrong mask");
> +    s2d = (0xcccccccc & s2d) >> 1; // Filter s{2n+1} registers.

I'm guessing 0xAAAAAAAA here.
Attachment #8826592 - Flags: feedback?(lhansen) → feedback+

Comment 39

10 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7e148db2e85
RegisterSets: Add a register type to getAny and add the equivalent hasAny function. r=lth

Comment 40

10 months ago
Various builds are failing to build with failures like https://treeherder.mozilla.org/logviewer.html#?job_id=68870774&repo=mozilla-inbound

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/a55032f495c4
Flags: needinfo?(nicolas.b.pierron)
(Assignee)

Comment 41

10 months ago
The previous patch had 2 issues:
 - arm64 had a bad assertion about not having aliases, which contradict the rest of the arm64 implementation, and caused the arm64 failures.
 - Windows compilers had a problem with qualified enums[1]. I replaced all "enum RegTypeName" by "RegTypeName" type name.

This seems to fix the issues encountered on treeherder:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d78b0c986f1691787b540743b814bda5327ba951

[1] https://connect.microsoft.com/VisualStudio/feedback/details/771978/compiler-error-c3431-using-elaborated-type-specifiers-for-scoped-enumerations
Flags: needinfo?(nicolas.b.pierron)

Comment 42

10 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54d34c024268
RegisterSets: Add a register type to getAny and add the equivalent hasAny function. r=lth

Comment 43

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/54d34c024268
Status: ASSIGNED → RESOLVED
Last Resolved: 10 months ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(Assignee)

Comment 44

10 months ago
Lars, do we need to backport this patch, to 52 or older, for the WasmBaseline compiler?
Flags: needinfo?(lhansen)
(Reporter)

Comment 45

10 months ago
No, the baseline compiler goes live only on FF54, until then it's preffed-off everywhere and we can live with this problem on FF52 and older.  (It's good to have it on FF53 though - and now we do :-)
Flags: needinfo?(lhansen)
(Reporter)

Comment 46

10 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c924fb664c18fb3c17bdbf6fc4d808b9107b87a
Bug 1321521 - register set adjustments for 'none' platform, r=me
(Reporter)

Comment 47

10 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/97c45b13a1654bce114ace1de67c5aacb12a75fd
Bug 1321521 - fix include order.  r=me

Comment 48

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2c924fb664c1
https://hg.mozilla.org/mozilla-central/rev/97c45b13a165
You need to log in before you can comment on or make changes to this bug.