Closed Bug 1254500 Opened 8 years ago Closed 8 years ago

IonMonkey: MIPS: Implement CtzI and PopcntI

Categories

(Core :: JavaScript Engine: JIT, defect)

Other
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: hev, Assigned: hev)

Details

Attachments

(3 files, 3 obsolete files)

Implement CtzI on MIPS.
Summary: IonMonkey: MIPS: Implement CodeGeneratorMIPSShared::visitCtzI → IonMonkey: MIPS: Implement CtzI and PopcntI
Attachment #8728213 - Flags: review?(mbebenita)
Attachment #8728214 - Flags: review?(mbebenita)
Attachment #8728213 - Flags: review?(arai.unmht)
Attachment #8728214 - Flags: review?(arai.unmht)
Can you give me some context and some description about the code?
Flags: needinfo?(r)
(In reply to Tooru Fujisawa [:arai] from comment #3)
> Can you give me some context and some description about the code?

Ctz(count trailing zeros):
https://en.wikipedia.org/wiki/Find_first_set
https://bugzilla.mozilla.org/show_bug.cgi?id=1251140

Popcnt(count number of bits set to 1):
https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT
https://bugzilla.mozilla.org/show_bug.cgi?id=1251392

Thank you!
Flags: needinfo?(r)
Comment on attachment 8728213 [details] [diff] [review]
0001-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8728213 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +929,5 @@
> +    masm.ma_and(output, input);
> +    masm.as_clz(output, output);
> +    masm.ma_negu(scratch, output);
> +    masm.ma_addu(scratch, Imm32(0x1f));
> +    masm.as_movn(output, scratch, input);

What algorithm are these code using?
Flags: needinfo?(r)
(In reply to Tooru Fujisawa [:arai] from comment #5)
> Comment on attachment 8728213 [details] [diff] [review]
> 0001-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch
> 
> Review of attachment 8728213 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +929,5 @@
> > +    masm.ma_and(output, input);
> > +    masm.as_clz(output, output);
> > +    masm.ma_negu(scratch, output);
> > +    masm.ma_addu(scratch, Imm32(0x1f));
> > +    masm.as_movn(output, scratch, input);
> 
> What algorithm are these code using?

output = clz(-input & input)
if (0 != input)
    output = -output + 31
Flags: needinfo?(r)
Comment on attachment 8728213 [details] [diff] [review]
0001-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8728213 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +929,5 @@
> +    masm.ma_and(output, input);
> +    masm.as_clz(output, output);
> +    masm.ma_negu(scratch, output);
> +    masm.ma_addu(scratch, Imm32(0x1f));
> +    masm.as_movn(output, scratch, input);

input is useRegisterAtStart, so, iiuc, output register can alias input register, and after overwriting output, input won't have original value.
can this be re-written to avoid using output as temporary register?

https://dxr.mozilla.org/mozilla-central/rev/f0c0480732d36153e8839c7f17394d45f679f87d/js/src/jit/Lowering.cpp#1401
> void
> LIRGenerator::visitCtz(MCtz* ins)
> {
>     MDefinition* num = ins->num();
> 
>     LCtzI* lir = new(alloc()) LCtzI(useRegisterAtStart(num));
>     define(lir, ins);
> }
Attachment #8728213 - Flags: review?(arai.unmht)
(In reply to Tooru Fujisawa [:arai] from comment #7)
> Comment on attachment 8728213 [details] [diff] [review]
> 0001-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch
> 
> Review of attachment 8728213 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +929,5 @@
> > +    masm.ma_and(output, input);
> > +    masm.as_clz(output, output);
> > +    masm.ma_negu(scratch, output);
> > +    masm.ma_addu(scratch, Imm32(0x1f));
> > +    masm.as_movn(output, scratch, input);
> 
> input is useRegisterAtStart, so, iiuc, output register can alias input
> register, and after overwriting output, input won't have original value.
> can this be re-written to avoid using output as temporary register?
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> f0c0480732d36153e8839c7f17394d45f679f87d/js/src/jit/Lowering.cpp#1401
> > void
> > LIRGenerator::visitCtz(MCtz* ins)
> > {
> >     MDefinition* num = ins->num();
> > 
> >     LCtzI* lir = new(alloc()) LCtzI(useRegisterAtStart(num));
> >     define(lir, ins);
> > }

Nice, Thank you!
Attachment #8728213 - Attachment is obsolete: true
Attachment #8728213 - Flags: review?(mbebenita)
Attachment #8730201 - Flags: review?(arai.unmht)
Comment on attachment 8730201 [details] [diff] [review]
0001-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8730201 [details] [diff] [review]:
-----------------------------------------------------------------

Scratch registers are supposed to be used in MacroAssembler.  sorry, I should've noticed it in previous review.
CodeGenerator should use temp() registers (passed by Lowering) instead, for temporary register purpose (CodeGeneratorARM::visitCtzI uses scratch tho...)

So, either:

  (a) move whole ctz code or some meaningful parts into MacroAssembler method(s), and use scratch registers there, and call those methods from CodeGenerator
  (b) request one or more temp() registers in Lowering (maybe separating LIRGenerator::visitCtz into arch specific ones)

with (a), or even with current code, please use ScratchRegisterScope and SecondScratchRegisterScope to make sure there is no overlap usage.
Attachment #8730201 - Flags: review?(arai.unmht) → feedback+
Comment on attachment 8728214 [details] [diff] [review]
0002-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8728214 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +941,5 @@
> +
> +    // Equivalent to GCC output of mozilla::CountPopulation32()
> +    Register tmp = ToRegister(ins->temp());
> +
> +    masm.ma_sra(tmp, input, Imm32(1));

why this and some of following use ms_sra instead of ms_srl ?
the sign bit is masked later or always 0 in all cases tho, is ma_sra preferred than ma_srl for some reason?

@@ +943,5 @@
> +    Register tmp = ToRegister(ins->temp());
> +
> +    masm.ma_sra(tmp, input, Imm32(1));
> +    masm.ma_and(tmp, Imm32(0x55555555));
> +    masm.as_subu(output, input, tmp);

input is useRegisterAtStart, so temp could alias input, and input could already be clobbered before as_subu.
Attachment #8728214 - Flags: review?(arai.unmht) → feedback+
Comment on attachment 8728214 [details] [diff] [review]
0002-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8728214 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +943,5 @@
> +    Register tmp = ToRegister(ins->temp());
> +
> +    masm.ma_sra(tmp, input, Imm32(1));
> +    masm.ma_and(tmp, Imm32(0x55555555));
> +    masm.as_subu(output, input, tmp);

I've been hearing conflicting answers to this question, see: http://logs.glob.uno/?c=mozilla%23jsapi&s=26+Feb+2016&e=26+Feb+2016&h=useRegisterAtStart#c704087 I am not sure who is right, I've been assuming that temp() registers cannot alias input and output registers.
interesting :)

http://logs.glob.uno/?c=mozilla%23ionmonkey&s=14+Mar+2016&e=14+Mar+2016#c118267

(In reply to Jan de Mooij [:jandem] from bug 1248289 comment #4)
> Using *AtStart means it's okay if one of the temps or the output get
> assigned the same register as the operand. I think this should be
> useRegister, as the code uses the operands after writing to temps.

ni?ing jandem and nbp,
what's the correct understanding about temp, input, output, and *AtStart ?
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(jdemooij)
(In reply to Tooru Fujisawa [:arai] from comment #11)
> Comment on attachment 8728214 [details] [diff] [review]
> 0002-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch
> 
> Review of attachment 8728214 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +941,5 @@
> > +
> > +    // Equivalent to GCC output of mozilla::CountPopulation32()
> > +    Register tmp = ToRegister(ins->temp());
> > +
> > +    masm.ma_sra(tmp, input, Imm32(1));
> 
> why this and some of following use ms_sra instead of ms_srl ?
> the sign bit is masked later or always 0 in all cases tho, is ma_sra
> preferred than ma_srl for some reason?
> 
This algorithm is copied from ARM. ;)
https://dxr.mozilla.org/mozilla-central/rev/f0c0480732d3/js/src/jit/arm/CodeGenerator-arm.cpp#971

> @@ +943,5 @@
> > +    Register tmp = ToRegister(ins->temp());
> > +
> > +    masm.ma_sra(tmp, input, Imm32(1));
> > +    masm.ma_and(tmp, Imm32(0x55555555));
> > +    masm.as_subu(output, input, tmp);
> 
> input is useRegisterAtStart, so temp could alias input, and input could
> already be clobbered before as_subu.

I don't clear the registers allocation, From two tests, looks output could alias input (test 1 & 2), but temp could *not* alias input or output (test 2):
Test 1:
Block 0
[2,3 AsmJSParameter] [def v1<i>:a0]
[4,5 CtzI] [def v2<i>] [use v1:r]
[6,7 CtzI] [def v3<i>] [use v2:r]
[8,9 AsmJSReturn] [use v3:v0]

[RegAlloc] Beginning grouping and queueing registers
Live ranges by virtual register:
  v1 [3,5) (def) v1:r@4
  v2 [5,7) (def) v2:r@6
  v3 [7,10) (def) v3:v0@9

Live ranges by bundle:
  v1 [3,5) (def) v1:r@4
  v2 [5,7) (def) v2:r@6
  v3 [7,10) (def) v3:v0@9
[RegAlloc] Beginning main allocation loop
[RegAlloc] Allocating  v3 [7,10) (def) v3:v0@9 [priority 3] [weight 1333]
[RegAlloc]   Requirement v0, due to use at 9
[RegAlloc]   allocated to v0
[RegAlloc] Allocating  v2 [5,7) (def) v2:r@6 [priority 2] [weight 2000]
[RegAlloc]   allocated to v0
[RegAlloc] Allocating  v1 [3,5) (def) v1:r@4 [priority 2] [weight 2000]
[RegAlloc]   Requirement a0, fixed by definition
[RegAlloc]   allocated to a0
[RegAlloc] Main allocation loop complete
Allocations:
Live ranges by virtual register:
  v1 [3,5) a0 (def) v1:r@4
  v2 [5,7) v0 (def) v2:r@6
  v3 [7,10) v0 (def) v3:v0@9

Live ranges by bundle:
  v1 [3,5) a0 (def) v1:r@4
  v2 [5,7) v0 (def) v2:r@6
  v3 [7,10) v0 (def) v3:v0@9
Allocations by physical register:
  v0: v2 [5,7) v0 (def) v2:r@6 / v3 [7,10) v0 (def) v3:v0@9
  a0: v1 [3,5) a0 (def) v1:r@4

[RegAlloc] Resolving control flow (vreg loop)
[RegAlloc] Resolving control flow (block loop)
[RegAlloc] Reifying Allocations
[RegAlloc] Populating Safepoints
Register Allocation Integrity State:

Block 0
[2,3 AsmJSParameter] [def v1<i>:a0]
[4,5 CtzI] [def v2<i>:v0] [use v1:r a0]
[6,7 CtzI] [def v3<i>:v0] [use v2:r v0]
[8,9 AsmJSReturn] [use v3:v0 v0]

Test 2:
Block 0
[2,3 AsmJSParameter] [def v1<i>:a0]
[4,5 PopcntI] [def v3<i>] [temp v2<g>] [use v1:r]
[6,7 PopcntI] [def v5<i>] [temp v4<g>] [use v3:r]
[8,9 AsmJSReturn] [use v5:v0]

[RegAlloc] Beginning grouping and queueing registers
Live ranges by virtual register:
  v1 [3,5) (def) v1:r@4
  v2 [4,6) (def)
  v3 [5,7) (def) v3:r@6
  v4 [6,8) (def)
  v5 [7,10) (def) v5:v0@9

Live ranges by bundle:
  v1 [3,5) (def) v1:r@4
  v2 [4,6) (def)
  v3 [5,7) (def) v3:r@6
  v4 [6,8) (def)
  v5 [7,10) (def) v5:v0@9
[RegAlloc] Beginning main allocation loop
[RegAlloc] Allocating  v5 [7,10) (def) v5:v0@9 [priority 3] [weight 1333]
[RegAlloc]   Requirement v0, due to use at 9
[RegAlloc]   allocated to v0
[RegAlloc] Allocating  v3 [5,7) (def) v3:r@6 [priority 2] [weight 2000]
[RegAlloc]   allocated to v0
[RegAlloc] Allocating  v1 [3,5) (def) v1:r@4 [priority 2] [weight 2000]
[RegAlloc]   Requirement a0, fixed by definition
[RegAlloc]   allocated to a0
[RegAlloc] Allocating  v2 [4,6) (def) [priority 2] [weight 1000000]
[RegAlloc]   v0 collides with  v3 [5,7) v0 (def) v3:r@6 [weight 2000]
[RegAlloc]   allocated to v1
[RegAlloc] Allocating  v4 [6,8) (def) [priority 2] [weight 1000000]
[RegAlloc]   v0 collides with  v3 [5,7) v0 (def) v3:r@6 [weight 2000]
[RegAlloc]   allocated to v1
[RegAlloc] Main allocation loop complete
Allocations:
Live ranges by virtual register:
  v1 [3,5) a0 (def) v1:r@4
  v2 [4,6) v1 (def)
  v3 [5,7) v0 (def) v3:r@6
  v4 [6,8) v1 (def)
  v5 [7,10) v0 (def) v5:v0@9

Live ranges by bundle:
  v1 [3,5) a0 (def) v1:r@4
  v2 [4,6) v1 (def)
  v3 [5,7) v0 (def) v3:r@6
  v4 [6,8) v1 (def)
  v5 [7,10) v0 (def) v5:v0@9
Allocations by physical register:
  v0: v3 [5,7) v0 (def) v3:r@6 / v5 [7,10) v0 (def) v5:v0@9
  v1: v2 [4,6) v1 (def) / v4 [6,8) v1 (def)
  a0: v1 [3,5) a0 (def) v1:r@4

[RegAlloc] Resolving control flow (vreg loop)
[RegAlloc] Resolving control flow (block loop)
[RegAlloc] Reifying Allocations
[RegAlloc] Populating Safepoints
Register Allocation Integrity State:

Block 0
[2,3 AsmJSParameter] [def v1<i>:a0]
[4,5 PopcntI] [def v3<i>:v0] [temp v2 v2<g>:v1] [use v1:r a0]
[6,7 PopcntI] [def v5<i>:v0] [temp v4 v4<g>:v1] [use v3:r v0]
[8,9 AsmJSReturn] [use v5:v0 v0]
Currently, We can't use ScratchRegisterScope on MIPS, because AutoRegisterScope is indeed too strict. https://bugzilla.mozilla.org/show_bug.cgi?id=1205524
Attachment #8730201 - Attachment is obsolete: true
Attachment #8730511 - Flags: review?(arai.unmht)
Comment on attachment 8730511 [details] [diff] [review]
0001-Bug-1254500-IonMonkey-MIPS-Implement-ma_ctz.-r-arai.patch

Review of attachment 8730511 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Heiher [:hev] from comment #15)
> Currently, We can't use ScratchRegisterScope on MIPS, because
> AutoRegisterScope is indeed too strict.
> https://bugzilla.mozilla.org/show_bug.cgi?id=1205524

okay, in that case I agree leaving it to the bug :)
Attachment #8730511 - Flags: review?(arai.unmht) → review+
Attachment #8730512 - Flags: review?(arai.unmht) → review+
(In reply to Tooru Fujisawa [:arai] from comment #13)
> interesting :)
> 
> http://logs.glob.uno/
> ?c=mozilla%23ionmonkey&s=14+Mar+2016&e=14+Mar+2016#c118267
> 
> (In reply to Jan de Mooij [:jandem] from bug 1248289 comment #4)
> > Using *AtStart means it's okay if one of the temps or the output get
> > assigned the same register as the operand. I think this should be
> > useRegister, as the code uses the operands after writing to temps.
> 
> ni?ing jandem and nbp,
> what's the correct understanding about temp, input, output, and *AtStart ?

Here is a graphical representation of the different allocations, in terms of Ranges.

                inputOf ---.      .---- outputOf
                           v      v

                           +      +
      Live Range    - -----+------+----- -
                           +      +

      UseAtStart    - ------------+

      Temp                 +------+

      Use           - ------------------ -

      Define                      +----- -

                           ^      ^
               snapshot ---'      '--- callVM


I guess this would be better as a comment somewhere in the code …
Flags: needinfo?(nicolas.b.pierron)
(In reply to Tooru Fujisawa [:arai] from comment #13)
> what's the correct understanding about temp, input, output, and *AtStart ?

(1) Any output or temp registers don't alias each other.

(2) If you use *AtStart for an operand, an output or temp register might be the same as the operand's register.

(3) If you DON'T use *AtStart, the input will always be different from any temp or output registers.

Does that help?
Flags: needinfo?(jdemooij)
Thanks :)

(In reply to Nicolas B. Pierron [:nbp] from comment #18)
> Here is a graphical representation of the different allocations, in terms of
> Ranges.
> 
>                 inputOf ---.      .---- outputOf
>                            v      v
> 
>                            +      +
>       Live Range    - -----+------+----- -
>                            +      +
> 
>       UseAtStart    - ------------+
> 
>       Temp                 +------+
> 
>       Use           - ------------------ -
> 
>       Define                      +----- -
> 
>                            ^      ^
>                snapshot ---'      '--- callVM

does it mean that if 2 ranges have no overlap, they can be assigned to same register? (it doesn't match to jandem's comment tho...)
what if 2 ranges have some overlap?


(In reply to Jan de Mooij [:jandem] from comment #19)
> (In reply to Tooru Fujisawa [:arai] from comment #13)
> > what's the correct understanding about temp, input, output, and *AtStart ?
> 
> (1) Any output or temp registers don't alias each other.
> 
> (2) If you use *AtStart for an operand, an output or temp register might be
> the same as the operand's register.
> 
> (3) If you DON'T use *AtStart, the input will always be different from any
> temp or output registers.
> 
> Does that help?

So, in PopcntI's case, rule(2) is applied to input (operand) and temp, and input may be clobbered by writing into temp, right?
https://dxr.mozilla.org/mozilla-central/rev/d6ee82b9a74155b6bfd544166f036fc572ae8c56/js/src/jit/Lowering.cpp#1410
> void
> LIRGenerator::visitPopcnt(MPopcnt* ins)
> {
>     MDefinition* num = ins->num();
> 
>     LPopcntI* lir = new(alloc()) LPopcntI(useRegisterAtStart(num), temp());
>     define(lir, ins);
> }
Hm when I added AtStart, for LSRA, I think it used to be that they could alias either outputs or temps.

Looking at the current backtracking code suggests AtStart operands can only alias defs but not temps (since temps start at inputOf and defs start at outputOf).

I wonder if that's by design and/or worth changing. If we want to keep the current behavior, we should update this comment:

https://dxr.mozilla.org/mozilla-central/rev/d6ee82b9a74155b6bfd544166f036fc572ae8c56/js/src/jit/shared/Lowering-shared.h#70-85
Flags: needinfo?(bhackett1024)
UsedAtStart:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#690-692,705

Temp:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#628,642-643

Use:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#693,705

Define:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#598,614

So I guess, the only error in the previous comment was about Use and UseAtStart:


                inputOf ---.      .---- outputOf
                           v      v

                           +      +
      Live Range    - -----+------+----- -
                           +      +

      UseAtStart    - -----+

      Temp                 +------+

      Use           - ------------+

      Define                      +----- -

                           ^      ^
               snapshot ---'      '--- callVM
(In reply to Tooru Fujisawa [:arai] from comment #20)
> does it mean that if 2 ranges have no overlap, they can be assigned to same
> register? (it doesn't match to jandem's comment tho...)
> what if 2 ranges have some overlap?

After discussing with jandem on irc, The "+" do overlap.
Thus they cannot have the same register allocation.
thanks, so, now, only UseAtStart and Define can alias, right?
Note that there is a comment about use vs useAtStart in Lowering-shared.h, but I agree it's hard to find and doesn't cover it all... Maybe time to update it in a separate bug?

https://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/Lowering-shared.h#70-85
Attachment #8728214 - Attachment is obsolete: true
Attachment #8728214 - Flags: review?(mbebenita)
Attachment #8732680 - Flags: review?(arai.unmht)
Comment on attachment 8732680 [details] [diff] [review]
0003-Bug-1254500-IonMonkey-MIPS-Implement-CodeGeneratorMI.patch

Review of attachment 8732680 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8732680 - Flags: review?(arai.unmht) → review+
https://hg.mozilla.org/mozilla-central/rev/95ac2b55d296
https://hg.mozilla.org/mozilla-central/rev/1a39a59c9103
https://hg.mozilla.org/mozilla-central/rev/03c09c9919fd
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
(In reply to Jan de Mooij [:jandem] from comment #21)
> Hm when I added AtStart, for LSRA, I think it used to be that they could
> alias either outputs or temps.
> 
> Looking at the current backtracking code suggests AtStart operands can only
> alias defs but not temps (since temps start at inputOf and defs start at
> outputOf).
> 
> I wonder if that's by design and/or worth changing. If we want to keep the
> current behavior, we should update this comment:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> d6ee82b9a74155b6bfd544166f036fc572ae8c56/js/src/jit/shared/Lowering-shared.
> h#70-85

I'm pretty sure I just aped LSRA's behavior when adding the constraints for the various operand types.  Allowing AtStart operands to alias temps would improve the quality of the allocation but would require auditing and probably changes to code generation I think.
Flags: needinfo?(bhackett1024)
You need to log in before you can comment on or make changes to this bug.