Open Bug 771864 Opened 12 years ago Updated 2 years ago

IonMonkey: split out effective address computations in x[v + i] accesses

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect

Tracking

()

People

(Reporter: bhackett1024, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [ion:t])

Attachments

(2 obsolete files)

Attached patch WIP (dc54ec097421) (obsolete) — Splinter Review
Right now, to perform the actual memory access in an x[v + i] computation within a loop (x and v loop invariant), IM (and JM) need to generate code like:

// $rx holds 'x->elements', loop invariant
// $rv holds 'v', loop invariant
add $rv $ri
mov ($rx,$ri,2) $res

It would be faster to generate code like:

// $rv holds 'x->elements + (v*4)', loop invariant
mov ($rx,$ri,2) $res

Making this transformation has several benefits:

- One less register needed for loop invariant things
- One less register needed for (v + i)
- No add instruction

The attached patch is a WIP (works on x86 on one testcase) to make this transform, by allowing the generation of EffectiveAddress nodes that are internal pointers into a typed array.  (the patch also includes the fix for bug 771835).  On this testcase:

M = new Int32Array(1024 * 1024);

function foo(v, len) {
  var res = 0;
  for (var i = 0; i < len; i++)
    res += M[v + i];
}

for (var i = 0; i < 1000; i++)
  foo(1024, 1024 * 100);

I get these times:

--ion -n (base): 260
--ion -n (with just this): 260
--ion -n (with bug 771835): 224
--ion -n (with both): 187
-m -n: 222

So this doesn't have an effect when we can't hoist the bounds check (expected, since the bounds check still needs to do the v + i), but a good one when the check can be hoisted.

JM+TI doesn't have either this or bug 771835 applied, but is still faster than baseline IM (and as fast as with bug 771835 applied).  I didn't look into this in detail, but looking at the code now being generated by IM there are inefficiencies.  The inner loop (interrupt check elided) with both bugs applied looks like:

cmpl       %ecx, %esi
jge        ((319))
movl       0(%ebx,%esi,2), %edx
movl       %eax, %ebp
addl       %edx, %ebp
jo         ((332))
movl       %esi, %edx
addl       $0x1, %edx
jo         ((343))
movl       %edx, %esi
movl       %ebp, %eax
jmp        ((352))

The two moves before the adds look like work needed to keep around both inputs in case an add overflows.  JM doesn't do this --- if the add overflows it undoes the add by subtracting the intact operand.  I don't know what's up with the two moves at the end, maybe they're related or maybe the regalloc is just being weird.  There's little register pressure here and it should be possible to just generate this body:

cmpl       %ecx, %esi
jge        ((319))
movl       0(%ebx,%esi,2), %edx
addl       %edx, %ebp
jo         ((332))
addl       $0x1, %esi
jo         ((343))
jmp        ((352))

JM can hoist the second jo, can port that to IM (it's an extension of bounds check hoisting).  I think there's some instruction fusion potential here, the mov + add could potentially be replaced with a single

addl 0(%ebx,%esi,2), %ebp

After that I think this would be an optimal straight line compilation (the first jo can't be removed but emscripten can insert a |0 to kill it).  Then there's unrolling......
Attached patch patch (dc54ec097421) (obsolete) — Splinter Review
Assignee: general → bhackett1024
Attachment #640015 - Attachment is obsolete: true
Attachment #640356 - Flags: review?(dvander)
Blocks: IonMonkey
Comment on attachment 640356 [details] [diff] [review]
patch (dc54ec097421)

Review of attachment 640356 [details] [diff] [review]:
-----------------------------------------------------------------

It looks like we overlay the effective addresses with the elements field expected on the L*TypedArray ops. That seems okay, but there should be a comment in LIR-Common.h for those ops that the field may either be an effective address or an elements base pointer. (And perhaps, asserts elsewhere that were removed should just be extended to include the opcode.)

There are a lot of changes to MIR instructions in this patch that add an extra operand for a bounds check. Is that part of this patch? If not, it should be in a separate bug. Otherwise, could you explain what that's for?

::: js/src/ion/IonBuilder.cpp
@@ +4474,5 @@
> +IonBuilder::getTypedArrayEffectiveAddress(MInstruction *elements, MDefinition *id, int arrayType,
> +                                          MInstruction **pNewElements, MDefinition **pNewId, int32 *pOffset)
> +{
> +    /* XXX no lea on ARM? */
> +#ifndef JS_CPU_ARM

This should have ARM parity. I think you're right that ARM doesn't have a direct LEA equivalent, but it has a scratch register so we should be able to implement a computeAddress().

@@ +4489,5 @@
> +     */
> +
> +    size_t scale = TypedArray::slotWidth(arrayType);
> +
> +    do {

Can we find a way to refactor this or split it into separate functions so the |do-while (0)| pattern isn't needed?

::: js/src/ion/LICM.h
@@ +133,5 @@
>      // check are not loop invariant, we analyze the tests and increments done
>      // in the loop to try to find a stronger condition which can be hoisted.
>  
> +    LoopReturn tryHoistBoundsCheck(MBoundsCheck *ins, MTest *test, BranchDirection direction,
> +                                   InstructionQueue *pHoistedChecks);

Is this part of this patch?

::: js/src/ion/LIR-Common.h
@@ +2219,5 @@
>  
>  // Load a typed value from a typed array's elements vector.
>  class LLoadTypedArrayElement : public LInstructionHelper<1, 2, 1>
>  {
> +    int32 offset_;

Instead of adding this field, add a mir() helper that returns mir_->toStoreBlah and then use mir()->offset().

@@ +2274,5 @@
>  };
>  
>  class LStoreTypedArrayElement : public LInstructionHelper<0, 3, 0>
>  {
> +    int32 offset_;

Ditto.

::: js/src/ion/Lowering.cpp
@@ +725,5 @@
>      if (ins->specialization() == MIRType_Int32) {
>          JS_ASSERT(lhs->type() == MIRType_Int32);
> +
> +        if (!ins->hasUses())
> +            return true;

What is this for?

::: js/src/ion/MIR.cpp
@@ +989,5 @@
> +MBinaryArithInstruction::inferInteger()
> +{
> +    specialization_ = MIRType_Int32;
> +    setResultType(MIRType_Int32);
> +}

Not part of this patch, right?

::: js/src/ion/MIR.h
@@ +3168,5 @@
> +    }
> +
> +    bool congruentTo(MDefinition *const &ins) const {
> +        if (ins->isEffectiveAddress()) {
> +            MEffectiveAddress *nins = ins->toEffectiveAddress();

nit: Instead, make this early return when (ins->op() != op())

::: js/src/ion/shared/Assembler-shared.h
@@ +173,5 @@
>      BaseIndex() { PodZero(this); }
> +
> +    // XXX is this correct for all platforms? Also, is this correct for any platform?
> +    static int32 minOffset() { return INT16_MIN; }
> +    static int32 maxOffset() { return INT16_MAX; }

What is this restriction?

::: js/src/ion/x64/MacroAssembler-x64.h
@@ +481,5 @@
>  
> +    void loadEffectiveAddress(const Address &src, Register dest) {
> +        lea(Operand(src), dest);
> +    }
> +    void loadEffectiveAddress(const BaseIndex &src, Register dest) {

nit: rename to "computeAddress", here and on other platforms
Attachment #640356 - Flags: review?(dvander)
Attachment #640356 - Attachment is obsolete: true
Assignee: bhackett1024 → general
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: