Closed Bug 706328 Opened 13 years ago Closed 12 years ago

IonMonkey: Handle LSetElement for Out-of-Bounds Index.

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: sstangl, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 5 obsolete files)

Ion performance in SS 1.0 bitops-nsieve-bits is about twice as slow as in the interpreter. The reason for this is that we bail out almost immediately at the first site of LBoundsCheck: the InitializedLength opcode reports a length of 0x27, which is quite an understatement.

The MIR, LIR, and regalloc all look correct.
Apparently "InitializedLength" means "the current (dense) length of the array", not "the length to which the array was initialized". So the value of 0x27 is actually correct.

This means that nsieve-bits needs a stub call in order to expand the array -- and that *nearly every* LSetElement in this benchmark will take that stub call.

An obvious change to support this behavior would be to make LBoundsCheck an input to LSetElement.
Summary: IonMonkey: nsieve-bits bails out unnecessarily. → IonMonkey: Handle LSetElement for Out-of-Bounds Index.
Yeah we should have an OOL path to handle the index == initializedLength case like JM.
Blocks: IonSpeed
Assignee: general → jdemooij
Status: NEW → ASSIGNED
I looked a bit into this today, and there are a number of different ways to fix this.

For each jsop_setelem, we have analysis info to determine whether this setelem has been used to write holes in the past. I think we should use this info to handle two common use cases:

1) setelem is used to overwrite non-hole values (index is always < initializedLength).

2) setelem is used to initialized an array by overwriting holes, for example:
--
var a = Array(n);
for (var i=0; i<n; i++)
    a[i] = 0;
--
Every iteration we have to bump the initializedLength value, and sometimes we have to reallocate the elements vector.

We currently handle case 1, but we need to handle the second case too. For the first case we use the following MIR instructions:

- MElements
- MInitializedLength
- MBoundsCheck (always bails out if this fails)
- MStoreElement

All but the last instruction can be hoisted (LICM) or eliminated (GVN). I think we should keep these instructions to handle the first case, and do something else for the second case.

To handle the second case, we could add 2 new LIR instructions, LStoreElementHole{T,V} (or something similar). These instructions check the initialized length and have an OOL path to bump the initializedLength or call a StoreElement stub. (note that with this approach, we don't need MInitializedLength and MBoundsCheck for case 2)

I think this is the best approach, but there are two alternatives:

1) Add MBoundsCheckAndBranch (or something) like MBoundsCheck, but instead of bailing out it branches to the OOL path of MStoreElement.

2) Add MUpdateInitializedLength and use it instead of MInitializedLength + MBoundsCheck.

These approaches have the following disadvantages:

- We need to jump from one LIR instruction to another LIR instruction (for instance, LBoundsCheckAndBranch -> LStoreElement OOL path). This is complicated because an LMoveGroup may be inserted between the bounds check and the store element. We can't just jump over the move group.

- The OOL path of LStoreElement{T,V} needs access to the JSObject, not just the elements array, for case 2.
Attached patch WIP (obsolete) — Splinter Review
This seems to work quite well. For the following micro-benchmark:
--
var a = [];
for (var i=0; i<10000000; i++) {
    a[i] = i;
}
--
I get these numbers:

IM+TI (new):  82 ms
JM+TI      : 108 ms
IM+TI (old): 814 ms

TODO: remove duplicate code, cleanup, x64/ARM support.

pushValueArg and related functions were copied from the patch in bug 713526, I will remove them once Brian's patch lands.
Attached patch WIP 2 (obsolete) — Splinter Review
Adds x64 support, removes a lot of duplicate code, moves StoreElement{V,T} to CodeGenerator (and have it call platform-dependent methods).
Attachment #584761 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
Basically done, but depends on the patch for bug 713526. When it lands I can cleanup this patch and ask for review.
Attachment #584817 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
Makes access-nsieve 9x faster (1 ms slower than JM+TI, but looking at the generated code, there's still room for improvement).

This patch does not help bitops-nsieve-bits. I suspect that we hit bug 705302 and then block OSR. Once that's fixed, this patch should help there too.
Attachment #584963 - Attachment is obsolete: true
Attachment #589139 - Flags: review?(dvander)
Please ignore the VM call. If we box the int32 index, we can just call the generic stub I'm going to add in bug 718982. Later we could also add a SetElementDense stub, but the generic function is okay for now.
Attachment #589139 - Flags: review?(dvander)
Depends on: 718982
Attached patch Patch (obsolete) — Splinter Review
Rebased + ARM support.
Attachment #589139 - Attachment is obsolete: true
Attachment #591760 - Flags: review?(dvander)
Comment on attachment 591760 [details] [diff] [review]
Patch

Probably best if I wait for bug 706328, should be able to simplify StoreElementV a bit with the BaseIndex type added there.
Attachment #591760 - Flags: review?(dvander)
(In reply to Jan de Mooij (:jandem) from comment #10)
> Comment on attachment 591760 [details] [diff] [review]
> Patch
> 
> Probably best if I wait for bug 706328, should be able to simplify
> StoreElementV a bit with the BaseIndex type added there.

Bug 706328 is this bug.
(In reply to Paul Wright from comment #11)
> 
> Bug 706328 is this bug.

Oops right, bug 721280.
Attached patch PatchSplinter Review
Rebased on top of the patches in bug 721280, use BaseIndex for StoreElementV..
Attachment #591760 - Attachment is obsolete: true
Attachment #592257 - Flags: review?(dvander)
Comment on attachment 592257 [details] [diff] [review]
Patch

Review of attachment 592257 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +936,5 @@
> +    const ValueOperand value = ToValue(lir, LStoreElementHoleV::Value);
> +
> +    // OOL path if index >= initializedLength.
> +    Address initLength(elements, ObjectElements::offsetOfInitializedLength());
> +    Int32Key key = index->isConstant() ? Int32Key(ToInt32(index)) : Int32Key(ToRegister(index));

This pattern appears a few times; should we have a ToInt32Key()?

@@ +980,5 @@
> +
> +    // If index == initializedLength, try to bump the initialized length inline.
> +    // If index > initializedLength, call a stub.
> +    Label callStub;
> +    masm.j(Assembler::NotEqual, &callStub);

Comment here that it relies on the condition flags sticking from the incoming branch.

@@ +1004,5 @@
> +    masm.bumpKey(&key, -1);
> +
> +    // The inline path for StoreElementHoleT does not always store the type tag,
> +    // so we have to make sure the new element has a valid type tag.
> +    if (ins->isStoreElementHoleT() && valueType != MIRType_Double)

This seems okay for x86/arm, but on x64 we probably just want to always box the first set, and box any further pointer sets.

@@ +1021,5 @@
> +        pushArg(*index->toConstant());
> +    else
> +        pushArg(TypedOrValueRegister(MIRType_Int32, ToAnyRegister(index)));
> +    pushArg(object);
> +    if (!callVM(Info, ins))

There's no risk about |elements| changing because it is necessarily dead after this instruction, right? (because of the alias set)

::: js/src/ion/Lowering.cpp
@@ +958,5 @@
> +      case MIRType_Double:
> +        lir = new LStoreElementHoleT(useRegister(ins->object()),
> +                                     useRegister(ins->elements()),
> +                                     useRegisterOrConstant(ins->index()),
> +                                     useRegister(ins->value()));

I wonder if we should have useRegisterOrNonDoubleConstant() or something to avoid this duplication.

::: js/src/ion/arm/MacroAssembler-arm.h
@@ +534,5 @@
> +    void branch32(Condition cond, const Address &address, Register reg, Label *label) {
> +        move32(address, ScratchRegister);
> +        branch32(cond, ScratchRegister, reg, label);
> +    }
> +    void branch32(Condition cond, const Address &address, Imm32 imm, Label *label) {

super nit: name parameters for these two lhs/rhs

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +450,5 @@
>            default:
>              JS_NOT_REACHED("unexpected operand kind");
>          }
>      }
> +    void cmpl(const Operand &op, const Register &reg) {

super nit: name these lhs and rhs

::: js/src/ion/x86/MacroAssembler-x86.h
@@ +179,5 @@
>      void pushValue(const Value &val) {
>          jsval_layout jv = JSVAL_TO_IMPL(val);
>          push(Imm32(jv.s.tag));
> +        if (val.isGCThing())
> +            push(ImmGCPtr(reinterpret_cast<gc::Cell *>(val.toGCThing())));

Good catch.
Attachment #592257 - Flags: review?(dvander) → review+
Depends on: 721280
http://hg.mozilla.org/projects/ionmonkey/rev/550a780f73ae

(In reply to David Anderson [:dvander] from comment #14)
> 
> This seems okay for x86/arm, but on x64 we probably just want to always box
> the first set, and box any further pointer sets.

I changed the OOL path to just call storeElementTyped and rejoin to the end of the inline path, so that we no longer need initElementType.
 
> 
> There's no risk about |elements| changing because it is necessarily dead
> after this instruction, right? (because of the alias set)

Right, the AliasSet::ObjectFields flag prevents this.

> 
> I wonder if we should have useRegisterOrNonDoubleConstant() or something to
> avoid this duplication.

Added this function and removed the duplicate useRegister(...) calls by moving them to the top of the function.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.