Open
Bug 771864
Opened 12 years ago
Updated 2 years ago
IonMonkey: split out effective address computations in x[v + i] accesses
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
NEW
People
(Reporter: bhackett1024, Unassigned)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [ion:t])
Attachments
(2 obsolete files)
Right now, to perform the actual memory access in an x[v + i] computation within a loop (x and v loop invariant), IM (and JM) need to generate code like: // $rx holds 'x->elements', loop invariant // $rv holds 'v', loop invariant add $rv $ri mov ($rx,$ri,2) $res It would be faster to generate code like: // $rv holds 'x->elements + (v*4)', loop invariant mov ($rx,$ri,2) $res Making this transformation has several benefits: - One less register needed for loop invariant things - One less register needed for (v + i) - No add instruction The attached patch is a WIP (works on x86 on one testcase) to make this transform, by allowing the generation of EffectiveAddress nodes that are internal pointers into a typed array. (the patch also includes the fix for bug 771835). On this testcase: M = new Int32Array(1024 * 1024); function foo(v, len) { var res = 0; for (var i = 0; i < len; i++) res += M[v + i]; } for (var i = 0; i < 1000; i++) foo(1024, 1024 * 100); I get these times: --ion -n (base): 260 --ion -n (with just this): 260 --ion -n (with bug 771835): 224 --ion -n (with both): 187 -m -n: 222 So this doesn't have an effect when we can't hoist the bounds check (expected, since the bounds check still needs to do the v + i), but a good one when the check can be hoisted. JM+TI doesn't have either this or bug 771835 applied, but is still faster than baseline IM (and as fast as with bug 771835 applied). I didn't look into this in detail, but looking at the code now being generated by IM there are inefficiencies. The inner loop (interrupt check elided) with both bugs applied looks like: cmpl %ecx, %esi jge ((319)) movl 0(%ebx,%esi,2), %edx movl %eax, %ebp addl %edx, %ebp jo ((332)) movl %esi, %edx addl $0x1, %edx jo ((343)) movl %edx, %esi movl %ebp, %eax jmp ((352)) The two moves before the adds look like work needed to keep around both inputs in case an add overflows. JM doesn't do this --- if the add overflows it undoes the add by subtracting the intact operand. I don't know what's up with the two moves at the end, maybe they're related or maybe the regalloc is just being weird. There's little register pressure here and it should be possible to just generate this body: cmpl %ecx, %esi jge ((319)) movl 0(%ebx,%esi,2), %edx addl %edx, %ebp jo ((332)) addl $0x1, %esi jo ((343)) jmp ((352)) JM can hoist the second jo, can port that to IM (it's an extension of bounds check hoisting). I think there's some instruction fusion potential here, the mov + add could potentially be replaced with a single addl 0(%ebx,%esi,2), %ebp After that I think this would be an optimal straight line compilation (the first jo can't be removed but emscripten can insert a |0 to kill it). Then there's unrolling......
Reporter | ||
Comment 1•12 years ago
|
||
Assignee: general → bhackett1024
Attachment #640015 -
Attachment is obsolete: true
Attachment #640356 -
Flags: review?(dvander)
Comment on attachment 640356 [details] [diff] [review] patch (dc54ec097421) Review of attachment 640356 [details] [diff] [review]: ----------------------------------------------------------------- It looks like we overlay the effective addresses with the elements field expected on the L*TypedArray ops. That seems okay, but there should be a comment in LIR-Common.h for those ops that the field may either be an effective address or an elements base pointer. (And perhaps, asserts elsewhere that were removed should just be extended to include the opcode.) There are a lot of changes to MIR instructions in this patch that add an extra operand for a bounds check. Is that part of this patch? If not, it should be in a separate bug. Otherwise, could you explain what that's for? ::: js/src/ion/IonBuilder.cpp @@ +4474,5 @@ > +IonBuilder::getTypedArrayEffectiveAddress(MInstruction *elements, MDefinition *id, int arrayType, > + MInstruction **pNewElements, MDefinition **pNewId, int32 *pOffset) > +{ > + /* XXX no lea on ARM? */ > +#ifndef JS_CPU_ARM This should have ARM parity. I think you're right that ARM doesn't have a direct LEA equivalent, but it has a scratch register so we should be able to implement a computeAddress(). @@ +4489,5 @@ > + */ > + > + size_t scale = TypedArray::slotWidth(arrayType); > + > + do { Can we find a way to refactor this or split it into separate functions so the |do-while (0)| pattern isn't needed? ::: js/src/ion/LICM.h @@ +133,5 @@ > // check are not loop invariant, we analyze the tests and increments done > // in the loop to try to find a stronger condition which can be hoisted. > > + LoopReturn tryHoistBoundsCheck(MBoundsCheck *ins, MTest *test, BranchDirection direction, > + InstructionQueue *pHoistedChecks); Is this part of this patch? ::: js/src/ion/LIR-Common.h @@ +2219,5 @@ > > // Load a typed value from a typed array's elements vector. > class LLoadTypedArrayElement : public LInstructionHelper<1, 2, 1> > { > + int32 offset_; Instead of adding this field, add a mir() helper that returns mir_->toStoreBlah and then use mir()->offset(). @@ +2274,5 @@ > }; > > class LStoreTypedArrayElement : public LInstructionHelper<0, 3, 0> > { > + int32 offset_; Ditto. ::: js/src/ion/Lowering.cpp @@ +725,5 @@ > if (ins->specialization() == MIRType_Int32) { > JS_ASSERT(lhs->type() == MIRType_Int32); > + > + if (!ins->hasUses()) > + return true; What is this for? ::: js/src/ion/MIR.cpp @@ +989,5 @@ > +MBinaryArithInstruction::inferInteger() > +{ > + specialization_ = MIRType_Int32; > + setResultType(MIRType_Int32); > +} Not part of this patch, right? ::: js/src/ion/MIR.h @@ +3168,5 @@ > + } > + > + bool congruentTo(MDefinition *const &ins) const { > + if (ins->isEffectiveAddress()) { > + MEffectiveAddress *nins = ins->toEffectiveAddress(); nit: Instead, make this early return when (ins->op() != op()) ::: js/src/ion/shared/Assembler-shared.h @@ +173,5 @@ > BaseIndex() { PodZero(this); } > + > + // XXX is this correct for all platforms? Also, is this correct for any platform? > + static int32 minOffset() { return INT16_MIN; } > + static int32 maxOffset() { return INT16_MAX; } What is this restriction? ::: js/src/ion/x64/MacroAssembler-x64.h @@ +481,5 @@ > > + void loadEffectiveAddress(const Address &src, Register dest) { > + lea(Operand(src), dest); > + } > + void loadEffectiveAddress(const BaseIndex &src, Register dest) { nit: rename to "computeAddress", here and on other platforms
Attachment #640356 -
Flags: review?(dvander)
Reporter | ||
Updated•12 years ago
|
Attachment #640356 -
Attachment is obsolete: true
Reporter | ||
Updated•12 years ago
|
Assignee: bhackett1024 → general
Updated•12 years ago
|
Whiteboard: [ion:t]
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•