Closed Bug 746691 Opened 12 years ago Closed 12 years ago

IonMonkey: Don't throw code away on every GC

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
blocking-kilimanjaro -

People

(Reporter: billm, Assigned: sstangl)

References

Details

Attachments

(2 files, 5 obsolete files)

Right now generated code gets discarded on every GC. This causes jitter in animations. Instead, we should detect which GCs are happening during animations and not throw code away. There are a few pieces to this:

1. We need a simple heuristic to decide when to throw away code. A really simple one to start with is the GC reason. If it's gcreason::ALLOC_TRIGGER or gcreason::MAYBEGC, then we'll keep the code. Otherwise we'll throw it away. Eventually we can do something fancier, but not in this bug.

2. We need to not throw the code away. Hopefully this is a simple matter of commenting out a few lines.

3. IonMonkey needs to reset all ICs whenever a GC happens. This is because shapes can get moved around or reallocated.

4. In order to avoid throwing away TI data, we need to simulate what happens when the activeAnalysis flag is set. This means calling JSCompartment::markTypes and not doing the TI sweeping in JSCompartment::sweep.
blocking-kilimanjaro: --- → ?
Target Milestone: --- → mozilla14
Version: unspecified → Trunk
Dave, this bug is IonMonkey-specific - so target milestone should probably be later.
(In reply to David Anderson [:dvander] from comment #1)
> Dave, this bug is IonMonkey-specific - so target milestone should probably
> be later.

Good point. Not sure what I was thinking there.
Target Milestone: mozilla14 → ---
IM isn't in scope for initial k9o, so neither is this.
blocking-kilimanjaro: ? → -
Assignee: general → sstangl
Assignee: sstangl → general
Nabbing for real this time. Marty OK'd.
Assignee: general → sstangl
Attached patch Preserve Ion code across GC, v1. (obsolete) — Splinter Review
Work for heuristics and TI preservation was already done as part of Bug 750834. Consequently, the patch for Ion is relatively simple: it only needs to clear Ion caches and hit write barriers when compartment->needsBarrier_ is set.

JM+TI maintains different versions of code, either including or excluding write barriers, based on GC state at time of compilation. v8 patches write barrier jumps around GC. Before implementing either of those more complicated solutions, I wrote a small amount of code that just does a dynamic test in IonMacroAssembler::emitPreBarrier() against needsBarrier_, and branches away from the write barrier code if it's unnecessary. I anticipated measuring moderate slowdown, which would then be used to justify the complexity of a patching solution.

Surprisingly, there was extremely little slowdown. Sunspider and Kraken were entirely unaffected; v8 got slightly slower overall, but only by about 20ms out of a 1.3s testsuite. Measurements were done with just the extra dynamic tests in place; the rest of the patch that actually preserves Ion code across GC was not included.

That appears to be great news for complexity and memory. Having no justification to adopt one of the more complicated solutions, I've included the (not so handsome; it uses the stack to get a temporary) dynamic-checking code as part of this patch.

Passes all tests, even with ShouldPreserveJITCode() modified to always return true, except for some Debugger+GC-related failures that also occur when the same change is made on m-c tip. So no new failures, at least.
Attachment #624593 - Flags: review?(dvander)
Dvander suggested to test the patch with the v8 harness, and indeed, it tells a different story than the SS harness.

[sstangl@winhill benchmarks]$ ~/dev/ionmonkey/js/src/opt32/js --no-jm run.js

>master:

Richards: 9595
DeltaBlue: 7663
Crypto: 13354
RayTrace: 880
EarleyBoyer: 710
RegExp: 532
Splay: 9844
NavierStokes: 476
----
Score (version 7): 2501

>with run-time barriers:

Richards: 9009
DeltaBlue: 6909
Crypto: 13347
RayTrace: 885
EarleyBoyer: 703
RegExp: 535
Splay: 9787
NavierStokes: 483
----
Score (version 7): 2452

>with full patch:

Richards: 9066
DeltaBlue: 6896
Crypto: 13297
RayTrace: 858
EarleyBoyer: 691
RegExp: 535
Splay: 10007
NavierStokes: 484
----
Score (version 7): 2446
It is possible and/or extremely likely that the major cause of the regression is instead from disabling array.pop() inlining (Bug 739572) -- that depended on !needsBarrier(), and I had forgotten to revisit it with a run-time check. Looking into that.
Attachment #624593 - Flags: review?(dvander)
Lacking ARM support. Emits patchable write barriers that can be toggled between (e.g.) |JMP eip+0x5| and |CMP eax, $0x5| by changing a single instruction byte. The number of such patchable locations per-script is extremely low (max 7 on V8, average ~= 0), so they are remembered in an uncompressed buffer of CodeOffsetLabels.

> v8 before patch:

Richards: 9224
DeltaBlue: 7742
Crypto: 13155
RayTrace: 988
EarleyBoyer: 2275
RegExp: 539
Splay: 9820
NavierStokes: 502
----
Score (version 7): 2942

> v8 after patch:

Richards: 9210
DeltaBlue: 7656
Crypto: 13502
RayTrace: 1026
EarleyBoyer: 2348
RegExp: 542
Splay: 9917
NavierStokes: 494
----
Score (version 7): 2972
Same patch, but with dynamic checks against needsBarrier_ instead of patching:

Richards: 8744
DeltaBlue: 7656
Crypto: 13197
RayTrace: 960
EarleyBoyer: 2294
RegExp: 535
Splay: 10211
NavierStokes: 486
----
Score (version 7): 2912
Fixes failures under the gc/ folder that exist on tip. Works on x64.

Just needs ARM support.
Attachment #624593 - Attachment is obsolete: true
Attachment #626622 - Attachment is obsolete: true
This needs to be applied on top of Sean's preexisting patches.
Attachment #627953 - Flags: review?(Jacob.Bramley)
Attached patch Preserve Ion code across GC. (obsolete) — Splinter Review
Doesn't include Marty's ARM patch.
Attachment #626985 - Attachment is obsolete: true
Attachment #628172 - Flags: review?(dvander)
Comment on attachment 628172 [details] [diff] [review]
Preserve Ion code across GC.

Review of attachment 628172 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonCaches.cpp
@@ +330,5 @@
>  
> +void
> +IonCache::reset()
> +{
> +    PatchJump(initialJump_, cacheLabel_);

\o/ so much simpler than JM ICs!

::: js/src/ion/IonMacroAssembler.h
@@ +333,4 @@
>              branch32(cond, length, Imm32(key.constant()), label);
>      }
>  
> +    void branchTestNeedsBarrierFlag(Condition cond, const Register &scratch, Label *label) {

nit: This function might read better without the "Flag"

@@ +337,5 @@
> +        JS_ASSERT(cond == Zero || cond == NonZero);
> +        JSCompartment *comp = GetIonContext()->cx->compartment;
> +        movePtr(ImmWord(comp), scratch);
> +        Address needsBarrierAddr(scratch, JSCompartment::OffsetOfNeedsBarrier());
> +        branchTest32(cond, needsBarrierAddr, Imm32(0x1), label);

needsBarrier_ is a bool, not a JSBool... could that be a problem?

@@ +343,3 @@
>  
> +    template <typename T>
> +    void callPreBarrier(const T &address, MIRType type) {

This should probably be private now? Just to make sure no one is using it.

@@ +378,5 @@
> +        // TODO: align() outputs a bunch of HLT instructions that must be jumped over.
> +        // We could have it output a bunch of NOPs instead.
> +        jump(&done);
> +
> +        // TODO: IonCode aligns instructions to 8, not 16.

nit: remove :TODO:s

::: js/src/ion/shared/Assembler-x86-shared.h
@@ +454,4 @@
>          }
>          return j;
>      }
> +    // Comparison of EAX against the address given by a Label.

nit: Space above comment, and above public:

::: js/src/ion/shared/CodeGenerator-shared.h
@@ +172,5 @@
>          return index;
>      }
>  
> +    void addPreBarrierOffset(CodeOffsetLabel offset) {
> +        barrierOffsets_.append(offset);

This should be:

   enoughMemory &= barrierOffsets_.append(offset);

::: js/src/jscompartment.cpp
@@ +106,5 @@
>          mjit::ClearAllFrames(this);
> +
> +# ifdef JS_ION
> +        /* Toggle write barrier activation within Ion code. */
> +        for (CellIterUnderGC i(this, FINALIZE_SCRIPT); !i.done(); i.next()) {

Would be better to break this out into an ion::ToggleBarriers function.
Attachment #628172 - Flags: review?(dvander) → review+
Comment on attachment 627953 [details] [diff] [review]
Completely untested ARM port of x86 patch

Review of attachment 627953 [details] [diff] [review]:
-----------------------------------------------------------------

I'm giving r-, but only because some changes I've asked for should get reviewed again. (Specifically, those that require bit patterns and suchlike to be represented. It's easy to make mistakes in such code, and sometimes hard to spot them.)

I don't know how it all links together with the non-ARM patch, so I've only really reviewed the instruction-fiddling part. That is, I've checked that the functions do what they say they do, but I haven't checked that their usage is correct since I assume that was covered in the review for the non-ARM patch.

> Completely untested ARM port of x86 patch

Of course, it needs to be tested :-)

::: js/src/ion/CodeGenerator.cpp
@@ +2135,5 @@
>          script->ion->copySafepoints(&safepoints_);
>  
>      linkAbsoluteLabels();
> +
> +#ifdef JS_CPU_ARM

I'm not really sure about how this bit works, or why it is only needed for ARM. Could you add a comment here to explain?

@@ +2137,5 @@
>      linkAbsoluteLabels();
> +
> +#ifdef JS_CPU_ARM
> +        bool needsBarrier = cx->compartment->needsBarrier();
> +        script->ion->toggleBarriers(needsBarrier);

The indentation is wrong. (We don't usually indent code inside #ifdef blocks, unless I'm mistaken.)

::: js/src/ion/Ion.cpp
@@ -570,5 @@
>  void
>  IonScript::copyPrebarrierEntries(const CodeOffsetLabel *barriers, MacroAssembler &masm)
>  {
>      memcpy(prebarrierList(), barriers, numPrebarriers() * sizeof(CodeOffsetLabel));
> -

I preferred it with the blank line.

@@ +574,1 @@
>      // On ARM, the saved offset may be wrong due to shuffling code buffers. Correct it.

Does this work on other platforms? Does it need #ifdef JS_CPU_ARM?

::: js/src/ion/arm/Assembler-arm.h
@@ +1676,5 @@
>      }
>      static uint8 *nextInstruction(uint8 *instruction, uint32 *count = NULL);
> +    // Toggle a jmp or cmp emitted by toggledJump().
> +    static void ToggleToJmp(CodeLocationLabel inst) {
> +        uint32 *ptr = (uint32 *)inst.raw();

You need to assert that the instrution is a CMP.

@@ +1683,5 @@
> +        *ptr = (*ptr & ~(0xff << 20)) | (0xa0 << 20);
> +        JSC::ExecutableAllocator::cacheFlush(ptr, sizeof(uintptr_t));
> +    }
> +    static void ToggleToCmp(CodeLocationLabel inst) {
> +        uint32 *ptr = (uint32 *)inst.raw();

You need to assert that the instruction is a B.

@@ +1685,5 @@
> +    }
> +    static void ToggleToCmp(CodeLocationLabel inst) {
> +        uint32 *ptr = (uint32 *)inst.raw();
> +        // Ensure that this masking operation doesn't affect the
> +        // offset of th ebranch instruction when it gets toggled back.

s/th e/the /

@@ +1689,5 @@
> +        // offset of th ebranch instruction when it gets toggled back.
> +        JS_ASSERT((*ptr & (0xf << 20)) == 0);
> +        // Zero out bits 20-27, then set them to be correct for a compare.
> +        // Theoretically, bits 12-15 should also be zeroed out, but
> +        // that restricts us a bit more than I am comfortable.

You need to zero them out. That still leaves you with 16 bits of offset (from the Rn and imm12 fields), and you previously said that 12 bits would be enough. (Using all 16 bits is a fiddle though, as you'd need to split and shift the offset into two fields.)

@@ +1691,5 @@
> +        // Zero out bits 20-27, then set them to be correct for a compare.
> +        // Theoretically, bits 12-15 should also be zeroed out, but
> +        // that restricts us a bit more than I am comfortable.
> +        *ptr = (*ptr & ~(0xff << 20)) | (0x35 << 20);
> +        JSC::ExecutableAllocator::cacheFlush(ptr, sizeof(uintptr_t));

Use sizeof(*ptr), or at least use uint32, to match the type of ptr.

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +2827,5 @@
>      bind(&fin);
>  }
> +
> +CodeOffsetLabel
> +MacroAssemblerARMCompat::toggledJump(bool enabled, Label *label)

I think "toggleableJump" is a better description of what this does.

@@ +2829,5 @@
> +
> +CodeOffsetLabel
> +MacroAssemblerARMCompat::toggledJump(bool enabled, Label *label)
> +{
> +    /* the |enabled| bit is specifically ignored, the jmp is patched to a cmp later*/

Do you need something like an "UNUSED(enabled)" statement here? Otherwise, you will get a compiler warning.
Attachment #627953 - Flags: review?(Jacob.Bramley) → review-
Attachment #628172 - Attachment is obsolete: true
(In reply to Jacob Bramley [:jbramley] from comment #14)
> Comment on attachment 627953 [details] [diff] [review]
> Completely untested ARM port of x86 patch
> 
It has since been tested.
> @@ +574,1 @@
> >      // On ARM, the saved offset may be wrong due to shuffling code buffers. Correct it.
> 
> Does this work on other platforms? Does it need #ifdef JS_CPU_ARM?
Yes, it does.  fixup is a nop on non-arm platforms, to cut down on the number of #idfef's.

> @@ +1685,5 @@
> > +    }
> > +    static void ToggleToCmp(CodeLocationLabel inst) {
> > +        uint32 *ptr = (uint32 *)inst.raw();
> > +        // Ensure that this masking operation doesn't affect the
> > +        // offset of th ebranch instruction when it gets toggled back.
> 
> s/th e/the /
> 
> @@ +1689,5 @@
> > +        // offset of th ebranch instruction when it gets toggled back.
> > +        JS_ASSERT((*ptr & (0xf << 20)) == 0);
> > +        // Zero out bits 20-27, then set them to be correct for a compare.
> > +        // Theoretically, bits 12-15 should also be zeroed out, but
> > +        // that restricts us a bit more than I am comfortable.
> 
> You need to zero them out. That still leaves you with 16 bits of offset
> (from the Rn and imm12 fields), and you previously said that 12 bits would
> be enough. (Using all 16 bits is a fiddle though, as you'd need to split and
> shift the offset into two fields.)
> 
> @@ +1691,5 @@
> > +        // Zero out bits 20-27, then set them to be correct for a compare.
> > +        // Theoretically, bits 12-15 should also be zeroed out, but
> > +        // that restricts us a bit more than I am comfortable.
> > +        *ptr = (*ptr & ~(0xff << 20)) | (0x35 << 20);
> > +        JSC::ExecutableAllocator::cacheFlush(ptr, sizeof(uintptr_t));
> 
> Use sizeof(*ptr), or at least use uint32, to match the type of ptr.
> 
I've changed it to sizeof(Instruction).
> ::: js/src/ion/arm/MacroAssembler-arm.cpp
> @@ +2827,5 @@
> >      bind(&fin);
> >  }
> > +
> > +CodeOffsetLabel
> > +MacroAssemblerARMCompat::toggledJump(bool enabled, Label *label)
> 
> I think "toggleableJump" is a better description of what this does.
> 
unfortunately, that is what it is called elsewhere.
> @@ +2829,5 @@
> > +
> > +CodeOffsetLabel
> > +MacroAssemblerARMCompat::toggledJump(bool enabled, Label *label)
> > +{
> > +    /* the |enabled| bit is specifically ignored, the jmp is patched to a cmp later*/
> 
> Do you need something like an "UNUSED(enabled)" statement here? Otherwise,
> you will get a compiler warning.
Supposedly we don't have those turned on?
tested.  It seems to be toggling the bits of the branch/compare correctly.
Attachment #627953 - Attachment is obsolete: true
Attachment #629405 - Flags: review?(sstangl)
Comment on attachment 629405 [details] [diff] [review]
/home/mrosenberg/patches/ArmGCKeepJIT-r1.patch

Review of attachment 629405 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/arm/Assembler-arm.cpp
@@ +131,5 @@
>  
> +Register
> +js::ion::toRN(Instruction &i)
> +{
> +    return Register::FromCode((i.encode()>>16) & 0xf);

Could 16 and 0xf be given names?

@@ +2260,5 @@
> +void
> +Assembler::ToggleToJmp(CodeLocationLabel inst_)
> +{
> +    uint32 *ptr = (uint32 *)inst_.raw();
> +    Instruction *inst = (Instruction*)inst_.raw();

DebugOnly<Instruction *>

@@ +2264,5 @@
> +    Instruction *inst = (Instruction*)inst_.raw();
> +    JS_ASSERT(inst->is<InstCMP>());
> +    // Mask out bits 20-27, then set 24-27 to be correct for a b
> +    // 20-23 will be part of the B's immediate, and should be 0.
> +    *ptr = (*ptr & ~(0xff << 20)) | (0xa0 << 20);

Same comments as below.

@@ +2267,5 @@
> +    // 20-23 will be part of the B's immediate, and should be 0.
> +    *ptr = (*ptr & ~(0xff << 20)) | (0xa0 << 20);
> +    JSC::ExecutableAllocator::cacheFlush(ptr, sizeof(Instruction));
> +}
> +void

Vertical whitespace

@@ +2271,5 @@
> +void
> +Assembler::ToggleToCmp(CodeLocationLabel inst_)
> +{
> +    uint32 *ptr = (uint32 *)inst_.raw();
> +    Instruction *inst = (Instruction*)inst_.raw();

DebugOnly<Instruction *>

@@ +2274,5 @@
> +    uint32 *ptr = (uint32 *)inst_.raw();
> +    Instruction *inst = (Instruction*)inst_.raw();
> +    JS_ASSERT(inst->is<InstBImm>());
> +    // Ensure that this masking operation doesn't affect the
> +    // offset of th ebranch instruction when it gets toggled back.

nit: the

@@ +2282,5 @@
> +    // ALU instructions are all unset (looks like it is encoding r0).
> +    JS_ASSERT(toRD(*inst) == r0);
> +
> +    // Zero out bits 20-27, then set them to be correct for a compare.
> +    *ptr = (*ptr & ~(0xff << 20)) | (0x35 << 20);

This clears more bits than just 20-27. I guess it's safe to clear the upper bits also? It might be clearer to express intent with an explicit bitmask, and a comment update.

@@ +2283,5 @@
> +    JS_ASSERT(toRD(*inst) == r0);
> +
> +    // Zero out bits 20-27, then set them to be correct for a compare.
> +    *ptr = (*ptr & ~(0xff << 20)) | (0x35 << 20);
> +    JSC::ExecutableAllocator::cacheFlush(ptr, sizeof(Instruction));

Really deserves a comment about expensiveness.

::: js/src/ion/arm/Assembler-arm.h
@@ +1898,5 @@
> +{
> +    static const int32 ALUMask = 0xc << 24;
> +  public:
> +    InstALU (Register rd, Register rn, Operand2 op2, ALUOp op, SetCond_ sc, Assembler::Condition c)
> +        : Instruction(RD(rd) | RN(rn) | op2.encode() | op | sc | c)

nit: should be 2 spaces of indentation before :

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +2829,5 @@
> +
> +CodeOffsetLabel
> +MacroAssemblerARMCompat::toggledJump(bool enabled, Label *label)
> +{
> +    /* the |enabled| bit is specifically ignored, the jmp is patched to a cmp later*/

nit: Comment style in Ion code tends to use // (with capitalized first letter and ending punctuation).

::: js/src/ion/arm/MacroAssembler-arm.h
@@ +678,5 @@
>      void branchPtr(Condition cond, Register lhs, ImmWord imm, Label *label) {
>          branch32(cond, lhs, Imm32(imm.value), label);
>      }
>      void moveValue(const Value &val, Register type, Register data);
> +    CodeOffsetLabel toggledJump(bool enabled, Label *label);

nit: vertical whitespace between moveValue() and toggledJump().
Attachment #629405 - Flags: review?(sstangl) → review+
http://hg.mozilla.org/projects/ionmonkey/rev/6afe1f9f551d
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: