Last Comment Bug 695075 - IonMonkey: Teach GC to walk Ion frames
: IonMonkey: Teach GC to walk Ion frames
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: David Anderson [:dvander]
:
Mentors:
Depends on: 695887 695897 710983 711763
Blocks: 691598
  Show dependency treegraph
 
Reported: 2011-10-17 11:39 PDT by David Anderson [:dvander]
Modified: 2011-12-27 17:42 PST (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
wip v0 (34.79 KB, patch)
2011-10-18 17:16 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
part 1: skip conservatively scanning Ion regions (5.22 KB, patch)
2011-12-01 19:57 PST, David Anderson [:dvander]
wmccloskey: review+
Details | Diff | Splinter Review
test case (316 bytes, application/javascript)
2011-12-01 19:58 PST, David Anderson [:dvander]
no flags Details
part 2: walk ion frames and mark callee tokens (6.34 KB, patch)
2011-12-05 17:35 PST, David Anderson [:dvander]
wmccloskey: review+
Details | Diff | Splinter Review
part 3: fix RA policies (26.26 KB, patch)
2011-12-05 17:39 PST, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review
part 4: spill Values contiguously during LSRA (13.24 KB, patch)
2011-12-12 14:29 PST, David Anderson [:dvander]
jdemooij: review+
Details | Diff | Splinter Review
part 5: spill values contiguously during GRA (14.89 KB, patch)
2011-12-12 16:36 PST, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review
part 6: share much more lowering code (30.82 KB, patch)
2011-12-13 18:12 PST, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review
part 7: add safepoint building, encoding, and decoding (59.89 KB, patch)
2011-12-16 01:33 PST, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review
part 9: gc integration (4.99 KB, patch)
2011-12-16 01:43 PST, David Anderson [:dvander]
wmccloskey: review+
Details | Diff | Splinter Review
part 8: safepoint integration for LSRA (16.38 KB, patch)
2011-12-19 13:02 PST, David Anderson [:dvander]
jdemooij: review+
Details | Diff | Splinter Review
part 10: greedy allocator safepoints (9.20 KB, patch)
2011-12-21 18:57 PST, David Anderson [:dvander]
sstangl: review+
Details | Diff | Splinter Review

Description David Anderson [:dvander] 2011-10-17 11:39:00 PDT
The garbage collector needs a low-level interface to trace (and later, potentially, relocate) pointers in Ion frames. Additionally we want the conservative scanner to skip over Ion frames. This is the easiest of the VM/Ion frame interaction bugs so it seems like a good place to start.
Comment 1 David Anderson [:dvander] 2011-10-17 19:03:24 PDT
This bug introduces three problems:
 (1) Need to iterate frames from within ion::ThunkToInterpreter. I.e. it needs a C++ frame.
 (2) We need a way to find snapshot offsets of nested calls.
 (3) We need to teach the GC how to actually iterate and inspect these frames.

IonFrame is beginning to get overloaded, so I would like to split it up into the actual different kinds of frames we have. This should afford us some flexibility and clarity.

Currently we have (or want) 4 frame types:
 * Entry frames (the actual ion::Cannon activation)
 * JS frames (JS -> JS function calls)
 * Rectifier frames (argc != nargs mismatch handling)
 * C++ frames (leaving JIT code into C++)

Because we don't have a frame pointer, the caller is responsible for communicating its frame size to the callee, so all frames have some sort of descriptor word encoding the caller size and the callee type.

Entry frames become just two words:
-1  descriptor
 0  returnAddress

JS and Rectifier frames become 3 words (4, once we support scope chains):
-2  calleeToken
-1  descriptor
 0  returnAddress

bug 680315 begins the framework for adding calls to C++, but the frame layout will need some changes to support giving ion::ThunkToInterpreter a proper exit frame. It currently looks like:
 ... incoming args ...
 -3 callerSnapshotOffset
 -2 frameSize
 -1 returnAddress
  0 topFrame
 ... outgoing args ...

We can combine frameSize and topFrame back into the descriptor word, giving:
 ... incoming args ...
 -2 callerSnapshotOffset
 -1 descriptor
  0 returnAddress
 ... outgoing args ...

But callerSnapshotOffset ends up being problematic if we're not inside a CodeGenerator, which is the case for trampoline generation (and will be for ICs as well). A few options:
 (1) Create two types of C++ exit frames
 (2) Remove callerSnapshotOffset entirely, since either way we will need a pc->snapshotOffset lookup table in IonScripts.
 (3) Pack callerSnapshotOffset into the descriptor (we have bits to spare),
     make it optional. If it's there, snapshot lookup will be very fast.

Ideally we'll end up with an ExitFrame looking like:
 ... incoming args ...
 -2 descriptor
 -1 returnAddress
  0 code             ; needed to root the IonCode being called
 ... outgoing args ...

With ThreadData::ionTop_ or something pointing to the base of the exit frame.
Comment 2 Nicolas B. Pierron [:nbp] 2011-10-18 03:23:46 PDT
(In reply to David Anderson [:dvander] from comment #1)
> This bug introduces three problems:
>  (1) Need to iterate frames from within ion::ThunkToInterpreter. I.e. it
> needs a C++ frame.
>  (2) We need a way to find snapshot offsets of nested calls.
>  (3) We need to teach the GC how to actually iterate and inspect these
> frames.
> 
> IonFrame is beginning to get overloaded, so I would like to split it up into
> the actual different kinds of frames we have. This should afford us some
> flexibility and clarity.
> 
> Currently we have (or want) 4 frame types:
>  * Entry frames (the actual ion::Cannon activation)
>  * JS frames (JS -> JS function calls)
>  * Rectifier frames (argc != nargs mismatch handling)
>  * C++ frames (leaving JIT code into C++)
> 
> Because we don't have a frame pointer, the caller is responsible for
> communicating its frame size to the callee, so all frames have some sort of
> descriptor word encoding the caller size and the callee type.

Currently we don't use the returnAddress and only use the descriptor to store few bytes of data.  We can use the descriptor space to store the relative offset (relative to the return address) of the information to statically encoded information inside the IonCode code_.  Thus we can recover descriptor information on a slow path (pointer accesses) while keeping the stack creation as a fast path (one push).

Some data such as the calleeToken would be identical for each call and thus could be shared instead of being pushed each time on the fast path.  Specialized data can be recovered by using the returnAddress to map to the info. (such as frameSize/frameType if not inside the descriptor, snapshotOffset)

> Entry frames become just two words:
> -1  descriptor
>  0  returnAddress
> 
> JS and Rectifier frames become 3 words (4, once we support scope chains):
> -2  calleeToken
> -1  descriptor
>  0  returnAddress
> 
> bug 680315 begins the framework for adding calls to C++, but the frame
> layout will need some changes to support giving ion::ThunkToInterpreter a
> proper exit frame. It currently looks like:
>  ... incoming args ...
>  -3 callerSnapshotOffset
>  -2 frameSize
>  -1 returnAddress
>   0 topFrame
>  ... outgoing args ...
> 
> We can combine frameSize and topFrame back into the descriptor word, giving:
>  ... incoming args ...
>  -2 callerSnapshotOffset
>  -1 descriptor
>   0 returnAddress
>  ... outgoing args ...

To be precise, the descriptor should at least contains:
  framesize + nargs * STACK_SLOT_SIZE

then the descriptor value should be added to the frame address (stack address) to obtain the next frame.

> But callerSnapshotOffset ends up being problematic if we're not inside a
> CodeGenerator, which is the case for trampoline generation (and will be for
> ICs as well). A few options:
>  (1) Create two types of C++ exit frames
>  (2) Remove callerSnapshotOffset entirely, since either way we will need a
> pc->snapshotOffset lookup table in IonScripts.

You mean returnAddr => snapshotOffset ?  I doubt we can use the bytecode pc to map to a snapshotOffset because It seems that we could have multiple snapshots for one bytecode instruction. (ByteCode ->* MIR ->* LIR -> CodeGenerator)

>  (3) Pack callerSnapshotOffset into the descriptor (we have bits to spare),
>      make it optional. If it's there, snapshot lookup will be very fast.

Whatever we choose we need to be able to recover the code_ & size & frameSize from all frames.  This is necessary to implement walking and avoid the hell of methodJIT bug reports which are unhelpful.

> Ideally we'll end up with an ExitFrame looking like:
>  ... incoming args ...
>  -2 descriptor
>  -1 returnAddress
>   0 code             ; needed to root the IonCode being called

What do you mean?

>  ... outgoing args ...
> 
> With ThreadData::ionTop_ or something pointing to the base of the exit frame.

Currently C calls set the IonCompartment::topCFrame_ which can be recover by the JSContext pointer or from the Thread Local Storage (for the ion compartment).  May be we should only store a GenericIonFrame which contains a descriptor and a returnAddress and use this base for all our frames.
Comment 3 David Anderson [:dvander] 2011-10-18 17:16:21 PDT
Created attachment 567937 [details] [diff] [review]
wip v0

Initial sketch of frame, bailout refactoring
Comment 4 Nicolas B. Pierron [:nbp] 2011-10-19 04:11:00 PDT
Comment on attachment 567937 [details] [diff] [review]
wip v0

Review of attachment 567937 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonFrames.h
@@ +87,5 @@
> +    IonFrame_JS,
> +    IonFrame_Entry,
> +    IonFrame_Rectifier
> +};
> +static const uint32 FRAMETYPE_BITS = 3;

Why using 3 bits, even with C++ frames we only need 2 for the moment ?

::: js/src/ion/shared/IonFrames-x86-shared.h
@@ +54,5 @@
> +    }
> +};
> +
> +class IonEntryFrameLayout
> +{

class IonEntryFrameLayout : public IonCommonFrameLayout

::: js/src/ion/x86/Bailouts-x86.cpp
@@ +54,5 @@
> +class BailoutStack
> +{
> +    uintptr_t frameClassId_;
> +    double    fpregs_[FloatRegisters::Total];
> +    uintptr_t regs_[Registers::Total];

I would prefer to have a name for this couple, such as ProcessContext / RegistersContext / MachineState and provide a RegistersContextRef / MachineStateRef class in place of the current MachineState.

Thus we could have the corresponding macro assembler instructions for making context.

@@ +75,5 @@
> +        return frameClass().frameSize();
> +    }
> +    MachineState machine() {
> +        return MachineState(regs_, fpregs_);
> +    }

3:

    FrameRecovery
    frameRecovery(IonCompartment *ion) {
        uint8 *sp = (uint8 *)&esp[sizeof(BailoutStack) / STACK_SLOT_SIZE];
        uint8 *fp = sp + bailout->frameSize();

        // Compute the bailout ID.
        IonCode *code = ion->getBailoutTable(bailout->frameClass());
        uintptr_t tableOffset = bailout->tableOffset();
        uintptr_t tableStart = reinterpret_cast<uintptr_t>(code->raw());

        tableOffset < tableStart + code->instructionsSize());
        JS_ASSERT((tableOffset - tableStart) % BAILOUT_TABLE_ENTRY_SIZE == 0);
        uint32 bailoutId = ((tableOffset - tableStart) / BAILOUT_TABLE_ENTRY_SIZE) - 1;
        JS_ASSERT(bailoutId < BAILOUT_TABLE_SIZE);

        return FrameRecovery::FromBailoutId(fp, sp, bailout->machine(), bailoutId);
    }

@@ +86,5 @@
> +  public:
> +    SnapshotOffset snapshotOffset() const {
> +        JS_ASSERT(frameClass() == FrameSizeClass::None());
> +        return snapshotOffset_;
> +    }

3:

    FrameRecovery
    frameRecovery(IonCompartment *ion) {
        uint8 *sp = (uint8 *)&esp[sizeof(ExtendedBailoutStack) / STACK_SLOT_SIZE];
        uint8 *fp = sp + bailout->frameSize();
        return FrameRecovery::FromSnapshot(fp, sp, bailout->machine(), bailout->snapshotOffset());
    }

@@ +96,5 @@
>  
>  JS_STATIC_ASSERT(sizeof(BailoutStack) ==
>                   sizeof(uintptr_t) +
>                   sizeof(double) * 8 +
>                   sizeof(uintptr_t) * 8 +

8 is a nice inlining of Registers::Total & FloatRegisters::Total values. ;)
sizeof(RegistersContext) is a good substitute.

@@ +110,2 @@
>  {
> +    ExtendedBailoutStack *bailout = reinterpret_cast<ExtendedBailoutStack *>(esp);

(cf 1)

@@ +112,3 @@
>  
> +    if (bailout->frameClass() == FrameSizeClass::None()) {
> +        uint8 *sp = (uint8 *)&esp[sizeof(ExtendedBailoutStack) / STACK_SLOT_SIZE];

From this code I understand that  (bailout->frameClass() == FrameSizeClass::None()) implies that you have an ExtendedBailoutStack.

(cf 2)

@@ +118,3 @@
>  
> +    uint8 *sp = (uint8 *)&esp[sizeof(BailoutStack) / STACK_SLOT_SIZE];
> +    uint8 *fp = sp + bailout->frameSize();

0:

and otherwise you have a BailoutStack.

1: Thus I wonder why you reinterpret data in a larger type than what it can be.  Instead of a void **esp, you should expect a "BailoutStack *".  This would make the prototype of the function cleaner for readers.

2: after checking the frameClass you can do the reinterpret_cast which is done in (cf 1).

Additionally, by looking at this test, it seems to me that the case you can re-implement the frameSize function in both classes and move the specialized content of this function to both classes.  Which would make this function a "fake" virtual.

(cf 3)

::: js/src/ion/x86/Trampoline-x86.cpp
@@ +276,5 @@
>      masm.call(eax);
>  
>      // Remove the rectifier frame.
>      masm.pop(ebp);            // ebp <- sizeDescriptor with FrameType.
> +    masm.shrl(Imm32(FRAMETYPE_BITS), ebp); // ebp <- sizeDescriptor.

update field name too (descriptor).
Comment 5 David Anderson [:dvander] 2011-10-19 15:57:34 PDT
This bug is more complicated than I expected so I'm filing 695897 for the refactoring steps, will reply to comment #7 there.
Comment 6 David Anderson [:dvander] 2011-10-19 16:27:52 PDT
Snapshots are not enough to tell us how to GC: we need to know about live allocations that may not be associated with stack slots. For example, if obj->slots becomes a gcthing, and the slots load is hoisted, we need to trace it, but it won't be reachable from a snapshot.

In LSRA this should be fairly straightforward, we can look at which intervals intersect such instructions and build a new kind of snapshot that captures any that define GC things or untyped Values. In the Greedy allocator, we'll need to keep track of which virtual registers are live.

We could also just ignore this issue for now (and indeed, maybe this entire bug, given that we have conservative scanning) - in theory anything lying around in LIR is reachable from a stack slot. This will only be necessary once we have moving GC.
Comment 7 David Anderson [:dvander] 2011-12-01 19:57:17 PST
Created attachment 578483 [details] [diff] [review]
part 1: skip conservatively scanning Ion regions

This patch skips conservative scans on regions covered by Ion activations. Unfortunately it's pretty hard to force a GC hazard through this. I managed to write a test case that demonstrates an A->B->A problem though.
Comment 8 David Anderson [:dvander] 2011-12-01 19:58:14 PST
Created attachment 578484 [details]
test case

This test case fails when we start skipping Ion frames in the conservative scanner.
Comment 9 David Anderson [:dvander] 2011-12-05 17:35:23 PST
Created attachment 579163 [details] [diff] [review]
part 2: walk ion frames and mark callee tokens

Skeletal code for walking Ion frames.
Comment 10 David Anderson [:dvander] 2011-12-05 17:39:06 PST
Created attachment 579165 [details] [diff] [review]
part 3: fix RA policies

The REDEFINED register allocation policy is not really compatible with precise GC because it can alias a Value to, say, an unboxed int32. This patch renames REDEFINED to PASSTHROUGH and makes sure all callers are not aliasing types. This means that MUnbox (the main culprit) now uses MUST_REUSE_INPUT.

In addition I removed LDefinition::INTEGER and LDefinition::POINTER in favor of LDefinition::GENERAL, and made sure all uses of either used LDefinition::OBJECT when a gcthing is involved.
Comment 11 David Anderson [:dvander] 2011-12-08 18:49:43 PST
Storing gcmaps should be a lot easier if we can ensure that Values are stored contiguously on the stack, and Sean pointed out that a trivial way to do this is to make all stack slots sizeof(Value). This requires, however, that *all* aliasing of boxed to unboxed values be removed, so I'm going to iterate on the 3rd patch to make LBox create a new virtual register.
Comment 12 Bill McCloskey (:billm) 2011-12-09 10:14:21 PST
Comment on attachment 578483 [details] [diff] [review]
part 1: skip conservatively scanning Ion regions

Review of attachment 578483 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine. I do have a question about how much is getting skipped over in the conservative scanner. Say C++ function A calls Ion function B. Usually some of the registers used by A will be stored in B's stack frame, which will not be scanned. Will IonMonkey scan those registers itself, or are they saved somehow before you even enter the Ion frame?

Also, this is a pretty small nit, but lately I've been writing iterators with methods next() and done(). There are only two of these so far, so we can always change them. How many Ion iterators are there? It would be nice to standardize.
Comment 13 David Anderson [:dvander] 2011-12-09 15:47:37 PST
> This seems fine. I do have a question about how much is getting skipped over
> in the conservative scanner. Say C++ function A calls Ion function B.
> Usually some of the registers used by A will be stored in B's stack frame,
> which will not be scanned. Will IonMonkey scan those registers itself, or
> are they saved somehow before you even enter the Ion frame?

Good question - I just checked, these are saved right before we enter the Ion frame, so it should be safe.

> Also, this is a pretty small nit, but lately I've been writing iterators
> with methods next() and done(). There are only two of these so far, so we
> can always change them. How many Ion iterators are there? It would be nice
> to standardize.

Heh. Luke had recommended operator ++ :) I don't care either way, consistency is good.
Comment 14 Bill McCloskey (:billm) 2011-12-09 15:50:36 PST
Comment on attachment 579163 [details] [diff] [review]
part 2: walk ion frames and mark callee tokens

Review of attachment 579163 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonFrames.cpp
@@ +222,5 @@
> +    } else if (IsCalleeTokenScript(token)) {
> +        MarkRoot(trc, CalleeTokenToScript(token), "ion-entry");
> +    } else {
> +        JS_NOT_REACHED("unknown callee token type");
> +    }

No braces needed here.

::: js/src/ion/IonFrames.h
@@ +280,5 @@
>  static inline JSObject *
>  CalleeTokenToFunction(CalleeToken token)
>  {
>      JS_ASSERT(IsCalleeTokenFunction(token));
>      return (JSObject *)token;

It might be nice to include the mask here as well as below. That way you don't require FUNCTION_CALLEE_TOKEN to be zero. Same for the | above.

@@ +282,5 @@
>  {
>      JS_ASSERT(IsCalleeTokenFunction(token));
>      return (JSObject *)token;
>  }
>  static inline JSScript *

Need a newline.
Comment 15 Bill McCloskey (:billm) 2011-12-09 15:51:53 PST
(In reply to David Anderson [:dvander] from comment #13)
> Good question - I just checked, these are saved right before we enter the
> Ion frame, so it should be safe.

OK. That probably deserves a comment, especially in the place where you save the registers.
Comment 16 Sean Stangl [:sstangl] 2011-12-09 16:10:30 PST
Comment on attachment 579165 [details] [diff] [review]
part 3: fix RA policies

Review of attachment 579165 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonLIR.h
@@ +450,4 @@
>      };
>  
>      enum Type {
> +        GENERAL,    // Generic, integer data (GPR).

This is also used to hold pointer-width data. Maybe "Generic, pointer-width data (GPR)."?

::: js/src/ion/arm/Lowering-arm.cpp
@@ +399,5 @@
>  
>  bool
>  LIRGeneratorARM::visitGuardShape(MGuardShape *ins)
>  {
> +    LDefinition tempInt = temp(LDefinition::OBJECT);

tempObj?

::: js/src/ion/x86/CodeGenerator-x86.cpp
@@ +177,3 @@
>      MUnbox *mir = unbox->mir();
>      if (mir->fallible()) {
> +        LAllocation *type = unbox->getOperand(1);

Could we name the constant? If accessor methods existed on the LInstruction, then the above comment wouldn't even be necessary.

::: js/src/ion/x86/Lowering-x86.cpp
@@ +149,3 @@
>      LUnbox *lir = new LUnbox;
> +    lir->setOperand(0, usePayloadInRegister(inner));
> +    lir->setOperand(1, useType(inner, LUse::ANY));

This looks like a huge hack. What's the problem with re-using the payload register if it's not the 0th operand? Could we fix that?

AIUI, operand order was not meant to be meaningful; operands are assigned registers simultaneously. If the order is meaningful to regalloc, we should consider that a defect in regalloc.

Follow-up bug?
Comment 17 David Anderson [:dvander] 2011-12-09 18:38:22 PST
> Could we name the constant? If accessor methods existed on the LInstruction,
> then the above comment wouldn't even be necessary.

Sure.

> This looks like a huge hack. What's the problem with re-using the payload
> register if it's not the 0th operand? Could we fix that?
> 
> AIUI, operand order was not meant to be meaningful; operands are assigned
> registers simultaneously. If the order is meaningful to regalloc, we should
> consider that a defect in regalloc.
>
> Follow-up bug?

I don't know - it seems low-impact. The ordering of defs isn't related to the ordering of virtual registers. It's just a policy limitation, REDEFINED only works on operand 0, and it just happens that the normal (type, payload) ordering on defs is opposite here.
Comment 18 David Anderson [:dvander] 2011-12-12 14:29:37 PST
Created attachment 581056 [details] [diff] [review]
part 4: spill Values contiguously during LSRA

This (really gross) patch ensures that if both halves of a nunbox are spilled to the stack, they are spilled contiguously. We'll still need to handle (reg, stack) combinations specially in the safepoint encoding, but at least there the number of registers is limited.
Comment 19 David Anderson [:dvander] 2011-12-12 16:36:14 PST
Created attachment 581093 [details] [diff] [review]
part 5: spill values contiguously during GRA

Same as part 4, except for the greedy allocator.
Comment 20 David Anderson [:dvander] 2011-12-13 18:12:09 PST
Created attachment 581506 [details] [diff] [review]
part 6: share much more lowering code

 8 files changed, 60 insertions(+), 225 deletions(-)

Rather than keep these three, identical sets of functions in sync, this patch moves a bunch of stuff into Lowering-shared. The #ifdef JS_NUNBOX32s are a little gross, if it gets to be too much we can factor out a LIRGeneratorNunboxing.
Comment 21 Sean Stangl [:sstangl] 2011-12-14 14:47:23 PST
Comment on attachment 581093 [details] [diff] [review]
part 5: spill values contiguously during GRA

Review of attachment 581093 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/GreedyAllocator.cpp
@@ +206,5 @@
> +    if (IsNunbox(vr->type())) {
> +        VirtualRegister *other = otherHalfOfNunbox(vr);
> +        unsigned stackSlot;
> +        if (!other->hasStackSlot()) {
> +            if (!stackSlots.allocateDoubleSlot(&stackSlot))

Perhaps allocateValueSlots()? It seems misleading to call it a Double.

@@ +245,5 @@
> +{
> +    if (vr->isDouble())
> +        stackSlots.freeDoubleSlot(vr->stackSlot());
> +    else
> +        stackSlots.freeSlot(vr->stackSlot());

This leaks slots: if we call allocateDoubleSlot() above on something that is not a double, as in allocateStack(), it will be freed as a regular slot, and the other half will be lost.

@@ +310,5 @@
> +            JS_ASSERT_IF(vr->hasStackSlot(), other->hasStackSlot() || !other->hasBackingStack());
> +            JS_ASSERT_IF(other->hasStackSlot(), vr->hasStackSlot() || !vr->hasBackingStack());
> +            VirtualRegister *candidate = vr->hasStackSlot() ? vr : other;
> +            unsigned slot = BaseOfNunboxSlot(candidate->type(), candidate->stackSlot());
> +            stackSlots.freeDoubleSlot(slot);

freeValueSlots()?

::: js/src/ion/IonLIR-inl.h
@@ +93,5 @@
> +              (type2 == LDefinition::TYPE && type1 == LDefinition::PAYLOAD));
> +}
> +
> +static inline unsigned
> +OffsetOfNunboxSlot(LDefinition::Type type)

"Offset" generally implies a byte count, but I haven't thought of a better name.

@@ +107,5 @@
> +BaseOfNunboxSlot(LDefinition::Type type, unsigned slot)
> +{
> +    if (type == LDefinition::PAYLOAD)
> +        return slot + (NUNBOX32_PAYLOAD_OFFSET / STACK_SLOT_SIZE);
> +    return slot + (NUNBOX32_TYPE_OFFSET / STACK_SLOT_SIZE);

return slot + OffsetOfNunboxSlot(type);
Comment 22 David Anderson [:dvander] 2011-12-16 01:33:27 PST
Created attachment 582204 [details] [diff] [review]
part 7: add safepoint building, encoding, and decoding

This patch has all the safepoint functionality save for LSRA/GRA changes and GC integration.
Comment 23 David Anderson [:dvander] 2011-12-16 01:36:05 PST
If the use of LSafepoint::liveRegs() is unclear from this patch: the plan is that out-of-line calls will push this set of registers, of which LSafepoint::gcRegs() are a subset. The GC will be able to trace these temporary spill slots (and modify them if necessary). The full set of live registers is recorded so out-of-line calls don't have to spill all registers, which would be painful for doubles.
Comment 24 David Anderson [:dvander] 2011-12-16 01:43:00 PST
Created attachment 582207 [details] [diff] [review]
part 9: gc integration

We don't support out-of-line calls yet so this part is fairly straightforward.
Comment 25 Sean Stangl [:sstangl] 2011-12-19 11:00:11 PST
Comment on attachment 581506 [details] [diff] [review]
part 6: share much more lowering code

Review of attachment 581506 [details] [diff] [review]:
-----------------------------------------------------------------

In the future we will want a *-nunbox-shared.* in the same vein as *-x86-shared.*, but this is fine for now (as discussed).
Comment 26 David Anderson [:dvander] 2011-12-19 13:02:55 PST
Created attachment 582931 [details] [diff] [review]
part 8: safepoint integration for LSRA

This patch adds safepoint integration to LSRA, but not reliably for out-of-line calls due to LSRA bugs. bug 709731 will probably fix these, if not, I'll do another patch on top of this one.
Comment 27 Jan de Mooij [:jandem] 2011-12-20 04:50:11 PST
Comment on attachment 582931 [details] [diff] [review]
part 8: safepoint integration for LSRA

Review of attachment 582931 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, nice to see that it doesn't affect the other regalloc passes (much).

::: js/src/ion/LinearScan.cpp
@@ +917,5 @@
> +
> +                // Safepoints are sorted, so we can shortcut out of this loop
> +                // if we go out of range.
> +                if (interval->start() > pos || interval->end() < pos)
> +                    break;

The RHS of the OR makes sense (all safepoints after the end of the interval are definitely not covered by the interval). However, the LHS should be removed.

Ranges are stored backwards and traversed in that order by the covers() call below, so it may be a bit more efficient to either use findFirstSafepoint before the loop or to add something like "pos < interval->start()" to the if-condition below.

@@ +1000,5 @@
> +        for (size_t j = 1; j < reg->numIntervals(); j++) {
> +            LiveInterval *interval = reg->getInterval(j);
> +            if (interval->end() > end)
> +                end = interval->end();
> +        }

Intervals are stored in ascending order, and intervals for a single vreg should never overlap, so you can use the end position of the last interval.

@@ +1012,5 @@
> +                break;
> +
> +            LSafepoint *safepoint = ins->safepoint();
> +
> +            if (!IsNunbox(reg)) {

Nit: add a JS_ASSERT(IsTraceable(reg)) here

@@ +1051,5 @@
> +                LAllocation *payloadAlloc = payloadInterval->getAllocation();
> +
> +                if (!ins->isCall() &&
> +                    ((!IsSpilledAt(type, inputOf(ins)) && IsSpilledAt(payload, inputOf(ins))) ||
> +                     payloadAlloc->isGeneralReg()))

Isn't this just:

if (!ins->isCall() && (typeAlloc->isGeneralReg() || payloadAlloc->isGeneralReg())

Or am I missing something?
Comment 28 Sean Stangl [:sstangl] 2011-12-20 16:03:31 PST
Comment on attachment 582204 [details] [diff] [review]
part 7: add safepoint building, encoding, and decoding

Review of attachment 582204 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine.

::: js/src/ion/Ion.cpp
@@ +425,1 @@
>      script->frameInfoEntries_ = frameInfoEntries;

For thought: since we know that each of these fields is consecutive, we could make accessor functions in the IonScript that recalculate the base. That would allow us to eliminate a few 32-bit fields from the IonScript.

::: js/src/ion/LIR.cpp
@@ +75,5 @@
> +bool
> +LIRGraph::noteNeedsSafepoint(LInstruction *ins)
> +{
> +    // Instructions with safepoints must be in linear order.
> +    JS_ASSERT_IF(safepoints_.length(), safepoints_[0]->id() < ins->id());

(safepoints_[safepoints_.length() - 1]->id() < ins->id()) ? Then linear order is by induction.

::: js/src/ion/LIR.h
@@ +935,5 @@
> +    typedef SafepointNunboxEntry NunboxEntry;
> +
> +  public:
> +    typedef Vector<uint32, 0, IonAllocPolicy> SlotList;
> +    typedef Vector<NunboxEntry, 0, IonAllocPolicy> NunboxList;

Does it make sense for nunbox-specific datatypes to be defined on punbox systems?

::: js/src/ion/shared/CodeGenerator-shared.cpp
@@ +239,5 @@
> +    safepoints_.writeGcRegs(safepoint->gcRegs(), safepoint->liveRegs().gprs());
> +    if (safepoint->gcSlots().length())
> +        safepoints_.writeGcSlots(safepoint->gcSlots().length(), &safepoint->gcSlots()[0]);
> +    else
> +        safepoints_.writeGcSlots(0, NULL);

It looks almost as if writeGcSlots() wants to take in a reference to gcSlots(), and likewise with the order writeFoo() functions.

::: js/src/ion/x64/Architecture-x64.h
@@ +144,3 @@
>  
> +// Smallest integer type that can hold a register bitmask.
> +typedef uint16 PackedRegisterMask;

PackedRegisterMask looks like a mask, not a type, in symmetry with AllocatableMask, for example. The typical solution is to explicitly mark it as a type by appending "_t".
Comment 29 David Anderson [:dvander] 2011-12-21 18:57:40 PST
Created attachment 583704 [details] [diff] [review]
part 10: greedy allocator safepoints

This patch applies on top of bug 711763.
Comment 30 Jan de Mooij [:jandem] 2011-12-24 06:59:09 PST
Comment on attachment 581056 [details] [diff] [review]
part 4: spill Values contiguously during LSRA

Review of attachment 581056 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/LinearScan.cpp
@@ +1061,5 @@
> +OffsetOfNunboxSlot(LDefinition *def)
> +{
> +    if (def->type() == LDefinition::PAYLOAD)
> +        return NUNBOX32_PAYLOAD_OFFSET / STACK_SLOT_SIZE;
> +    return NUNBOX32_TYPE_OFFSET / STACK_SLOT_SIZE;

See Sean's comment(s) in comment 21.

@@ +1140,5 @@
> +
> +        JS_ASSERT_IF(mine->canonicalSpill() && other->canonicalSpill(),
> +                     mine->canonicalSpill()->isStackSlot() == other->canonicalSpill()->isStackSlot());
> +
> +        VirtualRegister *candidate = alloc ? mine : other;

alloc is always non-NULL; should this be alloc->isStackSlot() or mine->canonicalSpill()?

::: js/src/ion/LinearScan.h
@@ +244,5 @@
>   * to them as it runs.
>   */
> +class LiveInterval
> +  : public InlineListNode<LiveInterval>,
> +    public TempObject

Good catch (inheriting from TempObject).
Comment 31 Bill McCloskey (:billm) 2011-12-27 09:46:49 PST
Comment on attachment 582207 [details] [diff] [review]
part 9: gc integration

Review of attachment 582207 [details] [diff] [review]:
-----------------------------------------------------------------

This looks okay, but could you explain a little more about why Values and pointers can't be kept separate, particularly for stack scanning?

::: js/src/ion/IonFrames.cpp
@@ +264,5 @@
> +    // actual values).
> +    uint32 slot;
> +    while (safepoint.getGcSlot(&slot)) {
> +        uintptr_t *ref = layout->slotRef(slot);
> +        gc::MarkGCThingOrValue(trc, *ref, "ion-gc-slot");

I don't understand, if you have a separate loop for Values below, why you also need to include Values in this loop as well. Are you unable to tell when these are generated whether they're pointers or Values?

Also, whatever happens here, it would be nice to have a Root version to use since this is always called during root marking.

::: js/src/ion/shared/Assembler-x86-shared.cpp
@@ +64,5 @@
>  {
>      while (reader.more()) {
>          size_t offset = reader.readUnsigned();
>          void *ptr = JSC::X86Assembler::getPointer(buffer + offset);
> +        gc::MarkGCThingOrValue(trc, reinterpret_cast<uintptr_t>(ptr), "imm-gc-word");

This one, I'm assuming, can actually be called on both Values and GC thing pointers? Also, it looks like it's called during normal object marking (i.e., it's not a root)?

::: js/src/jsgcmark.cpp
@@ +596,5 @@
> +            jsval_layout layout;
> +            layout.asBits = bits;
> +            Value v = IMPL_TO_JSVAL(layout);
> +            gc::MarkValueUnbarriered(trc, v, name);
> +            continue;

I assume you want return here.

@@ +601,5 @@
> +        }
> +    }
> +#endif
> +
> +    gc::MarkGCThing(trc, reinterpret_cast<void *>(word), name);

Some sort of comment would be nice here about how, on 32-bit systems, you pass in a pointer to the low word of a Value when the Value is markable.
Comment 32 David Anderson [:dvander] 2011-12-27 12:00:51 PST
> ::: js/src/ion/IonFrames.cpp
> @@ +264,5 @@
> > +    // actual values).
> > +    uint32 slot;
> > +    while (safepoint.getGcSlot(&slot)) {
> > +        uintptr_t *ref = layout->slotRef(slot);
> > +        gc::MarkGCThingOrValue(trc, *ref, "ion-gc-slot");
> 
> I don't understand, if you have a separate loop for Values below, why you
> also need to include Values in this loop as well. Are you unable to tell
> when these are generated whether they're pointers or Values?

On x64 we can tell whether a slot is a pointer or Value via the tag, since they're the same size. On x86 we can't. We could separate them on both platforms, but the code to differentiate is already there (so the x64 assembler doesn't need two relocation tables).

> This one, I'm assuming, can actually be called on both Values and GC thing
> pointers? Also, it looks like it's called during normal object marking
> (i.e., it's not a root)?

It can be used on pointers, or punboxed values. Yeah, it can be called tracing IonCode objects.

> 
> ::: js/src/jsgcmark.cpp
> @@ +596,5 @@
> > +            jsval_layout layout;
> > +            layout.asBits = bits;
> > +            Value v = IMPL_TO_JSVAL(layout);
> > +            gc::MarkValueUnbarriered(trc, v, name);
> > +            continue;
> 
> I assume you want return here.

Yep, thanks.

> @@ +601,5 @@
> > +        }
> > +    }
> > +#endif
> > +
> > +    gc::MarkGCThing(trc, reinterpret_cast<void *>(word), name);
> 
> Some sort of comment would be nice here about how, on 32-bit systems, you
> pass in a pointer to the low word of a Value when the Value is markable.

Okay.
Comment 33 David Anderson [:dvander] 2011-12-27 16:03:26 PST
> Ranges are stored backwards and traversed in that order by the covers() call
> below, so it may be a bit more efficient to either use findFirstSafepoint
> before the loop or to add something like "pos < interval->start()" to the
> if-condition below.
> 
> ...
> 
> Intervals are stored in ascending order, and intervals for a single vreg
> should never overlap, so you can use the end position of the last interval.

Good to know, thanks!

> > +                if (!ins->isCall() &&
> > +                    ((!IsSpilledAt(type, inputOf(ins)) && IsSpilledAt(payload, inputOf(ins))) ||
> > +                     payloadAlloc->isGeneralReg()))
> 
> Isn't this just:
> 
> if (!ins->isCall() && (typeAlloc->isGeneralReg() ||
> payloadAlloc->isGeneralReg())
> 
> Or am I missing something?

Well, we don't care if the type is in a register but it and the payload are also spilled. But yeah, this expression has at least one useless term.
Comment 34 David Anderson [:dvander] 2011-12-27 16:13:53 PST
> Perhaps allocateValueSlots()? It seems misleading to call it a Double.

Okay.

> This leaks slots: if we call allocateDoubleSlot() above on something that is
> not a double, as in allocateStack(), it will be freed as a regular slot, and
> the other half will be lost.

This function is never called on a nunbox slot.

> @@ +107,5 @@
> > +BaseOfNunboxSlot(LDefinition::Type type, unsigned slot)
> > +{
> > +    if (type == LDefinition::PAYLOAD)
> > +        return slot + (NUNBOX32_PAYLOAD_OFFSET / STACK_SLOT_SIZE);
> > +    return slot + (NUNBOX32_TYPE_OFFSET / STACK_SLOT_SIZE);
> 
> return slot + OffsetOfNunboxSlot(type);

These are two different offsets, actually. OffsetOfNunboxSlot returns the delta between a virtual register that represents half of a nunbox, and its counterpart (so, +/-1, depending on which you ask for).
Comment 35 David Anderson [:dvander] 2011-12-27 16:35:46 PST
> For thought: since we know that each of these fields is consecutive, we
> could make accessor functions in the IonScript that recalculate the base.
> That would allow us to eliminate a few 32-bit fields from the IonScript.

Yeah, we did this in JM but it makes the structure nearly impossible to debug, so unless the memory savings end up being huge I'm in favor of the easier thing.

> Does it make sense for nunbox-specific datatypes to be defined on punbox
> systems?

I'm desperately trying to avoid lots of #ifdefs

> It looks almost as if writeGcSlots() wants to take in a reference to
> gcSlots(), and likewise with the order writeFoo() functions.

Looks like Vector::begin() returns a raw pointer, so I'll use that.

> ::: js/src/ion/x64/Architecture-x64.h
> @@ +144,3 @@
> >  
> > +// Smallest integer type that can hold a register bitmask.
> > +typedef uint16 PackedRegisterMask;
> 
> PackedRegisterMask looks like a mask, not a type, in symmetry with
> AllocatableMask, for example. The typical solution is to explicitly mark it
> as a type by appending "_t".

noo

I'll rename to "PackedRegisters"
Comment 36 Sean Stangl [:sstangl] 2011-12-27 16:45:42 PST
Comment on attachment 583704 [details] [diff] [review]
part 10: greedy allocator safepoints

Review of attachment 583704 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/GreedyAllocator.cpp
@@ +655,5 @@
> +            continue;
> +        }
> +
> +        if (!IsNunbox(vr->type()))
> +            continue;

This can go within the #ifdef JS_NUNBOX32.
Comment 37 David Anderson [:dvander] 2011-12-27 17:42:48 PST
Bill - we talked on IRC about not aliasing value/pointers on x64, but it turns out to be way easier with the original approach. Otherwise both allocators have to distinguish value from non-value slots for registers and stack slots, which means more helper functions and more if-nests in both allocators. It's nice just having everything funnel into one case that we can differentiate at gc time.

There is still more work to do here with out-of-line calls, but I'll bother with that once LSRA has stabilized. For now I'd like to get this all out of my queue since rebasing has gotten painful.

http://hg.mozilla.org/projects/ionmonkey/rev/3a226af16bad
http://hg.mozilla.org/projects/ionmonkey/rev/31fbba782c39
http://hg.mozilla.org/projects/ionmonkey/rev/db6347787da3
http://hg.mozilla.org/projects/ionmonkey/rev/6c56e451cc78
http://hg.mozilla.org/projects/ionmonkey/rev/034cb42b7dba
http://hg.mozilla.org/projects/ionmonkey/rev/faf64e8928e8
http://hg.mozilla.org/projects/ionmonkey/rev/c46781b63795
http://hg.mozilla.org/projects/ionmonkey/rev/4a163d12ee5c
http://hg.mozilla.org/projects/ionmonkey/rev/a3c47142a828
http://hg.mozilla.org/projects/ionmonkey/rev/f5f0c0825e1a
http://hg.mozilla.org/projects/ionmonkey/rev/ecb0b1376971

Note You need to log in before you can comment on or make changes to this bug.