Closed Bug 846111 Opened 10 years ago Closed 10 years ago

PJS: Implement a limited version of GetPropertyIC

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: shu, Assigned: shu)

References

Details

Attachments

(4 files, 10 obsolete files)

10.31 KB, patch
shu
: review+
Details | Diff | Splinter Review
55.42 KB, patch
nbp
: review+
Details | Diff | Splinter Review
87.41 KB, patch
shu
: review+
Details | Diff | Splinter Review
151.73 KB, patch
Details | Diff | Splinter Review
Currently, all ICs are unsafe in parallel execution due to their self-modifying nature. This severely limits the kind of code that be run in parallel.
Attached patch Part 1: VM changes (obsolete) — Splinter Review
This patch carves out pure paths for property lookups that result in a slot on native objects.
Attachment #719277 - Flags: review?(luke)
Attached patch Part 2: Ion changes (obsolete) — Splinter Review
This patch can be divided into a bunch of logical parts and subparts:

 - Add the ability to re-enter the VM from parallel execution
   - Tag parallel frames and set ionTop on PerThreadData
   - Generate the VM wrapper different for parallel VMFunctions, which take a ForkJoinSlice * as their context
 - Add ParGetPropertyIC
   - Calls a special, pure version of property lookup (see part 1)
   - Can only attach slot read stubs
   - Locks the JSContext around the self-modifying/code allocation part
     - Because of said lock, hold a set of already-stubbed objects to avoid generating duplicate stubs when multiple copies of the IC are waiting on the lock
Attachment #719278 - Flags: review?(nicolas.b.pierron)
I think bhackett would be the better reviewer for part 1.
Attachment #719277 - Flags: review?(luke) → review?(bhackett1024)
Comment on attachment 719277 [details] [diff] [review]
Part 1: VM changes

Review of attachment 719277 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but I'd like to see a new version of the patch with the ObjectImpl interface changes.

::: js/src/jsobj.cpp
@@ +3932,5 @@
> +         * no cx and it'd probably be unsafe.
> +         *
> +         * JS_ASSERT_IF(!pobj->hasSingletonType() && shape->hasDefaultGetter(),
> +         *              js::types::TypeHasProperty(cx, pobj->type(), shape->propid(), vp));
> +         */

I think this comment should just be removed.  TypeHasProperty looks like it will only have side effects via an AutoEnterAnalysis, but fixing enough stuff to allow the assert to be added here doesn't seem worth the trouble.

@@ +3983,5 @@
> +        while (current) {
> +            if (!current->isNative())
> +                return false;
> +            current = current->getProto();
> +        }

This loop seems unnecessary, LookupPropertyPureInline will return false if it sees a non-native object while looking for the property.

@@ +3991,5 @@
> +    }
> +
> +    /* Fail if the object isn't native. */
> +    if (!obj2->isNative())
> +        return false;

This test is also unnecessary.

@@ +4001,5 @@
> +
> +    if (!NativeGetPureInline(obj2, shape, vp))
> +        return false;
> +
> +    return true;

Just 'return NativeGetPureInline(...)' is fine, avoid the last 'if'.

::: js/src/vm/Shape.h
@@ +1126,5 @@
> +         * Note that we let the linear searches be bumped racily. This is not
> +         * incorrect, as the number of linear searches is a heuristic anyways.
> +         */
> +        start->incrementNumLinearSearches();
> +    }

This entire if/else block should be removed.  Doing a linear search of a shape's parents is always valid to do, and it seems pretty bad that whether PJS can attach an IC depends on the number of searches that have been made on the shape.  Don't worry about the cost of doing the linear search, the hashtable optimization is in place for when VM operations are doing tons of searches through the shape; IC generation should only do a handful of searches.

Also, the number of searches is stored as some bits in a more general use uint32 field, and while I wouldn't expect races on incrementNumLinearSearches to damage that other info, it's hard to say for sure.  Would be better to just avoid the race, benign or not.

Making this change will make this method infallible, and you can use the same signature as Shape::search.  This will also make all the new methods in ObjectImpl.h pure, and their signatures should also be changed.
Attachment #719277 - Flags: review?(bhackett1024)
Comment on attachment 719278 [details] [diff] [review]
Part 2: Ion changes

Review of attachment 719278 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonCaches.cpp
@@ +1034,5 @@
>  }
>  
> +static bool
> +ParTryAttachNativeGetPropStub(LockedJSContext &cx, IonScript *ion,
> +                              ParGetPropertyIC &cache, HandleObject obj,

If this is only specific to this cache, this should be a member of this cache.

@@ +1095,5 @@
> +        return false;
> +    }
> +
> +    if (!cache.canAttachStub())
> +        return true;

You should check that before doing all this logic.

@@ +1097,5 @@
> +
> +    if (!cache.canAttachStub())
> +        return true;
> +
> +    return cache.attachReadSlot(cx, ion, obj, holder, shape);

the “false” returned by attachReadSlot does not have the same meaning as the “false” returned by ParTryAttachNativeGetPropStub.  This should be moved to the update function.

@@ +1122,5 @@
> +
> +    // Grab the property early, as the pure path is fast anyways and doesn't
> +    // need a lock.
> +    if (!GetPropertyPure(obj, NameToId(name), vp.address()))
> +        return false;

returning false out of a VM function call implies that an exception would be raised, unless you change the meaning, which I don't recommend since this would be extremely confusing, I would recommend to invalidate instead.

::: js/src/ion/IonCaches.h
@@ +7,5 @@
>  
>  #ifndef jsion_caches_h__
>  #define jsion_caches_h__
>  
> +#include "vm/ForkJoin.h"

nit: Use pre-declaration instead of includes, except in the cpp files which need to see the definitions.

@@ +202,5 @@
>      {
>      }
>  
> +    ~IonCache() {
> +        if (stubbedObjects_)

- For the moment IonCache destructors are only called when leaving the CodeGen, so this won't work as you would have expected.  You need to update IonCode destructor to call IonCache destructors.

- Then it would be better if this would be part of a virtual reset function, and moved to the most specialized version of the cache which needs this modification, instead of being part of IonCache.  And only call the reset function on the allocated cache.

- Do that also on GCs, at the same time when stubs are flushed.

@@ +290,5 @@
> +    bool initStubbedObjects(JSContext *cx) {
> +        // Note: to avoid double freeing, only initialize stubbedObjects after
> +        // the cache has been allocated (copied) into the cacheList.
> +        if (!stubbedObjects_) {
> +            stubbedObjects_ = cx->new_<ObjectSet>(cx);

- Add an isAllocated function which return if the fallbackLabel_ has been set or not.
- Assert that this function is never called if the cache is not allocated.

::: js/src/ion/IonFrames.cpp
@@ +429,5 @@
>  {
>      settle();
>  }
>  
> +IonActivationIterator::IonActivationIterator(uint8_t *top, IonActivation *activation)

Do not re-add this code.

::: js/src/ion/IonMacroAssembler.h
@@ +531,5 @@
>          linkExitFrame();
>          Push(ImmWord(uintptr_t(codeVal)));
>          Push(ImmWord(uintptr_t(NULL)));
>      }
> +    void enterParExitFrame(const VMFunction *f, Register slice, Register scratch) {

nit: Move all function definitions to the cpp file.

@@ +628,5 @@
>          reenterSPSFrame();
>          return ret;
>      }
>  
> +    void tagCallee(Register callee, ExecutionMode mode) {

nit: same here.

@@ +641,5 @@
> +            JS_NOT_REACHED("unknown execution mode");
> +        }
> +    }
> +
> +    void clearCalleeTag(Register callee, ExecutionMode mode) {

nit: and here.

::: js/src/ion/VMFunctions.h
@@ +8,5 @@
>  #ifndef jsion_vm_functions_h__
>  #define jsion_vm_functions_h__
>  
>  #include "jspubtd.h"
> +#include "vm/ForkJoin.h"

replace it by a pre-declaration of the class if possible.

  class ForkJoinSlice;

@@ +89,5 @@
>      // arguments of the VM wrapper.
>      uint64_t argumentRootTypes;
>  
> +    // Does this function take a ForkJoinSlice * instead of a JSContext *?
> +    bool parallel;

Don't store a boolean, but the ExecutionMode enum, or provide a wrapper to get the execution mode.

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +492,5 @@
>      // We're aligned to an exit frame, so link it up.
> +    if (f.parallel)
> +        masm.enterParExitFrame(&f, cxreg, temp);
> +    else
> +        masm.enterExitFrame(&f);

These “if (f.parallel)” are painful to read.  I think it would be better to have one function which takes the ExecutionMode (not a boolean) in argument and do the switch internally.

@@ +537,5 @@
> +    // Initialize the context parameter if sequential. For parallel execution,
> +    // we've already loaded the context earlier in entering the parallel exit
> +    // frame.
> +    if (!f.parallel)
> +        masm.loadJSContext(cxreg);

You might want to merge this one in the previous function which would be a variant of enterExitFrame.

@@ +603,5 @@
>  
>      masm.bind(&exception);
> +    if (f.parallel)
> +        masm.handleParException();
> +    else

This is really hard to follow, masm.handleException(ExecutionMode) should be better.
Attachment #719278 - Flags: review?(nicolas.b.pierron)
Attached patch Part 1: VM changes (obsolete) — Splinter Review
Applied comments
Attachment #719277 - Attachment is obsolete: true
Attachment #720211 - Flags: review?(bhackett1024)
Attached patch Part 2: Ion changes v2 (obsolete) — Splinter Review
Attachment #719278 - Attachment is obsolete: true
Attachment #720824 - Flags: review?(nicolas.b.pierron)
Attached patch Part 2: Ion changes v2 (obsolete) — Splinter Review
Forgot to qref some files
Attachment #720824 - Attachment is obsolete: true
Attachment #720824 - Flags: review?(nicolas.b.pierron)
Attachment #720907 - Flags: review?(nicolas.b.pierron)
Comment on attachment 720211 [details] [diff] [review]
Part 1: VM changes

Review of attachment 720211 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Shape.h
@@ +1100,5 @@
> +inline UnrootedShape
> +Shape::searchNoHashify(Shape *start, jsid id)
> +{
> +    AutoAssertNoGC nogc;
> +    Shape **spp;

Maybe better to move spp down to (now) its only use/definition.

@@ +1105,5 @@
> +
> +    if (start->inDictionary()) {
> +        spp = start->table().search(id, false);
> +        return SHAPE_FETCH(spp);
> +    }

This 'if' block can be removed, it will be handled by the start->hasTable() 'if' statement below.
Attachment #720211 - Flags: review?(bhackett1024) → review+
Blocks: PJS
Comment on attachment 720907 [details] [diff] [review]
Part 2: Ion changes v2

Review of attachment 720907 [details] [diff] [review]:
-----------------------------------------------------------------

Can you split this patch in 2 and ask me again on the VM functions calls with the ForkJoinSlice, which looks good, and dvander and me on the IC part of the patch?

Why naming ForkJoinSlice and not JSParallelContext?
How do you garantee that no code can get an JSContext which is not locked, such as somebody patching this code later can get at best a compilation error?

Do you have any test case of this IC path?

::: js/src/ion/CodeGenerator.cpp
@@ +4705,5 @@
> +        return addCache(ins, allocateCache(cache));
> +      }
> +      default:
> +        JS_NOT_REACHED("Bad execution mode");
> +    }

nit: add a "return false;"

::: js/src/ion/IonCaches.cpp
@@ +1127,5 @@
> +ParallelGetPropertyIC::update(ForkJoinSlice *slice, size_t cacheIndex,
> +                              HandleObject obj, MutableHandleValue vp)
> +{
> +    AutoFlushCache afc("ParallelGetPropertyCache");
> +    PerThreadData *pt = slice->perThreadData;

How hard would it be to re-use the same mechanism that we already use in GetPropertyIC?  We put a lot of effort and fixes in our ICs and this sounds like a big source of errors.

Are fuzzers running on parallel arrays yet?

I would feel more confident with the code if you were just calling the GetPropertyIC::update function.  Can you do something like:

ParalellelGetPropertyIC::update(…)
{
    …
    LockedJSContext cx(slice);
    AutoEnsurePurity purity();
    if (!GetPropertyIC::update(cx, …))
        return false;
    return purity.isTroubled();
}

And return false when the purity of the evaluation cannot be guaranteed.  Almost like what we are doing with AutoAssertNoGC.

::: js/src/ion/IonCaches.h
@@ +228,5 @@
>      // Reset the cache around garbage collection.
> +    virtual void reset();
> +
> +    // Destroy any extra resources the cache uses upon IonCode finalization.
> +    virtual void destroy() { }

Define it in IonCaches.cpp, not in the header.  Define a virtual destructor too.

::: js/src/ion/IonMacroAssembler.cpp
@@ +865,5 @@
> +MacroAssembler::clearCalleeTag(Register callee, ExecutionMode mode)
> +{
> +    switch (mode) {
> +      case SequentialExecution:
> +        // CalleeToken_Function is untagged, so we don't need to do anything.

Also add that this function should never be produced when we are generating the body of a script.

::: js/src/ion/VMFunctions.h
@@ +349,5 @@
>  
> +// VMFunction wrapper for calling from parallel execution with no explicit
> +// arguments.
> +template <class R>
> +struct FunctionInfo<R (*)(ForkJoinSlice *)> : public VMFunction {

I don't think you need to duplicate everything:

template <typename ContextPtr> struct matchContext<ContextPtr>;

template <> struct MatchContext<JSContext *> {
    static const ExecutionMode execMode = SequentialExecution;
};

template <> struct MatchContext<ForkJoinSlice *> {
    static const ExecutionMode execMode = ParallelExecution;
};

and replace the JSContext * by Context in all the templates, as well as using the following in FunctionInfo constructor:

  MatchContext<Context>::execMode
Attachment #720907 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #10)
> Comment on attachment 720907 [details] [diff] [review]
> Part 2: Ion changes v2
> 
> Review of attachment 720907 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Why naming ForkJoinSlice and not JSParallelContext?

Because that's a vague name, and our context happens to be the current slice we're running in. Our parallelism model is fork join, thus ForkJoinSlice.

>
> How do you garantee that no code can get an JSContext which is not locked, such as somebody patching this code later can get at best a compilation error?
>

Parallel VM reentry doesn't give you a JSContext, so the only interface to get a context is via ForkJoinSlice::acquireContext(). This could be made tighter to not return a JSContext * but a LockedJSContext and implement move semantics for LockedJSContext. You can get a JSContext * out of a LockedJSContext, though, but there's not a way to get at a JSContext without acquiring it first.

> 
> ::: js/src/ion/IonCaches.cpp
> @@ +1127,5 @@
> > +ParallelGetPropertyIC::update(ForkJoinSlice *slice, size_t cacheIndex,
> > +                              HandleObject obj, MutableHandleValue vp)
> > +{
> > +    AutoFlushCache afc("ParallelGetPropertyCache");
> > +    PerThreadData *pt = slice->perThreadData;
> 
> How hard would it be to re-use the same mechanism that we already use in
> GetPropertyIC?  We put a lot of effort and fixes in our ICs and this sounds
> like a big source of errors.
> 
> Are fuzzers running on parallel arrays yet?
> 
> I would feel more confident with the code if you were just calling the
> GetPropertyIC::update function.  Can you do something like:
> 
> ParalellelGetPropertyIC::update(…)
> {
>     …
>     LockedJSContext cx(slice);
>     AutoEnsurePurity purity();
>     if (!GetPropertyIC::update(cx, …))
>         return false;
>     return purity.isTroubled();
> }
> 
> And return false when the purity of the evaluation cannot be guaranteed. 
> Almost like what we are doing with AutoAssertNoGC.
> 

Very hard / results in too much complexity to reuse existing logic. The logic is sufficiently different, and the VM getProperty paths had to be vetted and special pure versions had to be carved out.

I don't see how an Auto object would modify the control flow of function calls that occur in its scope, unless we made intrusive changes all the way down to check for its existence. If the engine were designed from the ground up to understand the difference between pure and non-pure in its VM operations, something like this would be great, but as it stands now, I feel this approach'd be a bigger maintenance headache than the existing approach of carving out pre-vetted, pure paths.

P.S. AutoAssertNoGC was apparently removed for causing more problems than it solved, FWIW.
Comment on attachment 720907 [details] [diff] [review]
Part 2: Ion changes v2

Review of attachment 720907 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonCaches.cpp
@@ +10,1 @@
>  #include "CodeGenerator.h"

As only the thread which are entering the update function are locked, you have a race issue in attachStub function, which update the exit path after updating the entry one.

Jump addresses are not guarantee to be aligned, which means that a jump can be patched while another thread will request a prefect of an instruction stored on 2 cache lines.  Can you ensure that all threads pipeline are in a correct state when patching the "lastJump_".
based on a brief talk with nbp, it sounds like we'll want to have a patching method on ARM that is not prone to cache coherency issues.  We'll want to specify that the branch will always be of the form ldr pc, [pc, offset], and we can simply patch the data that is being loaded.
Yeah, nbp and I were just discussing a similar design of patching a value that is loaded and branched to.  That certainly seems easier to reason about.
Depends on: 849469
Parallel VM calls must return ParallelResult, which distinguishes between bailout due to failure (TP_FATAL), and bailout due to inability to stay pure (TP_RETRY_SEQUENTIALLY).
Attachment #720907 - Attachment is obsolete: true
Attachment #730940 - Flags: review?(nicolas.b.pierron)
Implement dispatch-style IC stubs via the |DispatchStubPrepender| attacher. The idea is that we use a dispatch table, allocated in IonScript, which holds a pointer to the first stub that the IC should jump to. Stubs are prepended instead of appended, so that no exit jumps need to be patched once the stub is attached. This, as far as I understand, addresses the need to align exit jump addresses to get atomic patching / cache coherence problems on ARM.

ParallelGetPropertyIC then generates dispatch style stubs instead of the repatch style stubs.

Also requesting r? from dvander for a general lookover and r? from mjrosenb for the ARM changes.
Attachment #730941 - Flags: review?(nicolas.b.pierron)
Attachment #730941 - Flags: review?(mrosenberg)
Attachment #730941 - Flags: review?(dvander)
Rebase part 1 patch. Carrying r+.
Attachment #720211 - Attachment is obsolete: true
Attachment #730945 - Flags: review+
Attachment #730941 - Flags: review?(mrosenberg) → review+
Comment on attachment 730940 [details] [diff] [review]
Part 2: Add parallel VMCall interface

Review of attachment 730940 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, still some details that I want to check on ParallelResult.
No need to re-upload a patch for the nits, just answer the question and ask again for review.

::: js/src/ion/VMFunctions.h
@@ +22,5 @@
>      Type_Int32,
>      Type_Object,
>      Type_Value,
> +    Type_Handle,
> +    Type_ParallelResult

nit: Move generateVMWrapper modification to this patch.

@@ +210,5 @@
>  template <> struct TypeToDataType<HandleFunction> { static const DataType result = Type_Handle; };
>  template <> struct TypeToDataType<HandleScript> { static const DataType result = Type_Handle; };
>  template <> struct TypeToDataType<HandleValue> { static const DataType result = Type_Handle; };
>  template <> struct TypeToDataType<MutableHandleValue> { static const DataType result = Type_Handle; };
> +template <> struct TypeToDataType<ParallelResult> { static const DataType result = Type_ParallelResult; };

Q: In which bug/patch is ParallelResult added?  Can you add the Bug number as a dependency here?  I don't have it in my working copy yet.

@@ +407,5 @@
>  };
>  
>  #undef FUNCTION_INFO_STRUCT_BODY
>  
> +#undef FOR_EACH_ARGS_6

nice catch.
Attachment #730940 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #18)
> Comment on attachment 730940 [details] [diff] [review]
> Part 2: Add parallel VMCall interface
> 
> Review of attachment 730940 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice work, still some details that I want to check on ParallelResult.
> No need to re-upload a patch for the nits, just answer the question and ask
> again for review.
> 
> ::: js/src/ion/VMFunctions.h
> @@ +22,5 @@
> >      Type_Int32,
> >      Type_Object,
> >      Type_Value,
> > +    Type_Handle,
> > +    Type_ParallelResult
> 
> nit: Move generateVMWrapper modification to this patch.

Sure.

> 
> @@ +210,5 @@
> >  template <> struct TypeToDataType<HandleFunction> { static const DataType result = Type_Handle; };
> >  template <> struct TypeToDataType<HandleScript> { static const DataType result = Type_Handle; };
> >  template <> struct TypeToDataType<HandleValue> { static const DataType result = Type_Handle; };
> >  template <> struct TypeToDataType<MutableHandleValue> { static const DataType result = Type_Handle; };
> > +template <> struct TypeToDataType<ParallelResult> { static const DataType result = Type_ParallelResult; };
> 
> Q: In which bug/patch is ParallelResult added?  Can you add the Bug number
> as a dependency here?  I don't have it in my working copy yet.

It's a fairly old bug, https://bugzilla.mozilla.org/show_bug.cgi?id=801087. 

> 
> @@ +407,5 @@
> >  };
> >  
> >  #undef FUNCTION_INFO_STRUCT_BODY
> >  
> > +#undef FOR_EACH_ARGS_6
> 
> nice catch.
Depends on: 801087
Attachment #730940 - Flags: review?(nicolas.b.pierron)
Comment on attachment 730941 [details] [diff] [review]
Part 3: Dispatch IC stubs and ParallelGetPropertyIC

Review of attachment 730941 [details] [diff] [review]:
-----------------------------------------------------------------

I think there is still more work to needed to improve the design of the StubAttacher.  The idea is here, but it should not leak that much in the CodeGenerator.

generateVMWrapper modification sounds fine and can be integrated in the previous patch, as soon as I'll review it again.  Which means Trampoline-/arch/.cpp & MacroAssembler-/arch/.{h,cpp} except for moveWithPatch from MacroAssembler-arm.h, As this is more modifications than I expected in my previous review (of part 2), please re-upload a patch with these modification integrated to make it clear what is included or not.

=> r-, as they are multiple reviewers involved on this patch.

::: js/src/ion/CodeGenerator.cpp
@@ +92,5 @@
>  // OutOfLineUpdateCache, but we want to keep it visible inside the
>  // CodeGeneratorShared such as we can specialize inline caches in function of
>  // the architecture.
>  bool
> +CodeGeneratorShared::addRepatchCache(LInstruction *lir, size_t cacheIndex)

The StubAttacher logic is leaking here.

We should not have to duplicate this path for it.  I feel like there should only be one StubAttacher logic for each IonCache and that attachWhatever functions should inherit the StubAttacher logic from the IonCache.

So I guess we can make the StubAttacher interface public and provide a function to make a scoped instance of the StubAttacher out of one function of the IonCache, such as the StubAttacher logic can also be used to emit the first jump.

@@ +111,5 @@
> +
> +bool
> +CodeGeneratorShared::addDispatchCache(LInstruction *lir, size_t cacheIndex, Register scratch)
> +{
> +    JS_ASSERT(cacheDispatchLabels_.length() == cacheIndex);

This design  implies that you have to know how an IonCache is used internally to add a cache, and there is no compile-time failure.

@@ +121,5 @@
>      if (!addOutOfLineCode(ool))
>          return false;
>  
> +    if (!cacheDispatchLabels_.append(masm.moveWithPatch(ImmWord(uintptr_t(-1)), scratch)))
> +        return false;

Why do we need yet another table, knowing that, there is a 1:1 mapping with the Dispatch IC, that this is something owned by the IonCache (similar to the repatch label).

::: js/src/ion/Ion.cpp
@@ +545,5 @@
> +
> +    // The dispatch table should be aligned to the pointer size for atomic
> +    // writes.
> +    size_t cacheDispatchEntries = cachesUseDispatch ? cacheEntries : 0;
> +    size_t paddedCacheDispatchTableSize = AlignBytes(cacheDispatchEntries * sizeof(uint8_t *), PointerAlignment);

If this could not be stored in the cache entries, this should have been part of the RuntimeData storage, as IonCache content is.  There is no reason to add an extra storage space here if there is no iteration needed at runtime.

::: js/src/ion/arm/MacroAssembler-arm.h
@@ +558,5 @@
> +        CodeOffsetLabel label = moveWithPatch(imm, ScratchRegister);
> +        ma_push(ScratchRegister);
> +        return label;
> +    }
> +    CodeOffsetLabel moveWithPatch(ImmWord imm, Register dest) {

Do not imclude these modifications in the previous patch.

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +569,5 @@
> +    switch (f.failType()) {
> +      case Type_Object:
> +      case Type_Bool:
> +        // Called functions return bools, which are 0/false and non-zero/true
> +        masm.ma_cmp(r0, Imm32(0));

nit: masm.ma_tst() should be better here.
nit: masm.branchTestPtr whould be even better.

@@ +574,5 @@
> +        masm.ma_b(&failure, Assembler::Zero);
> +        break;
> +      case Type_ParallelResult:
> +        masm.ma_cmp(r0, Imm32(TP_SUCCESS));
> +        masm.ma_b(&failure, Assembler::NotEqual);

nit: masm.branchPtr would be better here.  Same thing for other arch.

::: js/src/ion/shared/Assembler-shared.h
@@ +522,5 @@
>  
>      void repoint(IonCode *code, MacroAssembler *masm = NULL);
>  
> +    bool isSet() {
> +		return raw_ != (uint8_t *) 0xdeadc0de;

nit: Tabs ?!
Attachment #730941 - Flags: review?(nicolas.b.pierron) → review-
Comment on attachment 730941 [details] [diff] [review]
Part 3: Dispatch IC stubs and ParallelGetPropertyIC

Review of attachment 730941 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonCaches.cpp
@@ +1239,5 @@
> +    return attachReadSlotWithAttacher(cx, attacher, ion, obj, holder, shape);
> +}
> +
> +bool
> +ParallelGetPropertyIC::tryAttachReadSlot(LockedJSContext &cx, IonScript *ion,

As I mentionned since comment 10, we should focus on reusing the code as much as possible instead of duplicating things for parallel array only use.  Such as bug fixed in Ion will also be fixed in Parallel Array code seamlessly.

This function is looking like a copy & paste of TryAttachNativeGetPropStub, I am sure that we can find a nice way abstract / wrap / isolate the specific pieces of each code.

@@ +1242,5 @@
> +bool
> +ParallelGetPropertyIC::tryAttachReadSlot(LockedJSContext &cx, IonScript *ion,
> +                                         HandleObject obj, HandlePropertyName name,
> +                                         uint8_t **stubEntry, bool *isCacheable)
> +{

<<<<<<

@@ +1257,5 @@
> +    // If the cache is idempotent, watch out for resolve hooks or non-native
> +    // objects on the proto chain. We check this before calling lookupProperty,
> +    // to make sure no effectful lookup hooks or resolve hooks are called.
> +    if (idempotent() && !checkObj->hasIdempotentProtoChain())
> +        return true;

>>>>>> identical to TryAttachNativeGetPropStub
<<<<<<

@@ +1266,5 @@
> +
> +    RootedShape shape(cx);
> +    RootedObject holder(cx);
> +    if (!js::LookupPropertyPure(checkObj, NameToId(name), holder.address(), shape.address()))
> +        return true;

>>>>>> Parallel Array specific code.
<<<<<<

@@ +1284,5 @@
> +        IsCacheableNoProperty(obj, holder, shape, pc, output())) {
> +        // With Proxies, we cannot garantee any property access as the proxy can
> +        // mask any property from the prototype chain.
> +        if (obj->isProxy())
> +            return true;

>>>>>> identical to TryAttachNativeGetPropStub
<<<<<<
  …
>>>>>> Not in Parallel Array code.
<<<<<<

@@ +1301,5 @@
> +        return true;
> +    }
> +
> +    *isCacheable = true;
> +

>>>>>> identical to TryAttachNativeGetPropStub
Comment on attachment 730940 [details] [diff] [review]
Part 2: Add parallel VMCall interface

Review of attachment 730940 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with Trampoline & Macro assembler modifications of part 3 (without moveWithPatch) as mentioned in comment 20.
Attachment #730940 - Flags: review?(nicolas.b.pierron) → review+
Carrying r+ after reorganization.
Attachment #730940 - Attachment is obsolete: true
Attachment #732202 - Flags: review+
Hopefully addressed all the points.

There are now 2 base classes: RepatchIonCache and DispatchIonCache, to minimize space requirements. The initial jump codegen is refactored into those base classes. RepatchStubAppender and DispatchStubPrepender are now scoped classes inside {Repatch,Dispatch}IonCache.

An OOL cache state struct AddCacheState is added to help thread through temporary state to the OOL's virtual 'bind' function.
Attachment #730941 - Attachment is obsolete: true
Attachment #730941 - Flags: review?(dvander)
Attachment #732207 - Flags: review?(nicolas.b.pierron)
Comment on attachment 732202 [details] [diff] [review]
Part 2: Add parallel VMCall interface

Review of attachment 732202 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonFrames.cpp
@@ +104,5 @@
>  
>  JSFunction *
>  IonFrameIterator::maybeCallee() const
>  {
> +    if ((isScripted() && (isFunctionFrame() || isParallelFunctionFrame())) || isNative())

Can Ion enter parallel Ion code without a VM call?
If not, why not storing this flag on the IonActivation?
Attachment #732202 - Flags: review+ → review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #25)
> Comment on attachment 732202 [details] [diff] [review]
> Part 2: Add parallel VMCall interface
> 
> Review of attachment 732202 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/ion/IonFrames.cpp
> @@ +104,5 @@
> >  
> >  JSFunction *
> >  IonFrameIterator::maybeCallee() const
> >  {
> > +    if ((isScripted() && (isFunctionFrame() || isParallelFunctionFrame())) || isNative())
> 
> Can Ion enter parallel Ion code without a VM call?
> If not, why not storing this flag on the IonActivation?

Because parallel execution doesn't have an IonActivation, and we need to know the frames for which we should pull out its IonScript * via parallelIon or via ion.
Comment on attachment 732202 [details] [diff] [review]
Part 2: Add parallel VMCall interface

Review of attachment 732202 [details] [diff] [review]:
-----------------------------------------------------------------

I think this patch is a good stand-alone patch.
Thanks for making it consistent.

::: js/src/ion/IonFrames.cpp
@@ +367,5 @@
> +    ForkJoinSlice *slice = ForkJoinSlice::Current();
> +    IonFrameIterator iter(slice->perThreadData->ionTop);
> +
> +    while (!iter.isEntry()) {
> +		parallel::Spew(parallel::SpewBailouts, "Bailing from VM reentry");

nit: Fix indentation, use space.

::: js/src/ion/IonFrames.h
@@ +86,5 @@
>          return CalleeTokenToScript(token);
>        case CalleeToken_Function:
>          return CalleeTokenToFunction(token)->nonLazyScript();
> +      case CalleeToken_ParallelFunction:
> +		return CalleeTokenToParallelFunction(token)->nonLazyScript();

nit: same here, tabs -> space.

::: js/src/ion/x86/MacroAssembler-x86.cpp
@@ +218,5 @@
> +    handleFailureWithHandler(JS_FUNC_TO_DATA_PTR(void *, ion::HandleException));
> +}
> +
> +void
> +MacroAssemblerX86::handleParallelFailure()

nit: Move these function to the IonMacroAssembler.
Attachment #732202 - Flags: review?(nicolas.b.pierron) → review+
I just realized the comments in part 3 are outdated with the new version. Those will be updated, but I'm not re-attaching a new version for now.
Update comments with pretty ascii art diagrams as inspired by the baseline IC diagrams.
Assignee: general → shu
Attachment #732207 - Attachment is obsolete: true
Attachment #732207 - Flags: review?(nicolas.b.pierron)
Attachment #733712 - Flags: review?(nicolas.b.pierron)
Comment on attachment 733712 [details] [diff] [review]
Part 3: Dispatch IC stubs and ParallelGetPropertyIC

Review of attachment 733712 [details] [diff] [review]:
-----------------------------------------------------------------

Good work, thanks for taking time to address all the remarks I made in previous reviews.

The comments you added on top of the Repatch & Dispatch classes are awesome, and I am not the only one to say that.
h4writer> nbp, ascii graphs <3

Can you provide a micro benchmark and report the performance improvement of this modification?
Also, I think you should upload a new patch (with the fix mentioned below) and ask gkw / decoder for feedback.

::: js/src/ion/CodeGenerator.cpp
@@ +40,5 @@
>    public OutOfLineCodeBase<CodeGenerator>,
>    public IonCacheVisitor
>  {
>    private:
> +    CodeGeneratorShared *codegen_;

nit: This should not be necessary. (see following nits)

@@ +53,3 @@
>          cacheIndex_(cacheIndex)
> +    {
> +        codegen->getCache(cacheIndex)->initializeAddCacheState(lir, &state_);

nit: Move this to CodeGeneratorShared::addCache.

@@ +57,3 @@
>  
>      void bind(MacroAssembler *masm) {
> +        codegen_->getCache(cacheIndex_)->bindInitialJump(*masm, state_);

nit: Move this to CodeGenerator::visitOutOfLineCache, and add a comment to mention that the bind operation is done in visit function.

@@ +4933,5 @@
> +    JS_ASSERT(ins->isGetPropertyCacheV() || ins->isGetPropertyCacheT());
> +    if (ins->isGetPropertyCacheV())
> +        addState->dispatchScratch = output_.scratchReg().gpr();
> +    else
> +        addState->dispatchScratch = ToRegister(ins->toGetPropertyCacheT()->temp());

nit: You don't need the dipatchSratch, see DispatchIonCache::emitInitialJump 's comment.

::: js/src/ion/CodeGenerator.h
@@ +242,5 @@
>      bool visitOutOfLineCache(OutOfLineUpdateCache *ool);
>  
> +    bool addGetPropertyCache(LInstruction *ins, RegisterSet liveRegs, Register objReg,
> +                             PropertyName *name, TypedOrValueRegister output,
> +                             bool allowGetters);

nit: Move this function declaration with the emit* function below.

::: js/src/ion/IonCaches.cpp
@@ +245,5 @@
>  };
>  
>  const ImmWord IonCache::StubAttacher::STUB_ADDR = ImmWord(uintptr_t(0xdeadc0de));
>  
> +class RepatchIonCache::RepatchStubAppender : public IonCache::StubAttacher

Nice, that's a long time since I saw an inner class definition out-side the outer-class definition.  I like the fact that this is not increasing the header size (from a compilation time perspective).

@@ +281,5 @@
> +RepatchIonCache::reset()
> +{
> +    IonCache::reset();
> +    PatchJump(initialJump_, fallbackLabel_);
> +    this->lastJump_ = initialJump_;

nit: remove " this-> ".

@@ +317,5 @@
> +    {
> +    }
> +
> +    void patchNextStubJump(MacroAssembler &masm, IonCode *code) {
> +        if (hasNextStubOffset_) {

nit: As this is a prepend, I guess it would be better to assert that the stub has a next jump?

@@ +325,5 @@
> +            CodeLocationJump nextStubJump(code, nextStubOffset_);
> +            PatchJump(nextStubJump, CodeLocationLabel(cache_.firstStub_));
> +
> +            // Update the dispatch table.
> +            cache_.firstStub_ = code->raw();

nit: Mention in the comment that this has to be the last action for attaching the stub, otherwise another thread might enter the code while it is being patched on this thread.

@@ +342,5 @@
> +{
> +    Register scratch = addState.dispatchScratch;
> +    dispatchLabel_ = masm.moveWithPatch(ImmWord(uintptr_t(-1)), scratch);
> +    masm.loadPtr(Address(scratch, 0), scratch);
> +    masm.jump(scratch);

nit: Add an arch specific MacroAssemblerArch::jumpWithDataPtr() instruction which does the same thing except that it use the SrachReg instead of an extra temp.  FYI, the ARM backend for jumpWithPatch is already doing so the ARM implementation should just delegate to jumpWithPatch.

@@ +360,5 @@
> +    dispatchLabel_.fixup(&masm);
> +    Assembler::patchDataWithValueCheck(CodeLocationLabel(code, dispatchLabel_),
> +                                       ImmWord(uintptr_t(&firstStub_)),
> +                                       ImmWord(uintptr_t(-1)));
> +    firstStub_ = fallbackLabel_.raw();

At this point the code on which the updateBaseAddress is called is the final address of the IonCache.

I'll recommend to add an assertion to make sure that the address of firstStub_ is aligned, as this is a critical point of failure which is hard to investigate and might likely be caused by unrelated changes.

@@ +971,5 @@
>      // Need to set correct framePushed on the masm so that exit frame descriptors are
>      // properly constructed.
>      masm.setFramePushed(ion->frameSize());
>  
> +    RepatchStubAppender attacher(*this);

Nice :)
This means that one cannot use 2 incompatible StubAttacher policy without having a compilation error.

@@ -1006,5 @@
>          checkObj = obj->getTaggedProto().toObjectOrNull();
>      }
>  
> -    if (!checkObj || !checkObj->isNative())
> -        return true;

This check has disappeared, is that expected?

@@ +1357,5 @@
> +    }
> +
> +    if (IsIdempotentAndHasSingletonHolder(*this, holder, shape))
> +        return true;
> +

nit: Split this function in 2, one would be CanAttachReadSlot, and attachReadSlot.  So far this has been the convention.

@@ +1403,5 @@
> +        // finer-grained locking, with one lock per cache. However, generating
> +        // new jitcode uses a global ExecutableAllocator tied to the runtime.
> +        LockedJSContext cx(slice);
> +
> +        if (cache.canAttachStub()) {

If the cache is idempotent and has too many stubs, the you will still lock after the execution of GetPropertyPure.  I guess we can check for this case before locking:

if (cache.idempotent() && !cache.canAttachStub())
    return TP_SUCCESS;

::: js/src/ion/IonCaches.h
@@ +64,5 @@
> +// which invokes a cache function to perform the operation. The cache function
> +// may generate a stub to perform the operation in certain cases (e.g. a
> +// particular shape for an input object) and attach the stub to existing
> +// stubs, forming a daisy chain of tests for how to perform the operation in
> +// different circumstances. The details of how stubs are linked up at

typo?: The details of how stubs are linked up >at< described …

@@ +207,5 @@
>      // Reset the cache around garbage collection.
>      virtual void reset();
>  
> +    // Destroy any extra resources the cache uses upon IonCode finalization.
> +    virtual void destroy();

nit: IonCode -> IonScript, IonCode can be confused with the IonCode generated for the stubs.

::: js/src/ion/Lowering.cpp
@@ +2146,5 @@
>              return false;
>          return assignSafepoint(lir, ins);
>      }
>  
> +    LGetPropertyCacheT *lir = new LGetPropertyCacheT(useRegister(ins->object()), temp());

nit: You don't need the dipatchSratch, see DispatchIonCache::emitInitialJump 's comment.

::: js/src/ion/arm/MacroAssembler-arm.h
@@ +577,2 @@
>          CodeOffsetLabel label = currentOffset();
> +        ma_movPatchable(Imm32(imm.value), dest, Always, hasMOVWT() ? L_MOVWT : L_LDR);

In the context of the parallel execution you don't want to make patchable code with MOVWT, but only with LDR.
Attachment #733712 - Flags: review?(nicolas.b.pierron) → review+
Addressed nits (specific reply to points will be in the following comment). f? gkw per nbp's request. Carrying r+.
Attachment #733712 - Attachment is obsolete: true
Attachment #734846 - Flags: review+
Attachment #734846 - Flags: feedback?(gary)
(In reply to Nicolas B. Pierron [:nbp] from comment #30)
> Comment on attachment 733712 [details] [diff] [review]
> Part 3: Dispatch IC stubs and ParallelGetPropertyIC
> 
> Review of attachment 733712 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Good work, thanks for taking time to address all the remarks I made in
> previous reviews.
> 
> The comments you added on top of the Repatch & Dispatch classes are awesome,
> and I am not the only one to say that.
> h4writer> nbp, ascii graphs <3
> 
> Can you provide a micro benchmark and report the performance improvement of
> this modification?

I'd love to do this, but no spare cycles to do that at the moment. I'd like to land this so that we can get working ICs in PJS, then explore the pros and cons of the different schemes in normal code later. 

> Also, I think you should upload a new patch (with the fix mentioned below)
> and ask gkw / decoder for feedback.
> 
> ::: js/src/ion/CodeGenerator.cpp
> @@ +40,5 @@
> >    public OutOfLineCodeBase<CodeGenerator>,
> >    public IonCacheVisitor
> >  {
> >    private:
> > +    CodeGeneratorShared *codegen_;
> 
> nit: This should not be necessary. (see following nits)

Good call, removed.

> @@ +4933,5 @@
> > +    JS_ASSERT(ins->isGetPropertyCacheV() || ins->isGetPropertyCacheT());
> > +    if (ins->isGetPropertyCacheV())
> > +        addState->dispatchScratch = output_.scratchReg().gpr();
> > +    else
> > +        addState->dispatchScratch = ToRegister(ins->toGetPropertyCacheT()->temp());
> 
> nit: You don't need the dipatchSratch, see DispatchIonCache::emitInitialJump
> 's comment.
>

Since x86 doesn't have a scratch register, I've added arch-specific lowering for GetPropertyCache to only get a new temp on x86, when the output type is MIRType_Double. On all other archs we just use the ScratchReg/ScratchRegister.
 

> @@ +342,5 @@
> > +{
> > +    Register scratch = addState.dispatchScratch;
> > +    dispatchLabel_ = masm.moveWithPatch(ImmWord(uintptr_t(-1)), scratch);
> > +    masm.loadPtr(Address(scratch, 0), scratch);
> > +    masm.jump(scratch);
> 
> nit: Add an arch specific MacroAssemblerArch::jumpWithDataPtr() instruction
> which does the same thing except that it use the SrachReg instead of an
> extra temp.  FYI, the ARM backend for jumpWithPatch is already doing so the
> ARM implementation should just delegate to jumpWithPatch.
> 

Unfortunately this wasn't possible, see above note.

> @@ -1006,5 @@
> >          checkObj = obj->getTaggedProto().toObjectOrNull();
> >      }
> >  
> > -    if (!checkObj || !checkObj->isNative())
> > -        return true;
> 
> This check has disappeared, is that expected?
> 

Oops! Good catch, not sure how that disappeared.

> @@ +1357,5 @@
> > +    }
> > +
> > +    if (IsIdempotentAndHasSingletonHolder(*this, holder, shape))
> > +        return true;
> > +
> 
> nit: Split this function in 2, one would be CanAttachReadSlot, and
> attachReadSlot.  So far this has been the convention.
> 

The convention seems to me have just been 'attachXXX' for most ICs except the sequential GetPropertyIC. For ParallelGetPropertyIC the logic is much simpler; I don't see the benefit in splitting here.

> ::: js/src/ion/arm/MacroAssembler-arm.h
> @@ +577,2 @@
> >          CodeOffsetLabel label = currentOffset();
> > +        ma_movPatchable(Imm32(imm.value), dest, Always, hasMOVWT() ? L_MOVWT : L_LDR);
> 
> In the context of the parallel execution you don't want to make patchable
> code with MOVWT, but only with LDR.

I think movw/t are just for moving into a register anyways. Registers aren't observable anyways, so this shouldn't be possible to race.
Comment on attachment 734846 [details] [diff] [review]
Part 3: Dispatch IC stubs and ParallelGetPropertyIC

Also flagging for more fuzzing feedback? from decoder.
Attachment #734846 - Flags: feedback?(choller)
Attachment #734906 - Flags: feedback?(gary)
Attachment #734906 - Flags: feedback?(choller)
Attachment #734906 - Attachment is obsolete: true
Attachment #734906 - Flags: feedback?(gary)
Attachment #734906 - Flags: feedback?(choller)
Attachment #734911 - Flags: feedback?(gary)
Attachment #734911 - Flags: feedback?(choller)
(In reply to Shu-yu Guo [:shu] from comment #32)
> (In reply to Nicolas B. Pierron [:nbp] from comment #30)
> > @@ +1357,5 @@
> > > +    }
> > > +
> > > +    if (IsIdempotentAndHasSingletonHolder(*this, holder, shape))
> > > +        return true;
> > > +
> > 
> > nit: Split this function in 2, one would be CanAttachReadSlot, and
> > attachReadSlot.  So far this has been the convention.
> > 
> 
> The convention seems to me have just been 'attachXXX' for most ICs except
> the sequential GetPropertyIC. For ParallelGetPropertyIC the logic is much
> simpler; I don't see the benefit in splitting here.

The convention is that attach functions are called when we are sure that we will generate a Stub, as you can see all attachX functions are starting with the creation of the macro assembler.

The reason is, that the Try-part gives a logical error (“no, it is not safe to attach a stub here”), where the attach method gives a critical error (“… something went wrong during the construction of the stub !@#$”)
(In reply to Nicolas B. Pierron [:nbp] from comment #36)
> (In reply to Shu-yu Guo [:shu] from comment #32)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #30)
> > > @@ +1357,5 @@
> > > > +    }
> > > > +
> > > > +    if (IsIdempotentAndHasSingletonHolder(*this, holder, shape))
> > > > +        return true;
> > > > +
> > > 
> > > nit: Split this function in 2, one would be CanAttachReadSlot, and
> > > attachReadSlot.  So far this has been the convention.
> > > 
> > 
> > The convention seems to me have just been 'attachXXX' for most ICs except
> > the sequential GetPropertyIC. For ParallelGetPropertyIC the logic is much
> > simpler; I don't see the benefit in splitting here.
> 
> The convention is that attach functions are called when we are sure that we
> will generate a Stub, as you can see all attachX functions are starting with
> the creation of the macro assembler.
> 
> The reason is, that the Try-part gives a logical error (“no, it is not safe
> to attach a stub here”), where the attach method gives a critical error (“…
> something went wrong during the construction of the stub !@#$”)

Well, take GetElementIC::attachGetProp for instance. But fine, I'll split ParallelGetPropertyIC.
Attachment #734846 - Flags: feedback?(gary)
Attachment #734846 - Flags: feedback?(choller)
Comment on attachment 734911 [details] [diff] [review]
Rolled up patch based off 6be07c836e6d

I didn't find anything particularly bad with this patch after a round of overnight fuzzing -> feedback+.
Attachment #734911 - Flags: feedback?(gary) → feedback+
https://hg.mozilla.org/mozilla-central/rev/c84256093802
https://hg.mozilla.org/mozilla-central/rev/f6e861adb467
https://hg.mozilla.org/mozilla-central/rev/286594159989
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Attachment #734911 - Flags: feedback?(choller)
You need to log in before you can comment on or make changes to this bug.