Closed Bug 862923 Opened 11 years ago Closed 9 years ago

Stop-the-world garbage collection in PJS

Categories

(Core :: JavaScript Engine, defect)

22 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: nmatsakis, Assigned: pnkfelix)

References

Details

(Whiteboard: PJS)

Attachments

(2 files, 4 obsolete files)

Currently parallel execution aborts when a GC is requested.  It would be better to stop-the-world and collect the shared state.  

Note that there are several orthogonal extension that is possible:
- one can also imagine collecting the worker's local state independently.
- one can imagine allowing the other threads to continue while the GC is in progress, only halting if they too would request a collection.

The former point is not as useful as collecting the shared state, since typically a single PJS operation does not produce enough garbage on its own to merit a collection.  Rather, you are collecting the garbage that has accumulated until this point.

The second point is an optimization that would make sense as a later step.
Current status: implemented on branch, preparing patch.
Assignee: general → nmatsakis
Blocks: PJS
Whiteboard: PJS
Depends on: 865028
This work was primarily done by pnkfelix, though I've modified it some as well.  General strategy is described in a comment, but here is another summary:

(1) intercept GC requests during PJS sections and redirect to PJS.
(2) PJS will then stop-the-world and retrigger; because world is stopped, PJS is not considered "active" and thus the GC trigger proceeds normally.
(3) we register a list of additional "stack extents" for the conservative scanner, one for each of the worker threads.
Attachment #748184 - Flags: review?(wmccloskey)
Could you use more context in the diff? It makes it easier to review. You need to add -U 8 to your .hgrc file:
  https://wiki.mozilla.org/JavaScript:New_to_SpiderMonkey#.hgrc_file
Same as before but with -U8
Attachment #748184 - Attachment is obsolete: true
Attachment #748184 - Flags: review?(wmccloskey)
Attachment #749085 - Flags: review?(wmccloskey)
Comment on attachment 749085 [details] [diff] [review]
Extend PJS to support stop-the-world GC.

Review of attachment 749085 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good. I have a few high-level questions.

First, I'm concerned that you're not saving the register state on the workers like we do on the main thread. It seems possible that some GC pointers get stored in callee-saved registers that remain unused by the rendezvous code (especially on x64).

Regarding the rendezvous protocol, what happens if the main thread finishes and then the other threads try to do a rendezvous after that? It seems like they'll wait forever.

Finally, I'd like the tracking of stack extents to be refactored.
- Can you move conservativeGC from the runtime to PerThreadData?
- Can you try to use more of our existing code to save the stack extents? If not, can you refactor the existing code so that's possible?
- I'd like it if we could get rid of the StackExtents vector and instead just have a list of PerThreadData pointers or something. Then the GC could iterate over those when marking the stack.

::: js/src/builtin/TestingFunctions.cpp
@@ +907,5 @@
>      return true;
>  }
>  
>  JSBool
> +js::testingFunc_parallelJSActive(JSContext *cx, unsigned argc, jsval *vp)

Given how this is used, it might make sense to move it to ForkJoin.cpp and call it CheckParallelJSActive or something.

::: js/src/gc/RootMarking.cpp
@@ +270,5 @@
>  }
>  
>  #ifndef JSGC_USE_EXACT_ROOTING
> +static const uintptr_t *
> +MarkIonFramesIfPresent(JSTracer *trc, const uintptr_t *begin, uint8_t *ionTop, ion::IonActivation *ionActivation)

This seems like a weird way to break up this function. Can you keep just keep it the way it was (as MarkRangeConservativelyAndSkipIon) but additionally pass in an |ion::IonActivationIterator &| that would be used to iterate over the ion frames?

@@ +366,5 @@
> +
> +    // Mark additional stack extents from (paused) non-main threads.
> +    if (rt->extraExtents) {
> +        gc::StackExtent *extraExtents = rt->extraExtents->head;
> +        while (extraExtents) {

Please use a for loop for this.

::: js/src/ion/VMFunctions.cpp
@@ +21,5 @@
>  #include "jsboolinlines.h"
>  #include "jsinterpinlines.h"
>  
> +#include "ion/IonFrames-inl.h" // for GetTopIonJSScript
> +#include "vm/StringObject-inl.h"

Why is this new #include needed?

::: js/src/jscntxt.cpp
@@ +57,5 @@
>  #include "jscntxtinlines.h"
>  #include "jscompartment.h"
>  #include "jsobjinlines.h"
> +#include "jsinferinlines.h"
> +#include "jstypedarrayinlines.h"

Why are these new #includes needed?

::: js/src/jscntxt.h
@@ +635,5 @@
>  
>      /*
> +     * If non-null, then points to a collection of stack extents that
> +     * belong to the worker threads in parallel sections.  These are
> +     * to be used, for example, as additional ranges to scan when the GC

Maybe take out "for example".

@@ +1191,5 @@
>  
>      /* Always preserve JIT code during GCs, for testing. */
>      bool                alwaysPreserveCode;
>  
> +    /* Always preserve JIT code during GCs while in StopTheWorld ParallelDo. */

Please remove "Always". Also, it seems like "stop-the-world" should apply to the GC, not the ParallelDo.

::: js/src/jsgc.cpp
@@ +2589,5 @@
>          return false;
>  
>      if (comp->rt->alwaysPreserveCode)
>          return true;
> +    if (comp->rt->preserveCodeDueToParallelDo)

I'm a bit worried that this branch comes too late. Will we crash if we return false from this function when preserveCodeDueToParallelDo is set? I don't know enough about Ion invalidation.

The gcShouldCleanUpEverything business is a heuristic, so it's safe to move this check before that. However, I don't know what the !comp->zone()->types.inferenceEnabled is for.

::: js/src/vm/ForkJoin.cpp
@@ +371,5 @@
>  
> +class AutoMarkWorldStoppedForGC
> +{
> +  private:
> +    ForkJoinSlice &threadCx;

We try to spell out full names for fields, so this should be threadContext.

@@ +384,5 @@
> +        threadCx.shared->cx_->runtime->preserveCodeDueToParallelDo = true;
> +    }
> +
> +    ~AutoMarkWorldStoppedForGC()
> +    {

The brace should be on the previous line since this method is indented. It only goes on its own line if it won't need to be indented.

@@ +389,5 @@
> +        threadCx.shared->worldStoppedForGC_ = false;
> +        threadCx.shared->cx_->mainThread().suppressGC++;
> +        threadCx.shared->cx_->runtime->preserveCodeDueToParallelDo = false;
> +    }
> +

No blank line here please.

@@ +913,5 @@
> +    // Move the per-thread arenas into the main zone list.
> +    transferArenasToZone();
> +
> +    // Normally GCs are handled in a stop-the-world fashion during the
> +    // op callback (`ForkJoinShared::check()`). But sometimes the

If you want to distinguish some expression, please use |expr| rather than `expr`. That's what we do elsewhere in the code.

@@ +915,5 @@
> +
> +    // Normally GCs are handled in a stop-the-world fashion during the
> +    // op callback (`ForkJoinShared::check()`). But sometimes the
> +    // `check()` method might not get a chance to run, or it might opt
> +    // to abort par exec instead, so check the `gcRequested_` flag

Please spell out parallel execution.

@@ +937,2 @@
>  {
> +    JS_ASSERT(ForkJoinSlice::Current() == NULL);

This should be written !ForkJoinSlice::Current().

@@ +937,4 @@
>  {
> +    JS_ASSERT(ForkJoinSlice::Current() == NULL);
> +
> +    // stop-the-world GC may still be sweeping; let that finish so

Please capitalize stop.

@@ +946,5 @@
> +        zone->adoptWorkerAllocator(allocators_[i]);
> +}
> +
> +void
> +ForkJoinShared::triggerGCIfRequested() {

This brace should go on its own line.

@@ +948,5 @@
> +
> +void
> +ForkJoinShared::triggerGCIfRequested() {
> +    // this function either executes after the fork-join section ends
> +    // or when the world is stopped:

Capitalize "this" and end with a period.

@@ +954,3 @@
>  
>      if (gcRequested_) {
> +        if (gcZone_ == NULL)

Please change to |if (gcZone_)| and flip the order of the cases.

@@ +1015,5 @@
> +    // This works in concert with ForkJoinSlice::recordStackExtent
> +    // to establish the stack extent for this slice.
> +    slice.recordStackBase(myStackTop);
> +
> +    js::PerThreadData *pt = slice.perThreadData;

Can't you just use perThread, which is already a local variable?

@@ +1021,2 @@
>      JS_ASSERT(fun->isFunction());
> +    RootedFunction callee(cx_, fun->toFunction());

I thought you weren't supposed to use cx_ here. Shouldn't it be perThread?

@@ +1044,5 @@
>  
>      Spew(SpewOps, "Down");
>  }
>  
> +struct AutoInstallForkJoinStackExtents : public gc::StackExtents

It seems very odd to use inheritance here. Why not make StackExtents be a field?

@@ +1096,5 @@
> +
> +        if (gcRequested_ && rt->isHeapBusy()) {
> +            // Cannot call GCSlice when heap busy, so abort.  Easier
> +            // right now to abort rather than prove it cannot arise,
> +            // and safer for short-term than asserting !isHeapBusy.

I don't understand why there's any danger that isHeapBusy() would be true here. That would only happen if we're actually collecting right now (i.e., if the Collect function is on the stack) or if something else is tracing the heap (like JS_DumpHeap). That doesn't seem possible. If you agree, please just remove this code.

@@ +1099,5 @@
> +            // right now to abort rather than prove it cannot arise,
> +            // and safer for short-term than asserting !isHeapBusy.
> +            // Note that the `gcRequested_` flag remains true; this
> +            // will be observed by the sequential code in
> +            // `ForkJoinShared::execute()`

No backticks, and please end with a period.

@@ +1105,5 @@
> +            records_->setCause(ParallelBailoutHeapBusy, NULL, NULL, NULL);
> +            return false;
> +        }
> +
> +        // (1). Initiaize the rendezvous and record stack extents.

Initialize

@@ +1114,5 @@
> +
> +        // (2).  Note that because we are in a STW section, calls to
> +        // js::TriggerGC() etc will not re-invoke
> +        // ForkJoinSlice::requestGC().
> +        triggerGCIfRequested();

I think it would be clearer if this function actually performed the GC. Then you could avoid the gcIsNeeded check below, which is a bit confusing.

@@ +1119,5 @@
> +
> +        // (2b) Run the GC if it is required.  This would occur as
> +        // part of js_InvokeOperationCallback(), but we want to avoid
> +        // an incremental GC.
> +        if (rt->gcIsNeeded) {

No braces here.

@@ +1370,5 @@
> +    extent.ionTop        = ptd->ionTop;
> +    extent.ionActivation = ptd->ionActivation;
> +}
> +
> +

Extra space here.

@@ +1376,5 @@
> +{
> +    // This establishes the base, and ForkJoinSlice::recordStackExtent the tip,
> +    // of the stack address-range of this thread for the GC to scan.
> +#if JS_STACK_GROWTH_DIRECTION > 0
> +        this->extent->stackMin = baseAddr;

This should be indented 4 spaces.

::: js/src/vm/ForkJoin.h
@@ +164,5 @@
> +// If the GC *is* triggered during parallel execution, it will
> +// redirect to the current ForkJoinSlice() and invoke requestGC() (or
> +// requestZoneGC).  This will set a flag on the shared state
> +// (gcRequested_) and then cause an interrupt.  Once the interrupt
> +// occurs, we observe the `gcRequested_` flag and force a

Please fix the backticks here.

@@ +314,5 @@
>  
>      // When the code would normally trigger a GC, we don't trigger it
>      // immediately but instead record that request here.  This will
>      // cause |ExecuteForkJoinOp()| to invoke |TriggerGC()| or
> +    // |TriggerZoneGC()| as appropriate once the par. sec. is

Please write parallel section instead of par. sec.

@@ +317,5 @@
>      // cause |ExecuteForkJoinOp()| to invoke |TriggerGC()| or
> +    // |TriggerZoneGC()| as appropriate once the par. sec. is
> +    // complete. This is done because those routines do various
> +    // preparations that are not thread-safe, and because the full set
> +    // of arenas is not available until the end of the par. sec.

Same here.

@@ +322,2 @@
>      void requestGC(JS::gcreason::Reason reason);
> +    void requestZoneGC(JS::Zone *compartment, JS::gcreason::Reason reason);

The argument should be called |zone|, not |compartment|.

@@ +345,5 @@
>      void releaseContext();
>  
>      // Check the current state of parallel execution.
>      static inline ForkJoinSlice *Current();
> +    bool InWorldStoppedForGCSection();

This is a pretty crazy name. How about isStoppedForGC (note that method names should be lowercase unless static).

@@ +365,5 @@
>  #endif
>  
>      ForkJoinShared *const shared;
> +
> +private:

private: and public: should be indented two spaces. Also, the stuff here will already be private, so this line can be removed.

@@ +369,5 @@
> +private:
> +    // Stack base and tip of this slice's thread, for Stop-The-World GC.
> +    gc::StackExtent *extent;
> +
> +public:

Identation. Also, can this stuff just go in the previous public section?

@@ +414,5 @@
>      JSContext *operator->() { return cx_; }
>  };
>  
>  static inline bool
> +ParallelJSActive()

I actually find the name InParallelSection more descriptive than ParallelJSActive, but I'll leave the choice up to you.

@@ +419,4 @@
>  {
>  #ifdef JS_THREADSAFE
>      ForkJoinSlice *current = ForkJoinSlice::Current();
> +    return current != NULL && !current->InWorldStoppedForGCSection();

It's going to be a problem for gcc that this function is inline but InWorldStoppedForGCSection is not. Just warning you.

::: js/src/vm/StackExtents.h
@@ +45,5 @@
> +    void clearStackExtent() {
> +        stackMin = 0;
> +        stackEnd = 0;
> +        ionTop = 0;
> +        ionActivation = 0;

I guess these should be in #ifdef JS_ION.

@@ +52,5 @@
> +
> +/*
> + * A collection of stack extents.
> + */
> +struct StackExtents

Does this really need its own class?
Attachment #749085 - Flags: review?(wmccloskey)
Assignee: nmatsakis → pnkfelix
(swiping this back from Niko, with his permission, to free up his plate.  I hope to be able to incorporate billm's feedback.)
(rebased patch.  have not yet addressed bill's comments.)
Attachment #749085 - Attachment is obsolete: true
I meant to post some additional explanation of the patch. Here it is, transcribed from IRC:

"It wasn't a big explanation. Basically just that while it is 
not possible for a rendezvous to begin if the main thread has exited, 
what would happen I think is that all the other threads would 
repeatedly invoke ::check() (because the interrupt flag would never be cleared)
until they eventually terminate normally or abort for some other reason and that is obviously not good.

"So this way the main thread should (relatively quickly) set the flag,
causing the other threads to abort should they be interrupted later. 
In other words, once the interrupt flag is set, the main thread should 
"relatively quickly" either exit (and set mainThreadExited_) or 
initiate a rendezvous. I opted not to have the workers spin waiting 
for one of those two flags to become true, but rather return and 
do (hopefully) useful work in the meantime, since they should just 
keep re-entering the check method. I don't know whether that's the 
right call really."

Probably we should extend the comment I added with some of this text.
Attachment #754975 - Flags: review?(pnkfelix) → review+
Variation on mandelbrot.js that allocates 32-elem array per kernel iteration.  (That change to the benchmark is what motivated us to put in Stop-the-World GC support into PJS in the first place.)

This variation also uses computes a 512x512 array rather than a 1024x1024 array (the mandelbrot.js in parjs-benchmarks/ is 1024x1024).  This smaller iteration space seems to cause the GC bailouts more reliably than 1024x1024, from what I can tell.
Comment on attachment 760421 [details]
mandelbrot-alloc32.js: allocating variation of mandelbrot benchmark

Putting in the patch pAv1 from Bug 881223 onto mozilla-inbound (to get visual feedback on the bailouts) but before applying the STW patch, here's the performance measured by the PJS run.sh script (after the transcript you'll see post STW patch)

% ./run.sh ../../../objdir-opt-js/js mandelbrot-alloc32.js
../../../objdir-opt-js/js -e 'var libdir="./"; var MODE="compare";' mandelbrot-alloc32.js
Warming up sequential runs
.
Measuring sequential runs
Measuring parallel runs
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: unsupported at self-hosted:2933
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: requested zone GC at mandelbrot-alloc32.js:0
Checking correctness
MANDELBROT SEQUENTIAL MEASUREMENT 0: 558
MANDELBROT SEQUENTIAL MEASUREMENT 1: 584
MANDELBROT SEQUENTIAL MEASUREMENT 2: 620
MANDELBROT SEQUENTIAL AVERAGE: 587.3333333333334
MANDELBROT PARALLEL MEASUREMENT 0: 396
MANDELBROT PARALLEL MEASUREMENT 1: 269
MANDELBROT PARALLEL MEASUREMENT 2: 307
MANDELBROT PARALLEL AVERAGE  : 324
MANDELBROT SEQ/PAR RATIO     : 1.8127572016460907
MANDELBROT PAR/SEQ RATIO     : 0.5516458569807037
MANDELBROT IMPROVEMENT       : 44%


Here's the results after the STW patch:

% ./run.sh ../../../objdir-opt-js/js mandelbrot-alloc32.js
../../../objdir-opt-js/js -e 'var libdir="./"; var MODE="compare";' mandelbrot-alloc32.js
Warming up sequential runs
.
Measuring sequential runs
Measuring parallel runs
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: unsupported at self-hosted:2933
mandelbrot-alloc32.js:70:2 warning: Bailed out of parallel operation: unsupported at self-hosted:2933
Checking correctness
MANDELBROT SEQUENTIAL MEASUREMENT 0: 581
MANDELBROT SEQUENTIAL MEASUREMENT 1: 593
MANDELBROT SEQUENTIAL MEASUREMENT 2: 617
MANDELBROT SEQUENTIAL AVERAGE: 597
MANDELBROT PARALLEL MEASUREMENT 0: 226
MANDELBROT PARALLEL MEASUREMENT 1: 247
MANDELBROT PARALLEL MEASUREMENT 2: 212
MANDELBROT PARALLEL AVERAGE  : 228.33333333333334
MANDELBROT SEQ/PAR RATIO     : 2.6145985401459853
MANDELBROT PAR/SEQ RATIO     : 0.3824678950307091
MANDELBROT IMPROVEMENT       : 61%



For perspective, here is the run on a 512x512 version of mandelbrot.js on this machine (a MacBook Pro 2.6 Ghz core i7 with 16 GB ram), so you can see how much speedup one might hope for.

% ./run.sh ../../../objdir-opt-js/js mandelbrot.js
../../../objdir-opt-js/js -e 'var libdir="./"; var MODE="compare";' mandelbrot.js
Warming up sequential runs
.
Measuring sequential runs
Measuring parallel runs
mandelbrot.js:42:2 warning: Bailed out of parallel operation: unsupported at self-hosted:2933
Checking correctness
MANDELBROT SEQUENTIAL MEASUREMENT 0: 227
MANDELBROT SEQUENTIAL MEASUREMENT 1: 224
MANDELBROT SEQUENTIAL MEASUREMENT 2: 225
MANDELBROT SEQUENTIAL AVERAGE: 225.33333333333334
MANDELBROT PARALLEL MEASUREMENT 0: 40
MANDELBROT PARALLEL MEASUREMENT 1: 42
MANDELBROT PARALLEL MEASUREMENT 2: 48
MANDELBROT PARALLEL AVERAGE  : 43.333333333333336
MANDELBROT SEQ/PAR RATIO     : 5.2
MANDELBROT PAR/SEQ RATIO     : 0.19230769230769232
MANDELBROT IMPROVEMENT       : 80%
Comment on attachment 749085 [details] [diff] [review]
Extend PJS to support stop-the-world GC.

Review of attachment 749085 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +2589,5 @@
>          return false;
>  
>      if (comp->rt->alwaysPreserveCode)
>          return true;
> +    if (comp->rt->preserveCodeDueToParallelDo)

This was indeed a problem.  I've spent some days exploring our options here, namely either:

1. Turn the preserveCodeDueToParallelDo into a heuristic, and make the ForkJoin threads able to deal with their jit'ed code getting discarded (or integrate them with the Ion invalidation protocol, which delays discarding the code by building up a reference-count tracking the activations from the stack that remain to be unwound).

2. Make preserveCodeDueToParallelDo a strict requirement that the GC will respect, and change the logic here so that it takes precedence.

The first option was pretty attractive to me, as a more flexible/robust option.  But it turned out to be a lot harder than I expected, and there are hidden ratholes (e.g. incrementing the refcount on IonScript will make its gc-tracer assume that it really *is* invalidated, and not merely being kept alive for other reasons).

Anyway, at this point I think something like option 2 is the better choice.  The incarnation of the flag that's on the patch won't work, because associating the flag with the whole runtime ends up conflicting with the actions of nukeTypes.  I think the right answer here is to move the |preserveCodeDueToParallelDo| flag to the compartment (or IonCompartment), where I think one can prove that inferenceEnabled will never be set to false (and for now I will put in an assertion for it).
Okay, I think I addressed all of billm's notes.

I think billm might be out for a few days, so I posted a feedback-request for niko in the meantime, in case he wants to look over this.
Attachment #754850 - Attachment is obsolete: true
Attachment #754975 - Attachment is obsolete: true
Attachment #767702 - Flags: review?(wmccloskey)
Attachment #767702 - Flags: feedback?(nmatsakis)
billm: gentle review ping
Comment on attachment 767702 [details] [diff] [review]
patch A v3: Extend PJS to support stop-the-world GC (incorporated review feedback)

Review of attachment 767702 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Ion.cpp
@@ +2093,5 @@
>          Assembler::patchWrite_NearCall(osiPatchPoint, invalidateEpilogue);
>      }
>  
> +    // We (deliberately) only invalidate activation stack of main
> +    // thread, not stacks of Parallel JS threads.

I think this comment is slightly misleading. We should not do any invalidations if parallel code is active, since the main thread is reused as one of the workers.

::: js/src/vm/ForkJoin.cpp
@@ +1633,5 @@
> +
> +        // (3). Invoke the callback and abort if it returns false.
> +        if (!js_InvokeOperationCallback(cx_)) {
> +            slice.bailoutRecord->setCause(ParallelBailoutInterrupt, NULL, NULL, NULL);
> +            setAbortFlag(true);

As we discussed on IRC, it is probably unsafe to invoke the op callback, even if the world is marked as stopped for GC. It might for example cause TI invalidations or other negative things. Unfortunately, we can't currently distinguish a user-requested interrupt from other interrupts, and we probably don't want to fail to invoke the slow script dialog, so we'd need to add an additional flag to the JSRuntime.
Attachment #767702 - Flags: feedback?(nmatsakis) → feedback+
Comment on attachment 767702 [details] [diff] [review]
patch A v3: Extend PJS to support stop-the-world GC (incorporated review feedback)

Review of attachment 767702 [details] [diff] [review]:
-----------------------------------------------------------------

This version really looks much nicer. I've attached the remaining review comments; most of them are nits.

However, I'm starting to question why we really need this. Given that it doesn't collect the memory allocated by the currently running threads, it seems like its main purpose is less about doing GC than about avoid bailouts. If that's the case, I think there are much simpler ways to solve the problem. One idea is to stick calls to js::MaybeGC at the beginning and end of every ParallelDo. That would make it more likely that we do any needed GCs before entering the parallel code, so we never take bailouts. Can you see if this works?

::: js/src/gc/RootMarking.cpp
@@ +289,5 @@
>      MarkRangeConservatively(trc, i, end);
>  }
>  
>  static JS_NEVER_INLINE void
> +MarkConservativeStackRoots(JSTracer *trc, PerThreadData *thread, bool useSavedRoots)

How about calling this MarkConservativeThreadStackRoots?

@@ +294,5 @@
>  {
>      JSRuntime *rt = trc->runtime;
>  
>  #ifdef DEBUG
>      if (useSavedRoots) {

We'll never have saved roots for anything except the main thread, so I'd rather this chunk of code be moved out to the calling function (the one that's not per-thread).

@@ +320,5 @@
>      }
>  
>      uintptr_t *stackMin, *stackEnd;
>  #if JS_STACK_GROWTH_DIRECTION > 0
> +    stackMin = thread.nativeStackBase;

I think you mean thread->nativeStackBase.

@@ +330,5 @@
>  
>      JS_ASSERT(stackMin <= stackEnd);
> +
> +    ion::JitActivationIterator iter(thread);
> +    MarkRangeConservativelyAndSkipIon(trc, rt, stackMin, stackEnd, iter);

It seems odd to me to pass in a JitActivationIterator here. Can you just pass in the thread and have the callee create the iterator?

@@ +339,5 @@
> +static JS_NEVER_INLINE void
> +MarkConservativeStackRoots(JSTracer *trc, bool useSavedRoots)
> +{
> +    JSRuntime *rt = trc->runtime;
> +    // Mark stack extents from this main thread and (paused) non-main threads.

Need an empty line before the comment.

@@ +342,5 @@
> +    JSRuntime *rt = trc->runtime;
> +    // Mark stack extents from this main thread and (paused) non-main threads.
> +    MarkConservativeStackRoots(trc, &rt->mainThread, useSavedRoots);
> +    if (rt->threads) {
> +        for (int i=0, len=rt->threads->length(); i < len; i++) {

for (size_t i = 0; i < rt->threads->length(); i++) {

::: js/src/gc/Verifier.cpp
@@ +246,3 @@
>      AutoCopyFreeListToArenas copy(rt);
>  
>      ConservativeGCData *cgcd = &rt->conservativeGC;

Can you please verify that this code compiles? It looks like this field was removed.

::: js/src/ion/IonCompartment.h
@@ +230,5 @@
>      ReadBarriered<IonCode> stringConcatStub_;
>  
> +    // If true, then this compartment holds jit code for a paused ParallelDo
> +    // which must be preserved across a GC.
> +    bool mustPreserveCodeDueToParallelDo_;

This is a bit of a tongue twister. How about mustPreserveParallelDoCode_?

@@ +236,4 @@
>      IonCode *generateStringConcatStub(JSContext *cx);
>  
>    public:
> +    void setMustPreserveCodeDueToParallelDo(bool val) {

Same here.

::: js/src/jsapi.cpp
@@ -743,5 @@
>      execAlloc_(NULL),
>      bumpAlloc_(NULL),
>      ionRuntime_(NULL),
>      selfHostingGlobal_(NULL),
> -    nativeStackBase(0),

I think we still want to initialize this to zero.

::: js/src/jscntxt.h
@@ +667,5 @@
>       */
>      js::PerThreadData   mainThread;
>  
>      /*
> +     * NULL or Per-thread data for non-main threads created by

"Per" shouldn't be capitalized.

@@ -662,5 @@
>      js::PerThreadData   mainThread;
>  
>      /*
> -     * If non-zero, we were been asked to call the operation callback as soon
> -     * as possible.

Please don't remove this comment.

@@ +674,2 @@
>       */
> +    js::Vector<js::PerThreadData*, 16> *threads;

Does this have to be a pointer?

::: js/src/jscompartment.cpp
@@ +597,5 @@
>  
>  bool
>  JSCompartment::hasScriptsOnStack()
>  {
> +    for (ActivationIterator iter(&rt->mainThread); !iter.done(); ++iter) {

This change seems unnecessary.

::: js/src/jsgc.cpp
@@ +2579,5 @@
>  ShouldPreserveJITCode(JSCompartment *comp, int64_t currentTime)
>  {
> +#ifdef JS_ION
> +    if (comp->ionCompartment() && comp->ionCompartment()->mustPreserveCodeDuringGC())
> +    {

This brace should go on the previous line.

@@ +2588,1 @@
>      if (comp->rt->gcShouldCleanUpEverything || !comp->zone()->types.inferenceEnabled)

Please put a blank line before this line.

::: js/src/jspubtd.h
@@ +368,5 @@
>      SkipRoot *skipGCRooters;
>  #endif
>  
> +    /* Base address of the native stack for the current thread. */
> +    uintptr_t nativeStackBase;

Why does this have to be a friend field? It would be better if it could go in PerThreadData.

::: js/src/vm/ForkJoin.cpp
@@ +341,5 @@
>  
>      // The main thread has requested a rendezvous.
>      volatile bool rendezvous_;
>  
> +    // True if a worker requested a GC

This should end in a period.

@@ +347,5 @@
> +
> +    // True when the main thread exits, never becomes false again.
> +    volatile bool mainThreadExited_;
> +
> +    // True if all non-main threads have stopped for the main thread to GC

Same here.

@@ +1499,5 @@
>          gcZone_ = NULL;
> +
> +        JS_ASSERT_IF(alsoRunGC, cx_->runtime()->gcIsNeeded);
> +        if (alsoRunGC) {
> +            GC(cx_->runtime(), GC_NORMAL, gcReason_);

You should call PrepareForFullGC(cx_->runtime()) before this.

@@ +1562,3 @@
>  
>      RootedObject fun(perThread, fun_);
> +    JS_ASSERT(fun->is<JSFunction>()); 

Shouldn't end in a space.

@@ +1597,5 @@
>      if (abort_)
>          return false;
>  
>      if (slice.isMainThread()) {
> +        // We are the main thread: therefore we must

We use 80-character lines for comments. This comment seems to be much shorter. The other comments in this function seem pretty short too. Please fix.

@@ +1638,3 @@
>              return false;
>          }
> +

No blank line here.

@@ +1641,1 @@
>      } else if (rendezvous_) {

The way the rendezvous works is kinda strange. As I understand it, some thread will set rt->interrupt. At that point, all the threads will start running this function pretty often. Until the main thread gets to its check, the other threads' check() functions will just return. It would make more sense to me if they assume that the main thread (if it hasn't exited) will want to do rendezvous. We'd have to be a little careful not to race with the main thread exiting, but it seems more sensible to me this way.
Attachment #767702 - Flags: review?(wmccloskey)
(In reply to Niko Matsakis [:nmatsakis] from comment #0)
> Currently parallel execution aborts when a GC is requested.  It would be
> better to stop-the-world and collect the shared state.  

This work probably becomes moot once bug 933313 lands, but I just wanted to make
a note about the underlying assumptions, for posterity.

> Note that there are several orthogonal extension that is possible:
> - one can also imagine collecting the worker's local state independently.

This is what we do in bug 933313.

> - one can imagine allowing the other threads to continue while the GC is in
> progress, only halting if they too would request a collection.

Effectively this is what bug 933313 does as well.

> The former point is not as useful as collecting the shared state, since
> typically a single PJS operation does not produce enough garbage on its own
> to merit a collection.  Rather, you are collecting the garbage that has
> accumulated until this point.
> 
> The second point is an optimization that would make sense as a later step.

I think that a central fact from investigations in bug 933313 is that individual worker threads easily allocate a tremendous amount of garbage and also suriviving storage.

The main reason is that a lot of codes do not operate on scalars, which SM optimizes well, but on triples/quads (xyz, xyzw) represented as arrays or objects.  Those data structures are all heap-allocated at the present time and tend to be very short-lived.  Escape analysis might lighten that load, but in practice we might still see a lot of it.  Naive codes in addition will have "constant" triples/quads represented as array literals, and these will be allocated at a high rate if they're defined inside loops etc.

A secondary reason is that with today's core count, many element values tend to be computed per worker, so the per-element allocation volume adds up to a large per-worker volume.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: