Closed Bug 933313 Opened 6 years ago Closed 5 years ago

PJS: Integrate with generational garbage collection

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: nmatsakis, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 54 obsolete files)

182.59 KB, patch
Details | Diff | Splinter Review
Once generational GC lands, we should consider how it could be used in PJS. The current PJS scheme for memory allocation is to create one arena per thread. A more efficient alternative might be to have multiple nurseries, one per thread. An appealing aspect of the current APIs is that, at the end of every iteration, the nursery can be reset *if the output type is scalar*. Even if the output type is *not* scalar, the only way for an object to escape is to be stored in the output array, so we could take the opportunity to scan and collect the nursery, moving any surviving objects into tenured memory or elsewhere.

There are a number of obstacles:

- GGC was not designed for multiple, parallel nurseries, as far as I know.
- GGC nurseries cannot store all kinds of objects. Notably, they cannot store strings yet.
- Undoubtedly more.
Depends on: GenerationalGC
Blocks: PJS
I think we should do this, but the implementation should be unrelated to generational garbage collection. It's just region allocation in which you copy out the resule at the end. There's no need for exact stack scanning or write barriers. There will be a few places where the runtime will have to be updated to deal with the possibility of objects being stored in the thread's local region. There should be almost no need to change GC code.
I agree with Bill, you should probably wait for bug 902174: the swarm approach there will be much easier to adapt for this purpose.
I expect you two would know best. We did not have any immediate plans to implement this, I mostly wanted to open up bugs for the expected work items that will come up. Still, memory allocation is one of the major features we have to offer, and it'd be nice to support it well, so maybe we can have a session to discuss what would be the best strategy just so I better understand how things work. 

At the moment our most immediate problems with memory alloc have to do with the fact that GC throws away the compiled artifacts, forcing us to recompile every so often; this hurts par exec more than it does seq exec. I expect GGC may help with this since, as I understand it, minor collections do not throw away jitted code.
Discussions at the work week lead to the following implementation plan:

- Each worker will have a private bump-allocated nursery, specially made for PJS
- Code compiled to run in parallel mode will always allocate in this nursery
- Code that runs in sequential mode will never allocate in this nursery
- Each nursery will be copy-collected independently from the other nurseries by the worker
  thread if it fills up during parallel execution, and may also grow independently to
  accomodate live data
- At the end of the parallel section the live storage in all those nurseries will be evacuated
  into the common heap (this is trivial if the kernel produces only non-reference values)

Since individual slices are processed by individual workers, it suffices to keep track of the worker that processed a slice to find out which result objects to trace during collection for a particular worker.

I'm not sure whether it's necessary to discard all parallel work if we go to sequential mode, but Shu has made that a fact today and we should count on it if we have to.
Assignee: nobody → lhansen
There are some details I don't quite follow, particularly with evacuation and collection. It's clear that we will have to collect during execution (as recent benchmarks have persuasively demonstrated). What do we plan to do with data that is still live at that time?

I see two big possibilities:

1. We keep the data in the parallel arenas somehow during parallel execution (there are many variations here). At the end, from sequential code, we evacuate whatever data is still live into normal GC objects.

2. During parallel iteration, we create arenas for our output. When we find things that are live (as in, stored into the result), we copy data into there. At the end we adopt these arenas as we do today.

I think scheme 1 is roughly what you described. What makes me nervous about it is that there is clear sequential overhead -- evacuating the result data is something that is clearly parallelizable. On the other hand, it has the advantage that all GC data structures (arenas etc) are manipulated only from the main thread. I definitely see not interfering with the main GC allocation paths as a desirable goal.

On the other other hand, moving data out from this parallel nursery into arenas or what-have-you might
not be totally trivial, and might wind up being a separate code path. In that case, perhaps it is separate enough that we can do it from the parallel code without affecting the normal allocation paths.

Seems like something that will be clearer as we plan in more detail.
(In reply to Niko Matsakis [:nmatsakis] from comment #5)
> 1. We keep the data in the parallel arenas somehow during parallel execution
> (there are many variations here). At the end, from sequential code, we
> evacuate whatever data is still live into normal GC objects.

What I intended to say, since it has the advantages you suggest later.

> 2. During parallel iteration, we create arenas for our output. When we find
> things that are live (as in, stored into the result), we copy data into
> there. At the end we adopt these arenas as we do today.

A possibility, esp if we can avoid adopting large arenas.

> What makes me nervous about
> [scheme 1] is that there is clear sequential overhead -- evacuating the result data
> is something that is clearly parallelizable.

Yes.

I'm envisioning that at some point the GGC will support per-thread allocators into a common nursery, if it does not do so already.  In that case we may be able to evacuate from the parallel arenas multi-threaded, since we know there's no shared data.

Support for nested parallelism (inevitable if PJS is initially successful?) may also change a lot of things for us, and makes me pretty nervous, so as I evolve the plan I'll try to look out for that.

Not sure what happens if we fill the nursery in the middle of an evacuation from the parallel area, but I doubt it's going to be pleasant unless we can always grow the nursery.  That issue speaks in favor of just adopting the arenas.  They'd have to be adopted into the nursery though, since we don't have remembered sets for the parallel arenas.
(In reply to Lars T Hansen [:lth] from comment #6)
> Support for nested parallelism (inevitable if PJS is initially successful?)
> may also change a lot of things for us, and makes me pretty nervous, so as I
> evolve the plan I'll try to look out for that.

Yes, I have been thinking a bit about the nested case too, though I didn't mention it.

> Not sure what happens if we fill the nursery in the middle of an evacuation
> from the parallel area, but I doubt it's going to be pleasant unless we can
> always grow the nursery.  That issue speaks in favor of just adopting the
> arenas.  They'd have to be adopted into the nursery though, since we don't
> have remembered sets for the parallel arenas.

For some reason, I was assuming that we would evacuate parallel arenas into tenured space, but that actually doesn't make much sense. At the moment, arenas are only used for tenured allocations (as I understand it).
(In reply to Niko Matsakis [:nmatsakis] from comment #7)

> For some reason, I was assuming that we would evacuate parallel arenas into
> tenured space, but that actually doesn't make much sense. At the moment,
> arenas are only used for tenured allocations (as I understand it).

Actually, the comment stream on bug 619558 makes it look compelling to be evacuating into the tenured space, because a bunch of types are "not allocated in the nursery".  I don't know yet what that means for us except that evacuation can't be a straight copy into the nursery nor a merging of the parallel arenas into the nursery.
I've talked to Terrence some more.  There are additional complications.  At the moment, many types are not only not allocated in the nursery, but they are not movable at all.  This includes function objects (closures), strings, and pretty much anything with a nontrivial finalizer, I think.  Objects and Arrays are OK.  The consequences of these complications are several:

- we allocate an object in the tenured heap or in our worker-private heap depending on type
- we have to have write barriers that at a minimum trigger on every store into an object that is
  allocated in the tenured heap
- we have to scan at least a subset of the remembered set for the tenured heap when we copy-collect
  the worker-private heap and update pointers as appropriate
Non-GGC builds will be supported while products still require them; currently B2G and Fennec need non-GGC builds and the necessary preconditions for GGC on those products (tools, mostly?) won't be ready for "months, probably".  So we can rely on non-GGC for PJS for some time still.
Re: finalizers, I think we are somewhat aided by the fact that not all objects can be allocated in par mode, though strings currently can.
Depends on: 993347
Attached patch Work in progress (obsolete) — Splinter Review
General overview:

- The important allocation paths are given cases that route the allocation
  to the PJS per-worker nursery; this is all in place, as is bump allocation
  in those nurseries.

- The Nursery is refactored into a NurseryBase<> that is more general and can
  be used for other kinds of nurseries.  This may or may not be the right
  thing given the long-term plans for the current Nursery but it's pretty
  clean (and minimal, atm).

- The PJS nursery subclasses NurseryBase and steals a bunch of code from
  the nursery in general.  Allocation seems to be working, so long as GC isn't
  triggered :)

- The forwarding framework for the PJS nursery can copy into a tospace or into
  the private tenured area.

- At the end of the parallel work, the tenured area - expected to be quite
  small - is merged into the general tenured heap.

- The forwarding code has not been fleshed out yet.  Tracing and copying is
  inherited from NurseryBase and marking uses the general marking machinery,
  I expect, but I'm working on finding and marking roots properly.
Attached patch Work in progress v2 (obsolete) — Splinter Review
Still some missing bits but this can run through thread-independent collection and evacuation in a simple test without crashing, and the program generates the correct output.  Still, the copying stats are invariably zero and this makes scant sense.  More work needed...
Attachment #8405464 - Attachment is obsolete: true
Re comment #13: Ion code is bound to a specific thread, because it inlines the address of the activation_ field within the JSRuntime's main PerThreadData structure (ie the code is bound to the main thread).  We need this field to be updated correctly on every thread's PerThreadData in order to scan the thread's stack for pointers.

(It's not yet clear to me how we handle PJS kernels that throw exceptions, since correct unwinding should need activation information too.)

The alternative to scanning for stack roots is to adopt some "local bailout" style where the slice is aborted and returned to the slice pool; once we get back into C++ code we can run GC, which will be straightforward because the stack will be empty.  I had hoped to avoid that since (a) it wastes work and (b) it means we'll need to discover the case where a slice allocates more data than the size of the nursery.  Neither is likely a disaster, but it adds complexity.

To uncouple the generated code from the thread it's supposed to be running on we'd need to access the TLS-held PerThreadStorage to get the proper activation_ field, probably.
Re comment #14, in IonMacroAssembler.cpp we distinguish between entering a normal exit frame and a parallel exit frame (and ditto fake exit frames); in the case of the parallel frames, we set the ionTop on the perThreadData correctly.  This seems to work properly for all threads now that I've fixed a bug for the worker running on a main thread.

Still the activation_ is NULL so no frames are found.  Presumably there's a trick...
Comment on attachment 8406790 [details] [diff] [review]
Work in progress v2

Review of attachment 8406790 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like this is coming along well!

::: js/src/gc/Marking.cpp
@@ +207,5 @@
>  {
>      JS_ASSERT(thingp);
>      T *thing = *thingp;
>  
> +#if 0                           // ForkJoinNursery has a problem with this, how to finesse?

We already have an early exit for nursery GC's in MarkInternal:

    /* This function uses data that's not available in the nursery. */
    if (IsInsideNursery(trc->runtime, thing))
        return;

::: js/src/gc/Nursery-inl.h
@@ +111,5 @@
> +}
> +
> +template <class TRC>
> +MOZ_ALWAYS_INLINE void
> +js::gc::NurseryBase<TRC>::traceObject(TRC *trc, JSObject *obj)

This entire traceObject path simply duplicates the code in MarkChildren(JSObject*) in gc/Marking.cpp: it is here exclusively here for inlining. It is worth a 5-15% speedup in collectToFixed point, netting us a 2% win on octane.

::: js/src/jsgc.cpp
@@ +5547,5 @@
>  
> +#if 0
> +    // TODO
> +    // Walk our tenured area scanning each object in turn with MarkObjectUnbarriered().
> +    // How do I find the objects?

You want CellIterUnderGC in jsgcinline.h.

@@ +5563,5 @@
> +    //
> +    // Or perhaps a ditto iterator API.
> +
> +    for ( size_t kind=0 ; kind < FINALIZE_LIMIT ; kind++ ) {
> +        for ( ArenaHeader* ah = allocator_->arenas.getFirstArena(kind) ; ah ; ah=... ) {

You should be able to replace this block with:

for (CellIterUnderGC i(ah); !i.done(); i.next())
    MarkGCThingUnbarriered(trc, &i.getCellRef(), "ForkJoin tenured object");

Note, you'll have to implement CellIterImpl::getCellRef here; looks like we haven't needed it yet, but there's no reason we shouldn't have it.

::: js/src/vm/ForkJoin.cpp
@@ +1999,5 @@
> +#ifdef JSGC_USE_EXACT_ROOTING
> +    MarkForkJoinStack(trc);
> +#else
> +    # error "Conservative rooting not supported"
> +#endif

At the top of StoreBuffer.h, we have:

#ifdef JSGC_GENERATIONAL
#ifndef JSGC_USE_EXACT_ROOTING
# error "Generational GC requires exact rooting."
#endif

It may make more sense to guard on presensce of JSGC_GENERATIONAL, since this GC needs the other's machinery anyway. I'd put a similar #error at the top of ForkJoin.h with that requirement.

@@ +2034,5 @@
> +    uintptr_t p = reinterpret_cast<uintptr_t>(obj);
> +    for ( unsigned i=0 ; i <= currentChunk_ ; i++ )
> +        if (p >= chunks[i]->start() && p < chunks[i]->end())
> +            return true;
> +    return false;

Jon should be landing a patch very soon to put a bit in the chunk trailer that will allow us to do cell->chunk().info.trailer.type == NurseryType for this check. You should be able to add an enum value to that pretty trivially. Nulling the JSRuntime* in the trailer would also be a great way to ensure any C++ code PJS calls doesn't accidentally try to use the runtime pointer in the chunk.

@@ +2069,5 @@
> +            size_t totalSize = size + sizeof(HeapSlot) * numDynamic;
> +            JSObject *obj = static_cast<JSObject *>(allocate(totalSize));
> +            if (obj) {
> +                obj->setInitialSlots(reinterpret_cast<HeapSlot *>(size_t(obj) + size));
> +                return obj;

Nice!
Attached patch DumpStack testing function (obsolete) — Splinter Review
With patches from bug 996983 applied, you can use this function in a PJS kernel to dump the stack.
Depends on: 996983
With Shu's patches from 996983 it now enumerates frames, well, actually, it enumerates the first one - this is progress! - and then promptly crashes because that frame is not meaningful.  (It looks like an exit frame but the data are corrupt.)

I'm guessing that this is due to CodeGenerator::emitAllocateGCThingPar calling the storage allocator with masm.callABI, which does not set up an exit frame properly, rather than with masm.callVM, which does.

I think we would not have seen this problem previously because GC is disabled when the old allocator calls out to C++ using callABI due to free list exhaustion.  The C++ code would normally set the abort flag but always return an object.
In comment #18, s/callABI/callWithABI/g
In comment #18, the out-of-line call checks for a nullptr return, so a failed alloc in C++ can just return nullptr - it need not allocate.
Attached patch Work in progress v3 (obsolete) — Splinter Review
Rebased to sit on top of Shu's patches for activation traversal.
Attachment #8406790 - Attachment is obsolete: true
Attached patch Sketches for the allocator path (obsolete) — Splinter Review
Ineffective - ie crashy, wrong - changes to the parallel allocation path with an eye toward generating proper exit frames.  (There's too much of how everything fits together I don't understand yet :-)
Attached patch Work in progress v4 (obsolete) — Splinter Review
Missing:
 - more work on stack root scanning, see FIXME
 - implement tenured object scanning a la Terrence's suggestion
 - more serious test cases for PJS
 - more serious test cases for non-PJS, given the amount of GC refactoring
Attachment #8407506 - Attachment is obsolete: true
Attachment #8407507 - Attachment is obsolete: true
Attached file Test case 1 (obsolete) —
This test case is the usual mapPar cat convolver, but with additional work to allocate a substantial temporary structure during execution and then dereference that fully to get the result.  This test exercises minor GC, esp in combination with poisioning of the old chunks, but it does not test evacuation.

Run from the directory containing the input file (next attachment).
Attached image Input for test case (obsolete) —
Attached patch Work in progress v5 (obsolete) — Splinter Review
v5 is not complete but passes test case 1, meaning that stack traversal and copying are somewhat right.
Attachment #8410267 - Attachment is obsolete: true
Attached file Test case 2 (obsolete) —
Test case 2 requires evacuation from the PJS nursery, scanning of the output array, and merging of the PJS tenured areas into the general tenured area to be working properly: results from the parallel section are boxed in individual objects, references to which get stored in the result array.

This test has to use plain Array and Object because there are currently issues in the parallel engine with TypedObject arrays that contain references that cause the engine to bail out.
Attached file Test case 3 (obsolete) —
Test case 3 adds a further complication by returning objects from the kernel that are allocated directly in the PJS tenured areas (Function objects) but that refer, via their scope chain, to objects in the PJS nurseries that contain the actual computed values.  This tests that scanning of the tenured area works properly so that the objects containing the results are properly evacuated.

(A further refinement on this test would allocate a lot of garbage, a la Test case 1, to trigger minor GC after allocating the function but before returning it from the kernel.)
Attached patch Work in progress v6 (obsolete) — Splinter Review
Version 6 passes all test cases and is believed to be complete apart from bugs, cleanup, and tuning.

The big buggaboo at the moment is the liberal use of IsInsideNursery() throughout the code base; that function is way too performance sensitive to also make it consider the parallel nurseries, yet it takes some close reasoning to determine whether something else is needed in each specific instance and I expect there are some bugs around that handling still.  Terrence mentioned earlier a forthcoming fix by Jonco to flag nursery arenas for easy recognition (a bit in the runtime pointer); that would help a lot for this case.
Attachment #8411312 - Attachment is obsolete: true
Attached file Test case 4 (obsolete) —
Test case 4 stresses string allocation by always returning results from the kernel in the form of string encodings of the result values.  Strings are always allocated in the private tenured area, there should be no evacuation from the PJS nurseries in this case (set PAFLAGS=gc to see GC stats).
Attached file Test case 5 (obsolete) —
Test case 5 will occasionally (about 1% of the time) allocate a temporary data structure that is so large that it forces the nursery for the worker to expand beyond a single chunk.  The test thus stresses the allocator's ability to move from one chunk to another, and it tests the machinery around expansion policy, and around handling multiple chunks in the PJS GC in general.
Attached file Test case 6 (obsolete) —
Test case 6 allocates a temporary data structure that is large enough, and lives long enough, to cause the nursery overflow policy to kick in: live data will be promoted out of the nursery into the tenured generation in the middle of a parallel section.  Once the particular work item is finished all the promoted data becomes garbage.

In order to make the program run in a reasonable amount of time the large data structure is allocated only infrequently, approximately 36 times on a 600x600 grid.

Curiously I'm seeing some parallel bailouts on this benchmark (types 0 and 10 are reported; 0 is "no bailout" and 10 is "overrecursed" and neither makes a lot of sense to me.)
Attached file Test case 7 (obsolete) —
Test case 7 allocates temporary data structures in the PJS nurseries that are so large that they will cause the nursery to overflow and the data structure (the part of it that's been built) to be promoted to the PJS tenured space.

Unlike test case 6, the promotion volume is quite a bit higher and the allocations are more likely to end up in the same worker within the same parallel section, and the result will be to fill up the tenured area to the point where a GC is required at the end of the parallel section.

This is still a pretty soft test - the promoted volume is not so large that anything bad happens by deferring the GC until after the parallel section.
A note about termination:

It is very likely possible to get into a state where the tenured area fills up, we bail out to do GC, but then there were no new live data so the heap remains effectively the same size, and we reenter the parallel section only to have the same events occur again because all the work from the previous iteration was discarded.

If my understanding is correct, this (unlikely) loop is eventually broken by the bailout counter exceeding its cutoff and the kernel being run sequentially.  In that case, performance will be abysmal, but at least correctness is not at stake.
Attached file Test case 8 (obsolete) —
Test case 8 forces enough spills from the nurseries into the tenured area to require the tenured area to be collected, and it does cause bailouts to sequential execution.  This test case is very sensitive to the PJS GC parameters, but has no way of obtaining them so as to adjust to changing circumstances.
Attached patch Work in progress v7 (obsolete) — Splinter Review
WIP v7 is believed to be feature complete apart from optimizations (including the inline allocation path) and bugs I haven't found yet.
Attachment #8412485 - Attachment is obsolete: true
Attached file Test case 9 (obsolete) —
Test case 9 exercises several Array paths in the VM and GC and uncovered several problems in WIP v7 (missing resume points, broken GC predicates).  This test case is essentially Intel's SimPJS program.
Attached patch Work in progress v8 (obsolete) — Splinter Review
Rebased to m-c 180667, plus some bug fixes.
Attachment #8414322 - Attachment is obsolete: true
Attached patch Base patch 1: changes to Ion (obsolete) — Splinter Review
Basically three changes here:

- Pick up the parallel ion script during stack walking if that is appropriate

- Add resume points after storage allocation for parallel operations, since
  they will now be implemented using oolCallVM rather than callWithAPI, and
  will need sane exit frames, and my understanding is that this is how I
  get them

- Transfer the resume point when replacing an instruction, see change in
  ParallelSafetyAnalysis.cpp.

I'm a little nervous about the third change and jandem and I have discussed it briefly on IRC, he does not seem to believe it is necessary.  But without this change the resume point will be NULL and I will quickly hit an assert near the start of CodeGeneratorShared::callVM(), it looks like this:

    JS_ASSERT_IF(mir->isEffectful(), mir->resumePoint());

Perhaps the problem is elsewhere; suggestions appreciated, obviously.

(If it matters, this fix sits on top of another that has not landed, see https://bugzilla.mozilla.org/show_bug.cgi?id=996983#c13.)
Attachment #8415246 - Flags: review?(shu)
Attachment #8415246 - Flags: review?(jdemooij)
The result object is a root for garbage collection and must be known to the ForkJoin engine.  At the moment there is no need to pass more than the one object, and it is IMO better to pass it as an argument than to set it with additional function calls (plays better with nested parallelism, has less state).

Also incorporates a bug fix to bug 996983 (set the result value to zero before the parallel invocation) and corrects the number of arguments in the intrinsic table.
Attachment #8415261 - Flags: review?(shu)
This is perhaps a little rough still (a commented-out assert, two TODO annotations) but it is probably useful to discuss it in earnest.  At this point I probably need to know whether a change like this will be acceptable, or if I must thing along entirely different avenues.

This patch refactors the Nursery into a templated NurseryBase class and a Nursery that inherits from an instantiation of NurseryBase, the template argument is a subtype of NurseryCollectionTracer which must additionally provide some functions that are described only in comments (for inlining purposes - a little bizarre but it works).  Functions that know how to allocate and move objects are moved to NurseryBase and implemented with support from that template class's argument class.

Inlinability should not be compromised, provided code elsewhere that knows how to call Nursey functions is itself templated, see subsequent patches for examples.
Attachment #8415304 - Flags: feedback?(terrence)
I don't care for this one in its current form: partly because I'm not 100% sure about the ramifications, partly because the implementation is a hack, and partly because I have to make judgment calls about whether IsInsideAnyNursery is necessary or one can get away with IsInsideNursery (the latter is good enough for ExclusiveContext and JSContext, while anything with ThreadSafeContext or ForkJoinContext *may* need the former).  Yet something like this seems necessary.
The collector needs cheap access to chunks for its copying phase.  I've attached a simple chunk pool to the ThreadPool.

Missing from this patch is any consideration for how or when to prune the chunk pool, or whether the chunk pool should somehow go via the memory manager's chunk pool (gcChunkPool, IIRC).
Comment on attachment 8415246 [details] [diff] [review]
Base patch 1: changes to Ion

Review of attachment 8415246 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work! Looks good, below some ideas to simplify things a bit.

::: js/src/jit/IonBuilder.cpp
@@ +5517,5 @@
>          templateObject->setShouldConvertDoubleElements();
>      else
>          templateObject->clearShouldConvertDoubleElements();
> +
> +    return resumeAfter(ins);

This change shouldn't be necessary as we treat MNewArray as non-effectful (see MNewArray::getAliasSet). I think MNewPar should also have an empty alias set to fix the assertion failure.

::: js/src/jit/IonFrames.cpp
@@ +878,5 @@
> +        } else {
> +            ionScript = jsScript->ionScript();
> +        }
> +    }
> +    JS_ASSERT(ionScript);

JitFrameIterator::ionScript() looks very similar (and handles parallelIonScript), but it doesn't trace the IonScript. It'd be nice if we could refactor this a bit:

IonScript *ionScript = nullptr;
if (frame.checkInvalidation(&ionScript)) {
    // ..comment..
    IonScript::Trace(trc, ionScript);
} else {
    ionScript = frame.ionScriptFromCalleeToken();
}

ionScriptFromCalleeToken could assert !checkInvalidation(), and JitFrameIterator::ionScript() could also call ionScriptFromCalleeToken (after the checkInvalidation call).

@@ +939,5 @@
>      // any slots/elements pointers stored in this frame.
>  
>      IonJSFrameLayout *layout = (IonJSFrameLayout *)frame.fp();
>  
>      IonScript *ionScript = nullptr;

Same here.

::: js/src/jit/JitFrameIterator.h
@@ +119,5 @@
>          return current_;
>      }
> +    ExecutionMode mode() const {
> +        return mode_;
> +    }

You can remove this too I think (see other comment).

::: js/src/jit/ParallelSafetyAnalysis.cpp
@@ +632,5 @@
>  {
>      MBasicBlock *block = oldInstruction->block();
>      block->insertBefore(oldInstruction, replacementInstruction);
>      oldInstruction->replaceAllUsesWith(replacementInstruction);
> +    MResumePoint* rp = oldInstruction->resumePoint();

Nit: * goes before rp, so MResumePoint *rp

@@ +636,5 @@
> +    MResumePoint* rp = oldInstruction->resumePoint();
> +    if (rp && rp->instruction() == oldInstruction) {
> +        rp->setInstruction(replacementInstruction);
> +        replacementInstruction->setResumePoint(rp);
> +    }

Ah I see the problem now. PJS doesn't bail to Baseline so probably doesn't need these resume points, but I can see how that will upset certain asserts.

This fix seems fine. We could also make the assert in callVM sequential-mode-only. Shu, what do you think?
Attachment #8415246 - Flags: review?(jdemooij)
One of the new functions is under an ifdef but I don't know that it needs to be.

(The change to the Cell iterator will undoubtedly clash with what Nick is doing so I expect this to change a bit.)
Comment on attachment 8415246 [details] [diff] [review]
Base patch 1: changes to Ion

Review of attachment 8415246 [details] [diff] [review]:
-----------------------------------------------------------------

Generally looks good to me. I agree with Jan's comments about marking the allocation MIRs as non-effectful.

r=me with those changes.

::: js/src/jit/MCallOptimize.cpp
@@ +1602,5 @@
>      current->add(newObject);
>      current->push(newObject);
>  
> +    if (!resumeAfter(newObject))
> +        return InliningStatus_Error;

This shouldn't be necessary if we mark MNewDenseArrayPar (and MNewPar like Jan suggested earlier) as non-effectful.

::: js/src/jit/ParallelSafetyAnalysis.cpp
@@ +636,5 @@
> +    MResumePoint* rp = oldInstruction->resumePoint();
> +    if (rp && rp->instruction() == oldInstruction) {
> +        rp->setInstruction(replacementInstruction);
> +        replacementInstruction->setResumePoint(rp);
> +    }

This is the right fix over making the assert sequential-mode only. Walking the Ion stack (needed for this set of patches and for tooling) in PJS code requires Snapshots to be encoded.
Attachment #8415246 - Flags: review?(shu) → review+
Comment on attachment 8415261 [details] [diff] [review]
Base patch 2: pass the result array to the ForkJoin engine

Review of attachment 8415261 [details] [diff] [review]:
-----------------------------------------------------------------

This is just a patch to thread through |updatable|, right? I don't see any uses of it.

Looks good to me with style fixes if so.

::: js/src/vm/ForkJoin.cpp
@@ +278,5 @@
>      RootedScript bailoutScript;
>      jsbytecode *bailoutBytecode;
>  
>      ForkJoinOperation(JSContext *cx, HandleFunction fun, uint16_t sliceStart,
> +                      uint16_t sliceEnd, ForkJoinMode mode, RootedObject& updatable);

Style nit here and below: RootedObject &updatable

(Yeah, SpiderMonkey style is inconsistent with general Gecko style...)

@@ +1341,5 @@
>      }
>  
>      bool invoke(ForkJoinContext *cx) {
>          JitActivation activation(cx);
> +        Value result = JSVAL_ZERO;

Nit: Use Int32Value(0).

::: js/src/vm/SelfHosting.cpp
@@ +282,5 @@
>                                        intrinsic_ParallelSpew);
>  #endif
>  
>  /*
> + * ForkJoin(func, sliceStart, sliceEnd, mode, updatable): Invokes |func| many times in parallel.

Ah thanks for updating this comment, it was woefully out of date.
Attachment #8415261 - Flags: review?(shu) → review+
Comment on attachment 8415304 [details] [diff] [review]
Base patch 3: refactor the nursery

Review of attachment 8415304 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, I wanted to give myself adequate time to consider the full ramifications of going this route. In general I'm okay with this type of abstraction, however, it may not be right way to go at this time.

Specifically, we know for a fact that we're going to be changing our nursery structure in very short order to try to take advantage of multiple generations. Maybe we can do that and PJS can piggyback on that work. If so, that would be awesome. I am worried, however, that PJS's benchmarks will have very different allocation characteristics from octane and thus that conjoining the structure like this will potentially make it much harder to get unilateral improvements. At worst, it will have JS and PJS butting heads over who gets to take a win. Even if that were not an issue, the nursery does still need significantly more work: sharing the code via headers makes this harder because recompiles suddenly take 10x longer.

If the nursery were older and we were fairly certain that a full rewrite were not still in the cards, this sort of maximal code sharing would definitely be the way to go. As is, I think we'd be better off kicking the potential maintenance burden down the road for now. That's not to say we shouldn't still share as much code as possible, just to a less full degree. Many pieces can easily be ripped out to a shared header and fine-tuned with a template parameter: RelocationOverlay, moveToTenured, maybe even ChunkOverlay.

Does this seem reasonable? Are my concerns perhaps overblown?

::: js/src/gc/Nursery-inl.h
@@ +147,3 @@
>  {
>      JS_ASSERT(ref);
> +    //JS_ASSERT(isInside(*ref));

You should be able to use the IsInsideNursery from GCAPI.

::: js/src/gc/Nursery.h
@@ +54,5 @@
> +namespace gc {
> +
> +class NurseryCollectionTracer : public JSTracer
> +{
> +public:

2 spaces before public. This will make diffs inside the class show the class line in the header rather than only public:.

@@ +55,5 @@
> +
> +class NurseryCollectionTracer : public JSTracer
> +{
> +public:
> +    NurseryCollectionTracer(JSRuntime* runtime, JSTraceCallback traceCallback, WeakMapTraceKind weakTraceKind)

* with the name, per SM style.

@@ +63,5 @@
> +        , head(nullptr)
> +        , tail(&head)
> +    {}
> +
> +    JSRuntime* const runtime_;

JSRuntime * const, per SM style.

@@ +111,5 @@
> +       MOZ_ALWAYS_INLINE bool isInside(const T *p);
> +    */
> +};
> +
> +template<class TRC /* TRC implements NurseryCollectionTracer */>

Our typical style is to name these virtual types like normal types -- e.g. CamelCase. Let's call it Tracer here and elsewhere instead of T or TRC. Also, please use "typename" instead of class when declaring template arguments.

::: js/src/jit/IonFrames.cpp
@@ +931,5 @@
>  #endif
>  }
>  
>  #ifdef JSGC_GENERATIONAL
> +template<class NURSERY>

NurseryType
Attachment #8415304 - Flags: feedback?(terrence)
(In reply to Shu-yu Guo [:shu] from comment #47)
> >      ForkJoinOperation(JSContext *cx, HandleFunction fun, uint16_t sliceStart,
> > +                      uint16_t sliceEnd, ForkJoinMode mode, RootedObject& updatable);
> 
> Style nit here and below: RootedObject &updatable

"MutableHandleObject updatable"
Comment on attachment 8415261 [details] [diff] [review]
Base patch 2: pass the result array to the ForkJoin engine

Review of attachment 8415261 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ForkJoin.cpp
@@ +278,5 @@
>      RootedScript bailoutScript;
>      jsbytecode *bailoutBytecode;
>  
>      ForkJoinOperation(JSContext *cx, HandleFunction fun, uint16_t sliceStart,
> +                      uint16_t sliceEnd, ForkJoinMode mode, RootedObject& updatable);

Jan's right as usual, this should be either MutableHandleObject or HandleObject. I don't see how updatable is used in this patch, but if the reference itself doesn't need to be changed, use HandleObject here and below.
(In reply to Terrence Cole [:terrence] from comment #48)
> 
> Specifically, we know for a fact that we're going to be changing our nursery
> structure in very short order to try to take advantage of multiple
> generations.

I figured as much.

> If the nursery were older and we were fairly certain that a full rewrite
> were not still in the cards, this sort of maximal code sharing would
> definitely be the way to go. As is, I think we'd be better off kicking the
> potential maintenance burden down the road for now.

...

> Does this seem reasonable? Are my concerns perhaps overblown?

I think it's reasonable; I would probably have felt that way myself, had I been in your shoes.  Truth be told, my refactoring is fraying a bit at the edges and it will probably be cleaner just to fork the code and share what we can, with much cleaner refactorings where possible.
No longer depends on: 993347
Blocks: 1010169
Blocks: 1010178
Blocks: 1010186
The PJS GC is structured as a set of seven patches.  Some of the changes in early patches only make sense in light of other changes introduced in later patches.  Reviewers might have to grub around a little bit to find out how things fit together.

* The first provides some baseline changes to Ion and is a cleanup of a
  previous patch reviewed by jandem and shu
* The second propagates the PJS ForkJoin result array into the engine, so
  that it can be used as a GC root, this was previously reviewed by shu.
* The third introduces a chunk pool for use by PJS GC
* The fourth generalizes the marking functionality in the current GC a little
* The fifth - the largest by far - introduces the ForkJoinNursery and also
  makes some changes throughout the VM to fit it in.
* The sixth changes the VM's allocation paths to make use of the
  ForkJoinNursery.
* The seventh contains changes to various moz.build files to properly enable
  the PJS collector, this is suitable to use with the JS shell.  When the
  patch set lands, JSGC_FJGENERATIONAL will have to be turned on in some
  more structured way (it's not sufficient to enable it in js/src/moz.build
  due to some of the functionality leaking into a header file in one case).
Attachment #8415215 - Attachment is obsolete: true
Attachment #8415246 - Attachment is obsolete: true
Attachment #8415261 - Attachment is obsolete: true
Attachment #8415304 - Attachment is obsolete: true
Attachment #8415317 - Attachment is obsolete: true
Attachment #8415323 - Attachment is obsolete: true
Attachment #8415333 - Attachment is obsolete: true
Attachment #8422508 - Flags: review?(jdemooij)
Attachment #8422511 - Flags: review?(terrence)
Attachment #8422511 - Flags: review?(shu)
Attachment #8422513 - Flags: review?(terrence)
Attachment #8422513 - Flags: review?(shu)
Attachment #8422516 - Flags: review?(terrence)
Attachment #8422516 - Flags: review?(shu)
Attached patch Patch 6 of 7: allocation paths (obsolete) — Splinter Review
Attachment #8422517 - Flags: review?(shu)
Attached patch Patch 7 of 7: turn it on (obsolete) — Splinter Review
Comment on attachment 8422508 [details] [diff] [review]
Patch 1 of 7: baseline ion changes

Review of attachment 8422508 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks perfect.
Attachment #8422508 - Flags: review?(jdemooij) → review+
Comment on attachment 8422513 [details] [diff] [review]
Patch 4 of 7: generalizations to the GC's marker

Review of attachment 8422513 [details] [diff] [review]:
-----------------------------------------------------------------

r=me  Looks fine, although I think this stuff is all settled enough that we can use the obvious sharing.

::: js/src/gc/RootMarking.cpp
@@ +576,5 @@
> +AutoGCRooter::traceAll(ThreadSafeContext *cx, JSTracer *trc)
> +{
> +    for (js::AutoGCRooter *gcr = cx->autoGCRooters; gcr; gcr = gcr->down)
> +        gcr->trace(trc);
> +}

Please rename this to traceAllInContext, templatize it on the type of cx, then make traceAll call it to do it's marking too.

::: js/src/jit/IonFrames.cpp
@@ +1214,1 @@
>  MarkJitActivations(JSRuntime *rt, JSTracer *trc)

Could we kill the duplication and make the user of this method pass rt->mainThread instead?

::: js/src/vm/Stack.h
@@ +1093,5 @@
>      }
>  };
>  
>  void MarkInterpreterActivations(JSRuntime *rt, JSTracer *trc);
> +void MarkInterpreterActivations(PerThreadData *ptd, JSTracer *trc);

Likewise for these.
Attachment #8422513 - Flags: review?(terrence) → review+
Comment on attachment 8422511 [details] [diff] [review]
Patch 3 of 7: chunk pool for PJS GC

Review of attachment 8422511 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ThreadPool.cpp
@@ +506,5 @@
> +            break;
> +    } while (!freeChunks_.compareExchange(p, p->next));
> +    if (p) {
> +        numFreeChunks_--;
> +        return p;               // Already poisoned

Style nit: move comment to its own line.

@@ +514,5 @@
> +        return mem;
> +#ifdef DEBUG
> +    memset(mem, 0xbd, ChunkSize);
> +#endif
> +    reinterpret_cast<Chunk*>(mem)->info.trailer.runtime = nullptr;

Two concerns here.

1. Is there a more principled way to manage the various overlays on this piece of memory? Passing a void * around and reinterpret_casting is a footgun. How about one struct that manages the overlays, possibly with nested structs for the different overlays?

2. What is nulling the runtime ptr here supposed to do? Identify these chunks as PJS Chunks and not normal GC Chunks? I'm unsure of the consequences from this, so deferring to Terrence here.

Also, refactor the debug poisoning and the runtime nulling to its own helper.

::: js/src/vm/ThreadPool.h
@@ +264,5 @@
> +    //
> +    // Returns nullptr on OOM.
> +    void *getChunk();
> +
> +    // Free chunk memory to the cache.  In debug mode poison it as for

Missing a verb here. "Add a free chunk of memory to the cache."?

Maybe "putFreeChunk"?

@@ +277,5 @@
> +    // This must be called with the GC lock taken.
> +    void pruneChunkCache();
> +
> +private:
> +    static const int64_t usBeforePrune = 10000000LL; // 10 seconds

How was 10s arrived at? Would a metric like # of GCs since last allocation be preferred? Such a metric is already in use for Chunk in the normal GC retirement and PJS jitcode retainment.

@@ +284,5 @@
> +        ChunkFreeList *next;
> +    };
> +
> +    // Timestamp of last allocation from the chunk pool.
> +    mozilla::Atomic<int64_t, mozilla::ReleaseAcquire> timeOfLastAllocation_;

The only types we guarantee supporting for T in Atomic<T> across all our platforms are pointer-sized types. But see comment above about possibly not using time as a heuristic.

@@ +287,5 @@
> +    // Timestamp of last allocation from the chunk pool.
> +    mozilla::Atomic<int64_t, mozilla::ReleaseAcquire> timeOfLastAllocation_;
> +
> +    // List of free chunks.
> +    mozilla::Atomic<ChunkFreeList*, mozilla::ReleaseAcquire> freeChunks_;

General style nit: space between type name and *.

@@ +291,5 @@
> +    mozilla::Atomic<ChunkFreeList*, mozilla::ReleaseAcquire> freeChunks_;
> +
> +    // Length of that list, thread-safe but not necessarily synchronized with
> +    // the list, useful for policy decisions.
> +    mozilla::Atomic<uint32_t, mozilla::ReleaseAcquire> numFreeChunks_;

What is this used for currently? I don't see any uses in this patch.
Attachment #8422511 - Flags: review?(shu)
Comment on attachment 8422513 [details] [diff] [review]
Patch 4 of 7: generalizations to the GC's marker

Review of attachment 8422513 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. Thanks for the typo fixes!

::: js/public/HeapAPI.h
@@ +70,5 @@
> +const uintptr_t ChunkLocationBitPJSFromspace = 8;  // The PJS generational GC's fromspace (during GC)
> +
> +const uintptr_t ChunkLocationAnyNursery = ChunkLocationBitNursery |
> +                                     ChunkLocationBitPJSNewspace |
> +                                     ChunkLocationBitPJSFromspace;

Style nit: line up to ChunkLocationBitNursery
Attachment #8422513 - Flags: review?(shu) → review+
Comment on attachment 8422516 [details] [diff] [review]
Patch 5 of 7: the ForkJoinNursery class and support code

Review of attachment 8422516 [details] [diff] [review]:
-----------------------------------------------------------------

For the new files, you should run them through the style checker (cd ObjDir; make check-style) for #include order problems.

Review still in progress, as it's quite a big patch. Posting what I have now.

::: js/src/gc/ForkJoinNursery-inl.h
@@ +35,5 @@
> +// be few survivors of a collection in the ForkJoinNursery and few
> +// objects will be tested.  The bulk of the calls may come from the
> +// code that scans the roots.
> +
> +MOZ_ALWAYS_INLINE bool

Just use the C++ 'inline' keyword here and elsewhere.

@@ +39,5 @@
> +MOZ_ALWAYS_INLINE bool
> +ForkJoinNursery::isInsideNewspace(const void *obj)
> +{
> +    uintptr_t p = reinterpret_cast<uintptr_t>(obj);
> +    for ( unsigned i=0 ; i <= currentChunk_ ; i++ )

Style nit:

for (unsigned i = 0; i <= currentChunk_; i++) {
  ...
}

Even though the body is a 1 line |if|, taken together the body is 2 lines, so need the braces.

@@ +49,5 @@
> +MOZ_ALWAYS_INLINE bool
> +ForkJoinNursery::isInsideFromspace(const void *obj)
> +{
> +    uintptr_t p = reinterpret_cast<uintptr_t>(obj);
> +    for ( unsigned i=0 ; i < numFromspaceChunks_ ; i++ )

Same style nit as above.

@@ +63,5 @@
> +    bool d = isInsideFromspace((const void**)obj);
> +#endif
> +    bool r = obj &&
> +        (reinterpret_cast<NurseryChunkLayout*>(uintptr_t(obj) & ~ChunkMask)->trailer.location &
> +         ChunkLocationBitPJSFromspace);

Style nit: line up with obj &&

::: js/src/gc/ForkJoinNursery.cpp
@@ +63,5 @@
> +// 1/3) then the nursery is grown, independently of other nurseries.
> +//
> +// There is an upper limit on the number of chunks in a nursery.  If
> +// the live data in a nursery after a collection exceeds the set
> +// fraction and the nursery can't grow then the next collection will

Nit: comma after grow

@@ +78,5 @@
> +// practice.
> +//
> +// The roots for a collection in the ForkJoinNursery are the execution
> +// stack, any registered roots on the execution stack, any objects in
> +// the private tenured area, and the ForkJoin result object in the

Why are all objects in the private tenured area rooted?

@@ +83,5 @@
> +// common tenured area.
> +//
> +// No write barriers are run during the ForkJoin section; a minor or
> +// evacuating collection will examine all objects in the tenured area
> +// for pointers into the nursery.

Ah, this is why you null out the runtime ptr in the PJS nursery chunk trailers.

@@ +123,5 @@
> +// - While the gc is running, the previous "newspace" has been renamed
> +//   as the gc's "fromspace", and the space that objects are copied
> +//   into is known as the "tospace".  The tospace may be a nursery
> +//   space, or it may be a tenured space (though it's always one or
> +//   the other, never a combination).  After gc the fromspace is

So a nursery space during a minor gc, and a tenured space during evacuating gc? I would write that explicitly.

@@ +129,5 @@
> +//
> +// - If the gc copies objects into a nursery tospace then this tospace
> +//   becomes known as the "newspace" following gc.  Otherwise, a new
> +//   newspace won't be needed (if the parallel section is finished) or
> +//   can be created empty (if the gc just needed to evacuate).

Great overview comment!

@@ +151,5 @@
> +    , movedSize_(0)
> +    , head_(nullptr)
> +    , tail_(&head_)
> +{
> +    for ( size_t i=0 ; i < MaxNurseryChunks ; i++ )

Same style nit as before, here and below.

@@ +170,5 @@
> +    if (mustEvacuate_) {
> +        mustEvacuate_ = false;
> +        pjsCollection(true, true);
> +    }
> +    else

Style nit: } else {

@@ +200,5 @@
> +
> +    TIME_START(pjsCollection);
> +
> +    rt->incPJSMinorCollecting();
> +    rt->setNeedsBarrier(false);

This function should only be called from inside PJS execution, right? We should be able to assert that no barriers are needed.

@@ +250,5 @@
> +{
> +    // Grow the nursery if it is too full.  Do not bother to shrink it - lazy
> +    // chunk allocation means that a too-large nursery will not really be a problem,
> +    // the entire nursery will be deallocated soon anyway.
> +    if (live*NurseryLoadFactor > numActiveChunks_*NurseryChunkUsableSize) {

Style nit: spaces around *

@@ +253,5 @@
> +    // the entire nursery will be deallocated soon anyway.
> +    if (live*NurseryLoadFactor > numActiveChunks_*NurseryChunkUsableSize) {
> +        if (numActiveChunks_ < MaxNurseryChunks) {
> +            while (numActiveChunks_ < MaxNurseryChunks &&
> +                   live*NurseryLoadFactor > numActiveChunks_*NurseryChunkUsableSize)

Style nit: since the condition is on multiple lines, do:

while (foo &&
       bar)
{
    ...
}

@@ +297,5 @@
> +    numFromspaceChunks_ = 0;
> +}
> +
> +ForkJoinNursery::NurseryChunkLayout *
> +ForkJoinNursery::chunk(void *mem)

Could this be combined with 'setCurrentChunk' below? As I said in the previous patch's review, passing around void * pointers makes me uneasy. Also, the 'chunk' name doesn't really suggest that this particular nursery is taking ownership of the particular chunk and marking it as in newspace. Maybe 'getChunkFromPool'?

@@ +329,5 @@
> +     * using the forwardBufferPointer() mechanism.
> +     *
> +     * The only reason for that restriction is so that we can call
> +     * isCellInsideFromspace() rather than isInsideFromspace() here.
> +     */

Style nit: I'd just // the whole block.

@@ +339,5 @@
> +ForkJoinNursery::MinorGCCallback(JSTracer *trcArg, void **thingp, JSGCTraceKind kind)
> +{
> +    ForkJoinNursery *nursery = static_cast<ForkJoinNurseryCollectionTracer*>(trcArg)->nursery_;
> +    if (nursery->shouldMoveObject(thingp))
> +        *thingp = nursery->moveToTospace(static_cast<JSObject *>(*thingp));

The comment in shouldMoveObject mentions that |thingp| should be of type GCThing ** for some GCThing, but here it is always cast to a JSObject *. This is true presently because nothing else can be nursery-allocated. I would add an assertion on AllocKind that it's a JSObject AllocKind and a comment explaining this.

@@ +390,5 @@
> +            // Use ArenaCellIterUnderFinalize, not ...UnderGC, because that side-steps some
> +            // assertions in the latter that are wrong for PJS collection.
> +            tenured_->arenas.purge(kind);
> +            for (ArenaCellIterUnderFinalize i(ai.get()); !i.done(); i.next())
> +                objs[numObjs++] = i.get<JSObject>();

Ditto on AllocKind assertion.

::: js/src/gc/ForkJoinNursery.h
@@ +1,2 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
> + * vim: set ts=8 sw=4 et tw=78:

Nit: change tw=99

@@ +214,5 @@
> +    MOZ_ALWAYS_INLINE void markSlot(HeapSlot *slotp);
> +
> +    ForkJoinContext *const cx_;     // The context that owns this nursery
> +    Allocator *const tenured_;      // Private tenured area
> +    ForkJoinGCShared *const shared_;// Common to all nurseries belonging to a ForkJoin instance

Space between ; and //, maybe re-align the other comments.

@@ +215,5 @@
> +
> +    ForkJoinContext *const cx_;     // The context that owns this nursery
> +    Allocator *const tenured_;      // Private tenured area
> +    ForkJoinGCShared *const shared_;// Common to all nurseries belonging to a ForkJoin instance
> +    JS::Zone *evacuationZone_;      // During evacuating GC this is non-NULL: the Zone we

Finally, the Zone naming pays off.
(In reply to Shu-yu Guo [:shu] from comment #61)
> Comment on attachment 8422511 [details] [diff] [review]
> Patch 3 of 7: chunk pool for PJS GC
> 
> @@ +514,5 @@
> > +        return mem;
> > +#ifdef DEBUG
> > +    memset(mem, 0xbd, ChunkSize);
> > +#endif
> > +    reinterpret_cast<Chunk*>(mem)->info.trailer.runtime = nullptr;
> 
> Two concerns here.
> 
> 1. Is there a more principled way to manage the various overlays on this
> piece of memory?

I can probably use 'Chunk' as the data type (though I confess I don't
share the sentiment :)  Anything more complicated than that and I'll
probably put up a fight on the basis of "too much abstraction".

> 2. What is nulling the runtime ptr here supposed to do? 

It catches errors earlier, cf earlier comments from Terrence.

> > +private:
> > +    static const int64_t usBeforePrune = 10000000LL; // 10 seconds
> 
> How was 10s arrived at?

Tossing a coin, effectively.  See below.

> Would a metric like # of GCs since last allocation
> be preferred?

I don't think so, because the desired information is whether the PJS
engine is still active, not what is going on elsewhere in the system.
To me, the best determination of that is simply running time.  If the
PJS system has been quiet for a while then the memory can be recycled,
if not it should definitely be kept around.

> > +    // Timestamp of last allocation from the chunk pool.
> > +    mozilla::Atomic<int64_t, mozilla::ReleaseAcquire> timeOfLastAllocation_;
> 
> The only types we guarantee supporting for T in Atomic<T> across all our
> platforms are pointer-sized types.

The try server has been slapping me around a bit for this and
I changed the code to track the time stamp in seconds.

> @@ +291,5 @@
> > +    mozilla::Atomic<ChunkFreeList*, mozilla::ReleaseAcquire> freeChunks_;
> > +
> > +    // Length of that list, thread-safe but not necessarily synchronized with
> > +    // the list, useful for policy decisions.
> > +    mozilla::Atomic<uint32_t, mozilla::ReleaseAcquire> numFreeChunks_;
> 
> What is this used for currently? I don't see any uses in this patch.

It's not; I realised this myself and I've removed it.
(In reply to Shu-yu Guo [:shu] from comment #63)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> Review of attachment 8422516 [details] [diff] [review]:
> -----------------------------------------------------------------

> For the new files, you should run them through the style checker
> (cd ObjDir; make check-style) for #include order problems.

See bug 1011454 for a display of my refusing to drink the Cool-Aid
without additional sweetener being offered.

Anyway, just to clear up some important high-level points:

> @@ +78,5 @@
> > +// practice.
> > +//
> > +// The roots for a collection in the ForkJoinNursery are the execution
> > +// stack, any registered roots on the execution stack, any objects in
> > +// the private tenured area, and the ForkJoin result object in the
> 
> Why are all objects in the private tenured area rooted?

They are not rooted explicitly, but they are treated as roots during GC.
The reason is that the allocator can and will allocate objects in the tenured
area during a PJS run (not all objects can be allocated in the nursery) and
those objects can and will contain pointers, stored during initialization
or later, into the nursery.

If that's not clear from the comment then I will amend.  Will examine it.

> 
> @@ +83,5 @@
> > +// common tenured area.
> > +//
> > +// No write barriers are run during the ForkJoin section; a minor or
> > +// evacuating collection will examine all objects in the tenured area
> > +// for pointers into the nursery.
> 
> Ah, this is why you null out the runtime ptr in the PJS nursery chunk
> trailers.

No, that pointer is set in the chunk() function once the chunk is allocated,
the nulling is just to ensure that that setting is always done properly.

> @@ +123,5 @@
> > +// - While the gc is running, the previous "newspace" has been renamed
> > +//   as the gc's "fromspace", and the space that objects are copied
> > +//   into is known as the "tospace".  The tospace may be a nursery
> > +//   space, or it may be a tenured space (though it's always one or
> > +//   the other, never a combination).  After gc the fromspace is
> 
> So a nursery space during a minor gc, and a tenured space during evacuating
> gc? I would write that explicitly.

Indeed.  I will be more explicit.
(In reply to Lars T Hansen [:lth] from comment #65)
> (In reply to Shu-yu Guo [:shu] from comment #63)
> > Comment on attachment 8422516 [details] [diff] [review]
> > Patch 5 of 7: the ForkJoinNursery class and support code
> > 
> > Review of attachment 8422516 [details] [diff] [review]:
> > -----------------------------------------------------------------
> 
> > For the new files, you should run them through the style checker
> > (cd ObjDir; make check-style) for #include order problems.
> 
> See bug 1011454 for a display of my refusing to drink the Cool-Aid
> without additional sweetener being offered.

Oh boy, you've certainly put more thought into this than I have. After reading your comments, I agree with your position on the -inl. For the purposes of pushing this set of patches, though, I would still recommend suffering the inferior algorithm in check-style for now :)
(In reply to Lars T Hansen [:lth] from comment #64)
> (In reply to Shu-yu Guo [:shu] from comment #61)
> > Comment on attachment 8422511 [details] [diff] [review]
> > Patch 3 of 7: chunk pool for PJS GC
> > 
> > @@ +514,5 @@
> > > +        return mem;
> > > +#ifdef DEBUG
> > > +    memset(mem, 0xbd, ChunkSize);
> > > +#endif
> > > +    reinterpret_cast<Chunk*>(mem)->info.trailer.runtime = nullptr;
> > 
> > Two concerns here.
> > 
> > 1. Is there a more principled way to manage the various overlays on this
> > piece of memory?
> 
> I can probably use 'Chunk' as the data type (though I confess I don't
> share the sentiment :)  Anything more complicated than that and I'll
> probably put up a fight on the basis of "too much abstraction".

My main dislike is passing a void * around. Is the Chunk linked list overlay necessary? Chunk already has a next/prev pointer: http://dxr.mozilla.org/mozilla-central/source/js/src/gc/Heap.h#633
(In reply to Shu-yu Guo [:shu] from comment #63)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> @@ +339,5 @@
> > +ForkJoinNursery::MinorGCCallback(JSTracer *trcArg, void **thingp, JSGCTraceKind kind)
> > +{
> > +    ForkJoinNursery *nursery = static_cast<ForkJoinNurseryCollectionTracer*>(trcArg)->nursery_;
> > +    if (nursery->shouldMoveObject(thingp))
> > +        *thingp = nursery->moveToTospace(static_cast<JSObject *>(*thingp));
> 
> The comment in shouldMoveObject mentions that |thingp| should be of type
> GCThing ** for some GCThing, but here it is always cast to a JSObject *.
> This is true presently because nothing else can be nursery-allocated. I
> would add an assertion on AllocKind that it's a JSObject AllocKind and a
> comment explaining this.

Note that in this case we don't have an allocKind, but a traceKind.  But see below.

> @@ +390,5 @@
> > +            // Use ArenaCellIterUnderFinalize, not ...UnderGC, because that side-steps some
> > +            // assertions in the latter that are wrong for PJS collection.
> > +            tenured_->arenas.purge(kind);
> > +            for (ArenaCellIterUnderFinalize i(ai.get()); !i.done(); i.next())
> > +                objs[numObjs++] = i.get<JSObject>();
> 
> Ditto on AllocKind assertion.

Nice find, this is a latent bug.  The kind is the value of the variable that controls the loop, and we filter on the types of objects that are allocable in the nursery.  Right now those are all simple JSObject types, and the code here makes that assumption, as does the code above.
Comment on attachment 8422516 [details] [diff] [review]
Patch 5 of 7: the ForkJoinNursery class and support code

Review of attachment 8422516 [details] [diff] [review]:
-----------------------------------------------------------------

Whew, finally finished the first read through. Overall, it looks really solid. Since this is a big patch, I'd still like to see a new version of it.

::: js/src/gc/ForkJoinNursery.cpp
@@ +413,5 @@
> +     * since we always store the runtime as the last word in a nursery chunk,
> +     * isInsideFromspace will still be true, even if this zero-size allocation
> +     * abuts the end of the allocable area. Thus, it is always safe to read the
> +     * first word of |old| here.
> +     */

I see now that these functions are duplicated from Nursery wholesale. I would still like a consistent comment style throughout (//), but no big deal.

@@ +501,5 @@
> +    HeapSlot *slots = static_cast<HeapSlot *>(allocate(size));
> +    if (slots)
> +        return slots;
> +
> +    return nullptr;

Just return slots here, unless it's important to maintain parity with Nursery code.

@@ +524,5 @@
> +    return newSlots;
> +}
> +
> +bool
> +ForkJoinNursery::freeSlots(HeapSlot *slots)

This name is confusing given what it actually does, and would require reading the comment in the header to fully understand. I would remove this and call |isInsideNewspace| with a comment at the callsite.

@@ +616,5 @@
> +
> +    /*
> +     * Typed arrays in the nursery may have a lazily allocated buffer, make
> +     * sure there is room for the array's fixed data when moving the array.
> +     */

Hm, I'm not sure if this applies to PJS. I'll investigate after reviews.

@@ +644,5 @@
> +        // budget is used up.  There is a guard in
> +        // Chunk::allocateArena() against the latter case.
> +        return tenured_->arenas.allocateFromArena(evacuationZone_, thingKind);
> +    }
> +    else {

} else {

@@ +658,5 @@
> +ForkJoinNursery::allocateInTospace(size_t nelem, size_t elemSize)
> +{
> +    if (isEvacuating_)
> +        return evacuationZone_->malloc_(nelem*elemSize);
> +    return allocate(nelem*elemSize);

Style nit: spaces around *

@@ +681,5 @@
> +    movedSize_ += moveObjectToTospace(dst, src, dstKind);
> +
> +    RelocationOverlay *overlay = reinterpret_cast<RelocationOverlay *>(src);
> +    overlay->forwardTo(dst);
> +    insertIntoFixupList(overlay);

Lack of a remembered set turns out to be a-okay for our workloads, I suppose?

@@ +709,5 @@
> +        dst->setPrivate(dst->fixedData(TypedArrayObject::FIXED_DATA_START));
> +
> +    /* The shape's list head may point into the old object. */
> +    if (&src->shape_ == dst->shape_->listp)
> +        dst->shape_->listp = &dst->shape_;

I think this is okay, but it looks really scary. In general, there's no guarantee that dst->shape_ is thread-local. listp is something to do with dictionary mode, which JSObjects go into when too many properties are added to them. We can only convert thread-local objects in dictionary mode in PJS. So in this case, that means its shape_ must be thread-local in one of the tenured allocators. Please assert this using cx_->isThreadLocal.

@@ +728,5 @@
> +    size_t count = src->numDynamicSlots();
> +    dst->slots = reinterpret_cast<HeapSlot*>(allocateInTospace(count, sizeof(HeapSlot)));
> +    if (!dst->slots)
> +        CrashAtUnhandlableOOM("Failed to allocate slots while moving object.");
> +    mozilla::PodCopy(dst->slots, src->slots, count);

I'm not sure why PodCopy is used here but js_memcpy elsewhere both here and in the normal Nursery code. PodCopy has some built in optimization for small copies (< 128 items), but they seem otherwise identical. We should consolidate to use PodCopy.

@@ +799,5 @@
> +
> +ForkJoinNurseryCollectionTracer::ForkJoinNurseryCollectionTracer(JSRuntime *rt,
> +                                                                 ForkJoinNursery *nursery)
> +    : JSTracer(rt, ForkJoinNursery::MinorGCCallback, TraceWeakMapKeysValues)
> +    , nursery_(nursery)

Style nit: initializer lists are indented 2 spaces only

::: js/src/gc/ForkJoinNursery.h
@@ +44,5 @@
> +// It could look like this could be merged into ForkJoinNursery by
> +// making the latter derive from JSTracer; I've decided to keep them
> +// separate for now, since it allows for multiple instantiations of
> +// this class with different parameters, for different purposes.  That
> +// may change.

I agree with keeping them separate.

@@ +220,5 @@
> +                                    // allocate in
> +
> +    uintptr_t currentStart_;        // Start of current area in newspace
> +    uintptr_t currentEnd_;          // End of current area in newspace (last byte + 1)
> +    uintptr_t position_;            // Next free byte in current newspace chunk

Am I correct in understanding that these are cached to save on a deref?

::: js/src/gc/GCRuntime.h
@@ +360,5 @@
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    /* ForkJoin workers enter and leave GC independently; this counter
> +     * tracks the number that are currently in GC.
> +     */

Style nit: Just use //. Others may disagree with me though; if you want to keep /* */ style here and below, do

/*
 * ForkJoin workers [...]
 */

::: js/src/gc/Marking.cpp
@@ +155,5 @@
>  
> +    /* It looks to lth like the code below (runtimeFromMainThread,
> +     * etc) makes assumptions not valid for the ForkJoin worker
> +     * threads, so just bail.
> +     */

It'd be nice to get debugging asserts working for PJS GGC too as a followup patch after the main one lands. Most of the non-PJS safe calls should have direct PJS analogues. runtimeFromMainThread is just used for identity check that the tracer is on the same runtime, and the CurrentThreadCanAccessRuntime can be appropriately patched to check something like we're either inside a PJS collection or the current thread is the main thread.

Also nit: remove the "it looks like to lth" part, and add a TODO with a bug # for the followup patch if possible.

@@ +376,5 @@
>          return nursery.getForwardedPointer(thingp);
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (rt->isPJSMinorCollecting()) {
> +        ForkJoinContext *ctx = ForkJoinContext::current();

I'm not sure how hot these functions are, and if TLS performance will be a non-trivial factor. I imagine on Linux and OS X it'll be fine. I am unsure about Windows though.

@@ +381,5 @@
> +        ForkJoinNursery &fjNursery = ctx->fjNursery();
> +        if (fjNursery.isInsideFromspace(*thingp))
> +            return fjNursery.getForwardedPointer(thingp);
> +    }
> +#endif

Style nit: Move JSGC_FJGENERATIONAL inside the JSGC_GENERATIONAL ifdef for clarity that the former depends on the latter.

@@ +418,5 @@
> +        ForkJoinNursery &fjNursery = ctx->fjNursery();
> +        if (fjNursery.isInsideFromspace(thing))
> +            return !fjNursery.getForwardedPointer(thingp);
> +    }
> +#endif

Ditto as style nit above.

::: js/src/gc/Nursery-inl.h
@@ +18,5 @@
>  namespace gc {
>  
> +/* static */
> +inline RelocationOverlay *
> +RelocationOverlay::fromCell(Cell *cell) {

Style nit: function-opening { on its own line, here and below.

::: js/src/gc/Nursery.h
@@ +73,5 @@
> +    /* A list entry to track all relocated things. */
> +    RelocationOverlay *next_;
> +
> +  public:
> +    inline static RelocationOverlay *fromCell(Cell *cell);

Style nit: static inline

::: js/src/gc/RootMarking.cpp
@@ +680,5 @@
> +void
> +js::gc::MarkForkJoinStack(ForkJoinNurseryCollectionTracer *trc)
> +{
> +    PerThreadData *ptd = TlsPerThreadData.get();
> +    ForkJoinContext *fjc = ForkJoinContext::current();

Style nit: I'd prefer cx.

@@ +684,5 @@
> +    ForkJoinContext *fjc = ForkJoinContext::current();
> +
> +    AutoGCRooter::traceAll(fjc, trc);
> +    MarkExactStackRoots(fjc, trc);
> +    MarkInterpreterActivations(ptd, trc); // Shouldn't be any

Remove this and assert that there aren't any activations other than JIT activations inside ForkJoin PerThreadData.

::: js/src/jit/IonFrames.cpp
@@ +930,5 @@
>      for (GeneralRegisterBackwardIterator iter(safepoint.allGprSpills()); iter.more(); iter++) {
>          --spill;
> +        if (slotsRegs.has(*iter)) {
> +#ifdef JSGC_FJGENERATIONAL
> +            if (trc->callback == gc::ForkJoinNursery::MinorGCCallback) {

I'm not a fan of the hardcoded trc->callback check here, but it's a direct fallout of keeping the Nursery structs distinct.

Perhaps refactor this ifdef and the ifdef below so we only have one place to maintain?

@@ +1259,5 @@
> +
> +void
> +UpdateJitActivationsForMinorGC(PerThreadData *ptd, JSTracer *trc)
> +{
> +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting() || ForkJoinContext::current());

This assert should be a xor.

::: js/src/jsgc.cpp
@@ +883,5 @@
> +        // this thread is currently promoting into the tenured area.  I doubt
> +        // the better test would make much difference.
> +        if (!rt->isPJSMinorCollecting())
> +            return nullptr;
> +#else

Rather than these ifdefs being sprinkled at callsites, I would prefer adding a method like isAnyHeapMinorCollecting to JSRuntime.

@@ +2036,5 @@
> +    // evacuating collection, ergo "Any" thread.  However, it is
> +    // possible there should instead be a guard at the call to
> +    // this function from within allocateArena, or that allocateArena
> +    // should not be called for the allocation.
> +    return zone->runtimeFromAnyThread()->gc.triggerZoneGC(zone,reason);

On my copy of m-c, js::TriggerZoneGC already uses runtimeFromAnyThread, so the comment is unnecessary.

Style nit: zone, reason.

::: js/src/vm/ArrayBufferObject.cpp
@@ +799,5 @@
> +    if (!IS_GC_MARKING_TRACER(trc) && !trc->runtime()->isHeapMinorCollecting()
> +#ifdef JSGC_FJGENERATIONAL
> +        && !trc->runtime()->isPJSMinorCollecting()
> +#endif
> +        ) {

Wow this ifdef is gross. See my previous note about isAnyHeapMinorCollecting.

::: js/src/vm/Runtime.h
@@ +936,5 @@
> +    // counter is greater than zero.  (It will require some care to
> +    // make sure the two variables stay in sync.)
> +    bool isPJSMinorCollecting() { return gc.pjsCollectionCounter > 0; }
> +    void incPJSMinorCollecting() { gc.pjsCollectionCounter++; }
> +    void decPJSMinorCollecting() { gc.pjsCollectionCounter--; }

Naming nit: the rest of the code refers to ForkJoinNursery instead of PJSNursery. Rename this ForkJoin as well?
Attachment #8422516 - Flags: review?(shu)
Comment on attachment 8422517 [details] [diff] [review]
Patch 6 of 7: allocation paths

Review of attachment 8422517 [details] [diff] [review]:
-----------------------------------------------------------------

Reading this makes me dislike the JSGC_FJGENERATIONAL ifdef a lot.

Lars, what do you think of making JS_ENABLE_PARALLEL_JS depend on JSGC_GENERATIONAL, and removing JSGC_FJGENERATIONAL? Given how poorly non-generational does in PJS, I don't see why we would not want to have this be always-on.

::: js/src/jit/CodeGenerator.cpp
@@ +3775,5 @@
> +    // observe that eg the Intel demo code allocates arrays at a very
> +    // high rate and it's not obvious that that's going to be
> +    // atypical.  We need array allocation to be fast.  The only
> +    // reason it might not matter is if very short arrays have in-line
> +    // elements.

Remove this comment, as you marked bug 101086 as WONTFIX.

The inline written at a time when we didn't have the ability to call special parallel versions of natives. After we implemented that ability, it was easier to leave this inline as is rather than implement fully general allocation paths that had the functionality of initGCThing (as the comment says below).

It's fine to leave it as is for now. Having a usable-from-PJS general allocation function within the C++ runtime would let us clean this up nicely though.

::: js/src/jit/IonMacroAssembler.cpp
@@ +649,5 @@
> +    // the last chunk of the nursery not being located at the very top
> +    // of the address space.
> +    //
> +    // Note, result and cx may be mapped to the same register, see
> +    // notes below.

Where is this note? The OOL note below is something different, I think. The result and cx registers shouldn't overlap here, since we didn't lower the cx input with useRegisterAtStart.

@@ +670,5 @@
> +    // t2 <- &nursery
> +    loadPtr(Address(cx, ForkJoinContext::offsetOfNursery()), tempReg2);
> +    // nursery->position <- nursery->position + size
> +    storePtr(tempReg1, Address(tempReg2, gc::ForkJoinNursery::offsetOfPosition()));
> +    // r  <- nursery->position

r <- nursery->position - size

Or how about storing nursery->position into result before bumping the pointer?

::: js/src/jsgcinlines.h
@@ +57,5 @@
>      if (!isForkJoinContext())
>          return true;
>  
> +#ifdef JSGC_FJGENERATIONAL
> +    ForkJoinContext *fjcx = static_cast<ForkJoinContext*>(const_cast<ThreadSafeContext*>(this));

Perhaps add a const overload of asForkJoinContext on ThreadSafeContext?

@@ +64,5 @@
> +#else
> +    // TODO: This should never happen; the test is old code that should possibly
> +    // be an assert.
> +    if (IsInsideNursery(runtime_, thing))
> +        return false;

Yeah, seems fine to just assert this.

::: js/src/jsobj.cpp
@@ +2818,5 @@
>          return cx->asJSContext()->runtime()->gc.nursery.freeSlots(cx->asJSContext(), slots);
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext()) {
> +        if (cx->asForkJoinContext()->fjNursery().freeSlots(slots))

See note for previous part about renaming this.

@@ +3048,5 @@
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext()) {
> +        ObjectElements *elts =
> +            cx->asForkJoinContext()->fjNursery().allocateElements(obj,
> +                                                                  nelems);

Style nit: one line.

::: js/src/vm/ForkJoin.cpp
@@ +1235,5 @@
> +{
> +  public:
> +    AutoHeapState(JSContext *cx, HeapState s)
> +        : rt(cx->runtime())
> +        , oldState(rt->gc.heapState)

Style nit: 2 space indent and put the comma on the previous line. (I personally like this style better, but that's not what's consistent with the rest of the codebase)

@@ +1276,5 @@
>          return RedLight;
>      }
>  
> +    {
> +        AutoHeapState state(cx_, PJSRunning);

Instead of putting this AutoHeapState here, how about put it inside execute, so that you don't have to undo it in transferArenasToCompartmentAndProcessGCRequests?
Attachment #8422517 - Flags: review?(shu)
Comment on attachment 8422516 [details] [diff] [review]
Patch 5 of 7: the ForkJoinNursery class and support code

Review of attachment 8422516 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ForkJoin.h
@@ +448,5 @@
>      }
>  
> +#ifdef JSGC_FJGENERATIONAL
> +    // There is already a nursery() method in ThreadSafeContext
> +    gc::ForkJoinNursery& fjNursery() { return fjNursery_; }

Please rename the methods on ThreadSafeContext, as their names are now misleading.

I was thinking of the following refactor:

Remove ThreadSafeContext::hasNursery, since both PJS and non-PJS contexts have nurseries now, and replacing its use with tests for the kind of nursery.

Move ThreadSafeContext::nursery to ExclusiveContext::nursery.

Add ForkJoinContext::nursery.

All uses of ThreadSafeContext::nursery would then require first casting to one of the more specific context types.
Comment on attachment 8422517 [details] [diff] [review]
Patch 6 of 7: allocation paths

Review of attachment 8422517 [details] [diff] [review]:
-----------------------------------------------------------------

Reading this makes me dislike the JSGC_FJGENERATIONAL ifdef a lot.

Lars, what do you think of making JS_ENABLE_PARALLEL_JS depend on JSGC_GENERATIONAL, and removing JSGC_FJGENERATIONAL? Given how poorly non-generational does in PJS, I don't see why we would not want to have this be always-on.

::: js/src/jit/CodeGenerator.cpp
@@ +3775,5 @@
> +    // observe that eg the Intel demo code allocates arrays at a very
> +    // high rate and it's not obvious that that's going to be
> +    // atypical.  We need array allocation to be fast.  The only
> +    // reason it might not matter is if very short arrays have in-line
> +    // elements.

Remove this comment, as you marked bug 101086 as WONTFIX.

The inline written at a time when we didn't have the ability to call special parallel versions of natives. After we implemented that ability, it was easier to leave this inline as is rather than implement fully general allocation paths that had the functionality of initGCThing (as the comment says below).

It's fine to leave it as is for now. Having a usable-from-PJS general allocation function within the C++ runtime would let us clean this up nicely though.

::: js/src/jit/IonMacroAssembler.cpp
@@ +649,5 @@
> +    // the last chunk of the nursery not being located at the very top
> +    // of the address space.
> +    //
> +    // Note, result and cx may be mapped to the same register, see
> +    // notes below.

Where is this note? The OOL note below is something different, I think. The result and cx registers shouldn't overlap here, since we didn't lower the cx input with useRegisterAtStart.

@@ +670,5 @@
> +    // t2 <- &nursery
> +    loadPtr(Address(cx, ForkJoinContext::offsetOfNursery()), tempReg2);
> +    // nursery->position <- nursery->position + size
> +    storePtr(tempReg1, Address(tempReg2, gc::ForkJoinNursery::offsetOfPosition()));
> +    // r  <- nursery->position

r <- nursery->position - size

Or how about storing nursery->position into result before bumping the pointer?

::: js/src/jsgcinlines.h
@@ +57,5 @@
>      if (!isForkJoinContext())
>          return true;
>  
> +#ifdef JSGC_FJGENERATIONAL
> +    ForkJoinContext *fjcx = static_cast<ForkJoinContext*>(const_cast<ThreadSafeContext*>(this));

Perhaps add a const overload of asForkJoinContext on ThreadSafeContext?

@@ +64,5 @@
> +#else
> +    // TODO: This should never happen; the test is old code that should possibly
> +    // be an assert.
> +    if (IsInsideNursery(runtime_, thing))
> +        return false;

Yeah, seems fine to just assert this.

@@ +486,5 @@
>  
> +#ifdef JSGC_FJGENERATIONAL
> +template <AllowGC allowGC>
> +inline JSObject *
> +TryNewFJNurseryObject(ForkJoinContext *cx, size_t thingSize, size_t nDynamicSlots)

I would rename this TryNewNurseryObject and overload it, and change TryNewNurseryObject to only take ExclusiveContext *. Same for ShouldFJNurseryAllocate.

@@ +493,5 @@
> +    JSObject *obj = nursery.allocateObject(thingSize, nDynamicSlots);
> +    if (obj)
> +        return obj;
> +
> +    // Note, allowGC is ignored - ForkJoin nursery collection is always allowed.

Ignoring AllowGC is bad in general. Is there undue implementation burden to support it, so we can use AutoSuppressGC in PJS?

::: js/src/jsobj.cpp
@@ +2818,5 @@
>          return cx->asJSContext()->runtime()->gc.nursery.freeSlots(cx->asJSContext(), slots);
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext()) {
> +        if (cx->asForkJoinContext()->fjNursery().freeSlots(slots))

See note for previous part about renaming this.

@@ +3048,5 @@
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext()) {
> +        ObjectElements *elts =
> +            cx->asForkJoinContext()->fjNursery().allocateElements(obj,
> +                                                                  nelems);

Style nit: one line.

::: js/src/vm/ForkJoin.cpp
@@ +1235,5 @@
> +{
> +  public:
> +    AutoHeapState(JSContext *cx, HeapState s)
> +        : rt(cx->runtime())
> +        , oldState(rt->gc.heapState)

Style nit: 2 space indent and put the comma on the previous line. (I personally like this style better, but that's not what's consistent with the rest of the codebase)

@@ +1276,5 @@
>          return RedLight;
>      }
>  
> +    {
> +        AutoHeapState state(cx_, PJSRunning);

Instead of putting this AutoHeapState here, how about put it inside execute, so that you don't have to undo it in transferArenasToCompartmentAndProcessGCRequests?
For some reason the review tool decided to re-post all my comments for part 6 instead of just the new one. Lars, please ignore comment 70.
(In reply to Lars T Hansen [:lth] from comment #68)
> (In reply to Shu-yu Guo [:shu] from comment #63)
> > Comment on attachment 8422516 [details] [diff] [review]
> > Patch 5 of 7: the ForkJoinNursery class and support code
> [...]
> Nice find, this is a latent bug.  The kind is the value of the variable that
> controls the loop, and we filter on the types of objects that are allocable
> in the nursery.  Right now those are all simple JSObject types, and the code
> here makes that assumption, as does the code above.

Indeed, since I've split the ForkJoinNursery's code from the Nursery's code, the
underlying bug may be that I'm still making use of the Nursery's predicate to
determine whether an object is nursery-allocable.  That could (should?) be
something the ForkJoinNursery decides for itself; it would be more robust that way.
(In reply to Shu-yu Guo [:shu] from comment #69)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> @@ +681,5 @@
> > +    movedSize_ += moveObjectToTospace(dst, src, dstKind);
> > +
> > +    RelocationOverlay *overlay = reinterpret_cast<RelocationOverlay *>(src);
> > +    overlay->forwardTo(dst);
> > +    insertIntoFixupList(overlay);
> 
> Lack of a remembered set turns out to be a-okay for our workloads, I suppose?

What workloads? :-P  We're suffering from a lack of good workloads at the moment, partly
as a result of having implemented very few of the parallel functions with true ForkJoin
support.

The few programs I've looked at so far do not allocate much into the tenured area, since
their objects are straightforward (functions and strings would be the main types to
go into the tenured area right now).  Some of these programs, like SimPJS, have very
large working sets in the nursery and incur quite a bit of copying overhead, see my
earlier mail to Jas, Niko, and yourself.  But so far as I can tell root scanning is
not much of an issue.  It can become an issue when the nursery is full and we have
to evacuate; IMO that will be a rare occurrence and we should not worry about it yet.

> @@ +220,5 @@
> > +                                    // allocate in
> > +
> > +    uintptr_t currentStart_;        // Start of current area in newspace
> > +    uintptr_t currentEnd_;          // End of current area in newspace (last byte + 1)
> > +    uintptr_t position_;            // Next free byte in current newspace chunk
> 
> Am I correct in understanding that these are cached to save on a deref?

And/or temporary registers.  I really want to move currentEnd_ and position_ out of the
ForkJoinNursery and into the ForkJoinContext to further shorten the in-line allocation
path.

> @@ +376,5 @@
> >          return nursery.getForwardedPointer(thingp);
> >  #endif
> > +#ifdef JSGC_FJGENERATIONAL
> > +    if (rt->isPJSMinorCollecting()) {
> > +        ForkJoinContext *ctx = ForkJoinContext::current();
> 
> I'm not sure how hot these functions are, and if TLS performance will be a
> non-trivial factor. I imagine on Linux and OS X it'll be fine. I am unsure
> about Windows though.

Yes, the cost of this is open.  The cheap predicate ensures(?) that there's little
cost to the general marker.  And it's my understanding that we only hit the
call to ::current() when we do root scanning, and we do little root scanning.
But I can dig into that - and the cost of TLS access - further.

Frankly I doubt there's much to be done here, apart from lifting the TLS access
into a variable at a higher level and then passing that value around; that
perturbs the marker a bit, and would only be worth it if the access is hot.

> @@ +381,5 @@
> > +        ForkJoinNursery &fjNursery = ctx->fjNursery();
> > +        if (fjNursery.isInsideFromspace(*thingp))
> > +            return fjNursery.getForwardedPointer(thingp);
> > +    }
> > +#endif
> 
> Style nit: Move JSGC_FJGENERATIONAL inside the JSGC_GENERATIONAL ifdef for
> clarity that the former depends on the latter.

It struck me that while there was a clear dependence in the earlier design,
where I had refactored Nursery, there may not be much of a dependence any
longer.  I will investigate that.

> ::: js/src/jsgc.cpp
> @@ +883,5 @@
> > +        // this thread is currently promoting into the tenured area.  I doubt
> > +        // the better test would make much difference.
> > +        if (!rt->isPJSMinorCollecting())
> > +            return nullptr;
> > +#else
> 
> Rather than these ifdefs being sprinkled at callsites, I would prefer adding
> a method like isAnyHeapMinorCollecting to JSRuntime.

I might prefer that too, but I'm far from sure about the consequences of that
and generally I've avoided it.  I've had to do it a couple of places with a
test for whether something is in any nursery.
(In reply to Shu-yu Guo [:shu] from comment #70)
> Comment on attachment 8422517 [details] [diff] [review]
> Patch 6 of 7: allocation paths
> 
> Review of attachment 8422517 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +1276,5 @@
> >          return RedLight;
> >      }
> >  
> > +    {
> > +        AutoHeapState state(cx_, PJSRunning);
> 
> Instead of putting this AutoHeapState here, how about put it inside execute,
> so that you don't have to undo it in
> transferArenasToCompartmentAndProcessGCRequests?

There's a reason for the current structure, it prevents the runtime from doing certain things
that it does when the heap state is idle.  I seem to remember that it had to do with asserts
firing when they shouldn't (because asserts often make abstraction-breaking assumptions
they should not be making, usually by accident).

I can try to remove it and see what happens, it's possible that the current structure was
required as a consequence of factoring the Nursery.  If it remains required I'll add
an appropriate comment.
A note to reviewers: there's a subtle bug in the code that reallocates slots and elements array.  The ForkJoinNursery method for that will currently return nullptr if the new array size is too large.  If that happens, a subsequent call to realloc will be passed a pointer that it did not allocate and things will fall apart.
(In reply to Shu-yu Guo [:shu] from comment #71)

> 
> Please rename the methods on ThreadSafeContext, as their names are now
> misleading.
> 
> I was thinking of the following refactor:
> 
> Remove ThreadSafeContext::hasNursery, since both PJS and non-PJS contexts
> have nurseries now, and replacing its use with tests for the kind of nursery.
> 

Oops, the above should say "replacing its use with tests for the kind of context".
Comment on attachment 8422517 [details] [diff] [review]
Patch 6 of 7: allocation paths

Review of attachment 8422517 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/TypedObject.js
@@ +1282,2 @@
>          if (outGrainTypeIsTransparent)
>            ClearThreadLocalArenas();

4-space indent.

::: js/src/jit/IonMacroAssembler.cpp
@@ +670,5 @@
> +    // t2 <- &nursery
> +    loadPtr(Address(cx, ForkJoinContext::offsetOfNursery()), tempReg2);
> +    // nursery->position <- nursery->position + size
> +    storePtr(tempReg1, Address(tempReg2, gc::ForkJoinNursery::offsetOfPosition()));
> +    // r  <- nursery->position

Looks like it should be |// r <- nursery->position - size|.

::: js/src/jit/ParallelFunctions.cpp
@@ +36,5 @@
>  jit::NewGCThingPar(ForkJoinContext *cx, gc::AllocKind allocKind)
>  {
>      JS_ASSERT(ForkJoinContext::current() == cx);
> +#ifdef JSGC_FJGENERATIONAL
> +    // NoGC is correct, the nursery may be collected anyhow

Add a period at the end.

::: js/src/vm/ForkJoin.h
@@ +454,5 @@
>      // Evacuate live data from the per-thread nursery into the per-thread
>      // tenured area.
>      void evacuateLiveData() { fjNursery_.evacuatingGC(); }
> +
> +    // Used in inlining nursery allocation

Add a period at the end.
Comment on attachment 8422511 [details] [diff] [review]
Patch 3 of 7: chunk pool for PJS GC

Review of attachment 8422511 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8422511 - Flags: review?(terrence) → review+
Comment on attachment 8422516 [details] [diff] [review]
Patch 5 of 7: the ForkJoinNursery class and support code

Review of attachment 8422516 [details] [diff] [review]:
-----------------------------------------------------------------

This is quite nice. r=me

Since PJS is denovo right now, I'd like to see a close followup removing the #ifdef JSGC_FJGENERATIONAL typedef as soon as you are confident you won't need to back out.

::: js/src/gc/Marking.cpp
@@ +154,5 @@
>      JS_ASSERT(thingp);
>  
> +    /* It looks to lth like the code below (runtimeFromMainThread,
> +     * etc) makes assumptions not valid for the ForkJoin worker
> +     * threads, so just bail.

Our comment style has the /* on its own line. Alternatively use //: we're moving in that direction.

@@ +157,5 @@
> +     * etc) makes assumptions not valid for the ForkJoin worker
> +     * threads, so just bail.
> +     */
> +    if (ForkJoinContext::current())
> +        return;

Adding the crash diagnostics branch below was a regression on several benchmarks, so I'd like this to move down into the #ifdef DEBUG section below the #ifdef JS_CRASH_DIAGNOSTICS. I think the crash-diagnostics section also applies for FJS code.

@@ +235,5 @@
>      CheckMarkedThing(trc, thingp);
>      T *thing = *thingp;
>  
>      if (!trc->callback) {
> +        /* This case should never be reached from PJS collections as

/*\n

@@ +376,5 @@
>          return nursery.getForwardedPointer(thingp);
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (rt->isPJSMinorCollecting()) {
> +        ForkJoinContext *ctx = ForkJoinContext::current();

IsMarked is generally only on weak things, so should hopefully be minimally hot.

::: js/src/gc/Nursery.cpp
@@ +906,5 @@
>  }
>  
> +#undef TIME_START
> +#undef TIME_END
> +#undef TIME_TOTAL

Good catch!
Attachment #8422516 - Flags: review?(terrence) → review+
Blocks: 1014409
(In reply to Shu-yu Guo [:shu] from comment #71)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> Review of attachment 8422516 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/ForkJoin.h
> @@ +448,5 @@
> >      }
> >  
> > +#ifdef JSGC_FJGENERATIONAL
> > +    // There is already a nursery() method in ThreadSafeContext
> > +    gc::ForkJoinNursery& fjNursery() { return fjNursery_; }
> 
> Please rename the methods on ThreadSafeContext, as their names are now
> misleading.
> 
> I was thinking of the following refactor:
> 
> Remove ThreadSafeContext::hasNursery, since both PJS and non-PJS contexts
> have nurseries now, and replacing its use with tests for the kind of context. [edited for typo, see #c78]
> 
> Move ThreadSafeContext::nursery to ExclusiveContext::nursery.
> 
> Add ForkJoinContext::nursery.
> 
> All uses of ThreadSafeContext::nursery would then require first casting to
> one of the more specific context types.

I agree this would be quite a bit cleaner than what we have at the moment, but it depends on what is possibly an accidental or implementation detail, namely that the nursery currently is only available for a JSContext.  Let's ask Terrence about that.
Flags: needinfo?(terrence)
(In reply to Shu-yu Guo [:shu] from comment #63)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> Review of attachment 8422516 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: js/src/gc/ForkJoinNursery-inl.h
> @@ +35,5 @@
> > +// be few survivors of a collection in the ForkJoinNursery and few
> > +// objects will be tested.  The bulk of the calls may come from the
> > +// code that scans the roots.
> > +
> > +MOZ_ALWAYS_INLINE bool
> 
> Just use the C++ 'inline' keyword here and elsewhere.

Those have different semantics, I think - MOZ_ALWAYS_INLINE should be equivalent to "force inline" in sundry compilers, which overrides compiler settings.

Anyway the FJ code is following the rest of the GC code here.
(In reply to Shu-yu Guo [:shu] from comment #69)
> Comment on attachment 8422516 [details] [diff] [review]
> Patch 5 of 7: the ForkJoinNursery class and support code
> 
> Review of attachment 8422516 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +728,5 @@
> > +    size_t count = src->numDynamicSlots();
> > +    dst->slots = reinterpret_cast<HeapSlot*>(allocateInTospace(count, sizeof(HeapSlot)));
> > +    if (!dst->slots)
> > +        CrashAtUnhandlableOOM("Failed to allocate slots while moving object.");
> > +    mozilla::PodCopy(dst->slots, src->slots, count);
> 
> I'm not sure why PodCopy is used here but js_memcpy elsewhere both here and
> in the normal Nursery code. PodCopy has some built in optimization for small
> copies (< 128 items), but they seem otherwise identical. We should
> consolidate to use PodCopy.

In the js_memcpy cases the type information is not available for PodCopy to work correctly,
though I think it could be made available.  I've left it alone for the moment but I agree
it seems a little messy.

> @@ +1259,5 @@
> > +
> > +void
> > +UpdateJitActivationsForMinorGC(PerThreadData *ptd, JSTracer *trc)
> > +{
> > +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting() || ForkJoinContext::current());
> 
> This assert should be a xor.

It could be an xor, but that serves to confuse the matter more than illuminate it IMO, so
I've left it alone.
(In reply to Lars T Hansen [:lth] from comment #76)
> (In reply to Shu-yu Guo [:shu] from comment #70)
> > Comment on attachment 8422517 [details] [diff] [review]
> > Patch 6 of 7: allocation paths
> > 
> > Review of attachment 8422517 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > @@ +1276,5 @@
> > >          return RedLight;
> > >      }
> > >  
> > > +    {
> > > +        AutoHeapState state(cx_, PJSRunning);
> > 
> > Instead of putting this AutoHeapState here, how about put it inside execute,
> > so that you don't have to undo it in
> > transferArenasToCompartmentAndProcessGCRequests?
> 
> There's a reason for the current structure, it prevents the runtime from
> doing certain things
> that it does when the heap state is idle.  I seem to remember that it had to
> do with asserts
> firing when they shouldn't (because asserts often make abstraction-breaking
> assumptions
> they should not be making, usually by accident).
> 
> I can try to remove it and see what happens, it's possible that the current
> structure was
> required as a consequence of factoring the Nursery.  If it remains required
> I'll add
> an appropriate comment.

This has turned out no longer to be necessary, so I've removed it.
(In reply to Shu-yu Guo [:shu] from comment #72)
> Comment on attachment 8422517 [details] [diff] [review]
> Patch 6 of 7: allocation paths
> 
> Review of attachment 8422517 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +486,5 @@
> >  
> > +#ifdef JSGC_FJGENERATIONAL
> > +template <AllowGC allowGC>
> > +inline JSObject *
> > +TryNewFJNurseryObject(ForkJoinContext *cx, size_t thingSize, size_t nDynamicSlots)
> 
> I would rename this TryNewNurseryObject and overload it, and change
> TryNewNurseryObject to only take ExclusiveContext *. Same for
> ShouldFJNurseryAllocate.

I've not done anything here as that change would propagate out a bit (all the call paths
operate on ThreadSafeContext).  I also think that this fits in neatly with the other
refactoring you suggested and that we can do it as a follow-on item if Terrence agrees
that it's the right thing.  I'm happy to volunteer for the work.
(In reply to Lars T Hansen [:lth] from comment #86)
> (In reply to Shu-yu Guo [:shu] from comment #72)
> > Comment on attachment 8422517 [details] [diff] [review]
> > Patch 6 of 7: allocation paths
> > 
> > Review of attachment 8422517 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > @@ +486,5 @@
> > >  
> > > +#ifdef JSGC_FJGENERATIONAL
> > > +template <AllowGC allowGC>
> > > +inline JSObject *
> > > +TryNewFJNurseryObject(ForkJoinContext *cx, size_t thingSize, size_t nDynamicSlots)
> > 
> > I would rename this TryNewNurseryObject and overload it, and change
> > TryNewNurseryObject to only take ExclusiveContext *. Same for
> > ShouldFJNurseryAllocate.
> 
> I've not done anything here as that change would propagate out a bit (all
> the call paths
> operate on ThreadSafeContext).  I also think that this fits in neatly with
> the other
> refactoring you suggested and that we can do it as a follow-on item if
> Terrence agrees
> that it's the right thing.  I'm happy to volunteer for the work.

Followup is fine with me.
Second complete round of reviews.

There are eight patches.

The first patch enables JSGC_FJGENERATIONAL in all the relevant build files.  Once I have HTML-embedded test cases it will probably become apparent that I need to turn it on in more places since the size of JSRuntime depends on the define.  This patch is substantially new.

The second patch makes some basic changes to the JIT (cleanup).

The third patch changes the structure of the NewDenseArrayPar primitive to unconditionally use callVM rather than callWithABI.  The change is needed for the ForkJoinNursery; #ifdeffing was just too painful.  This patch is substantially new.

The fourth patch propagages the result array from the selfhosted code into the VM.

The fifth patch introduces a private chunk pool for the ForkJoin nurseries.

The sixth patch generalizes the GC's marker a little.

The seventh patch introduces ForkJoinNursery and the machinery around it and hooks it into the code, though the code does not make use of it.

The eight patch introduces allocation paths that make use of the ForkJoinNursery.

Except for points to which I have responded directly in previous comments I believe I've addressed all substantial review comments and almost all nits.
Attachment #8422508 - Attachment is obsolete: true
Attachment #8422509 - Attachment is obsolete: true
Attachment #8422511 - Attachment is obsolete: true
Attachment #8422513 - Attachment is obsolete: true
Attachment #8422516 - Attachment is obsolete: true
Attachment #8422517 - Attachment is obsolete: true
Attachment #8422518 - Attachment is obsolete: true
Attachment #8429005 - Flags: review?(shu)
Attachment #8429006 - Flags: review?(shu)
Attachment #8429008 - Flags: review?(shu)
Attached patch Patch 5 of 8: Chunk pool (obsolete) — Splinter Review
Attachment #8429012 - Flags: review?(shu)
Attachment #8429013 - Flags: review?(shu)
Attachment #8429014 - Flags: review?(shu)
Attached patch Patch 8 of 8: Allocation paths (obsolete) — Splinter Review
Attachment #8429015 - Flags: review?(shu)
Try run for that patch set on top of m-c 184755:e86a0d92d174:

https://tbpl.mozilla.org/?tree=Try&rev=af574e497181
Attached image Input for test case (obsolete) —
An unencumbered input image for the convolution test cases.  This is a free photo of an Amur leopard taken from morguefile.com on 27 May 2014, URL http://mrg.bz/MT0rPL.  It has been scaled down significantly, cropped slightly, and converted from jpeg to pgm.
Attachment #8411311 - Attachment is obsolete: true
(In reply to Lars T Hansen [:lth] from comment #82)
> (In reply to Shu-yu Guo [:shu] from comment #71)
> > Comment on attachment 8422516 [details] [diff] [review]
> > Patch 5 of 7: the ForkJoinNursery class and support code
> > 
> > Review of attachment 8422516 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: js/src/vm/ForkJoin.h
> > @@ +448,5 @@
> > >      }
> > >  
> > > +#ifdef JSGC_FJGENERATIONAL
> > > +    // There is already a nursery() method in ThreadSafeContext
> > > +    gc::ForkJoinNursery& fjNursery() { return fjNursery_; }
> > 
> > Please rename the methods on ThreadSafeContext, as their names are now
> > misleading.
> > 
> > I was thinking of the following refactor:
> > 
> > Remove ThreadSafeContext::hasNursery, since both PJS and non-PJS contexts
> > have nurseries now, and replacing its use with tests for the kind of context. [edited for typo, see #c78]
> > 
> > Move ThreadSafeContext::nursery to ExclusiveContext::nursery.
> > 
> > Add ForkJoinContext::nursery.
> > 
> > All uses of ThreadSafeContext::nursery would then require first casting to
> > one of the more specific context types.
> 
> I agree this would be quite a bit cleaner than what we have at the moment,
> but it depends on what is possibly an accidental or implementation detail,
> namely that the nursery currently is only available for a JSContext.  Let's
> ask Terrence about that.

I think Shu's proposal makes sense for the base JS nursery as well for the foreseeable future.

Anything that we are allocating off-main-thread is likely to be long lived -- attached to scripts and such -- so we're getting the behavior we want "for free." If we eventually move to an oldest-first scheme or do heavy off-main-thread allocation, we'll almost certainly need to revisit these mechanisms anyway.
Flags: needinfo?(terrence)
(In reply to Lars T Hansen [:lth] from comment #95)
> Created attachment 8429015 [details] [diff] [review]
> Patch 8 of 8: Allocation paths

Note the inline allocation path (in IonMacroAssembler.cpp) is wrong, because the ForkJoinNursery is a member variable of the ForkJoinContext, not an object pointed to from the latter.  I have a fix but need to clean it up; I will post it here in the morning.
Comment on attachment 8429005 [details] [diff] [review]
Patch 1 of 8: Turn on JSGC_FJGENERATIONAL

Review of attachment 8429005 [details] [diff] [review]:
-----------------------------------------------------------------

Man, gotta love the build system fragmentation. :(
Attachment #8429005 - Flags: review?(shu) → review+
Comment on attachment 8429006 [details] [diff] [review]
Patch 2 of 8: Basic JIT adjustments

Review of attachment 8429006 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for separating out this refactor patch.

::: js/src/jit/IonFrames.cpp
@@ +852,2 @@
>      } else {
> +        ionScript = frame.ionScriptFromCalleeToken();

Ah, nice.

::: js/src/jit/MIR.h
@@ +1612,5 @@
>      JSObject *templateObject() const {
>          return templateObject_;
>      }
> +
> +    virtual AliasSet getAliasSet() const {

Nit: we don't really use |virtual| on the overriding function.
Attachment #8429006 - Flags: review?(shu) → review+
Comment on attachment 8429007 [details] [diff] [review]
Patch 3 of 8: Clean up NewDenseArray JIT implementation

Review of attachment 8429007 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with keeping LNewDenseArrayPar an LCallInstructionHelper, or an explanation on IRC of why it was changed.

::: js/src/jit/CodeGenerator.cpp
@@ +3863,2 @@
>      emitAllocateGCThingPar(lir, tempReg2, cxReg, tempReg0, tempReg1, templateObj);
> +    masm.Pop(lengthReg);

I don't think you need Push and Pop here, which are used when masm needs to track the frame size for computing offsets into stack slots. It looks like here you're just saving a register, so the lower-case variants are push and pop are fine.

::: js/src/jit/LIR-Common.h
@@ +373,5 @@
>          return getTemp(1);
>      }
>  };
>  
> +class LNewDenseArrayPar : public LInstructionHelper<1, 2, 3>

This seems to unconditionally callVM, so why change it from a LCallInstructionHelper? Then you don't need the manual saveLive and storeResultTo dance.
Attachment #8429007 - Flags: review?(shu) → review+
Comment on attachment 8429008 [details] [diff] [review]
Patch 4 of 8: propagate the result array

Review of attachment 8429008 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ForkJoin.cpp
@@ +1341,5 @@
>      }
>  
>      bool invoke(ForkJoinContext *cx) {
>          JitActivation activation(cx);
> +        Value result = Int32Value(0);

This could use a comment explaining what the initialization's for.

::: js/src/vm/SelfHosting.cpp
@@ +286,5 @@
> + * ForkJoin(func, sliceStart, sliceEnd, mode, updatable): Invokes |func| many times in parallel.
> + *
> + * If "func" will update a pre-existing object then that object /must/ be passed
> + * as the object "updatable".  It is /not/ correct to pass an object that
> + * references the updatable objects indirectly.

Wow this comment was outdated. Thanks for the update.
Attachment #8429008 - Flags: review?(shu) → review+
Comment on attachment 8429012 [details] [diff] [review]
Patch 5 of 8: Chunk pool

Review of attachment 8429012 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: js/src/vm/ThreadPool.cpp
@@ +259,5 @@
>  ThreadPool::ThreadPool(JSRuntime *rt)
>    : activeWorkers_(0),
>      joinBarrier_(nullptr),
>      job_(nullptr),
> +#if defined DEBUG || defined JSGC_FJGENERATIONAL

Style nit: defined(DEBUG) || defined(JSGC_FJGENERATIONAL)

@@ +298,5 @@
>      joinBarrier_ = PR_NewCondVar(lock_);
> +    if (!joinBarrier_)
> +        return false;
> +#endif
> +#ifdef JSGC_FJGENERATIONAL

Please nest this inside JS_THREADSAFE. Non-threadsafe builds aren't going to have NSPR.

@@ +527,5 @@
> +        freeChunks_ = p->next;
> +    PR_Unlock(chunkLock_);
> +
> +    if (p) {
> +        // Already poisoned

Style nit: period at the end.

@@ +533,5 @@
> +    }
> +    gc::ForkJoinNurseryChunk *c =
> +        reinterpret_cast<gc::ForkJoinNurseryChunk *>(runtime_->
> +                                                     gc.pageAllocator.mapAlignedPages(gc::ChunkSize,
> +                                                                                      gc::ChunkSize));

Style nit: I would do something like 

    gc::ForkJoinNurseryChunk *c = reinterpret_cast<gc::ForkJoinNurseryChunk *>(
        runtime_->gc.pageAllocator.mapAlignedPages(gc::ChunkSize,
                                                   gc::ChunkSize));

@@ +556,5 @@
> +void
> +ThreadPool::poisonChunk(gc::ForkJoinNurseryChunk *c)
> +{
> +#ifdef DEBUG
> +    memset(c, 0xbd, gc::ChunkSize);

Please define this pattern in js/public/Utility.h with the other patterns, so that somebody else doesn't accidentally define another poison pattern to 0xbd.

::: js/src/vm/ThreadPool.h
@@ +191,5 @@
>  
>      // The current job.
>      ParallelJob *job_;
>  
> +#if defined DEBUG || defined JSGC_FJGENERATIONAL

Style nit: defined(DEBUG) || defined(JSGC_FJGENERATIONAL)

@@ +300,5 @@
> +    static const int32_t secondsBeforePrune = 10;
> +
> +    // This lock controls access to the following variables and to the
> +    // 'next' field of any ChunkFreeList object reachable from freeChunks_.
> +    PRLock *chunkLock_;

A comment here to warn intrepid engineers thinking they can improve the code by making it lock-free would be nice. :)

@@ +312,5 @@
> +        ChunkFreeList *next;
> +    };
> +
> +    // List of free chunks.
> +    ChunkFreeList * freeChunks_;

Style nit: ChunkFreeList *freeChunks_
Attachment #8429012 - Flags: review?(shu) → review+
Comment on attachment 8429005 [details] [diff] [review]
Patch 1 of 8: Turn on JSGC_FJGENERATIONAL

Review of attachment 8429005 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/configure.in
@@ +3184,5 @@
>      JSGC_GENERATIONAL=1 )
>  if test -n "$JSGC_GENERATIONAL"; then
>      AC_DEFINE(JSGC_GENERATIONAL)
>  fi
> +JSGC_GENERATIONAL_CONFIGURED=$JSGC_GENERATIONAL

This should check for --enable-threadsafe.
Comment on attachment 8429013 [details] [diff] [review]
Patch 6 of 8: Adjustments to the GC's marker

Review of attachment 8429013 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/RootMarking.cpp
@@ +570,5 @@
>  /* static */ void
>  AutoGCRooter::traceAllWrappers(JSTracer *trc)
>  {
>      for (ContextIter cx(trc->runtime()); !cx.done(); cx.next()) {
> +        for (JS::AutoGCRooter *gcr = cx->autoGCRooters; gcr; gcr = gcr->down) {

Accidental change from js:: to JS::?

::: js/src/jsgcinlines.h
@@ +188,5 @@
>  
>    public:
> +    ArenaCellIterImpl() {
> +        // Squelch warnings.
> +        thingSize = firstThingOffset = 0;

Initializer list?
Attachment #8429013 - Flags: review?(shu) → review+
Comment on attachment 8429014 [details] [diff] [review]
Patch 7 of 8: ForkJoinNursery et al

Review of attachment 8429014 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great overall.

My comments below are style nits and rewordings and so forth, so I don't need to see a new patch.

::: js/src/gc/ForkJoinNursery-inl.h
@@ +1,2 @@
> +/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
> + * vim: set ts=4 sw=4 et tw=79 ft=cpp:

Style nit: tw=99

@@ +35,5 @@
> +// be few survivors of a collection in the ForkJoinNursery and few
> +// objects will be tested.  The bulk of the calls may come from the
> +// code that scans the roots.
> +
> +MOZ_ALWAYS_INLINE bool

Just use 'inline' here and elsewhere in this patch.

@@ +39,5 @@
> +MOZ_ALWAYS_INLINE bool
> +ForkJoinNursery::isInsideNewspace(const void *obj)
> +{
> +    uintptr_t p = reinterpret_cast<uintptr_t>(obj);
> +    for ( unsigned i=0 ; i <= currentChunk_ ; i++ ) {

Style nit here and all the other for loops below: for (unsigned i = 0; i <= currentChunk_; i++) {

@@ +67,5 @@
> +    const RelocationOverlay *overlay = reinterpret_cast<const RelocationOverlay *>(*ref);
> +    if (!overlay->isForwarded())
> +        return false;
> +    // This static_cast from Cell* restricts T to valid (GC thing) types.
> +    *ref = static_cast<T*>(overlay->forwardingAddress());

Style nit: T *

::: js/src/gc/ForkJoinNursery.cpp
@@ +8,5 @@
> +#ifdef JSGC_FJGENERATIONAL
> +
> +#include "gc/ForkJoinNursery-inl.h"
> +
> +#include "mozilla/IntegerPrintfMacros.h"

Today I Learned.

@@ +157,5 @@
> +  , movedSize_(0)
> +  , head_(nullptr)
> +  , tail_(&head_)
> +{
> +    (void)cx_;

What is this?

@@ +225,5 @@
> +    }
> +    ForkJoinNurseryCollectionTracer trc(rt, this);
> +    forwardFromRoots(&trc);
> +    collectToFixedPoint(&trc);
> +#ifdef JS_ION

We should probably make JSGC_FJGENERATIONAL depend on Ion being enabled, and remove this #ifdef.

@@ +392,5 @@
> +        for ( ; !ai.done() ; ai.next() ) {
> +            // Do the walk in two steps to avoid problems resulting from allocating
> +            // into the arena that's being walked: ArenaCellIter is not safe for that.
> +            // It can happen during evacuation.
> +            size_t numObjs = 0;

Style nit: consolidate the comment sections and move this line right above tenured_->arenas.purge(kind).

@@ +454,5 @@
> +    position_ = currentStart_;
> +    newspace[index] = c;
> +}
> +
> +void*

Style nit here and below: void *

@@ +473,5 @@
> +    return thing;
> +}
> +
> +JSObject*
> +ForkJoinNursery::allocateObject(size_t size, size_t numDynamic)

For clarity, maybe rename size to baseSize?

@@ +494,5 @@
> +            return obj;
> +        }
> +    }
> +
> +    // Go directly to the tenured space

Style nit: period.

@@ +520,5 @@
> +{
> +    if (!isInsideNewspace(obj) || !isInsideNewspace(oldSlots))
> +        return nullptr;
> +
> +    // The nursery cannot make use of the returned slots data.

I don't understand this comment in relation to the immediately following code.

@@ +619,5 @@
> +
> +    // Typed arrays in the nursery may have a lazily allocated buffer, make
> +    // sure there is room for the array's fixed data when moving the array.
> +    //
> +    // TODO: Not sure this actually applies to PJS.  Shu to investigate.

So we can't currently compile TypedArray allocations like |new Uint8Array(64)| because Uint8Array is a native function with no parallel version.

Still, I would leave this code in, since we really should support allocating TypedArrays in PJS.

@@ +700,5 @@
> +    // them if there is adequate room inline in dst.
> +    if (src->is<ArrayObject>())
> +        srcSize = movedSize = sizeof(ObjectImpl);
> +
> +    js_memcpy(dst, src, srcSize);

Please use PodCopy.

@@ +757,5 @@
> +    // Unlike other objects, Arrays can have fixed elements.
> +    if (src->is<ArrayObject>() && nslots <= GetGCKindSlots(dstKind)) {
> +        dst->setFixedElements();
> +        dstHeader = dst->getElementsHeader();
> +        js_memcpy(dstHeader, srcHeader, nslots * sizeof(HeapSlot));

PodCopy

::: js/src/gc/ForkJoinNursery.h
@@ +82,5 @@
> +
> +    JSRuntime *runtime();
> +    JS::Zone *zone();
> +
> +    // The updatable object (the ForkJoin result array), or nullptr

Style nit: period.

@@ +88,5 @@
> +
> +    // allocateNurseryChunk() returns nullptr on oom.
> +    ForkJoinNurseryChunk *allocateNurseryChunk();
> +
> +    // p must have been obtained through allocateNurseryChunk

Style nit: period.

@@ +91,5 @@
> +
> +    // p must have been obtained through allocateNurseryChunk
> +    void freeNurseryChunk(ForkJoinNurseryChunk *p);
> +
> +    // GC statistics output

Style nit: period.

@@ +194,5 @@
> +    AllocKind getObjectAllocKind(JSObject *src);
> +    void *allocateInTospace(AllocKind thingKind);
> +    void *allocateInTospace(size_t nelem, size_t elemSize);
> +    MOZ_ALWAYS_INLINE bool shouldMoveObject(void **thingp);
> +    void *moveToTospace(JSObject *src);

What do you think about the following rename:

moveToToSpace -> moveObjectToTospace
moveObjectToTospace -> moveObjectGutsToTospace

::: js/src/gc/RootMarking.cpp
@@ +668,5 @@
> +void
> +js::gc::MarkForkJoinStack(ForkJoinNurseryCollectionTracer *trc)
> +{
> +    PerThreadData *ptd = TlsPerThreadData.get();
> +    ForkJoinContext *cx = ForkJoinContext::current();

The ptd can be gotten from the cx.

::: js/src/jit/IonFrames.cpp
@@ +1252,5 @@
> +
> +void
> +UpdateJitActivationsForMinorGC(PerThreadData *ptd, JSTracer *trc)
> +{
> +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting() || ForkJoinContext::current());

Asserting trc->runtime()->isHeapMinorCollection() || trc->runtime()->isFJMinorCollecting() would be clearer.

::: js/src/jit/IonFrames.h
@@ +270,5 @@
>  JSCompartment *
>  TopmostIonActivationCompartment(JSRuntime *rt);
>  
>  #ifdef JSGC_GENERATIONAL
>  void UpdateJitActivationsForMinorGC(JSRuntime *rt, JSTracer *trc);

Can this be refactored to use the PerThreadData version?

::: js/src/vm/ArrayBufferObject.cpp
@@ +799,5 @@
> +    if (!IS_GC_MARKING_TRACER(trc) && !trc->runtime()->isHeapMinorCollecting()
> +#ifdef JSGC_FJGENERATIONAL
> +        && !trc->runtime()->isFJMinorCollecting()
> +#endif
> +        ) {

Style nit: { on next line.

::: js/src/vm/ForkJoin.h
@@ +452,5 @@
>      }
>  
> +#ifdef JSGC_FJGENERATIONAL
> +    // There is already a nursery() method in ThreadSafeContext.
> +    gc::ForkJoinNursery& fjNursery() { return fjNursery_; }

Style nit: ForkJoinNursery &fjNursery()

::: js/src/vm/Runtime.h
@@ +959,5 @@
> +    // counter is greater than zero.  (It will require some care to
> +    // make sure the two variables stay in sync.)
> +    bool isFJMinorCollecting() { return gc.fjCollectionCounter > 0; }
> +    void incFJMinorCollecting() { gc.fjCollectionCounter++; }
> +    void decFJMinorCollecting() { gc.fjCollectionCounter--; }

Can these functions assert that we are in fact inside a ForkJoin section?
Attachment #8429014 - Flags: review?(shu) → review+
Comment on attachment 8429015 [details] [diff] [review]
Patch 8 of 8: Allocation paths

Review of attachment 8429015 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling per comment 99.
Attachment #8429015 - Flags: review?(shu)
Fixes a bug that made the inline allocation path not work, no other changes.
Attachment #8429015 - Attachment is obsolete: true
Attachment #8430706 - Flags: review?(shu)
Comment on attachment 8430706 [details] [diff] [review]
Patch 8 of 8: Allocation paths, v2

Review of attachment 8430706 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. r=me with nits addressed.

::: js/src/jit/CodeGenerator.cpp
@@ +3942,5 @@
>      }
>  };
> +#endif // JSGC_FJGENERATIONAL
> +
> +typedef JSObject *(*NewGCThingParFn)(ForkJoinContext*, js::gc::AllocKind allocKind);

Style nit: ForkJoinContext *

@@ +3951,5 @@
>  CodeGenerator::emitAllocateGCThingPar(LInstruction *lir, Register objReg, Register cxReg,
>                                        Register tempReg1, Register tempReg2, JSObject *templateObj)
>  {
>      gc::AllocKind allocKind = templateObj->tenuredGetAllocKind();
> +#if JSGC_FJGENERATIONAL

Nit: #ifdef

::: js/src/jsgcinlines.h
@@ +508,5 @@
> +template <AllowGC allowGC>
> +inline JSObject *
> +TryNewFJNurseryObject(ForkJoinContext *cx, size_t thingSize, size_t nDynamicSlots)
> +{
> +    ForkJoinNursery& nursery = cx->fjNursery();

Style nit: ForkJoinNursery &nursery

@@ +513,5 @@
> +    JSObject *obj = nursery.allocateObject(thingSize, nDynamicSlots);
> +    if (obj)
> +        return obj;
> +
> +    // Note, allowGC is ignored - ForkJoin nursery collection is always allowed.

Well, since AllowGC is ignored, untemplatize the function so eyeballing the callsites doesn't give the wrong idea.

@@ +520,5 @@
> +    obj = nursery.allocateObject(thingSize, nDynamicSlots);
> +    if (obj)
> +        return obj;
> +
> +    return nullptr;

Can condense above to |return nursery.allocateObject(...)|

@@ +611,5 @@
>      }
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext() &&
> +        ShouldFJNurseryAllocate(cx->asForkJoinContext()->fjNursery(), kind, heap))

I wonder if this extra branch affects existing benchmarks. If so, we can always templatize it since there's just one callsite in the OOL fallback of masm.newGCThingPar.

::: js/src/jsobj.cpp
@@ +2734,5 @@
>      }
>  #endif
> +#ifdef JSGC_FJGENERATIONAL
> +    if (cx->isForkJoinContext()) {
> +        gc::ForkJoinNursery& fjn = cx->asForkJoinContext()->fjNursery();

Style nit: ForkJoinNursery &fjn

@@ +2743,5 @@
> +            // memory that was already used for the slots.
> +            slots = (HeapSlot *)cx->malloc_(newCount * sizeof(HeapSlot));
> +            if (!slots)
> +                return slots;
> +            js_memcpy(slots, oldSlots, oldCount * sizeof(HeapSlot));

PodCopy

@@ +2824,1 @@
>      // Note: threads without a JSContext do not have access to nursery allocated things.

Could you update the comment here to refer to the sequential nursery instead of just nursery?

@@ +3086,5 @@
> +            // memory that was already used for the elements.
> +            elts = (ObjectElements *)cx->malloc_(newCount * sizeof(HeapSlot));
> +            if (!elts)
> +                return elts;
> +            js_memcpy(elts, oldHeader, oldCount * sizeof(HeapSlot));

PodCopy

::: js/src/vm/Shape.cpp
@@ +21,5 @@
>  
>  #include "jscntxtinlines.h"
>  #include "jsobjinlines.h"
>  
> +#include "gc/ForkJoinNursery-inl.h"

Is this necessary?
Attachment #8430706 - Flags: review?(shu) → review+
Response to last round of reviews (some of this was covered on IRC) where I did not simply do what was suggested, or something morally equivalent:

- We'll go with js_memcpy everywhere, for consistency, since PodCopy requires consistent type
  information and most surrounding code operates on byte counts anyhow

- We'll stick with MOZ_ALWAYS_INLINE since the functions in question have one caller and are
  used only within the object-moving machinery

- LCallInstructionHelper is not appropriate for LNewDenseArrayPar because the latter contains
  multiple calls (one for the allocation bailout, one to allocate the element array), hence
  LCallInstructionHelper makes assumptions that don't hold

- I have not placed JSGC_FJGENERATIONAL code under JS_THREADSAFE, nor removed dependencies on JS_ION.
  When JSGC_FJGENERATIONAL is removed some of the code will become ifdeffed by JS_THREADSAFE,
  and the JS_ION condition will need to remain.

- I did not refactor UpdateJitActivationsForMinorGC since the two versions have different
  preconditions, though I'm probably splitting hairs here.  I may change my mind.

BTW, I think there are several policy / API usage bugs in the out-of-line allocation paths for the ForkJoinNursery.  The lower-level allocator can return NULL for two reasons: element array too large or nursery overflow.  Only in the latter case should a collection be triggered, but in that case it should be triggered reliably.  TryNewFJNurseryObject() will collect in either case.  (This may not be a big deal since I think the element array size is always zero in PJS code.)  Worse, both allocateSlots and reallocateSlots (and ditto elements) will take a failure in the nursery as license to go directly to the tenured area, but should possibly run the collector.  This is unlikely to be serious but it is wrong.  I will address these problems shortly.
(In reply to Lars T Hansen [:lth] from comment #111)
> - I have not placed JSGC_FJGENERATIONAL code under JS_THREADSAFE, nor
> removed dependencies on JS_ION.
>   When JSGC_FJGENERATIONAL is removed some of the code will become ifdeffed
> by JS_THREADSAFE,
>   and the JS_ION condition will need to remain.
> 

That wasn't a style suggestion. I think you have a bit of lock code not protected by JS_THREADSAFE and only by JSGC_FJGENERATIONAL, which won't compile with --disable-threadsafe --enable-generational. Can you check that --disable-threadsafe builds aren't busted? Similarly, also check that --disable-ion builds aren't broken.
(In reply to Shu-yu Guo [:shu] from comment #112)
> 
> That wasn't a style suggestion. I think you have a bit of lock code not
> protected by JS_THREADSAFE and only by JSGC_FJGENERATIONAL, which won't
> compile with --disable-threadsafe --enable-generational. 

And purposely so.  If you look in ForkJoinNursery.h you will see that JSGC_FJGENERATIONAL
requires both JS_THREADSAFE and JS_ION.  (Also currently requires JSGC_GENERATIONAL but
as discussed earlier I'm not sure how hard that requirement actually is.)

I had assumed, without digging very deeply, that PJS depended on JS_THREADSAFE.  Since
JSGC_FJGENERATIONAL is currently only enabled with PJS that should imply that there
isn't a problem with JSGC_FJGENERATIONAL requiring JS_THREADSAFE.  Am I wrong in this?
(In reply to Lars T Hansen [:lth] from comment #113)
> (In reply to Shu-yu Guo [:shu] from comment #112)
> > 
> > That wasn't a style suggestion. I think you have a bit of lock code not
> > protected by JS_THREADSAFE and only by JSGC_FJGENERATIONAL, which won't
> > compile with --disable-threadsafe --enable-generational. 
> 
> And purposely so.  If you look in ForkJoinNursery.h you will see that
> JSGC_FJGENERATIONAL
> requires both JS_THREADSAFE and JS_ION.  (Also currently requires
> JSGC_GENERATIONAL but
> as discussed earlier I'm not sure how hard that requirement actually is.)
> 
> I had assumed, without digging very deeply, that PJS depended on
> JS_THREADSAFE.  Since
> JSGC_FJGENERATIONAL is currently only enabled with PJS that should imply
> that there
> isn't a problem with JSGC_FJGENERATIONAL requiring JS_THREADSAFE.  Am I
> wrong in this?

No, you're right. I had missed the #error pragmas in ForkJoinNursery.h, so everything seems good here.
It turns out that I had some important misconceptions about how to handle huge slot arrays (malloc'd objects hanging off of nursery objects), probably stemming from older ideas about how allocation should work but in any case my fault.

This patch includes changes to ForkJoinNursery to track and handle huge slot arrays, heavily inspired by the code already in the Nursery.  There are huge-slot sets corresponding to newspace and fromspace, with huge-slot objects moving from one to the other during minor GC and being removed from either during evacuation.

The next patch will include changes (simplifications, really) to the allocation paths.

The third patch is not part of the patch set per se, it is the diff between the old patch set and the new patch set, to ease review.
Attachment #8429014 - Attachment is obsolete: true
Attachment #8430706 - Attachment is obsolete: true
Attachment #8432431 - Flags: review?(terrence)
Attachment #8432432 - Flags: review?(terrence)
Attached file Test cases with expected output (obsolete) —
This all unpacks into js/src/parjs-test and contains some benchmarks and stress tests, along with some largeish files for input and expected output.  The convolve programs are all mostly the same, with slight changes to the kernel (use diff).

These are too useful to sit around in my sandbox github repo, so I'd like to land them, but I've not spent any time to try to automate them yet.
Attachment #8432478 - Flags: review?(shu)
Attached patch Rollup patch (obsolete) — Splinter Review
gkw, decoder: fuzz testing of this patch requested.
Attachment #8432508 - Flags: feedback?(gary)
Attachment #8432508 - Flags: feedback?(choller)
Depends on: 1019080
I assume this patch can only be tested in threadsafe builds? Nevertheless, it should build with --disable-threadsafe too, which it doesn't. Before landing this patch, this should be fixed.
(In reply to Christian Holler (:decoder) from comment #120)
> I assume this patch can only be tested in threadsafe builds? Nevertheless,
> it should build with --disable-threadsafe too, which it doesn't. Before
> landing this patch, this should be fixed.

Yeah, it can only be tested for threadsafe builds. If you can tune the fuzzer to test PJS entry points (buildPar, mapPar, etc) with a focus on allocation, that would be pretty helpful.
Comment on attachment 8432508 [details] [diff] [review]
Rollup patch

Array.buildPar(7, function(z = d, ...x) {
    return ({
        N: RegExp()
    })
})

$ ./js-dbg-64-dm-ts-darwin-6a984e21c2ca-933313-c119-d2ea9db9ddb2 --ion-parallel-compile=off testcase.js

Assertion failure: !ins->isCall(), at jit/shared/CodeGenerator-shared-inl.h

Configuration parameters: (Mac 10.9)

CC="clang -Qunused-arguments" CXX="clang++ -Qunused-arguments" AR=ar sh /Users/fuzz3/trees/mozilla-central/js/src/configure --target=x86_64-apple-darwin12.5.0 --enable-optimize --enable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options>
Attachment #8432508 - Flags: feedback?(gary) → feedback-
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #122)
> Comment on attachment 8432508 [details] [diff] [review]
> Rollup patch

[Fuzzing failure]

Cool, a bug in the handling of rest arguments in parallel code, LRestPar should no longer be derived from LCallInstructionHelper because the code for LRestPar contains a callVM operation internally (for the allocation).
Depends on: 1019545
Comment on attachment 8432508 [details] [diff] [review]
Rollup patch

Apart from the issue that gkw already posted, I haven't found any more problems => feedback+ with the fix for that one bug.
Attachment #8432508 - Flags: feedback?(choller) → feedback+
Attached patch pjsRootingHazards.diff (obsolete) — Splinter Review
The changes necessary to fix rooting hazards highlighted by https://tbpl.mozilla.org/?tree=Try&rev=7eb5687fa631.

Problem appears to be fixed as per https://tbpl.mozilla.org/?tree=Try&rev=ba83e9774b97, and the logic seems OK to me: GC is disabled unless the context is a JSContext, and triggered via the context which in this case is a ForkJoinContext, so we get the right triggering methods.
Attachment #8433477 - Flags: review?(shu)
Attachment #8433477 - Flags: review?(shu) → review+
Comment on attachment 8432478 [details]
Test cases with expected output

This patch has been moved to bug 1019821.
Attachment #8432478 - Flags: review?(shu)
Comment on attachment 8432431 [details] [diff] [review]
Patch 7 of 8: ForkJoinNursery et al, v2

Review of attachment 8432431 [details] [diff] [review]:
-----------------------------------------------------------------

All around, quite nice.

::: js/src/gc/ForkJoinNursery.cpp
@@ +171,5 @@
> +  , tail_(&head_)
> +  , hugeSlotsNew(0)
> +  , hugeSlotsFrom(1)
> +{
> +    (void)cx_;                  // Squelch warning, cx_ is DEBUG-only at the moment

Perhaps you could make cx_ a mozilla::DebugOnly<JSContext *>?

@@ +211,5 @@
> +#define TIME_END(name) int64_t timstampEnd_##name = PRMJ_Now()
> +#define TIME_TOTAL(name) (timstampEnd_##name - timstampStart_##name)
> +
> +void
> +ForkJoinNursery::pjsCollection(bool evacuate, bool recreate)

SpiderMonkey style prefers a two-element enum to a bool for a parameters so that the calling code is easier to read. Sorry, I don't have good suggestions for the names.

@@ +310,5 @@
> +void
> +ForkJoinNursery::flip()
> +{
> +    size_t i;
> +    for ( i=0 ; i < numActiveChunks_ ; i++ ) {

SpiderMonkey style is:
for (i = 0; i < numActiveChunks_; ++i) {

@@ +330,5 @@
> +
> +void
> +ForkJoinNursery::freeFromspace()
> +{
> +    for ( size_t i=0 ; i < numFromspaceChunks_ ; i++ ) {

Style.

@@ +404,5 @@
> +void
> +ForkJoinNursery::forwardFromTenured(ForkJoinNurseryCollectionTracer *trc)
> +{
> +    JSObject *objs[ArenaCellCount];
> +    for ( size_t k=0 ; k < FINALIZE_LIMIT ; k++ ) {

Style.

@@ +416,5 @@
> +        JS_ASSERT(kind <= FINALIZE_OBJECT_LAST);
> +
> +        ArenaIter ai;
> +        ai.init(const_cast<Allocator *>(tenured_), kind);
> +        for ( ; !ai.done() ; ai.next() ) {

Style.

@@ +432,5 @@
> +            size_t numObjs = 0;
> +            tenured_->arenas.purge(kind);
> +            for (ArenaCellIterUnderFinalize i(ai.get()); !i.done(); i.next())
> +                objs[numObjs++] = i.get<JSObject>();
> +            for ( size_t i=0 ; i < numObjs ; i++ )

Style.

@@ +718,5 @@
> +    // TODO: At the moment this code is probably idle.  Shu says:
> +    // "So we can't currently compile TypedArray allocations like |new Uint8Array(64)|
> +    //  because Uint8Array is a native function with no parallel version.
> +    //  Still, I would leave this code in, since we really should support allocating
> +    //  TypedArrays in PJS."

We are not currently allocating TypedArrays in the other nursery either, since bhackett changed the layout. I'd remove this code since it's probably going to have to change.

::: js/src/gc/Marking.cpp
@@ +244,5 @@
> +         * This case should never be reached from PJS collections as
> +         * those should all be using a ForkJoinNurseryCollectionTracer
> +         * that carries a callback.
> +         */
> +        JS_ASSERT(!ForkJoinContext::current());

Would it be possible to also add:
JS_ASSERT(!trc->runtime()->isFJMinorCollecting());

@@ +387,5 @@
> +        if (fjNursery.isInsideFromspace(*thingp))
> +            return fjNursery.getForwardedPointer(thingp);
> +        check_nursery = false;
> +    }
> +#endif

After having dealt with some of the older parts of spidermonkey I've developed a significant allergy to using state variables like this. Let's instead use C syntax to make this work instead. Even if it turns a few heads, it's much harder to accidentally corrupt syntax with automatic refactoring.

#ifdef JSGC_FJGENERATIONAL
if (rt->isFJMinorCollecting()
{
    ....
}
else
#endif
{
    if (IsInsideNursery(*thingp))
    {
        ....
    }
}

This, of course, would also work fine without the block scope around the if-statement, but is probably clearer with than without.

@@ +433,5 @@
> +                return !nursery.getForwardedPointer(thingp);
> +            return false;
> +        }
> +    }
> +#endif  // JSGC_GENERATIONAL

Please use a block to remove the state variable, as above.

@@ +471,2 @@
>          rt->gc.nursery.getForwardedPointer(thingp);
> +#endif  // JSGC_GENERATIONAL

Ditto.

::: js/src/jit/IonFrames.cpp
@@ +930,5 @@
>      for (GeneralRegisterBackwardIterator iter(safepoint.allGprSpills()); iter.more(); iter++) {
>          --spill;
> +        if (slotsRegs.has(*iter)) {
> +#ifdef JSGC_FJGENERATIONAL
> +            if (trc->callback == gc::ForkJoinNursery::MinorGCCallback) {

The ifdef plus the continue makes this pretty hard to read. For this particular case, I think it would be clearer if we templatized UpdateIonJSFrameForMinorGC on the nursery type and did T::forwardBufferPointer(...);.

@@ +956,5 @@
> +        if (trc->callback == gc::ForkJoinNursery::MinorGCCallback) {
> +            gc::ForkJoinNursery::forwardBufferPointer(trc, slots);
> +            continue;
> +        }
> +#endif

Ditto.

@@ +1256,5 @@
> +#ifdef JSGC_FJGENERATIONAL
> +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting() || trc->runtime()->isFJMinorCollecting());
> +#else
> +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting());
> +#endif

It would be a bit less repetition to do:

JS_ASSERT(trc->runtime()->isHeapMinorCollecting());
#ifdef JSGC_FJGENERATIONAL
JS_ASSERT(trc->runtime()->isFJMinorCollecting());
#endif
Attachment #8432431 - Flags: review?(terrence) → review+
Comment on attachment 8432432 [details] [diff] [review]
Patch 8 of 8: Allocation paths, v3

Terrence, note that the small, later patch pertaining to rooting hazards (already reviewed by Shu) really fits into this one.
(In reply to Terrence Cole [:terrence] from comment #128)
> Comment on attachment 8432431 [details] [diff] [review]
> Patch 7 of 8: ForkJoinNursery et al, v2
> 
> Review of attachment 8432431 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks, most of these are clear improvements.

> ::: js/src/jit/IonFrames.cpp
> @@ +930,5 @@
> >      for (GeneralRegisterBackwardIterator iter(safepoint.allGprSpills()); iter.more(); iter++) {
> >          --spill;
> > +        if (slotsRegs.has(*iter)) {
> > +#ifdef JSGC_FJGENERATIONAL
> > +            if (trc->callback == gc::ForkJoinNursery::MinorGCCallback) {
> 
> The ifdef plus the continue makes this pretty hard to read.

I agree.

> For this
> particular case, I think it would be clearer if we templatized
> UpdateIonJSFrameForMinorGC on the nursery type and did
> T::forwardBufferPointer(...);.

I agree (also for the second case you mention).

I started looking at this but the changes bleed into a number of files.  As I've already promised Shu to do some post-landing templatization of common code, including the stack walking, I am going to defer it until then; I'd like to not do another round of reviews on the present patch set if it is avoidable...  Collecting those work items in bug 1020290.

> @@ +1256,5 @@
> > +#ifdef JSGC_FJGENERATIONAL
> > +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting() || trc->runtime()->isFJMinorCollecting());
> > +#else
> > +    JS_ASSERT(trc->runtime()->isHeapMinorCollecting());
> > +#endif
> 
> It would be a bit less repetition to do:
> 
> JS_ASSERT(trc->runtime()->isHeapMinorCollecting());
> #ifdef JSGC_FJGENERATIONAL
> JS_ASSERT(trc->runtime()->isFJMinorCollecting());
> #endif

Aesthetically I'm with you all the way, but the operator is '||' and not '&&' :-)
Comment on attachment 8432432 [details] [diff] [review]
Patch 8 of 8: Allocation paths, v3

Review of attachment 8432432 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/jit-test/tests/parallel/bug1009788.js
@@ +3,5 @@
>          Array.buildPar(500, (function() {
>              return {}
>          }))
>      }
> +    print("Done");

Was this intentional? We are not supposed to print un-prefixed output from tests since it interferes with the output-scanners that implement tbpl.
Attachment #8432432 - Flags: review?(terrence) → review+
Attached patch Rollup patch, v2 (obsolete) — Splinter Review
Reviewed at various times, and in its entirety, by :jandem, :shu, :terrence.
Fuzzed by :gkw and :decoder, fuzz bugs fixed.

Several TBPL runs over the last week, here is one for this exact patch:
https://tbpl.mozilla.org/?tree=Try&rev=5507bf0999fc
Attachment #8407274 - Attachment is obsolete: true
Attachment #8411310 - Attachment is obsolete: true
Attachment #8411661 - Attachment is obsolete: true
Attachment #8412480 - Attachment is obsolete: true
Attachment #8412517 - Attachment is obsolete: true
Attachment #8413630 - Attachment is obsolete: true
Attachment #8413676 - Attachment is obsolete: true
Attachment #8413691 - Attachment is obsolete: true
Attachment #8414321 - Attachment is obsolete: true
Attachment #8414424 - Attachment is obsolete: true
Attachment #8429005 - Attachment is obsolete: true
Attachment #8429006 - Attachment is obsolete: true
Attachment #8429007 - Attachment is obsolete: true
Attachment #8429008 - Attachment is obsolete: true
Attachment #8429012 - Attachment is obsolete: true
Attachment #8429013 - Attachment is obsolete: true
Attachment #8429061 - Attachment is obsolete: true
Attachment #8432431 - Attachment is obsolete: true
Attachment #8432432 - Attachment is obsolete: true
Attachment #8432433 - Attachment is obsolete: true
Attachment #8432478 - Attachment is obsolete: true
Attachment #8432508 - Attachment is obsolete: true
Attachment #8433083 - Attachment is obsolete: true
Attachment #8433477 - Attachment is obsolete: true
Keywords: checkin-needed
OS: Linux → All
Hardware: x86 → All
Landed:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d89d4281fd8d

Notes:
(1) Not sure what your email is - in the patch it is @mozilla.org (`hg log` shows a mix of both), but here in b.m.o it is @mozilla.com - I chose the latter.
(2) Added reviewers' r+ to commit message.
(3) I applied this on top of its parent 7297dedf2416b8c4e6832ef501d3687d251f3035 as specified in the patch, then rebased to current inbound tip 257269554ad9 (using `hg rebase -d 257269554ad9`) as it wouldn't apply otherwise.
Keywords: checkin-needed
Target Milestone: --- → mozilla32
sorry had to backout this patch since it caused bustages like https://tbpl.mozilla.org/php/getParsedLog.php?id=41185316&tree=Mozilla-Inbound
Status: NEW → ASSIGNED
Flags: needinfo?(lhansen)
Target Milestone: mozilla32 → ---
Oh foo, I think there's a conflict with something :jonco pushed last night.  I will rebase and try again.
Flags: needinfo?(lhansen)
Attached patch Rollup patch, v3 (obsolete) — Splinter Review
Rebased to recent mozilla-inbound; change made to accomodate recent changes to gc/RootMarking.cpp.  Change reviewed (IRC/pastebin) by jonco.
Attachment #8435570 - Attachment is obsolete: true
Keywords: checkin-needed
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #133)
> Landed:
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d89d4281fd8d
> 
> Notes:
> (1) Not sure what your email is - in the patch it is @mozilla.org (`hg log`
> shows a mix of both), but here in b.m.o it is @mozilla.com - I chose the
> latter.

Ah, sorry - didn't see this until now.  The new patch has the same problem, please fix as for the previous one.  I've updated my local .hgrc files.
Inbound just got closed (and I have to bbiab), so I can't push for you yet. I'd recommend attaching a new patch with the following fixed:

(1) Your email.
(2) The following commit message: "Bug 933313 - Per-worker generational GC for PJS. Take 2. r=jandem, r=shu, r=terrence, r=jonco"

I used the following command for mq:

hg qref -m "Bug 933313 - Per-worker generational GC for PJS. Take 2. r=jandem, r=shu, r=terrence, r=jonco" -u "Lars T Hansen <lhansen@mozilla.com>"
Flags: needinfo?(lhansen)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #138)
> Inbound just got closed (and I have to bbiab), so I can't push for you yet.
> I'd recommend attaching a new patch with the following fixed:

Will do.
Flags: needinfo?(lhansen)
Attached patch Rollup patch, v4 (obsolete) — Splinter Review
With fixes as suggested.
Attachment #8435665 - Attachment is obsolete: true
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #141)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d15632d88126

sorry had to back this out again. This time for cppunit test failures https://tbpl.mozilla.org/php/getParsedLog.php?id=41207005&tree=Mozilla-Inbound - btw this failures was also in the try run from comment #132 - maybe we need a new green try run before we checkin again ?
(In reply to Carsten Book [:Tomcat] from comment #142)
> (In reply to Gary Kwong [:gkw] [:nth10sd] from comment #141)
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/d15632d88126
> 
> sorry had to back this out again. This time for cppunit test failures
> https://tbpl.mozilla.org/php/getParsedLog.php?id=41207005&tree=Mozilla-
> Inbound - btw this failures was also in the try run from comment #132 -
> maybe we need a new green try run before we checkin again ?

Ok....   I know that one has been there, but the way I interpreteted both the Cpp failures in the run cited above was that there are bugs filed for that problem.  What did I misunderstand?
Target Milestone: mozilla32 → ---
Attached patch Rollup patch, v5 (obsolete) — Splinter Review
One bug found and fixed.

Try server is green:
https://tbpl.mozilla.org/?tree=Try&rev=fdf4c7fcc221
Attachment #8435675 - Attachment is obsolete: true
Keywords: checkin-needed
Needs rebasing.
Keywords: checkin-needed
Keywords: checkin-needed
Attachment #8436755 - Attachment is obsolete: true
Keywords: checkin-needed
Depends on: 1023399
Depends on: 1023753
Depends on: 1024756
Blocks: 1026931
Blocks: 1026935
Depends on: 1033115
You need to log in before you can comment on or make changes to this bug.