Last Comment Bug 850070 - OdinMonkey: Parallelize use of Ion backend.
: OdinMonkey: Parallelize use of Ion backend.
Status: RESOLVED FIXED
[gdc2013]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla22
Assigned To: Sean Stangl [:sstangl]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 854197
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-11 17:29 PDT by Sean Stangl [:sstangl]
Modified: 2013-03-24 09:59 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
[WIP] Parallel compilation without proper error handling. (36.94 KB, patch)
2013-03-14 21:24 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
[WIP] Parallel compilation with error handling. (39.50 KB, patch)
2013-03-15 00:18 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
Parallelize OdinMonkey compilations (37.92 KB, patch)
2013-03-17 14:49 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
Parallelize OdinMonkey compilations [fuzztarget] (38.65 KB, patch)
2013-03-18 13:07 PDT, Sean Stangl [:sstangl]
gary: feedback+
choller: feedback+
Details | Diff | Splinter Review
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases. (18.81 KB, patch)
2013-03-19 15:31 PDT, Sean Stangl [:sstangl]
luke: review+
Details | Diff | Splinter Review
[Part 2/2] Parallelize OdinMonkey compilations. (20.89 KB, patch)
2013-03-19 15:32 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases. [v2] (21.82 KB, patch)
2013-03-22 14:51 PDT, Sean Stangl [:sstangl]
luke: review+
Details | Diff | Splinter Review
[Part 2/2] Parallelize OdinMonkey compilations. [v2] (28.70 KB, patch)
2013-03-22 14:54 PDT, Sean Stangl [:sstangl]
luke: review+
Details | Diff | Splinter Review
Follow-up patch to fix a broken assertion in Label's destructor in sequential mode. (1.60 KB, patch)
2013-03-22 16:46 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review

Description User image Sean Stangl [:sstangl] 2013-03-11 17:29:22 PDT
Odin spends about half of total compilation time in IonMonkey's CompileBackEnd(), which translates Odin's MIR graph into optimized machine code. All of this work is performed sequentially, but MIR optimization and LIR generation can be parallelized. Building the initial MIR and performing code generation must still happen sequentially.

Note that parallel compilation for Odin is very different from parallel compilation for Ion, since Odin does not intend to execute the AsmJS script in parallel with compilation, probably because AsmJS doesn't support OSR by design. So while the goal with Ion parallelism is to keep the browser responsive, the goal with Odin parallelism is to just cut down the compilation time for an entire Module.

The general plan is to have as many active jobs as worker threads, to keep memory reasonable. Each job is given its own LifoAlloc, into which the MIR is built, optimized, and translated to LIR. On successful LIR generation, the LifoAlloc and other associated Function-level data structures are added to a finished queue, on which the main thread blocks and pulls in CompileFunctionBodies(). Once pulled, the finished function has codegen run on it on the main thread, and the LifoAlloc is returned to the pool for use in another job.

Note that this means that functions are generated into the MacroAssembler buffer in a non-deterministic order. Given that the most expensive functions only take around 160ms to compile, we may be able to perform code generation in order without a terrible performance hit, but maintaining order can lead to some extremely tricky situations if we ever intend to gracefully handle OOM on worker threads.
Comment 1 User image Alon Zakai (:azakai) 2013-03-11 18:03:26 PDT
Hmm, why is this a confidential bug?
Comment 2 User image Sean Stangl [:sstangl] 2013-03-12 11:45:51 PDT
Whoops. Now it's not :)
Comment 3 User image Sean Stangl [:sstangl] 2013-03-14 21:24:56 PDT
Created attachment 725253 [details] [diff] [review]
[WIP] Parallel compilation without proper error handling.

WIP patch. Termination conditions are unhandled -- I just wanted to see what the speedup from a parallel AsmJS backend would be. Handling them won't be hard, it just needs slightly better abstractions to avoid duplication of blocking code.

On a very large benchmark that takes about 14 seconds to compile the entire module, including parsing, this shaves off 6 seconds.

So that's pretty cool.
Comment 4 User image Luke Wagner [:luke] 2013-03-14 21:26:01 PDT
\o/
Comment 5 User image Vladimir Vukicevic [:vlad] [:vladv] 2013-03-14 21:28:28 PDT
(In reply to Sean Stangl [:sstangl] from comment #3)
> So that's pretty cool.

Understatement.  I see what you did there.  \o/!
Comment 6 User image Sean Stangl [:sstangl] 2013-03-15 00:18:47 PDT
Created attachment 725298 [details] [diff] [review]
[WIP] Parallel compilation with error handling.

Error handling turned out to be extremely simple: failed children add themselves to a harvester list protected by the runtime-scoped WorkerThreadState mutex; the main thread counts the number of jobs dispatched and the number of jobs completed, determines how many jobs are outstanding, and waits for them to fail or complete.

Passes all AsmJS jit-tests. This patch is pretty much good to go, I just have to look over it to leave out some cruft I probably forgot about, and possibly add some more encapsulation so it's easier to read.

Perf results are unchanged since the previous patch.
Comment 7 User image Vladimir Vukicevic [:vlad] [:vladv] 2013-03-17 14:39:00 PDT
Note: I applied the latest patch on top of latest m-i; needed trivial fixup (removing a NULL cx param).  I consistently crash on any Ion startup if I don't disable parallel compilation at jsworkers.cpp, line 445 or so (the popCopy() right after the newly added if (terminate) block).  I can't give any useful info about the crash because I couldn't manage to inspect locals usefully.. something seemed to be NULL though, looking at the registers.

The patch worked fine for me with m-i from friday afternoon or so though..
Comment 8 User image Sean Stangl [:sstangl] 2013-03-17 14:49:36 PDT
Created attachment 725934 [details] [diff] [review]
Parallelize OdinMonkey compilations

Rebased patch, to make sure that Vlad and I are running the same code.
Comment 9 User image Sean Stangl [:sstangl] 2013-03-18 13:07:46 PDT
Created attachment 726302 [details] [diff] [review]
Parallelize OdinMonkey compilations [fuzztarget]

Fixed the case of JS_NO_HELPER_THREADS and removed a bogus assertion that tripped in the case of induced failure.

This works with everything I've thrown at it so far, so I'd like to bring in the fuzzers. We're on somewhat of a tight schedule for landing this -- ideally it should be landed on Thursday or Friday. Including a review, that means concluding fuzzing on Wednesday. If any bugs pop up, I'll try to fix them immediately.

Fuzzers must pass "--ion-parallel-compile=on". AsmJS code is most useful, but it's also useful to include some non-AsmJS code, since the same worker threads handle both AsmJS and Ion compilations.
Comment 10 User image Alon Zakai (:azakai) 2013-03-18 13:17:39 PDT
In addition to the previous asm.js code samples for fuzzing that are in bug 840284, one that might be interesting for this bug specifically is BananaBread, since it's large,

http://kripken.github.com/misc-js-benchmarks/banana/game/bb.js
Comment 11 User image Gary Kwong [:gkw] [:nth10sd] 2013-03-18 14:25:55 PDT
(In reply to Sean Stangl [:sstangl] from comment #9)
> Created attachment 726302 [details] [diff] [review]
> Parallelize OdinMonkey compilations [fuzztarget]

I am fuzzing this now, but have not seen any obvious immediate problem. Will leave this fuzzing overnight before marking feedback+.
Comment 12 User image Christian Holler (:decoder) 2013-03-18 14:35:20 PDT
I'm on this too now, due to the short time frame we have left :)
Comment 13 User image Sean Stangl [:sstangl] 2013-03-19 15:31:42 PDT
Created attachment 726932 [details] [diff] [review]
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases.

This patch series is the same as the fuzztarget one logically, but I moved some code into helper functions so that the parallel compilation code looks simpler.

This first patch just deals with interface setup for the second patch: IonMonkey and AsmJS compilation phases are each made more separable.
Comment 14 User image Gary Kwong [:gkw] [:nth10sd] 2013-03-19 15:31:52 PDT
Comment on attachment 726302 [details] [diff] [review]
Parallelize OdinMonkey compilations [fuzztarget]

-> feedback+ because I didn't find anything terribly wrong after a round of overnight fuzzing.
Comment 15 User image Sean Stangl [:sstangl] 2013-03-19 15:32:26 PDT
Created attachment 726935 [details] [diff] [review]
[Part 2/2] Parallelize OdinMonkey compilations.
Comment 16 User image Luke Wagner [:luke] 2013-03-19 17:49:05 PDT
Comment on attachment 726932 [details] [diff] [review]
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases.

Review of attachment 726932 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/AsmJS.cpp
@@ +4415,5 @@
> +        // Use the Module-wide LifoAlloc for all temporaries,
> +        // including the MIRGenerator, MIRGraph, and LIRGraph.
> +        LifoAllocScope scope(&m.lifo());
> +
> +        MIRGenerator *mirGen = GenerateFunctionMIR(cx, m, func, scope.alloc());

There is an unbroken chain of Check* functions starting at CheckModule going down to the smaller Check* functions, and they all generate MIR.  Given this, could you rename GenerateFunctionMIR back to CheckFunctionBody?

::: js/src/ion/AsmJSModule.h
@@ +483,4 @@
>          return functionBytes_;
>      }
>  
> +    bool addHeapAccesses(const Vector<ion::AsmJSHeapAccess, 0, ion::IonAllocPolicy> &accesses) {

Could you add a
  typedef Vector<ion::AsmJSHeapAccess, 0, ion::IonAllocPolicy> AsmJSHeapAccessVector
somewhere convenient and use that everywhere else?
Comment 17 User image Christian Holler (:decoder) 2013-03-20 09:19:09 PDT
Comment on attachment 726302 [details] [diff] [review]
Parallelize OdinMonkey compilations [fuzztarget]

So far I haven't found anything :)
Comment 18 User image Luke Wagner [:luke] 2013-03-20 10:10:44 PDT
Comment on attachment 726932 [details] [diff] [review]
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases.

Review of attachment 726932 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/AsmJS.cpp
@@ +4415,5 @@
> +        // Use the Module-wide LifoAlloc for all temporaries,
> +        // including the MIRGenerator, MIRGraph, and LIRGraph.
> +        LifoAllocScope scope(&m.lifo());
> +
> +        MIRGenerator *mirGen = GenerateFunctionMIR(cx, m, func, scope.alloc());

There is an unbroken chain of Check* functions starting at CheckModule going down to the smaller Check* functions, and they all generate MIR.  Given this, could you rename GenerateFunctionMIR back to CheckFunctionBody?

@@ +4425,4 @@
>  
> +        LIRGraph *lir = GenerateLIR(mirGen);
> +        if (!lir)
> +            return false;

I just noticed: both OptimizeMIR and GenerateLIR need to report a warning if they fail:
  return m.fail("Internal compiler failure (probably out of memory)", func.fn());

::: js/src/ion/AsmJSModule.h
@@ +483,4 @@
>          return functionBytes_;
>      }
>  
> +    bool addHeapAccesses(const Vector<ion::AsmJSHeapAccess, 0, ion::IonAllocPolicy> &accesses) {

Could you add a
  typedef Vector<ion::AsmJSHeapAccess, 0, ion::IonAllocPolicy> AsmJSHeapAccessVector
somewhere convenient and use that everywhere else?
Comment 19 User image Luke Wagner [:luke] 2013-03-20 10:47:03 PDT
Comment on attachment 726935 [details] [diff] [review]
[Part 2/2] Parallelize OdinMonkey compilations.

Review of attachment 726935 [details] [diff] [review]:
-----------------------------------------------------------------

Clean code, good organization, all around excellent work.  I had a few non-nit suggestions that'd I'd like to see in a next patch, so I'll just clear r? for now.

::: js/src/ion/AsmJS.cpp
@@ +4437,5 @@
>  }
>  
> +#ifdef JS_PARALLEL_COMPILATION
> +// State of compilation as tracked and updated by the main thread.
> +struct AsmJSThreadGroup {

Perhaps you could name it something like "ParallelCompilationState"?  The AsmJS* prefix isn't necessary since this is a .cpp-local class (c.f. ModuleCompiler, FunctionCompiler, VarType, etc).  Also a "thread group" sounds like a resource that is managed when really this is just a collection of variables on the stack that you want to avoid passing around individually.  Lastly, { on new line.

@@ +4451,5 @@
> +};
> +
> +// Block until a worker-assigned LifoAlloc becomes finished.
> +static AsmJSCompilationData *
> +GetFinishedCompilation(JSContext *cx, AsmJSThreadGroup &group, ModuleCompiler &m)

The 'ModuleCompiler m' should be the first parameter and, because you have 'm.cx()', there is no need to pass a 'cx'.
Ditto for GetAndFinalizeUsedLifo, DispatchParallelJobs, CheckFunctionBodies{Sequential,Parallel}.

@@ +4460,5 @@
> +        if (!group.state.asmJSFinishedList.empty()) {
> +            group.outstandingJobs--;
> +            return group.state.asmJSFinishedList.popCopy();
> +        }
> +        group.state.wait(WorkerThreadState::MAIN);

Not your problem, but as a general note I wish there was syntactic connection between 'wait' calls and their associated lock.

@@ +4482,5 @@
> +    return lifo;
> +}
> +
> +static bool
> +GetAndFinalizeUsedLifo(JSContext *cx, AsmJSThreadGroup &group, ModuleCompiler &m, LifoAlloc **outLifo)

Could this be named "GenerateCodeForFinishedJob" to indicate that it's more than just tidying up a LifoAlloc?

@@ +4511,5 @@
> +    return true;
> +}
> +
> +static bool
> +DispatchParallelJobs(JSContext *cx, AsmJSThreadGroup &group, ModuleCompiler &m)

For the same reason as my comment in the previous patch, I'd like this function name to start with "Check".  How about CheckFunctionBodiesParallelImpl?  Because this is logically just part of CheckFunctionBodiesParallel, hoisted out to make it easier to read.

@@ +4524,5 @@
> +
> +        // Get exclusive access to an empty LifoAlloc from the thread group's pool.
> +        LifoAlloc *lifo = NULL;
> +        if (!GetUnusedLifo(group, i, &lifo) && !GetAndFinalizeUsedLifo(cx, group, m, &lifo))
> +            return false;

An important thing for Odin is to report a warning (via m.fail()) everywhere that doesn't generate an out-of-memory (or other JS-visible) exception.  That means we need something here and everywhere else that can return 'false' without throwing.  Could you carefully audit the sequential and parallel paths to make sure we m.fail() or throw?  (It isn't the end of the world if we m.fail() and an exception is pending.)

@@ +4595,5 @@
> +    JS_ASSERT(group.outstandingJobs == 0);
> +    JS_ASSERT(group.state.asmJSWorklist.empty());
> +    JS_ASSERT(group.state.asmJSFinishedList.empty());
> +
> +    return;

Doesn't seem necessary to have final 'return;'

@@ +4603,5 @@
> +
> +static bool
> +CheckFunctionBodiesParallel(JSContext *cx, ModuleCompiler &m)
> +{
> +    if (!EnsureParallelCompilationInitialized(cx->runtime))

Can this be moved to js::CompileAsmJS with the other ensure* calls?

@@ +4614,5 @@
> +    // To perform work, a thread gains exclusive ownership of a LifoAlloc.
> +    LifoAllocScope scope(&m.lifo());
> +    LifoAlloc *lifos = m.lifo().newArrayUninitialized<LifoAlloc>(numLifos);
> +    for (size_t i = 0; i < numLifos; i++)
> +        new (&lifos[i]) LifoAlloc(LIFO_ALLOC_PARALLEL_CHUNK_SIZE);

I don't see where these LifoAlloc's are destroyed which means we'll leak all the LifoAlloc's chunks, iiuc.  IIUC, this is the only use of m.lifo(), and it is basically just doing a scoped malloc here.  Can we remove m.lifo_/m.alloc_/m.ionContext_ altogether and just use some scoped allocation here?  I'm thinking we could make the LifoAllocs be member variables of AsmJSCompilationData and then have a Vector<AsmJSCompilationData> (of size numLifos, perhaps renamed to numJobs) on the stack in this function.  The problem is that LifoAlloc isn't copyable/movable as required by Vector.  I think the simplest fix is to add an "init()" function to Vector that asserts empty() && !usingInlineStorage() and thus no need for copy construction.

I also just noticed that we have f.m().alloc().ensureBallast() scattered above in the Check* functions that should be using f.mirGen().ensureBallast().  Removing m.lifo and m.alloc will suss out all these.

::: js/src/ion/AsmJS.h
@@ +25,5 @@
> +namespace ion { class MIRGenerator; class LIRGraph; }
> +
> +// Struct type for passing parallel compilation data between the main thread
> +// and compilation workers.
> +struct AsmJSCompilationData {

Could this struct definition go at the end of the .h (so the header reads better from top-to-bottom)?

::: js/src/jsworkers.cpp
@@ +68,5 @@
> +    JS_ASSERT(rt->workerThreadState);
> +    JS_ASSERT(rt->workerThreadState->isLocked());
> +    JS_ASSERT(asmData->lir);
> +
> +    rt->workerThreadState->asmJSFinishedList.append(asmData);

I see that this function is symmetric with the Ion one, but it seems simpler just to move this one liner into its one callsite (and drop all the JS_ASSERTs, since they are locally implied).

@@ +433,5 @@
>          }
>  
> +        // Terminate even if a job is available, if requested.
> +        if (terminate) {
> +            state.unlock();

I don't see where the contents of ionWorklist get deleted in this case.  (iiuc, asmJSWorklist isn't a problem since all the resources are LifoAlloc'd, right?)

::: js/src/jsworkers.h
@@ +142,5 @@
>      /* Any builder currently being compiled by Ion on this thread. */
>      ion::IonBuilder *ionBuilder;
>  
> +    /* Any AsmJS data currently being optimized by Ion on this thread. */
> +    AsmJSCompilationData *asmData;

Why do these need to be member variables?  It seems like they could be local variables for all the non-debug assertions and we could have some DebugOnly state for assertive purposes.
Comment 20 User image Sean Stangl [:sstangl] 2013-03-22 14:51:46 PDT
Created attachment 728447 [details] [diff] [review]
[Part 1/2] Separate sequential compilation into explicit MIR/LIR/Codegen phases. [v2]
Comment 21 User image Sean Stangl [:sstangl] 2013-03-22 14:54:49 PDT
Created attachment 728448 [details] [diff] [review]
[Part 2/2] Parallelize OdinMonkey compilations. [v2]

Note that I discovered yesterday that the "dlmalloc" test has different internal behavior in parallel mode: when run with --ion-parallel-compile=on, something causes a SIGSEGV that gets picked up by the Odin signal handler. This behavior was also present in the patch that underwent fuzzing.

I haven't discovered the cause of the signal, but outputs from every benchmark appear to match in both modes. Tested on several large codebases that should really show an error if one existed, and they work just fine.
Comment 22 User image Luke Wagner [:luke] 2013-03-22 16:03:35 PDT
Comment on attachment 728448 [details] [diff] [review]
[Part 2/2] Parallelize OdinMonkey compilations. [v2]

Review of attachment 728448 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!

::: js/src/ion/AsmJS.cpp
@@ +4596,5 @@
> +    Vector<AsmJSParallelTask, 0> tasks(m.cx());
> +    if (!tasks.initCapacity(numParallelJobs))
> +        return false;
> +
> +    for (size_t i = 0; i < numParallelJobs; i++)

Personally, I'd remove the preceding \n.
Comment 23 User image Sean Stangl [:sstangl] 2013-03-22 16:46:15 PDT
Created attachment 728513 [details] [diff] [review]
Follow-up patch to fix a broken assertion in Label's destructor in sequential mode.
Comment 25 User image Sean Stangl [:sstangl] 2013-03-22 17:28:07 PDT
Follow-up fix to --disable-ion and --disable-threadsafe builds (broke Android):
https://hg.mozilla.org/integration/mozilla-inbound/rev/48ccb418a9c0
Comment 26 User image Sean Stangl [:sstangl] 2013-03-22 17:56:34 PDT
Follow-up fix to Fedora warnings-as-errors build complaining that a Vector function has an argument specifically named "capacity":
https://hg.mozilla.org/integration/mozilla-inbound/rev/cedf014b5764

Note You need to log in before you can comment on or make changes to this bug.