Odin: Parallelize validation / MIR generation

RESOLVED FIXED in Firefox 43

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: bbouvier, Assigned: bbouvier)

Tracking

unspecified
mozilla43
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox43 fixed, firefox45 fixed)

Details

Attachments

(10 attachments, 28 obsolete attachments)

33.87 KB, patch
luke
: review+
Details | Diff | Splinter Review
12.46 KB, patch
luke
: review+
Details | Diff | Splinter Review
6.70 KB, patch
luke
: review+
Details | Diff | Splinter Review
2.33 KB, patch
luke
: review+
Details | Diff | Splinter Review
48.20 KB, patch
luke
: review+
Details | Diff | Splinter Review
26.06 KB, patch
luke
: review+
Details | Diff | Splinter Review
113.64 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
176.54 KB, patch
Details | Diff | Splinter Review
75.29 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
215.74 KB, patch
Details | Diff | Splinter Review
(Assignee)

Description

4 years ago
With bug 1157624 landed, parsing/validating a function will now yield a bytecode, which will be consumed right thereafter, and transformed into the MIR graph of the function. All of this happens sequentially. We can make it happen in parallel, or at least try.

Currently, if we have helper threads, we do this:

1. Parse and generate MIR on the main thread
2. Optimize the MIR and generate LIR on an helper thread
3. Generates the code on the main thread

We want to change this to:

1. Parse and generate bytecode on the main thread
2. Generate the initial MIR from the bytecode on an helper thread

Steps 3 and 4 remain unchanged.

Expected gains are explained in bug 1157624. This should reduce the compilation slowdowns brought by this bug, and actually improve compilation time on devices with several cores.
I was hoping that 3 could also be done on a helper thread (for simplicity, it could be the same helper thread as 2, keeping the same fork/join design as currently in CheckFunctionsParallel).  See bug 959263 for performance motivation.

Updated

4 years ago
Duplicate of this bug: 959263
(Assignee)

Comment 3

4 years ago
Posted patch 1. Renaming (obsolete) — Splinter Review
A simple renaming patch, changing all occurrences of "AsmJS" into "Asm", that is, the common denominator for asm.js & wasm.
Attachment #8634181 - Flags: review?(luke)
(Assignee)

Comment 4

4 years ago
So Jan said that to start, we could have one thread passing the tasks from the main thread to the other threads. That way, we'll just have to plug in the EmitMIR function call at the right place and (hopefully almost) not worry too much about the parallelization bits.

This is an attempt at doing this. It adds a new AsmJS task list called AsmMIRGenList (as the role of this list will be to store MIR generation tasks). The main thread passes the tasks to the one thread handling MIR gen, which just bounces it to the real asmjs-ion compilation threads. It's mostly a copy of the existing code (no attempt of factorization were really made during this), and asm.js tests all pass.
Attachment #8634183 - Flags: feedback?(sstangl)
Attachment #8634183 - Flags: feedback?(jdemooij)
(Assignee)

Comment 5

4 years ago
The Asm process looks like this:
- CheckFunction / FunctionBuilder parse and validate the asm.js code, building the bytecode out of it.
- EmitMIR / FunctionCompiler create the MIR from the bytecode.
- The rest of the asm.js pipeline hasn't really changed (if parallel compilation is enabled, threads will take care of the MIR optimizations + LIR gen, otherwise this is done on the main thread; in all cases, codegen is done on the main thread).

As of before the patch, EmitMIR *is* in CheckFunction. This patch just moves this out, to pave the way for having this in another thread.
Attachment #8634187 - Flags: review?(jdemooij)

Updated

4 years ago
Attachment #8634181 - Flags: review?(luke) → review+
Comment on attachment 8634183 [details] [diff] [review]
2. Pass tasks from one thread to the others

Review of attachment 8634183 [details] [diff] [review]:
-----------------------------------------------------------------

Nothing strange here.
Attachment #8634183 - Flags: feedback?(sstangl) → feedback+
(Assignee)

Comment 7

4 years ago
Posted patch 4. Move stuff around (obsolete) — Splinter Review
So this patch (only) moves things around between AsmJSValidate.cpp (which contains all the Validate and Emit functions before this patch) and the newly introduced AsmMIRGen.h/.cpp (for the Emit code) and AsmConstants.h (for things used in common -- note that's it's a temporary name, see 3)).

1) A lot of stuff has leaked in AsmConstants.h. In particular, the ModuleCompiler had to be there because it's used at validation (e.g. when checking for a global) and when emitting code (e.g. when retrieving all the Global structure from an index in the Globals array). That means that all basic Type structures (Type, VarType, RetType) had to leak (that's expected), but also Signature and a few other little things.
2) Rather than having a single "using" declaration in the header file (bad idea, as it leaks to all the cpp files which includes it), I've prefixed the few instances of frontend names (TokenStream, ParseNode) with frontend::, ditto for the mozilla:: names.
3) Considering that ModuleCompiler is in AsmConstants.h, the file name of the latter doesn't sound appropriate. Candidates: AsmCompilationContext.h (but that's partially false, as the compilation bits are in AsmMIRGen.cpp), or AsmModuleCompiler.h (although it contains other things -- maybe should split the header into two? AsmTypes.h and AsmModuleCompiler.h). Any other idea will be greatly appreciated!

Flagging for review, as this kind of changes is very subject to rebasing. (really, this is just code motion!)
Attachment #8634870 - Flags: review?(luke)
Hrm, I'm a bit concerned by sharing ModuleCompiler between Validate.cpp and Emit.cpp b/c ModuleCompiler is updated while validating function bodies and so there would be races between reads and writes (or at least it'd be easy to have this bug).

Rather, I was thinking that most of ModuleCompiler is either constant state (that is built up while parsing the global section of an asm.js module) or validator-local.  We could create a new ModuleGlobals class (and put that in AsmConstants.h (perhaps renamed to AsmGlobals.h) that is built by ModuleCompiler (perhaps renamed to ModuleValidator? and similarly for FunctionCompiler/Validator?) during the global section and then kept constant (with assertions) and shared with the emitter thereafter.

There are a handful of ModuleCompiler members that are updated throughout validating the module though: functions_, exits_, exitSignatures_.  For these:
 - I think functions_ can be split into the state needed by validation, and stored in ModuleValidator, and another separate table of state (viz., the entry_ Label) stored by a ModuleEmitter (the parallel of ModuleValidator stored in AsmEmit.cpp).
 - I *think* (not quite sure) exits_ can be local to ModuleValidator and the necessary info (exitIndex) baked into the IR handed to the emitter.

Sorry if this is a fair amount more work, but I think it would lead to a cleaner separation of validation logic (specific to asm.js) and compiler logic (shared by asm.js/wasm).
(Assignee)

Comment 9

4 years ago
Comment on attachment 8634870 [details] [diff] [review]
4. Move stuff around

Review of attachment 8634870 [details] [diff] [review]:
-----------------------------------------------------------------

No problem, I prefer to get it done the right way, even if that needs more work! I'll do a patch atop of this one.
Attachment #8634870 - Flags: review?(luke)
(Assignee)

Comment 10

4 years ago
As for each refactoring, I am in FUD: is it how you would expect to look like?

This patch moves ModuleCompiler::Global to the js::jit namespace, renames it AsmGlobal, implements a new data structure ModuleGlobals that contain the information modified at validation and used during compilation (but only for the ModuleCompiler::Global bookkeeping!). Is it what you expect that all other globals are moved to ModuleGlobals this way?
Attachment #8636798 - Flags: feedback?(luke)
(Assignee)

Comment 11

4 years ago
Posted patch asmglobal.refactoring.patch (obsolete) — Splinter Review
On top of the previous wip patch, a general refactoring for AsmGlobal:
- The friendship relation with ModuleCompiler is removed, thanks to a few static constructors (overloaded ctors couldn't work because of ambiguous cases in the types overloads, as FFI / FuncPtrTable, for instance).
- The enum is changed into an enum class.
- A few helpers are added so users of this class don't have to know the Which data structure (unless we explicitly switch on it).
Attachment #8636799 - Flags: feedback?(luke)
Comment on attachment 8636798 [details] [diff] [review]
wip - Move Global out of ModuleCompiler, rename it AsmGlobal, create the ModuleGlobals

Review of attachment 8636798 [details] [diff] [review]:
-----------------------------------------------------------------

Exactly, great!

::: js/src/asmjs/AsmGlobals.h
@@ +1703,5 @@
>          // The type of a const is the exact type of the literal (since its value
>          // cannot change) which is more precise than the corresponding vartype.
>          Type type = isConst ? Type::Of(lit) : VarType::Of(lit).toType();
> +        AsmGlobal::Which which = isConst ? AsmGlobal::ConstantLiteral : AsmGlobal::Variable;
> +        AsmGlobal* global = moduleLifo_.new_<AsmGlobal>(which);

I was thinking that ModuleGlobals would need its own LifoAlloc since it might outlive ModuleValidator (in the case of background Ion compilation).
Attachment #8636798 - Flags: feedback?(luke) → feedback+
Comment on attachment 8636798 [details] [diff] [review]
wip - Move Global out of ModuleCompiler, rename it AsmGlobal, create the ModuleGlobals

Review of attachment 8636798 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, on second thought: is it necessary to hoist AsmGlobal?  It seems like most of AsmGlobal's state is only relevant for validation and that the important bits (index and type) would be encoded in the IR and so AsmEmitMIR.cpp wouldn't need to see AsmGlobal.

::: js/src/asmjs/AsmGlobals.h
@@ +1703,5 @@
>          // The type of a const is the exact type of the literal (since its value
>          // cannot change) which is more precise than the corresponding vartype.
>          Type type = isConst ? Type::Of(lit) : VarType::Of(lit).toType();
> +        AsmGlobal::Which which = isConst ? AsmGlobal::ConstantLiteral : AsmGlobal::Variable;
> +        AsmGlobal* global = moduleLifo_.new_<AsmGlobal>(which);

I was thinking that ModuleGlobals would need its own LifoAlloc since it might outlive ModuleValidator (in the case of background Ion compilation).
Attachment #8636798 - Flags: feedback+ → feedback?(luke)
Comment on attachment 8634183 [details] [diff] [review]
2. Pass tasks from one thread to the others

Review of attachment 8634183 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.
Attachment #8634183 - Flags: feedback?(jdemooij) → feedback+
Comment on attachment 8634187 [details] [diff] [review]
3. Move EmitMIR out of CheckFunction

Review of attachment 8634187 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/AsmJSValidate.cpp
@@ +10407,2 @@
>          if (!mir)
> +            return false;

These ~12 lines also appear in CheckFunctionsParallel below. For now it seems okay as it'll change anyway...
Attachment #8634187 - Flags: review?(jdemooij) → review+
(Assignee)

Updated

4 years ago
Depends on: 1186424
(Assignee)

Updated

4 years ago
Attachment #8636798 - Flags: feedback?(luke)
(Assignee)

Updated

4 years ago
Attachment #8636799 - Flags: feedback?(luke)
(Assignee)

Comment 16

4 years ago
Now that bug 1186424 has landed and sticked, here's part 2 rebased.
Attachment #8634181 - Attachment is obsolete: true
Attachment #8634183 - Attachment is obsolete: true
Attachment #8634187 - Attachment is obsolete: true
Attachment #8634870 - Attachment is obsolete: true
Attachment #8636798 - Attachment is obsolete: true
Attachment #8636799 - Attachment is obsolete: true
(Assignee)

Comment 17

4 years ago
Posted patch 1. Move things around (obsolete) — Splinter Review
This patch moves things around and introduces a few new files. It is mostly code motion, but a quick explanation will probably help:

- AsmJSGlobals.h (maybe should be renamed AsmJSCommon.h? "globals" here means "global stuff to C++ asmjs/ files", it is not referring to globals in an asm.js module), which is a transitive closure of all the things necessary in common to the ModuleValidator and the ModuleCompiler. This is imported in asmjs/AsmJS{Validate,Compile}.cpp

- AsmJSCompile.h contains all the headers necessary for the creation and all ModuleCompiler steps. Those functions are all defined in AsmJSCompile.cpp. It contains ModuleCompileInput, because it's going to be included in the GlobalHelperThreadState by copy. This header will be eventually imported in vm/HelperThreads.{h,cpp} (for ModuleCompileInputs), asmjs/AsmJS{Validate,Compile}.cpp.

- AsmJSCompile.cpp, which contains the ModuleCompiler, all Emit* functions (stolen from AsmJSValidate.cpp). Interesting things are at the bottom of the files: these are the new exported functions CreateModuleCompiler, GenerateModuleMIR, GenerateModuleCode.
Attachment #8657790 - Attachment is obsolete: true
(Assignee)

Comment 18

4 years ago
Comment on attachment 8659902 [details] [diff] [review]
1. Move things around

Review of attachment 8659902 [details] [diff] [review]:
-----------------------------------------------------------------

Luke, see previous comment for the explanation, plus the following notes:

- the |createJitContext| formal in GenerateModuleCode is used in the parallelization patch.
- the interesting parts of AsmJSValidate.cpp start after "CheckFunctionsSequential"
- the interesting parts of AsmJSCompile.cpp start after "EXPORTED FUNCTIONS"
Attachment #8659902 - Flags: review?(luke)
(Assignee)

Comment 19

4 years ago
Posted patch 2. Parallelization WIP (obsolete) — Splinter Review
It works! For the cases where there are no errors. I need to work on the error management, when there's a failure, either in validation, on the compiler thread or in a helper thread.
(Assignee)

Comment 20

4 years ago
Posted patch 2. Parallelization WIP (obsolete) — Splinter Review
This one works better. Here's the grand scheme:

- we have at least 3 threads: the validator thread (== the main thread if the script isn't compiled with <script async>), the compiler thread (compiling the internal bytecode to the MIR graph and doing the final codegen step), the ion helper thread (doing the MIR optimization and LIR generation).
- instead of one queue "asmJSWorkloadList", we now have four:
  1) from the main thread to the compiler thread, that's asmJSMIRGenList
  2) from the compiler thread to the ion helper threads, that's asmJSOptimizeList
  3) from the ion helper threads to the compiler thread, that's asmJSCodegenList
  4) from the compiler thread back to the main thread, that's asmJSFreeList
- The compiler thread takes care of filling up and cleaning up the optimizeList and codegenList, in case of errors. The two other lists are cleaned up by the main thread.
- As the compiler thread has to wait from both the main thread and the ion thread, a new CondVar is introduced, specific to that purpose: ASM_MODULE.
- The end of the compilation is marked by a synchronization check-point between all threads, which happens at AsmJSValidate.cpp, CollectCompileResults: at this point, all ion threads have joined and the compiler thread as well. This makes error management simpler.

There's still one race I need to track and fix (in asm.js/testResize), but early feedback would be appreciated.
Attachment #8660025 - Attachment is obsolete: true
Attachment #8660789 - Flags: feedback?(sstangl)
Attachment #8660789 - Flags: feedback?(luke)
Attachment #8660789 - Flags: feedback?(jdemooij)
(Assignee)

Comment 21

4 years ago
Posted patch 2. Parallelization WIP (obsolete) — Splinter Review
And I've fixed the one last race condition I was observing in jit-tests (the CollectCompileResults function would wait only for the first CONSUMER signal, while it should actually wait until the signal actually is designed to reach CollectCompileResults).

See previous comment for some high-level explanation.
Attachment #8660789 - Attachment is obsolete: true
Attachment #8660789 - Flags: feedback?(sstangl)
Attachment #8660789 - Flags: feedback?(luke)
Attachment #8660789 - Flags: feedback?(jdemooij)
Attachment #8660802 - Flags: feedback?(sstangl)
Attachment #8660802 - Flags: feedback?(luke)
Attachment #8660802 - Flags: feedback?(jdemooij)
(Assignee)

Comment 22

4 years ago
Quick benchmark results (only one run on my machine, all cores enabled (8 cores), parallel parsing is disabled, further details will come tomorrow):

- for Massive (higher score is better), it seems to affect poppler-cold-preparation (score raised by 13.81%) and sqlite-cold-preparation (score raised by 5.77%).
- Angry Bots: compilation is sped up by >17% (before: 2813ms avg 4 runs, after: 2397ms avg 4 runs)
- Dead Trigger: compilation is sped up by >18% (before: 6036ms avg 5 runs, after: 4933ms avg 5 runs)
(Just to check: have you run this through TSan or Helgrind?)
Comment on attachment 8659902 [details] [diff] [review]
1. Move things around

Review of attachment 8659902 [details] [diff] [review]:
-----------------------------------------------------------------

Great!  Only small requests:

::: js/src/asmjs/AsmJSCompile.cpp
@@ +27,5 @@
> +using mozilla::DebugOnly;
> +using mozilla::Maybe;
> +
> +namespace js {
> +namespace jit {

Instead of placing everything here in namespace js::jit, could you instead place just the classes inside an unnamed namespace?  Also, can you put the exports in namespace js, not jit (similar to AsmJSValidate).  You'll need to fully qualify the exported functions (with js::) like AsmJSValidate.cpp but I think that's a nice syntactic marker that it's exported.  Lastly, it'd be good to put 'Asm' in all the names in AsmJSCompile.h (to distinguish from normal JS modules).

@@ +3066,5 @@
> +}
> +
> +// ****************************************************************************
> +// EXPORTED FUNCTIONS
> +// ****************************************************************************

With the above namespacing changes, you should be able to drop this comment (the js:: prefix will say "this is an export").

@@ +3103,5 @@
> +                   bool createJitContext)
> +{
> +    if (createJitContext) {
> +        JitContext jitContext(m.runtime(), /* CompileCompartment = */ nullptr, &mir.alloc());
> +        return GenerateModuleCode(m, func, mir, lir, false);

How about having GenerateModuleCode *always* create a JitContext and then, in CheckFunctionsSequential(), scoping the JitContext to only include OptimizeMIR/GenerateLIR?  I think the perf effect will be negligible.

::: js/src/asmjs/AsmJSCompile.h
@@ +47,5 @@
> +        usesSignalHandlersForOOB(usesSignalHandlersForOOB)
> +    {}
> +};
> +
> +class ModuleCompilerScope {

{ on newline

@@ +48,5 @@
> +    {}
> +};
> +
> +class ModuleCompilerScope {
> +    MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER;

From the recent m.d.platform email, there is now MOZ_RAII which is a purely static check and ensures this only lives on the stack.  That actually exposes an issue with this class...

@@ +60,5 @@
> +        MOZ_GUARD_OBJECT_NOTIFIER_INIT;
> +    }
> +
> +    ModuleCompiler* module() const {
> +        return m_;

I'd return a ModuleCompiler& and assert in the ctor that it's non-null.

@@ +66,5 @@
> +
> +    ~ModuleCompilerScope();
> +};
> +
> +ModuleCompilerScope CreateModuleCompiler(ModuleCompileInputs mci);

... it should be returned by value, since the temporary's destructor would destroy the ModuleCompiler.  (I think the only reason you haven't seen this is b/c there is an optimization allowed by the C++ standard wherein a temporary that is immediately returned can be elided).  Rather, I think the ModuleCompilerScope should be an outparm pointer argument to CreateModuleCompiler.  You probably also want to delete both cctor and op= from ModuleCompilerScope.

@@ +71,5 @@
> +
> +bool GenerateModuleMIR(ModuleCompiler& m, LifoAlloc& lifo, AsmFunction& func, MIRGenerator** mir);
> +bool GenerateModuleCode(ModuleCompiler& m, AsmFunction& func, MIRGenerator& mir, LIRGraph& lir,
> +                        bool createJitContext = false);
> +bool FinishModuleCompilation(ModuleCompiler& m, ModuleCompileResults** results);

How about passing results as a ScopedJSDeletePtr<ModuleCompileResults>* (in the style of the various other 'finish' functions)?

::: js/src/asmjs/AsmJSValidate.cpp
@@ +6492,5 @@
>          if (!func)
>              continue;
>  
>          MIRGenerator* mir;
> +        if (!GenerateModuleMIR(*mc, lifo, *func, &mir))

Taking into account other comment, how about naming this GenerateAsmFunctionMIR()?

@@ +6509,5 @@
>              return m.failOffset(func->srcBegin(), "internal compiler failure (probably out of memory)");
>  
>          func->accumulateCompileTime((PRMJ_Now() - before) / PRMJ_USEC_PER_MSEC);
>  
> +        if (!GenerateModuleCode(*mc, *func, *mir, *lir))

and GenerateAsmFunctionCode()?

@@ +6520,5 @@
> +    ModuleCompileResults* results;
> +    if (!FinishModuleCompilation(*mc, &results))
> +        return false;
> +
> +    *compileResults = results;

With the changed mentioned elsewhere, you could just have 'return FinishModuleCompilation(mc, compileResults);'.

@@ +6705,5 @@
> +    ModuleCompileResults* results;
> +    if (!FinishModuleCompilation(*mc, &results))
> +        return false;
> +
> +    *compileResults = results;

ditto
Attachment #8659902 - Flags: review?(luke) → review+
(Assignee)

Comment 25

4 years ago
Thanks for the review! Considering all the changes I've made, plus the ones we've discussed on irc, here's another patch. Here's what's changed:

- addressed all your comments
- in AsmJSGlobals, put everything in the js:: namespace, except the Type/VarType/RetType/Signature/LifoSignature, which go to js::wasm::
- prefix the class names with Asm*, in AsmJSCompile.h (that could all be part of wasm::, but keeping the wasm:: namespace internal to the asmjs/ directory sounded nice).
- changed the signatures of Create/Destroy ModuleCompiler to return void, as they're not fallible.
Attachment #8659902 - Attachment is obsolete: true
Attachment #8661385 - Flags: review?(luke)
Comment on attachment 8661385 [details] [diff] [review]
1. Move things around

Review of attachment 8661385 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8661385 - Flags: review?(luke) → review+
(Assignee)

Updated

4 years ago
Depends on: 1205328
(In reply to Benjamin Bouvier [:bbouvier] from comment #20)
Sounds mostly good, a few questions:

> - we have at least 3 threads: the validator thread (== the main thread if
> the script isn't compiled with <script async>), the compiler thread
> (compiling the internal bytecode to the MIR graph and doing the final
> codegen step), the ion helper thread (doing the MIR optimization and LIR
> generation).

One question is how this plays out when there are 2 cores (no hyperthreads).  Since Ion compilation is ~3-4x slower than parsing, it stands to reason that the validator thread finishes its work 1/3 - 1/4 of the way through and then that core will switch to running the ion compile thread.  However, this will likely leave one core partially idle (since the compiler thread will block when there are >1 Ion compile jobs outstanding).  So even in the 2 core case, I expect we want at least 2 Ion helper threads to saturate available cores after parsing completes.

Before parsing completes, though, there will be contention between parsing, compiling and the 1 or 2 Ion threads.  This might be fine; the compiler thread will relinquish its core frequently when blocked on outstanding Ion compile jobs.  If we wanted to be fancy, we'd dynamically throttle so that we allowed 1 less outstanding Ion compile job while parsing was active; this signaled from the validator->compiler via a sentinel AsmFunction that the validator sticks in when its done.  It'd be good to measure this case (using taskset to artificially limit process to 2 cores and then using --thread-count=2 to SetFakeCPUCount()).

>   2) from the compiler thread to the ion helper threads, that's
> asmJSOptimizeList
>   3) from the ion helper threads to the compiler thread, that's
> asmJSCodegenList

These two are exactly like asmJSWorklist_/asmJSFinishedList_, right?

>   4) from the compiler thread back to the main thread, that's asmJSFreeList

I was wondering why this is necessary.  It seems like all you'd need for the compile->validate thread direction is a condvar which the validation thread blocks on when gets to the end of the function bodies and then you can communicate the ModuleCompileResults via some simple variable protected by the condvar's lock.
(Assignee)

Comment 28

4 years ago
(In reply to Luke Wagner [:luke] from comment #27)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #20)
> Sounds mostly good, a few questions:
> 
> > - we have at least 3 threads: the validator thread (== the main thread if
> > the script isn't compiled with <script async>), the compiler thread
> > (compiling the internal bytecode to the MIR graph and doing the final
> > codegen step), the ion helper thread (doing the MIR optimization and LIR
> > generation).
> 
> One question is how this plays out when there are 2 cores (no hyperthreads).
> Since Ion compilation is ~3-4x slower than parsing, it stands to reason that
> the validator thread finishes its work 1/3 - 1/4 of the way through and then
> that core will switch to running the ion compile thread.  However, this will
> likely leave one core partially idle (since the compiler thread will block
> when there are >1 Ion compile jobs outstanding).  So even in the 2 core
> case, I expect we want at least 2 Ion helper threads to saturate available
> cores after parsing completes.

Actually, I haven't changed the formerly-called maxAsmJSCompileThreads() limit, so it's still at least 2+. I'll test with taskset and --thread-count=2 tomorrow.

> 
> >   2) from the compiler thread to the ion helper threads, that's
> > asmJSOptimizeList
> >   3) from the ion helper threads to the compiler thread, that's
> > asmJSCodegenList
> 
> These two are exactly like asmJSWorklist_/asmJSFinishedList_, right?
Yes, respectively renamed, to make their new roles clearer. (the asmJSMIRGenList and asmJSFreeList have similar roles ("worklist" and "finished" lists), from the validator thread's stand point)

> 
> >   4) from the compiler thread back to the main thread, that's asmJSFreeList
> 
> I was wondering why this is necessary.  It seems like all you'd need for the
> compile->validate thread direction is a condvar which the validation thread
> blocks on when gets to the end of the function bodies and then you can
> communicate the ModuleCompileResults via some simple variable protected by
> the condvar's lock.
And this is what's done in this patch. However, not having a free list implies that we'd create new Lifos for the tasks, although we can reuse them as we did before, here. This free list just servers that purpose of reusing the free lifos.
(In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> And this is what's done in this patch. However, not having a free list
> implies that we'd create new Lifos for the tasks, although we can reuse them
> as we did before, here. This free list just servers that purpose of reusing
> the free lifos.

Which LIFOs?
(Assignee)

Comment 30

4 years ago
(In reply to Luke Wagner [:luke] from comment #29)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #28)
> > And this is what's done in this patch. However, not having a free list
> > implies that we'd create new Lifos for the tasks, although we can reuse them
> > as we did before, here. This free list just servers that purpose of reusing
> > the free lifos.
> 
> Which LIFOs?

The lifo contained in the AsmJSParallelTask (as shown in the patch). We still need to have one per task, because the AsmFunction is lifo-allocated on the main thread during validation, and then the function is alive until the end of the compilation (so it's live in the compiler thread, ion optimizer thread, then compiler thread and then main thread).

I'll experiment with just creating new lifos instead of recycling them (this will reduce locking, so it potentially could make compilation faster), but this could also provoke a memory peak (that's why we were recycling the lifos, iirc).
Status: NEW → ASSIGNED
(Assignee)

Comment 32

4 years ago
Landing the first part, after a few try sessions (had to change the rooting annotation in the automation directory and remove the MOZ_STACK_CLASS on ModuleCompiler, as it's heap allocated now -- Terrence told me that was ok.).
https://hg.mozilla.org/mozilla-central/rev/f3b279e03095
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
status-firefox43: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
(Assignee)

Comment 34

4 years ago
Oops, I was pretty sure I added 'leave-open' to the whiteboard, probably forgot to save though...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 35

4 years ago
Comment on attachment 8660802 [details] [diff] [review]
2. Parallelization WIP

There is one flaw with this patch, that Luke's questions regarding the lifo raised: the main thread has to wait that ion optimizer threads are done with compiling, to start parsing the next functions, which *actually* causes big slow downs, for a small number of cores. Better patch is being worked on.
Attachment #8660802 - Flags: feedback?(sstangl)
Attachment #8660802 - Flags: feedback?(luke)
Attachment #8660802 - Flags: feedback?(jdemooij)
(Assignee)

Comment 36

4 years ago
Posted patch wip.patch (obsolete) — Splinter Review
So, revamping the parallelization patch: not finished yet. In particular, there are a few TODOs in the code; and I'm wondering how to make the state in the GlobalHelperThread less intrusive (there are quite a few new variables in there).

So the new paradigm is the following:
- the validator thread allocates all AsmFunction on the main thread, and parses them as fast as possible, then pushes them to the asmJSMIRGenList_
- the compiler thread receives the functions and tries to saturate all the helper threads as fast as possible (there's a lifo reusing mechanism here). For each incoming function from asmJSMIRGenList_, the MIR is generated and then the function is passed to the asmJSOptimizeList_.
- the ion optimizer thread receives the non-optimized MIR, optimizes the MIR and generates the LIR, and pushes the result to the asmJSCodegenList_.
- the compiler thread receives that message and generates the code.
- once all functions have been codegen'd, the results are sent back to the validator thread.
Attachment #8660802 - Attachment is obsolete: true
Attachment #8663005 - Flags: feedback?(sstangl)
Attachment #8663005 - Flags: feedback?(luke)
Attachment #8663005 - Flags: feedback?(jdemooij)
(Assignee)

Comment 37

4 years ago
Some benchmarking and results (spoiler alert: quite a long post). In all cases: 3 runs, take the median value (variance is very low), compare medians. n % speedup means the compilation is n% faster with the patch.

1) First, the bare idea explained in comment 36, explored on the compilation time of DeadTrigger2
- 1 thread:  3.82% speedup
- 2 threads: 7.98% speedup
- 3 threads: ~0% speedup (?)
- 4 threads: 9% speedup
- 8 threads: 18.40% speedup (from 5097ms to 4159ms)

So I've tried the "fancy" thing explained in comment 27: limit the number of available optimizer tasks to be N - 1 when the validator thread is parsing; when it's done parsing, increase the number of available optimizer tasks by 1 (so it'd be N) and notify an optimizer thread. This didn't help, surprisingly, except in cases where the number of threads is high.

So I've tried a third approach, which is the same throttling approach, but without notifying the optimizer thread (this spares us the notify/wake). In this case, it penalizes a bit machines with low threads (there's still a speedup, but it's slightly smaller (around 1%)). But machines with a lot of threads seem to like that better (wins us a few 100ms on certain benchmarks). (Does that make sense, or is it just a statistical aberration?)

Here are the numbers with the latter approach (throttling + no notifying), with all cores:
- DeadTrigger2:  18.46% speedup (from 5097ms to 4156ms)
- Caesar3:       30.27% speedup (from 4704ms to 3280ms)
- AngryBots:     20.95% speedup (from 2419ms to 1912ms)
- Platformer:    26.17% speedup (from 7315ms to 5401ms)
- Tappy Chicken: 26.07% speedup (from 5487ms to 4056ms)

So it does look good.
Great results!  Could you explain the throttling approach in more detail?
Also, have you run this through Helgrind/TSan yet :) ?
(Assignee)

Comment 39

4 years ago
(In reply to Luke Wagner [:luke] from comment #38)
> Great results!  Could you explain the throttling approach in more detail?

Sure! So let's say we have a machine with N threads. In Odin, we have the validator thread + the compiler thread + N Ion helper threads by default.

The validator thread sends the internal bytecode to the compiler thread, which creates the MIR from it and sends them to the helper threads.

1) When the validator thread is doing work, that is, when it's parsing code and making the bytecode, only N - 1 helper threads are allowed to work. The underlying assumption is that there are N threads and that the compiler thread is mostly idle; so the only "really" active threads are the validator thread and the helper threads, that is, 1 + (N - 1) == N threads.

2) When the validator thread is done working, it awaits the compiler thread results by sending a special termination message (this is an AsmJSParallelTask with no bytecode, used as a sentinel). When the compiler thread sees this message, it sets a flag on the GlobalHelperThreadState, which allows one more helper thread (through throttledMaxAsmJSOptimizationThreads() === the maximum number of helper threads at a given time). Next time an unassigned inactive helper thread wakes up, it will start being active, if there are any awaiting optimization tasks.

3) At this point, we should *probably* notify one helper thread, if all the N - 1 other threads are active, to make sure it ever starts. When benchmarking on Friday, I noticed this notifying was actually making things slightly slower (probably because of the wake up?), and the best wins were happening when there was the throttling mechanism but **no** notifying. This is weird, and I am going to double-check my assumption with simple timers (PRMJ_Now()).

Does that make it clearer?

> Also, have you run this through Helgrind/TSan yet :) ?

I've tried TSAN, but it was crashing at startup, because there were already some races (that is, before I even applied the patch). I'll retry and will definitely try with Helgrind as well.
(Assignee)

Comment 40

4 years ago
So Helgrind crashes even on latest inbound, when running basic asm.js tests. Back to tsan, which is included by default on ubuntu's clang (I've tried to manually compile clang for the entire day, but hit a linker internal failure, then tried several other strategies, before just using the plain ubuntu package, which actually works fine).

TSAN doesn't show anything related to this patch, when running the asm.js test suite.
(Assignee)

Comment 41

4 years ago
Comment on attachment 8663005 [details] [diff] [review]
wip.patch

Cancelling review for now, another version is in the pipes.
Attachment #8663005 - Flags: feedback?(sstangl)
Attachment #8663005 - Flags: feedback?(luke)
Attachment #8663005 - Flags: feedback?(jdemooij)
(Assignee)

Comment 42

4 years ago
Posted patch 2. Parallelization (obsolete) — Splinter Review
A first final version. I've really stared at this code too much, tried other design approaches that didn't work, tried something else, got back to initial state, then retried another approach. It'd be great if other pairs of eyes would look at this :)
Attachment #8663005 - Attachment is obsolete: true
Attachment #8666109 - Flags: review?(luke)
Attachment #8666109 - Flags: feedback?(sstangl)
Attachment #8666109 - Flags: feedback?(jdemooij)
(Assignee)

Comment 43

4 years ago
Comment on attachment 8666109 [details] [diff] [review]
2. Parallelization

Review of attachment 8666109 [details] [diff] [review]:
-----------------------------------------------------------------

three nits I've fixed locally.

::: js/src/asmjs/AsmJSCompile.cpp
@@ +3388,5 @@
> +        tasks.infallibleAppend(LIFO_ALLOC_PARALLEL_CHUNK_SIZE);
> +
> +    ModuleCompiler& module = scope.module();
> +    ParallelCompileGroupState group(tasks);
> +    uint32_t failedJobs;

fixed locally: this is initialized to 0.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +6624,5 @@
> +    unsigned numFunctions = 0;
> +    for (;;) {
> +        // Check if we failed on another thread
> +        if (ParallelState.failed())
> +            return false;

although it's a "safe" race condition (unprotected read), tsan doesn't like it: i've removed it from the code, as the same check is done in StartOffThreadAsmJSCompile below.

@@ +6659,5 @@
> +        if (!AwaitEndOfCompilation(compileResults))
> +            return false;
> +    }
> +
> +    MOZ_ASSERT(!ParallelState.failed());

And here as well (the check is done in AwaitEndOfCompilation anyways, which returns false is we failed())
Comment on attachment 8666109 [details] [diff] [review]
2. Parallelization

Review of attachment 8666109 [details] [diff] [review]:
-----------------------------------------------------------------

I was applying the patch to play around with it and two things:
 - AsmJSParallelState::isLocked() needs #ifdef DEBUG
 - I'm seeing a null crash caused by ParallelState.harvestResults() returning null but ParallelState.failed() being false in AwaitEndOfCompilation().
Comment on attachment 8666109 [details] [diff] [review]
2. Parallelization

Review of attachment 8666109 [details] [diff] [review]:
-----------------------------------------------------------------

Here's a first pass of comments, mostly just trying to follow everything and see how it fits together.  In general, this is looking really nice and clean.  One non-nit below that may help manycore performance.

::: js/src/asmjs/AsmJSCompile.cpp
@@ +3368,5 @@
> +    MOZ_ASSERT(HelperThreadState().canStartAsmJSCompile());
> +    MOZ_ASSERT(thread->idle());
> +
> +    AsmModuleCompilerScope scope;
> +    ModuleCompileInputs inputs(ParallelState.harvestInputs());

I think it'd look a tiny bit nicer to have 'ModuleCompileInputs inputs = ...' and to move this statement above the decl of 'scope' and separated by a \n.

Also, with the changes mentioned in HelperThreads.h (using separate locks for the mirgenList and codegenList), I think you could move the AutoUnlockHelperThreadState to right after harvestInputs() (showing the close connection).

::: js/src/asmjs/AsmJSCompile.h
@@ +75,1 @@
>      ~AsmModuleCompilerScope();

Pre-existing, but could AsmModuleCompilerScope::setModule() MOZ_ASSERT(!m_)?

::: js/src/asmjs/AsmJSValidate.cpp
@@ +6619,2 @@
>      }
>  #endif

I think the readyForNextModuleCompilation() asserts make this whole #ifdef block redundant.

::: js/src/vm/HelperThreads.cpp
@@ +1094,5 @@
>      static_cast<HelperThread*>(arg)->threadLoop();
>  }
>  
>  void
>  HelperThread::handleAsmJSWorkload()

Perhaps rename handleAsmJSOptimizeWorkload.

::: js/src/vm/HelperThreads.h
@@ +70,4 @@
>      // Simultaneous AsmJS compilations all service the same AsmJS module.
>      // The main thread must pick up finished optimizations and perform codegen.
>      // |asmJSCompilationInProgress| is used to avoid triggering compilations
>      // for more than one module at a time.

Maybe update this comment with the overview of how compilation works.

@@ +70,5 @@
>      // Simultaneous AsmJS compilations all service the same AsmJS module.
>      // The main thread must pick up finished optimizations and perform codegen.
>      // |asmJSCompilationInProgress| is used to avoid triggering compilations
>      // for more than one module at a time.
> +    class AsmJSParallelState {

How about moving this class as well as AsmJSCompileTask/AsmJSOptimizeTask to AsmJSCompile.h (which is already #included by this file)?

@@ +123,5 @@
> +        }
> +
> +        void reset() {
> +            failedJobs_ = 0;
> +            successfulJobs_ = 0;

Does this function need to be called by the ctor (seems fine to leave these uninitialized).  If so, then perhaps you could merge reset() with setInputs() and rename the result startCompilation(ModuleCompileInputs) which has a nice feel.

@@ +128,5 @@
> +            totalJobs_ = UINT32_MAX;
> +            failedFunc_ = nullptr;
> +            status_ = INITIAL;
> +        }
> +        void setInputs(js::ModuleCompileInputs&& inputs) {

Since ModuleCompileInputs is tiny and copied later in harvestInputs(), I'd take a copy here.  If ModuleCompileInputs gets some fat fields, they'll have disabled copy and then everything can be updated to move semantics.

@@ +136,5 @@
> +
> +        bool readyForNextModuleCompilation() const {
> +            return mirGenList_.empty() &&
> +                   optimizeList_.empty() &&
> +                   codegenList_.empty() &&

How about instead making ParallelCompilationGuard be a member class of AsmJSParallelState (just named Guard), then making compilationInProgress a private field and having Guard() and ~Guard() both assert the condition asserted here; that way the state of compilationInProgress is tied directly to all these lists being empty.

@@ +157,5 @@
> +        uint32_t numTotalJobs() const      { MOZ_ASSERT(hasNumTotalJobs()); return totalJobs_; }
> +
> +        AsmJSCompileTaskVector&  mirGenList()   { MOZ_ASSERT(isLocked()); return mirGenList_; }
> +        AsmJSOptimizeTaskVector& optimizeList() { MOZ_ASSERT(isLocked()); return optimizeList_; }
> +        AsmJSOptimizeTaskVector& codegenList()  { MOZ_ASSERT(isLocked()); return codegenList_; }

The global helper thread lock is only necessary to synchronize the insertion of jobs into any of the lists that get read by HelperThread::threadLoop (which takes 1 lock, waits on the PRODUCER condvar, and looks at all the lists).  So that means inputs_ and optimizeList_ need the lock, but mirGenList_ and codegenList_ don't.  Having separate locks/condvars for *each* of mirGenList_ and codegenList_ will reduce contention and may give some speedups on manycore devices so let's do that.  This would generalize what you've needed to do with ASM_MODULE; I'd move that condvar into AsmJSParallelState along with the new lock.  It would be good to organize the member variable declarations so that they are grouped by which lock they are protected by (I'd even interleave private declarations with the public member functions that access them and assert the lock).  (Also, in general, it would be good for *every* member function to assert the lock which protects the fields it touches.)

Once you do this, you don't need to have js::StartOffThreadAsmJSCompile() in HelperThreads.cpp, you can inline it into the AsmJSValidate.cpp caller.  I'd also move HelperThread::handleAsmJSWorkload into AsmJSCompile.cpp (symmetric to HandleAsmJSCompilerWorkload()).

@@ +159,5 @@
> +        AsmJSCompileTaskVector&  mirGenList()   { MOZ_ASSERT(isLocked()); return mirGenList_; }
> +        AsmJSOptimizeTaskVector& optimizeList() { MOZ_ASSERT(isLocked()); return optimizeList_; }
> +        AsmJSOptimizeTaskVector& codegenList()  { MOZ_ASSERT(isLocked()); return codegenList_; }
> +
> +        js::ModuleCompileInputs harvestInputs() {

"harvest" sounds like reaping the results of work, whereas this is more handing off the inputs to work from validator thread to compiler thread.  How about takeCompilerInput()?

@@ +568,5 @@
>          HelperThreadState().lock();
>      }
>  };
>  
> +struct AsmJSCompileTask

Perhaps name AsmJSMIRGenTask?
Attachment #8666109 - Flags: review?(luke)
(Assignee)

Comment 46

4 years ago
Comment on attachment 8666109 [details] [diff] [review]
2. Parallelization

Obsoleting the patch: we're going to try to change the approach here, by having only 2 threads. There'll be some nasty patching of the labels for internal function calls, but this might work eventually.
Attachment #8666109 - Attachment is obsolete: true
Attachment #8666109 - Flags: feedback?(sstangl)
Attachment #8666109 - Flags: feedback?(jdemooij)
This sounds promising.  I'd suggest, for the function call patching, to extend AsmJSModule::RelativeLink::Kind with some Call that is patched with the relative displacement.  Hopefully that would fit in relatively naturally.
(Assignee)

Comment 48

4 years ago
While looking at CodeLabels, I've realized some of the code is strictly duplicated on all platforms. Also, I've added a comment explaining in simple and precise wording what a CodeLabel is.

8 files changed, 22 insertions(+), 59 deletions(-)
Attachment #8674958 - Flags: review?(jdemooij)
(Assignee)

Comment 49

4 years ago
This one isn't mandatory, but I've found it useful. When reading the AsmJS frame iterator code, I've been confused by the naming of "entry" vs "begin", and I had to read the code to figure out it was the non profiling entry vs profiling entry. Let's make it clear in the name directly.

Also, it was unclear why we needed to have a label to store the "end" of the function; but if it's written in the name that it's the end *after OOL code*, it makes it clearer, in my opinion, that it's used for the binary search over code.
Attachment #8674962 - Flags: review?(luke)
(Assignee)

Comment 50

4 years ago
So, asking for review but feel free to switch to feedback if you'd prefer to see the final, entire patch (it's going to be much bigger though!). After reading a lot about how the Labels work, the different RelativeLink patching that we considered (but which is not really needed, as we have the final information much before static linking), I realized that there was one function doing exactly what I needed for labels: masm.retarget.

In the future, only the FunctionCompiler will remain. As a matter of fact, the ModuleCompiler will disappear, and all its shared members will need to be in the FC itself: this includes error labels (OOB, conversion error, SO, etc.), compile inputs (copied once per function? that's 2 pointers + 1 bool -- they could easily be shared), and so one. The error labels will be merged the same way at the end of the module compilation. Next patch is going to be big, but there'll be a lot of code removal! \o/
Attachment #8674994 - Flags: review?(luke)
Comment on attachment 8674962 [details] [diff] [review]
Bonus. Rename AsmJS function Labels

Review of attachment 8674962 [details] [diff] [review]:
-----------------------------------------------------------------

Good point
Attachment #8674962 - Flags: review?(luke) → review+
Comment on attachment 8674994 [details] [diff] [review]
3. Give each function their own labels

Review of attachment 8674994 [details] [diff] [review]:
-----------------------------------------------------------------

I'm sorta worried that this patch could, in the worst case, allocate O(n^2) memory for n functions in a module (esp since some modules can have 100,000+ functions).  It seems what's ultimately necessary in ModuleCompileResults is a list of pairs (caller offset, callee func-index).  One thing I just noticed is that we already maintain an array of jit::CallSite in the MacroAssembler for the purpose of stack walking *and* repatching asm.js->asm.js calls in AsmJSModule::setProfilingEnabled.  What if instead MAsmJSCall::Callee::u.internal_ was changed from a Label* to a targetFuncIndex (integer).  Then targetFuncIndex gets written into the instructions stream (instead of the actual callee offset) and, in finish(), we repatch.  (Writing targetFuncIndex into insn stream avoids storing in the jit::CallSite which lives persistently in AsmJSModule and thus would waste memory since we don't need targetFuncIndex after finish().)
Attachment #8674994 - Flags: review?(luke)
Comment on attachment 8674958 [details] [diff] [review]
2. Hoist a few things in AssemblerShared

Stealing.
Attachment #8674958 - Flags: review?(jdemooij) → review+
(Assignee)

Updated

4 years ago
Keywords: leave-open
(Assignee)

Comment 57

4 years ago
Posted patch wip.patch (obsolete) — Splinter Review
Work in progress -- enabling profiling doesn't work at the moment, for a reason I ignore.
Attachment #8674994 - Attachment is obsolete: true
(Assignee)

Comment 58

4 years ago
Posted patch wip.patch (obsolete) — Splinter Review
So I figured out my mistake: I've added an overload for MacroAssembler::call(uint32_t), although there was already MacroAssembler::call(AsmJSImmKind) and AsmJSImmKind ~ uint32_t, so the compiler was calling the wrong one at the wrong place, of course... Took me too long to find out :(

So is this something like what you had in mind? It's implemented only for x86/x64 (tested only on x64) at the moment.
Attachment #8676396 - Attachment is obsolete: true
Attachment #8676942 - Flags: feedback?(luke)
(Assignee)

Comment 59

4 years ago
Minor drive-by cleanup
Attachment #8677400 - Flags: review?(luke)
(Assignee)

Comment 60

4 years ago
This implements the strategy as described in comment 52.

I've switched strategy since the previous WIP patch and decided it's better to take one bit off the "column" member in CallSiteDesc rather than making sentinel values that rely on undescribed assumptions (like: function code sections have to be present before the trampolines and error handlers).

The platform specific paths are implemented on x86, x64 and ARM (not ARM64 though).

Next steps:
- have a similar mechanism for all labels that are in ModuleCompiler
- have a masm per FunctionCompiler and merge them in finishFunctionBodies
Attachment #8676942 - Attachment is obsolete: true
Attachment #8676942 - Flags: feedback?(luke)
Attachment #8677405 - Flags: review?(luke)
(Assignee)

Comment 61

4 years ago
Comment on attachment 8677405 [details] [diff] [review]
4. Make internal calls thread-local

For what it's worth, since I've moved forward, I've changed things a bit, and it doesn't make sense to have this patch reviewed at this point.
Attachment #8677405 - Attachment is obsolete: true
Attachment #8677405 - Flags: review?(luke)
(Assignee)

Comment 63

4 years ago
This one is going to change when we get to merge masm, as we can avoid the O(n) memory behavior (n being the num of functions in the module). Not sure if you'd prefer to review separately (as a transitional state), or later, but with the masm changes (bigger patch).

Updated

4 years ago
Attachment #8677400 - Flags: review?(luke) → review+
(Assignee)

Comment 64

3 years ago
Comment on attachment 8678230 [details] [diff] [review]
4. Make internal calls thread-local

This patch hasn't changed since I've been working on the rest. See comment 60 for explanations.

One implementation detail worth mentioning is the AsmJSInternalCallee which is a just a typed wrapper for an uint32_t. There is a call overload that takes AsmJSInternalCallee as a parameter. Directly using an uint32_t wasn't doable, because one of the "call" functions takes an enum argument, so when implementing a call(uint32_t) overload, the compiler silently decided to use the call(uint32_t) function when the argument was the enum type. This ended up wasting almost one day of my time, justifying having a separate structure for this.
Attachment #8678230 - Flags: review?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #64)
> This ended
> up wasting almost one day of my time, justifying having a separate structure
> for this.

That is very well justified indeed.  Another reason to use enum class everywhere.
(Assignee)

Comment 66

3 years ago
Posted patch WIP for merging masm on x64 (obsolete) — Splinter Review
With this patch, we're able to compile, under x64, the following simple module:

function f() {
    "use asm";
    function g() { return +(+h() + +i()); }
    function h() { return 42.1; }
    function i() { return 13.37; }
    return g;
}
assertEq(f()(), 42.1 + 13.37);
Comment on attachment 8678230 [details] [diff] [review]
4. Make internal calls thread-local

Review of attachment 8678230 [details] [diff] [review]:
-----------------------------------------------------------------

Great work, this is exactly what I was hoping for.  Unless I'm missing something (quite possible) I'd be surprised if this significantly affects compile-time perf, so I bet it's fine to land before the rest.  Mostly nits below:

::: js/src/asmjs/AsmJSCompile.cpp
@@ +763,2 @@
>            case MAsmJSCall::Callee::Dynamic:  kind = CallSiteDesc::Register; break;
>            case MAsmJSCall::Callee::Builtin:  kind = CallSiteDesc::Register; break;

I'd still have isInternal = false in the Dynamic/Builtin cases (so you don't have to go up and read the decl).  Also, can you remove the "// initialize to silence GCC warning" comment?  They don't add much and there isn't any risk if someone accidentally removes it.  Also need space in "isInternal= true".

@@ +764,5 @@
>            case MAsmJSCall::Callee::Builtin:  kind = CallSiteDesc::Register; break;
>          }
>  
> +        MAsmJSCall* ins = MAsmJSCall::New(alloc(),
> +                                          CallSiteDesc(call.lineno_, call.column_, kind, isInternal),

Perhaps construct CallSiteDesc with a separate statement?

@@ +3064,5 @@
>  }
>  
>  bool
> +js::GenerateAsmFunctionCode(ModuleCompiler& m, AsmFunction& func, MIRGenerator& mir, LIRGraph& lir,
> +                            LifoAlloc& lifo, FunctionCompileResults& results)

Since 'results' is a mutated outparam, can pass by *?

@@ +3081,5 @@
>      ScopedJSDeletePtr<CodeGenerator> codegen(js_new<CodeGenerator>(&mir, &lir, &m.masm()));
>      if (!codegen)
>          return false;
>  
> +    AsmJSFunctionLabels labels(results.entry(), m.stackOverflowLabel());

I'm a little uncomfortable with all the copying of labels (into the FunctionCompileResult, then into the Vector<Label> in ModuleValidator since Labels are these stateful things that get embedded into linked lists.  That's why AsmJSFunctionLabels is MOZ_STACK_CLASS (I should have also deleted the cctor/op= too; can you?).  However, all you really want at the end is the offset().  However, this offset is already present in the CodeRange (named .entry() which, matching your previous renaming, should probably be changed to nonProfilingEntry).  So I think you can remove the FunctionCompileResults::label_ altogether and use CodeRange::nonProfilingEntry instead later on.

Same goes for FCR::endOffset_; I think it can be totally removed.

@@ +3091,5 @@
> +    PropertyName* funcName = func.name();
> +    unsigned line = func.lineno();
> +
> +    // Fill in the results of the function's compilation
> +    IonScriptCounts* counts = codegen->extractScriptCounts();

Could you inline the call to extractScriptCounts() into the finishCodegen() call?

@@ +3092,5 @@
> +    unsigned line = func.lineno();
> +
> +    // Fill in the results of the function's compilation
> +    IonScriptCounts* counts = codegen->extractScriptCounts();
> +    auto* codeRange = lifo.new_<AsmJSModule::FunctionCodeRange>(funcName, line, labels);

FunctionCodeRange is a POD and cheap (enough) to copy so could you:
 - allocate it on the stack and copy into finishCodegen
 - have FunctionCompileResults::codeRange return 'const FunctionCodeRange&'
 - remove 'lifo' parameter to this function

@@ +3099,5 @@
>  
> +    results.finishCodegen(&func, codeRange, counts);
> +
> +#if defined(MOZ_VTUNE) || defined(JS_ION_PERF)
> +    results->endOffset = labels.endAfterOOL.offset();

I don't think this builds, but I think it can be removed according to above.  Perhaps build with --enable-perf ;)

::: js/src/asmjs/AsmJSGlobals.h
@@ +1049,5 @@
> +class FunctionCompileResults
> +{
> +    jit::Label entry_;
> +
> +    AsmFunction* func_;

Can you remove the \n?  Also, can func_ be a const AsmFunction*?

@@ +1067,5 @@
> +    {}
> +
> +    jit::Label& entry() { return entry_; }
> +    AsmFunction* func() const { return func_; }
> +    AsmJSModule::FunctionCodeRange* codeRange() { return mozilla::Move(codeRange_); }

No need for mozilla::Move(codeRange_).

@@ +1070,5 @@
> +    AsmFunction* func() const { return func_; }
> +    AsmJSModule::FunctionCodeRange* codeRange() { return mozilla::Move(codeRange_); }
> +    jit::IonScriptCounts* counts() const { return counts_; }
> +
> +    void finishCodegen(AsmFunction* func, AsmJSModule::FunctionCodeRange* codeRange,

Since it isn't mutated and must be non-null, could you make these two & params?

::: js/src/asmjs/AsmJSValidate.cpp
@@ +1414,5 @@
>          return *funcPtrTables_[i];
>      }
> +    Label* functionEntry(unsigned i) {
> +        return &functionEntries_[i];
> +    }

I think this function is dead now.

@@ +1466,5 @@
>  
> +    bool finishGeneratingFunction(FunctionCompileResults& results) {
> +        AsmFunction& func = *results.func();
> +        unsigned i = func.funcIndex();
> +        if (functionEntries_.length() <= i && !functionEntries_.resize(i + 1))

I would think you could
  MOZ_ASSERT(functionEntries_.size() == func.funcIndex());
since these were added in order.  In that case you could
  if (!funcEntries_.append(results.entry().offset()))

@@ +1478,5 @@
> +
> +        // These must be done before the module is done with function bodies.
> +        if (results.counts() && !module().addFunctionCounts(results.counts()))
> +            return false;
> +        if (!module().addFunctionCodeRange(results.codeRange()->name(), Move(*results.codeRange())))

Pre-existing, but I don't think you need to Move() codeRange; it's copyable POD.

@@ +1573,5 @@
>          compileResults_ = compileResults->forget();
>  
> +        // Patch internal calls to their final positions
> +        CallSiteVector& callSites = masm().callSites();
> +        for (auto& cs : callSites) {

Could you write "for (auto& cs : masm().callSites())" ?  (Note: won't reevaluate the rhs of : every time.)

@@ +1579,5 @@
> +                continue;
> +            MOZ_ASSERT(cs.kind() == CallSiteDesc::Relative);
> +            uint32_t callerOffset = cs.returnAddressOffset();
> +            uint32_t calleeIndex = masm().asmCalleeIndex(callerOffset);
> +            MOZ_ASSERT(calleeIndex < functionEntries_.length());

You'll get this assert for free from the operator[] below.

::: js/src/jit/MacroAssembler-inl.h
@@ +93,5 @@
> +void
> +MacroAssembler::call(const CallSiteDesc& desc, AsmJSInternalCallee callee)
> +{
> +    call(callee);
> +    append(desc, currentOffset(), framePushed());

Pre-existing, but can you change call(AsmJSInternalCallee) and call(Label*) to return a CodeOffsetLabel and for append() to take a CodeOffsetLabel?  I was thinking it'd be nice to avoid this implicit dependency on currentOffset() being used in just the right way (also one always has to ask if a constant pool could be magically injected after the call()... I guess not? but best not to have to ask.)

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +4984,5 @@
>  void
> +MacroAssembler::call(AsmJSInternalCallee callee)
> +{
> +    // For now, assume that it'll be nearby + there won't be 2**24 functions
> +    // in a module (because of BOffImm offset restrictions).

Can you add a check that with a MOZ_RELEASE_ASSERT() here and put a hard limit somewhere in AsmJSCompile?  2**24 * 18 bytes is 301MiB and so technically possible (unless there is some other implicit limit we shouldn't depend on).  Maybe also add comment to "See also asmCalleeIndex and patchAsmInternalCall."

::: js/src/jit/shared/Assembler-shared.h
@@ +975,5 @@
>          CallSite callsite(desc, currentOffset, framePushed + sizeof(AsmJSFrame));
>          enoughMemory_ &= callsites_.append(callsite);
>      }
>      CallSiteVector&& extractCallSites() { return Move(callsites_); }
> +    CallSiteVector& callSites() { return callsites_; }

Can you make this const method returning a const&?
Attachment #8678230 - Flags: review?(luke) → review+
(Assignee)

Comment 68

3 years ago
(In reply to Luke Wagner [:luke] from comment #67)
> Comment on attachment 8678230 [details] [diff] [review]
> 4. Make internal calls thread-local
> 
> Review of attachment 8678230 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Great work, this is exactly what I was hoping for.  Unless I'm missing
> something (quite possible) I'd be surprised if this significantly affects
> compile-time perf, so I bet it's fine to land before the rest.  Mostly nits
> below:

Thanks! Needinfo for the last question.

> @@ +1070,5 @@
> > +    AsmFunction* func() const { return func_; }
> > +    AsmJSModule::FunctionCodeRange* codeRange() { return mozilla::Move(codeRange_); }
> > +    jit::IonScriptCounts* counts() const { return counts_; }
> > +
> > +    void finishCodegen(AsmFunction* func, AsmJSModule::FunctionCodeRange* codeRange,
> 
> Since it isn't mutated and must be non-null, could you make these two &
> params?

Not really: it'd be interesting to do so if the FunctionCompileResults struct could store these two as & as well, but it can't (FCR is going to be allocated on the main thread and this function will be called on an helper thread, ultimately). Unless you just want to polish the facade API, I'll leave it as is.

> 
> ::: js/src/asmjs/AsmJSValidate.cpp
> @@ +1414,5 @@
> >          return *funcPtrTables_[i];
> >      }
> > +    Label* functionEntry(unsigned i) {
> > +        return &functionEntries_[i];
> > +    }
> 
> I think this function is dead now.

No, it's still used in GenerateEntry (calling into the actual function entry) and ModuleValidator::finish (func ptr tables).

> 
> @@ +1466,5 @@
> >  
> > +    bool finishGeneratingFunction(FunctionCompileResults& results) {
> > +        AsmFunction& func = *results.func();
> > +        unsigned i = func.funcIndex();
> > +        if (functionEntries_.length() <= i && !functionEntries_.resize(i + 1))
> 
> I would think you could
>   MOZ_ASSERT(functionEntries_.size() == func.funcIndex());
> since these were added in order.  In that case you could
>   if (!funcEntries_.append(results.entry().offset()))

Not quite: finishGeneratingFunction() is called after a function is compiled on an helper thread, or at the end of the module compilation. Functions are compiled in parallel and there is no guaranteed order of compilation. This assertion blows up in the entire asm.js test suite. 

> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +4984,5 @@
> >  void
> > +MacroAssembler::call(AsmJSInternalCallee callee)
> > +{
> > +    // For now, assume that it'll be nearby + there won't be 2**24 functions
> > +    // in a module (because of BOffImm offset restrictions).
> 
> Can you add a check that with a MOZ_RELEASE_ASSERT() here and put a hard
> limit somewhere in AsmJSCompile?  2**24 * 18 bytes is 301MiB and so
> technically possible (unless there is some other implicit limit we shouldn't
> depend on).  Maybe also add comment to "See also asmCalleeIndex and
> patchAsmInternalCall."

BOffImm already has a MOZ_CRASH if the given offset isn't in range. Is there a need for a MOZ_RELEASE_ASSERT? And do you want the extra hard limit in AsmJSCompile as well, knowing that it'll crash here anyways? Also, I am missing where the "* 18 bytes" comes from, in your computation?
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #68)
> Not really: it'd be interesting to do so if the FunctionCompileResults
> struct could store these two as & as well, but it can't (FCR is going to be
> allocated on the main thread and this function will be called on an helper
> thread, ultimately). Unless you just want to polish the facade API, I'll
> leave it as is.

Yeah, I'd be happy just tightening the interface, leaving the member variable types as an impl detail.

> And do you want the extra hard limit in
> AsmJSCompile as well, knowing that it'll crash here anyways?

Well, MOZ_CRASHes are only meant as backstops for bugs; if users can actually hit something we need to dynamically guard for it.

> Also, I am missing where the "* 18 bytes" comes from, in your computation?

sizeof("function abcde(){}")
Flags: needinfo?(luke)
(Assignee)

Comment 70

3 years ago
Enough changes that it justifies a new review: we decided on irc that storing the function index immediate wasn't that nice, as it would have limited the number of asm.js functions to 2**25 + 1, because of ARM backend implementation details. So instead, we store the target index in an augmented CallSite, and use this to patch the call site.
Attachment #8678230 - Attachment is obsolete: true
Attachment #8682035 - Flags: review?(luke)
(Assignee)

Comment 71

3 years ago
And that is the actual correct patch.
Attachment #8682035 - Attachment is obsolete: true
Attachment #8682035 - Flags: review?(luke)
Attachment #8682038 - Flags: review?(luke)
Comment on attachment 8682038 [details] [diff] [review]
4. Make internal calls thread-local

Review of attachment 8682038 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!  Just minor nits:

::: js/src/asmjs/AsmJSCompile.cpp
@@ +3142,5 @@
>  
>      func.accumulateCompileTime((PRMJ_Now() - before) / PRMJ_USEC_PER_MSEC);
>  
> +    PropertyName* funcName = func.name();
> +    unsigned line = func.lineno();

Can these be inlined into ctor call below?

::: js/src/asmjs/AsmJSGlobals.h
@@ +1058,5 @@
> +        counts_(nullptr),
> +        codeRange_()
> +    {}
> +
> +    const AsmFunction* func() const { return func_; }

Can you also return a 'const AsmFunction&'?  (MOZ_ASSERT(func_);)

::: js/src/asmjs/AsmJSModule.cpp
@@ +325,5 @@
> +    const CallSiteAndTargetVector& callSites = masm.callSites();
> +    if (!callSites_.reserve(callSites.length()))
> +        return false;
> +    for (const CallSite& cs: callSites)
> +        callSites_.infallibleAppend(cs);

I think you can:
  if (!callSites_.append(masm.callSites().begin(), masm.callSites().end()))
      return false;
even though the elem types are different.

::: js/src/asmjs/AsmJSModule.h
@@ +1166,5 @@
>      }
>      bool addCodeRange(CodeRange::Kind kind, uint32_t begin, uint32_t pret, uint32_t end) {
>          return codeRanges_.append(CodeRange(kind, begin, pret, end));
>      }
> +    bool addFunctionCodeRange(PropertyName* name, FunctionCodeRange& codeRange)

const FunctionCodeRange&

::: js/src/asmjs/AsmJSValidate.cpp
@@ +939,5 @@
>      bool                                    hasChangeHeap_;
>      bool                                    supportsSimd_;
>      bool                                    atomicsPresent_;
>  
> +    LabelVector                             functionEntries_;

To avoid copying Labels, can this be a Vector<uint32_t> called functionEntryOffsets_?

::: js/src/jit/shared/Assembler-shared.h
@@ +989,5 @@
>          // framePushed does not include sizeof(AsmJSFrame), so add it in here (see
>          // CallSite::stackDepth).
> +        CallSite callsite(desc, label.offset(), framePushed + sizeof(AsmJSFrame));
> +        CallSiteAndTarget callsiteWithTarget(callsite, targetIndex);
> +        enoughMemory_ &= callsites_.append(callsiteWithTarget);

Perhaps construct CallSiteAndTarget inline in append() call?
Attachment #8682038 - Flags: review?(luke) → review+
(Assignee)

Comment 73

3 years ago
Posted patch WIP for merging masm on x64 (obsolete) — Splinter Review
Passes all tests under x64, but testBullet, which exhibits an interesting pattern related to constant pools that I am willing to extract into a unit test case.
Attachment #8678232 - Attachment is obsolete: true
Attachment #8681434 - Attachment is obsolete: true
(Assignee)

Comment 74

3 years ago
(In reply to Luke Wagner [:luke] from comment #72)
> Comment on attachment 8682038 [details] [diff] [review]
> 4. Make internal calls thread-local
> 
> Review of attachment 8682038 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/asmjs/AsmJSCompile.cpp
> @@ +3142,5 @@
> >  
> >      func.accumulateCompileTime((PRMJ_Now() - before) / PRMJ_USEC_PER_MSEC);
> >  
> > +    PropertyName* funcName = func.name();
> > +    unsigned line = func.lineno();
> 
> Can these be inlined into ctor call below?

They're used twice, if we take #ifdef code into account, so keeping as is.

> 
> ::: js/src/asmjs/AsmJSGlobals.h
> @@ +1058,5 @@
> > +        counts_(nullptr),
> > +        codeRange_()
> > +    {}
> > +
> > +    const AsmFunction* func() const { return func_; }
> 
> Can you also return a 'const AsmFunction&'?  (MOZ_ASSERT(func_);)

Done.

> 
> ::: js/src/asmjs/AsmJSModule.cpp
> @@ +325,5 @@
> > +    const CallSiteAndTargetVector& callSites = masm.callSites();
> > +    if (!callSites_.reserve(callSites.length()))
> > +        return false;
> > +    for (const CallSite& cs: callSites)
> > +        callSites_.infallibleAppend(cs);
> 
> I think you can:
>   if (!callSites_.append(masm.callSites().begin(), masm.callSites().end()))
>       return false;
> even though the elem types are different.

Right! Even better, we can use appendAll, which has the same effect. Done.

> 
> ::: js/src/asmjs/AsmJSModule.h
> @@ +1166,5 @@
> >      }
> >      bool addCodeRange(CodeRange::Kind kind, uint32_t begin, uint32_t pret, uint32_t end) {
> >          return codeRanges_.append(CodeRange(kind, begin, pret, end));
> >      }
> > +    bool addFunctionCodeRange(PropertyName* name, FunctionCodeRange& codeRange)
> 
> const FunctionCodeRange&

Nope: addFunctionCodeRange calls the mutating method initNameIndex on the codeRange.

> 
> ::: js/src/asmjs/AsmJSValidate.cpp
> @@ +939,5 @@
> >      bool                                    hasChangeHeap_;
> >      bool                                    supportsSimd_;
> >      bool                                    atomicsPresent_;
> >  
> > +    LabelVector                             functionEntries_;
> 
> To avoid copying Labels, can this be a Vector<uint32_t> called
> functionEntryOffsets_?

I am not sure to understand where the copies are, at this point? Also, we specifically want Labels here. One of functionEntries_'s users is functionEntry(), which is called by GenerateEntry for a masm.call(CallSiteDesc, Label*). Using offsets would mean reconstructing a temporary label, binding it to the offset. As far as I can tell, there aren't copies, so we're fine just having labels in the functionEntries_ vector?

> 
> ::: js/src/jit/shared/Assembler-shared.h
> @@ +989,5 @@
> >          // framePushed does not include sizeof(AsmJSFrame), so add it in here (see
> >          // CallSite::stackDepth).
> > +        CallSite callsite(desc, label.offset(), framePushed + sizeof(AsmJSFrame));
> > +        CallSiteAndTarget callsiteWithTarget(callsite, targetIndex);
> > +        enoughMemory_ &= callsites_.append(callsiteWithTarget);
> 
> Perhaps construct CallSiteAndTarget inline in append() call?

Done.
(In reply to Benjamin Bouvier [:bbouvier] from comment #74)
> > const FunctionCodeRange&
> 
> Nope: addFunctionCodeRange calls the mutating method initNameIndex on the
> codeRange.

Ah, true.  In that case, it'd be customary to take a * argument, indicating the mutation.  However, since FCR is tiny (and this whole method will be inlined anyway), seems nicer to just take by value and mutate the local copy.

> > > +    LabelVector                             functionEntries_;
> > 
> > To avoid copying Labels, can this be a Vector<uint32_t> called
> > functionEntryOffsets_?
> 
> I am not sure to understand where the copies are, at this point? Also, we
> specifically want Labels here. One of functionEntries_'s users is
> functionEntry(), which is called by GenerateEntry for a
> masm.call(CallSiteDesc, Label*). Using offsets would mean reconstructing a
> temporary label, binding it to the offset. As far as I can tell, there
> aren't copies, so we're fine just having labels in the functionEntries_
> vector?

Vector<Label> will copy into the internal array and copy on resize.  Reconstructing a Label from an offset seems fine.  Using a Label is necessary when the identity matters (you're linking disparate masm.jmp()s with masm.bind()s), but because we don't have identity with a Vector<Label>, it seems more direct and less hazardous to use uint32_ts.
(Assignee)

Comment 78

3 years ago
Posted patch 5. Merge macroassemblers (obsolete) — Splinter Review
Not finished yet, but it works for x86 and for x64.

This patch gives the ability to merge two macro assemblers together (so as to "reduce" them, as in reduce(accumulator, other)). We append the buffer of the merged after the buffer of the accumulator, and we fix up offsets by the size of the accumulator buffer. Of course, this is dangerous, because of raw offsets embedded in the jit code.

A few things I'd like to improve in this patch before review:
- I would like to make it clearer that these methods can be used *only when the masm are used for asm.js* (maybe by looking at the masm token).
- Currently, the smaller, merged macroassembler (the "other" one) is altered during merge, as it's thrown away afterwards. The API would be saner if it were not, and so we'd have copies and const masm& other everywhere.
- the Label in CodeLabel could be a linked list of embedded offsets. It appears it is not at the moment (it's only used for switch cases, as far as i can tell), but it could be in the future. This needs to be either enforced or better handled.

How sane is this approach?
Attachment #8684304 - Flags: feedback?(luke)
Attachment #8684304 - Flags: feedback?(jdemooij)
(Assignee)

Comment 79

3 years ago
This uses the feature implemented in the previous patch to merge MacroAssemblers. Each FunctionCompiler gets its own MacroAssembler, and when the function compilation is done and returns to the main thread, the main thread does the merge. This patch doesn't change the parallel state.
Attachment #8682588 - Attachment is obsolete: true
(Assignee)

Comment 80

3 years ago
Comment on attachment 8684305 [details] [diff] [review]
6. Merge masm at the end of AsmJS functions compilations

I actually think it can be reviewed before the other masm patch is reviewed. See previous comment.
Attachment #8684305 - Flags: review?(luke)
(Assignee)

Comment 81

3 years ago
As expected, parallelization is made "trivial" by the previous set of patches. We have a clean map/reduce paradigm (each function compilation returns a masm which are reduced into the single final masm).

I've made basic benchmarking for this patch.
- For DeadTrigger2, before (inbound): 5069ms, after (the last 3 patches): 4285ms (15% speedup)
- For TappyChicken, before: 5234ms, after: 4335ms (17% speedup)

(I forgot to mention that the patch for merging masm doesn't implement ARM yet, which might or might not take a lot of time)
Attachment #8684308 - Flags: review?(luke)
Comment on attachment 8684304 [details] [diff] [review]
5. Merge macroassemblers

Review of attachment 8684304 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MacroAssembler.h
@@ +1323,5 @@
>      Label* failureLabel() {
>          return &failureLabel_;
>      }
>  
> +    bool mergeWith(MacroAssembler& masm);

Do you suppose you could change 'masm' to a 'const MacroAssembler&' and then, instead of shifting everything in the src before copying, you copy and then shift the copy (in 'this', which is mutable)?  Also, 'merge' is a bit symmetric, so you don't know from who into who; what about 'append'?  Lastly, agreed that, since it's asm.js-specific, 'Asm' or 'AsmJS' in the name would be appropriate (and MOZ_ASSERT(IsCompilingAsmJS())).

::: js/src/jit/shared/Assembler-shared.h
@@ +501,5 @@
>      }
> +
> +    void shiftBy(size_t off) {
> +        MOZ_ASSERT(offset() + off >= offset(), "no overflow");
> +        offset_ += off;

FWIW, I initially read 'shift' as << or >>.  Perhaps 'offsetBy'?

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +197,5 @@
> +        return false;
> +    if (!simdMap_.initialized() && !simdMap_.init())
> +        return false;
> +
> +    for (Double &d : other.doubles_) {

'Double& d' here and below.

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +274,5 @@
>      }
>  }
>  
> +bool
> +MacroAssemblerX86::mergeWith(MacroAssemblerX86& other)

Is there a better way to share code between MacroAssemblerX86/64?
Attachment #8684304 - Flags: feedback?(luke) → feedback+
Comment on attachment 8684308 [details] [diff] [review]
7. Do MIRgen and codegen offthread

Review of attachment 8684308 [details] [diff] [review]:
-----------------------------------------------------------------

You're right, with all the other patches, this is quite nice and simple.  Here's a first pass of comments; need to read a bit more.

::: js/src/asmjs/AsmJSCompile.cpp
@@ +2996,5 @@
>      MOZ_CRASH("unexpected float32x4 expression");
>  }
>  
>  bool
> +js::CompileAsmFunction(LifoAlloc& lifo, FunctionCompileResults* results)

It's a bit awkward to have the 'results' be carrying as input the AsmFunction and ModuleCompileInputs.  Could you remove these fields from FunctionCompileResults and instead pass these both as const& parameters to CompileAsmFunction (and thus also fields of AsmJSParallelTask)?

@@ +3037,5 @@
> +
> +        if (!OptimizeMIR(mir))
> +            return false;
> +
> +        lir = GenerateLIR(mir);

I think you can move the declaration of 'lir' to right here since it's local to this block.

@@ +3047,5 @@
> +        // MacroAssembler. To avoid spiking memory, a LifoAllocScope in the caller
> +        // frees all MIR/LIR after each function is compiled. This method is
> +        // responsible for cleaning out any dangling pointers that the
> +        // MacroAssembler may have kept.
> +        results->masm().resetForNewCodeGenerator(mir->alloc());

Bet you can just delete this :)

@@ +3049,5 @@
> +        // responsible for cleaning out any dangling pointers that the
> +        // MacroAssembler may have kept.
> +        results->masm().resetForNewCodeGenerator(mir->alloc());
> +
> +        ScopedJSDeletePtr<CodeGenerator> codegen(js_new<CodeGenerator>(mir, lir, &results->masm()));

Can CodeGenerator just live on the stack now (no js_new)?

@@ +3068,5 @@
> +
> +        // Unlike regular IonMonkey, which links and generates a new JitCode for
> +        // every function, we accumulate all the functions in the module in a
> +        // single MacroAssembler and link at end. Linking asm.js doesn't require a
> +        // CodeGenerator so we can destroy it now (via ScopedJSDeletePtr).

I think this comment can go away; we're quite different from normal Ion compilation these days ;)

@@ +3073,5 @@
> +    }
> +
> +    // Release the lifo's mark associated to the TempAllocator (after the
> +    // codegen has been destructed, so that dtor invariants are preserved).
> +    mir->alloc().TempAllocator::~TempAllocator();

I think we can simplify this logic too by putting TempAllocator on the stack, declared right before 'mir' (showing that it outlives MIR).  Then FunctionCompiler will take a TempAllocator& in the ctor (instead of the LifoAlloc parameter, since TempAlloc brings the LifoAlloc) rather than creating one.  Next, I think you put MIRGraph on the stack too (rather than allocating in prepareEmitMIR), taking it as a ctor arg as well.

::: js/src/asmjs/AsmJSGlobals.h
@@ +1110,5 @@
>          counts_ = &counts;
>      }
>  };
>  
> +bool CompileAsmFunction(LifoAlloc& lifo, FunctionCompileResults* results);

Perhaps rename AsmJSGlobals.h to AsmJSCompile.h?
Comment on attachment 8684305 [details] [diff] [review]
6. Merge masm at the end of AsmJS functions compilations

Review of attachment 8684305 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/AsmJSModule.h
@@ +622,5 @@
>          uint32_t functionLineNumber() const {
>              MOZ_ASSERT(isFunction());
>              return lineNumber_;
>          }
> +        void functionShiftBy(uint32_t offset) {

Based on comment in previous patch, perhaps 'functionOffsetBy'?
Comment on attachment 8684305 [details] [diff] [review]
6. Merge masm at the end of AsmJS functions compilations

Review of attachment 8684305 [details] [diff] [review]:
-----------------------------------------------------------------

Do you suppose you could upload a new patch with 6 and 7 folded with comments addressed?

::: js/src/asmjs/AsmJSGlobals.h
@@ +1074,5 @@
> +
> +    jit::NonAssertingLabel stackOverflowLabel_;
> +    jit::NonAssertingLabel syncInterruptLabel_;
> +    jit::NonAssertingLabel onConversionErrorLabel_;
> +    jit::NonAssertingLabel onOutOfBoundsLabel_;

What do you think of moving all these Labels to be in jit::MacroAssembler so that they get merged with the rest of MacroAssembler?  They are already rather awkwardly threaded in via MIRGenerator; perhaps better to stick next to the rest of the asm.js-specific state in MacroAssembler.
Btw, on my 16 core desktop, with this patch applied:

          trunk   patch
DT2       10.7s   6.8s (-36%)
AngryBots 4.6s    3.0s (-34%)
UE4       13.9s   9.1s (-34%)

Using Linux perf stat's "CPUs utilized" measurement, this patch takes UE4 from ~2.9 to ~4.5 (on both Unity and UE).  Playing with --thread-count, before the patch I see speedups only up to --thread-count=4, but with the patch there are speedups up to 8.  So very nice!

Independent of this patch (so feel free to ignore) but most curious: limiting the number of CPUs via taskset or threads via --thread-count can be a substantial speedup:

           normal  taskset 0xff --thread-count=8
AngryBots  3.0s    2.3s (-23%)
UE4        9.1s    6.9s (-24%)

'perf stat' shows about 2x fewer cache misses, so I'm guessing there's some thread migration going between cores that don't share a cache which causes a lot of cache misses.
(Assignee)

Comment 87

3 years ago
(In reply to Luke Wagner [:luke] from comment #82)
> Comment on attachment 8684304 [details] [diff] [review]
> 5. Merge macroassemblers
> 
> Review of attachment 8684304 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MacroAssembler.h
> @@ +1323,5 @@
> >      Label* failureLabel() {
> >          return &failureLabel_;
> >      }
> >  
> > +    bool mergeWith(MacroAssembler& masm);
> 
> Do you suppose you could change 'masm' to a 'const MacroAssembler&' and
> then, instead of shifting everything in the src before copying, you copy and
> then shift the copy (in 'this', which is mutable)?  Also, 'merge' is a bit
> symmetric, so you don't know from who into who; what about 'append'? 
> Lastly, agreed that, since it's asm.js-specific, 'Asm' or 'AsmJS' in the
> name would be appropriate (and MOZ_ASSERT(IsCompilingAsmJS())).

Agreed on the const MacroAssembler&, I've done it locally (we can't easily use for-of loops, but it's not too ugly). About the name, it doesn't seem mergeWith is confusing anymore if the parameter is a const&, I've let it as is but I am fine with changing it. I've added the asm- prefix. We can't MOZ_ASSERT(IsCompilingAsmJS()) because this needs a JitContext on the stack and there isn't one when we're calling finishGeneratingFunction... Should we have one?

> 
> ::: js/src/jit/shared/Assembler-shared.h
> @@ +501,5 @@
> >      }
> > +
> > +    void shiftBy(size_t off) {
> > +        MOZ_ASSERT(offset() + off >= offset(), "no overflow");
> > +        offset_ += off;
> 
> FWIW, I initially read 'shift' as << or >>.  Perhaps 'offsetBy'?

Done.

> 
> ::: js/src/jit/x86/MacroAssembler-x86.cpp
> @@ +274,5 @@
> >      }
> >  }
> >  
> > +bool
> > +MacroAssemblerX86::mergeWith(MacroAssemblerX86& other)
> 
> Is there a better way to share code between MacroAssemblerX86/64?

MacroAssemblerX86 and MacroAssemblerX64 inherit both from MacroAssemblerX86Shared, which would be the appropriate place for sharing code. The MacroAssembler (which inherits from MacroAssemblerX64 or MacroAssemblerX86) would be the appropriate place, if there was not ARM et al. I think we could use some templates here, but that's much beyond the scope of this bug (it would probably need templating the constants vector and map as well). OK for doing it in a follow-up bug, though.
(In reply to Benjamin Bouvier [:bbouvier] from comment #87)
> About the name, it doesn't seem mergeWith is confusing anymore if the 
> parameter is a const&, I've let it as is

Agreed

> We can't MOZ_ASSERT(IsCompilingAsmJS()) because this needs a JitContext on the stack
> and there isn't one when we're calling finishGeneratingFunction... Should we
> have one?

I guess not; the "asm" in the name should be sufficient.

> > Is there a better way to share code between MacroAssemblerX86/64?
> 
> MacroAssemblerX86 and MacroAssemblerX64 inherit both from
> MacroAssemblerX86Shared, which would be the appropriate place for sharing
> code. The MacroAssembler (which inherits from MacroAssemblerX64 or
> MacroAssemblerX86) would be the appropriate place, if there was not ARM et
> al. I think we could use some templates here, but that's much beyond the
> scope of this bug (it would probably need templating the constants vector
> and map as well). OK for doing it in a follow-up bug, though.

Makes sense w.r.t ARM, but could you then share the code at least between X86/X64?
(Assignee)

Comment 89

3 years ago
Adding a dependency on bug 1223355, which will make sharing the mergeWith code trivial.
Depends on: 1223355
(Assignee)

Comment 90

3 years ago
Attachment #8684305 - Attachment is obsolete: true
Attachment #8684308 - Attachment is obsolete: true
Attachment #8684305 - Flags: review?(luke)
Attachment #8684308 - Flags: review?(luke)
Attachment #8685513 - Flags: review?(luke)
Comment on attachment 8685513 [details] [diff] [review]
6. Merge masm at the end of AsmJS functions compilations and move MIRgen and codegen off main thread

Review of attachment 8685513 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!  Just nits:

::: js/src/asmjs/AsmJSCompile.cpp
@@ +89,5 @@
>      RetType          returnedType() const { return func_.returnedType(); }
>  
> +    Label& syncInterruptLabel() const { return compileResults_.syncInterruptLabel(); }
> +    Label& onConversionErrorLabel() const { return compileResults_.onConversionErrorLabel(); }
> +    Label& onOutOfBoundsLabel() const { return compileResults_.onOutOfBoundsLabel(); }

Similar to below comment, I think these helper could be removed and just the masm() exposed.

@@ +2998,5 @@
>      int64_t before = PRMJ_Now();
>  
> +    TempAllocator tempAlloc(&lifo);
> +    MIRGraph graph(&tempAlloc);
> +    MIRGenerator* mir = nullptr;

Continuing the refactoring, could you move the CompileInfo and MIRGenerator to be here on the stack, so that prepareEmitMIR() just starts with newBlock()?  (Then extractMIR() can be deleted.)

@@ +3009,3 @@
>              return false;
> +
> +        if (!f.prepareEmitMIR(func->argTypes()))

Looks like you could fold prepareEmitMIR() into init().

@@ +3019,5 @@
> +        mir = f.extractMIR();
> +        if (!mir)
> +            return false;
> +
> +        jit::SpewBeginFunction(mir, nullptr);

I think you can move this down to the line above AutoSpewEndFunction which will show a nice connection between the two.

@@ +3060,5 @@
> +        AsmJSModule::FunctionCodeRange codeRange(funcName, line, labels);
> +        results->finishCodegen(codeRange, *codegen.extractScriptCounts());
> +    }
> +
> +    func->accumulateCompileTime((PRMJ_Now() - before) / PRMJ_USEC_PER_MSEC);

IIUC, this method call is the only reason why 'func' can't be const.  How about instead moving  AsmFunction::compileTime_ to be a field of FunctionCompileResults and then making 'func' a const&?  That way we have the high-level property that compilation doesn't mutate the AsmFunction which is a good property to have in the future when we could have simultaneous compilation in baseline and ion.

::: js/src/asmjs/AsmJSCompile.h
@@ +1070,5 @@
> +    jit::MacroAssembler& masm() { return masm_; }
> +    jit::Label& stackOverflowLabel()     { return *masm_.asmStackOverflowLabel(); }
> +    jit::Label& syncInterruptLabel()     { return *masm_.asmSyncInterruptLabel(); }
> +    jit::Label& onConversionErrorLabel() { return *masm_.asmOnConversionErrorLabel(); }
> +    jit::Label& onOutOfBoundsLabel()     { return *masm_.asmOnOutOfBoundsLabel(); }

Since masm() is exposed and these 4 hang off masm, I'd remove these helpers.

::: js/src/asmjs/AsmJSValidate.cpp
@@ +1476,5 @@
> +    Label& onDetachedLabel()        { return onDetachedLabel_; }
> +    ExitMap::Range allExits() const { return exits_.all(); }
> +
> +  public:
> +    bool finishGeneratingFunction(const AsmFunction& func, FunctionCompileResults& results) {

Can 'results' be a const& now?

@@ +1488,3 @@
>          functionEntryOffsets_[i] = codeRange.entry();
>  
> +        DebugOnly<size_t> sizeNewMasm = results.masm().size();

Perhaps 'delta'?
Attachment #8685513 - Flags: review?(luke) → review+
Oh, I forgot to add: the patch is missing AsmJSCompileInputs.h which I assume is just the ModuleCompileInputs struct removed from AsmJSGlobals.h.
(Assignee)

Comment 93

3 years ago
CompileInfo can be made const easily, so let's do it, so that to store a const CompileInfo& in patch 6.
Attachment #8686621 - Flags: review?(luke)
(Assignee)

Comment 94

3 years ago
Patch 6 updated, all comments addressed! Carrying forward r+ from previous review.
Attachment #8685513 - Attachment is obsolete: true
Attachment #8686622 - Flags: review+
(Assignee)

Comment 95

3 years ago
Posted patch Folded patchSplinter Review
I'm now actively working on arm support, and then we should be good to run to try.

Updated

3 years ago
Attachment #8686621 - Flags: review?(luke) → review+

Updated

3 years ago
Blocks: 1224389
(Assignee)

Comment 96

3 years ago
Posted patch 5. Merge macroassemblers (obsolete) — Splinter Review
Known bug: on the ARM simulator, deadtrigger2 seems to crash with this patch applied, because some offsets between jumps are slightly too big.

Without any of these patches, ValidateAsmJS seems to soft fail (not crash) at some point (after the validation of a function), so I assume there is a mechanism to prevent long jumps like this. In the meanwhile, the patch shouldn't change too much, so asking for review to get a first bunch of comments.
Attachment #8684304 - Attachment is obsolete: true
Attachment #8684304 - Flags: feedback?(jdemooij)
Attachment #8687263 - Flags: review?(luke)
Comment on attachment 8687263 [details] [diff] [review]
5. Merge macroassemblers

Review of attachment 8687263 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Assembler-shared.h
@@ +1051,5 @@
> +            asmJSAbsoluteLinks_[i].patchAt.offsetBy(offset);
> +
> +        i = codeLabels_.length();
> +        enoughMemory_ &= codeLabels_.appendAll(other.codeLabels_);
> +        // TODO XXX FIXME codelabel can have embedded lists

If I'm reading x86 code correctly, there can be multiple uses of a single CodeLabel Label due to multiple uses of a single double/float/simd constant on x86.  Is that right?  I bet this is pretty rare b/c of GVN of MConstant, but it'd be nice to verify this and get a test case if possible.  (I vaguely recall having had a bug that only surfaced with x86 constants in this specific corner case.)

If that is the case, then I'd recommend we change CodeLabel to contain two CodeOffsetLabels instead of two Labels.  If you look at the uses of CodeLabel, none really take advantage of the generality of being able to bind N uses to both labels:
 - Table switches just want to patch the address of cases into the switch table (1 use).
 - The use in the trampolines just wants to push two separate CodeLabels (2 uses).
 - The fake return address thing has only 1 use
 - Change X86Shared::(Double|Float|SIMD)::uses to be a Vector<CodeOffsetLabel>; the X86 impl then stores one CodeLabel for each use.  PlatformSpecificLabel and the templatization can be removed (both use CodeOffsetLabel).

So with this new, simplified CodeLabel, we can massively simplify the X86/MIPS Bind and remove the caveat from ARM's Bind (and remove writeCodePointer).  I'd also recommend renaming CodeLabel::(src|dest) to patchAt/target b/c src/dest are super-confusing and I think they are backwards from what I always assume.

::: js/src/jit/x64/Assembler-x64.cpp
@@ +219,5 @@
> +bool
> +Assembler::asmMergeWith(const Assembler& other)
> +{
> +    MOZ_ASSERT(other.jumps_.length() == 0);
> +    return AssemblerX86Shared::asmMergeWith(other);

Since there is no merging of jumps_ in AssemblerX86Shared (which owns jumps_), I assume the assertion should hold for the x86 case too, but there is no x86 asmMergeWith.  Can you move this assert into AssemblerX8Shared::asmMergeWith() and delete this method?
Attachment #8687263 - Flags: review?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #96)
> Known bug: on the ARM simulator, deadtrigger2 seems to crash with this patch
> applied, because some offsets between jumps are slightly too big.

ARM branches have a range of 32 MB in either direction, and we currently pretend that is the same as 'infinite'. We simply crash when a branch is out of range. Call instructions have the same range.

ARM64 branches are more constrained. There are conditional variants with 32 KB and with 1 MB ranges. Over in bug 1210554, I'm working on an AssemblerBuffer patch to deal with the short-range branches. Calls and unconditional branches have a 128 MB range which we treat as infinite.

If we are making very large AssemblerBuffers, we may have to start worrying about calls and branches referencing relocation tables at the end of the buffer?
(Assignee)

Comment 99

3 years ago
Posted patch 5. Merge macroassemblers (obsolete) — Splinter Review
Attachment #8690777 - Flags: review?(luke)
(Assignee)

Updated

3 years ago
Attachment #8687263 - Attachment is obsolete: true
(Assignee)

Comment 100

3 years ago
Forgot to say about this patch: maybe we could rename CodeOffsetLabel into SingleUseLabel, as it's more telling and a very specific case of Labels?

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #98)
> If we are making very large AssemblerBuffers, we may have to start worrying
> about calls and branches referencing relocation tables at the end of the
> buffer?

I think we do have the same issue on x64 and we deal with it with relocation tables as well.
Comment on attachment 8690777 [details] [diff] [review]
5. Merge macroassemblers

Review of attachment 8690777 [details] [diff] [review]:
-----------------------------------------------------------------

Great, much nicer!  What you're doing here is nicer than what I proposed.  r+ with the nits below.

One general style nit, I see lots of 'for (x: foo)' when before I've seen 'for (x : foo)'.  Is there any new style guideline on this?  If not, I'd prefer to stick with the latter since we've already been using it.

::: js/src/asmjs/AsmJSModule.cpp
@@ +346,5 @@
> +        int32_t targetOffset = src.target()->offset();
> +        size_t patchAtOffset = masm.labelOffsetToPatchOffset(labelOffset);
> +        RelativeLink link(RelativeLink::CodeLabel);
> +        link.patchAtOffset = patchAtOffset;
> +        link.targetOffset = targetOffset;

Since labelOffsetToPatchOffset is only used for asm.js, how about changing its signature to take a CodeOffsetLabel directly, renaming it labelToPatchOffset(), and then writing:
  CodeLabel cl = masm.codeLabel(i);
  RelativeLink link(RelativeLink::CodeLabel);
  link.patchAtOffset = masm.labelToPatchOffset(cl.patchAt());
  link.targetOffset = cl.target().offset();
?  (Renaming 'src' to 'cl')

::: js/src/jit/arm/Assembler-arm.cpp
@@ +2876,5 @@
>  
>  }
>  
> +void
> +Assembler::retargetWithOffset(size_t baseOffset, const LabelBase* label, LabelBase* target)

I'm not too familiar with how ARM represents jumps, but I'm assuming from Jakob's comment that he's checked out this function.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +80,5 @@
>  MacroAssemblerX64::finish()
>  {
>      if (!doubles_.empty())
>          masm.haltingAlign(sizeof(double));
> +    for (const Double& d: doubles_) {

This is probably still emerging, but I would think the style would be to have spaces on either side of the colon.

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +929,5 @@
> +        masm.linkJump(src, dst);
> +    }
> +    void use(CodeOffsetLabel* label) {
> +        label->use(currentOffset());
> +    }

So for a while use vs. bind was really confusing me, but the way I have come to understand it now is that CodeOffsetLabel isn't a Label at all, it's something more raw: it's an abstraction for a single offset in the code segment, but it doesn't say anything about what is at that offset: it might be a raw address in a table, an absolute address in an x86 immediate, maybe an x64 rip-relative offset.  It's only *CodeLabel* that tells you more: it says that 'patchAt' receives the raw pointer for 'target'.  I think these changes would help make this more clear:
 - rename CodeOffsetLabel to just CodeOffset
 - remove the above bind() and inline it into the 1 use in MacroAssembler-x64 where you can write a little comment on the helper function that does the masm.linkJump() why linkJump will do the right thing (given that we're not linking a jump).
 - rename use() to bind() in CodeOffset and here so that "binding" a CodeOffset = setting its offset (which is symmetric to the meaning of "bind" for a Label).

::: js/src/jit/x86-shared/MacroAssembler-x86-shared.cpp
@@ +340,5 @@
> +        if (!AppendShiftedUses(d.uses, sizeBefore, &doubles_[index].uses))
> +            return false;
> +    }
> +
> +    for (const Float& f: other.floats_) {

These three loops look like they have the same structure; could you generalize AppendShiftedUses so it contains the whole outer for loop, passing it the two maps and two vectors?

::: js/src/jit/x86/Assembler-x86.h
@@ +298,5 @@
>      }
>      void mov(Imm32 imm, const Operand& dest) {
>          movl(imm, dest);
>      }
> +    void mov(CodeOffsetLabel * label, Register dest) {

CodeOffsetLabel*
Attachment #8690777 - Flags: review?(luke) → review+
(Assignee)

Comment 102

3 years ago
Same patch, plus two fixes for ARM (when the assembler buffer is OOMing, it creates BufferOffset with an offset of 0x80000000, which was used as the sentinel value for undefined offsets in CodeOffsetLabel -- changed the sentinel value in COL to be size_t(-1)) and ARM64 (which used CodeLabel).

I am now fighting with a compilation error on Windows and leaks detected by ASAN, only in the browser (no leaks are detected in the shell).

- For the windows compilation error, I have no clue:
c:\builds\moz2_slave\try-w32-0000000000000000000000\build\src\obj-firefox\dist\include\js/Vector.h(62) : error C2248: 'mozilla::VectorBase<T,0,AllocPolicy,js::Vector<T,0,AllocPolicy>>::VectorBase' : cannot access private member declared in class 'mozilla::VectorBase<T,0,AllocPolicy,js::Vector<T,0,AllocPolicy>>'
This diagnostic occurred in the compiler generated function 'js::Vector<js::jit::CodeOffsetLabel,0,js::SystemAllocPolicy>::Vector(const js::Vector<js::jit::CodeOffsetLabel,0,js::SystemAllocPolicy> &)'

MSVC doesn't give more hints (e.g. the file, the line which called it, etc.), so have to figure this out myself. Probably setting a windows machine tomorrow for this.

- For the ASAN leak, it's probably due to the fact that masms are allocated in FunctionCompileResults, which are lifo-allocated; although the masms have a few SystemAllocPolicy vectors attached, which might be leaking when we clear the lifo. Investigating.
Attachment #8690777 - Attachment is obsolete: true
Attachment #8690921 - Flags: review?(luke)
(Assignee)

Comment 103

3 years ago
Comment on attachment 8690921 [details] [diff] [review]
5. Merge macroassemblers

Oops, mid air collision. Carrying forward the r+, as it's really only trivial changes. Will address your nits, and investigate my issues with Windows and the leaks. I'll repost a patch here with all the changes (for getting it to compile on windows and fixing the leak) and will ask for another review if the patch isn't trivial.
Attachment #8690921 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #101)
> > +void
> > +Assembler::retargetWithOffset(size_t baseOffset, const LabelBase* label, LabelBase* target)
> 
> I'm not too familiar with how ARM represents jumps, but I'm assuming from
> Jakob's comment that he's checked out this function.

The function is fine in the sense that BOffImm will MOZ_CRASH if the offset is out of range, also in release builds. So there is no security issue.

However, ARM branches only have a 32 MB range, and we are not doing anything to deal with that other than crashing if the code gets too big. It is much easier to hit the 32 MB limit on ARM than the corresponding 2 GB limit on x86 / x86-64.
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #104)
You're right it's a very real concern; this has been on the "todo" list for ARM for a while.  So is the expected fix to insert some space between functions (1 unit per call) so that, if call targets are out of range, you can reliably create a thunk in the space and call that?  I wasn't clear: will your ARM64 patch fix this or would it be additional work?
(In reply to Luke Wagner [:luke] from comment #105)
> (In reply to Jakob Stoklund Olesen [:jolesen] from comment #104)
> You're right it's a very real concern; this has been on the "todo" list for
> ARM for a while.  So is the expected fix to insert some space between
> functions (1 unit per call) so that, if call targets are out of range, you
> can reliably create a thunk in the space and call that?

That would be a plausible solution. I think some of this is already implemented for arm64 with emitExtendedJumpTable(), but there doesn't seem to be similar code in the arm assembler. It would be good to have a shared strategy for handling these things instead of ad hoc per-target solutions.

I don't yet have a good understanding of how linking works for the JIT compilers and/or asm.js, so I could be missing something.

> I wasn't clear: will your ARM64 patch fix this or would it be additional work?

No, that would be additional work. My changes to the ARM64 assembler deal with forward branches to labels that have not been bound yet. This is for generating normal control flow while assembling a function. It doesn't really apply to linking / concatenating assembler buffers after calling finish().

My ARM64 is mostly target independent. It would be easy to hook up for ARM or MIPS.
(In reply to Jakob Stoklund Olesen [:jolesen] from comment #106)
> It would be good to have a shared
> strategy for handling these things instead of ad hoc per-target solutions.
> I don't yet have a good understanding of how linking works for the JIT
> compilers and/or asm.js, so I could be missing something.

Right now it is done in an ad hoc, asm.js-specific way:
  http://hg.mozilla.org/mozilla-central/file/tip/js/src/asmjs/AsmJSValidate.cpp#l1594
However, bbouvier's patches here add the notion of merging one masm into another so if we added a general way to do this within a single MacroAssembler, we could do the Right Thing when merging.
(Assignee)

Comment 108

3 years ago
Posted patch folded.patchSplinter Review
Flags: needinfo?(luke)
(Assignee)

Comment 110

3 years ago
Finally pushed. We agreed with Luke that the CodeOffsetLabel could be done later in a subsequent bug. I think we also need to address the ARM short branch issue.
Flags: needinfo?(luke)
Keywords: leave-open

Comment 111

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/55ab2a060b85
https://hg.mozilla.org/mozilla-central/rev/4c1c5106ea3f
https://hg.mozilla.org/mozilla-central/rev/a813fc2595b9
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago3 years ago
status-firefox45: --- → fixed
Resolution: --- → FIXED
Depends on: 1229258
Blocks: 1229396
You need to log in before you can comment on or make changes to this bug.