Closed Bug 1317319 Opened 3 years ago Closed 3 years ago

wasm: Factor out section decoding, Part 2

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox52 --- affected

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(11 files, 6 obsolete files)

17.41 KB, patch
luke
: review+
Details | Diff | Splinter Review
13.63 KB, patch
luke
: review+
Details | Diff | Splinter Review
22.91 KB, patch
luke
: review+
Details | Diff | Splinter Review
31.50 KB, patch
luke
: review+
Details | Diff | Splinter Review
22.16 KB, patch
luke
: review+
Details | Diff | Splinter Review
21.49 KB, patch
luke
: review+
Details | Diff | Splinter Review
2.85 KB, patch
luke
: review+
Details | Diff | Splinter Review
4.38 KB, patch
bbouvier
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
39.19 KB, patch
bbouvier
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
79.02 KB, patch
bbouvier
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
1.81 KB, patch
bbouvier
: review+
RyanVM
: checkin+
Details | Diff | Splinter Review
No description provided.
Attached patch 1.table.patchSplinter Review
Table section!
Attachment #8811358 - Flags: review?(luke)
Attached patch 2.global.patchSplinter Review
Global section!
Attachment #8811359 - Flags: review?(luke)
Attached patch 3.export.patchSplinter Review
Exports section!
Attachment #8811360 - Flags: review?(luke)
Attached patch 4.start.patchSplinter Review
Start section! Funnily enough, I've switched strategies at this point, and figured out that the AstDecodeContext would eventually want to have its own ModuleEnvironment too, so I could start hoisting a few of its attributes there, until we get a proper ModuleEnvironment class. This ends up simplifying even more decoding sections in WasmCompile.cpp \o/
Attachment #8811361 - Flags: review?(luke)
Attached patch 5.elem.patchSplinter Review
Elem section!

Eventually, the ModuleGenerator could be created just before the CodeSection is decoded, hiding decoding of everything above the CodeSection under a big function. Does this sound reasonable?
Attachment #8811362 - Flags: review?(luke)
Attached patch 6.order.patchSplinter Review
This is really just a code motion patch, moving all the sections in the order they're supposed to appear, in a wasm Module. There's a nice symmetry between the three files Wasm{Compile,BinaryFormat,BinaryToAST} after this.
Attachment #8811364 - Flags: review?(luke)
I don't think the CodeSection or the head/bottom of function decoding should be shared, because the amount of shared code would be ridiculously low (and would involve passing many things by outparams for low reuse).

That being said, this would prevent the small issues I've found here. Thoughts?
Attachment #8811365 - Flags: review?(luke)
Attached patch 8.faster-validation.patch (obsolete) — Splinter Review
This one I find pretty funny :) Using BinaryToAST instead of Compile in the Validation algorithm has two benefits:
- it helped find issues in the function validation routines of WasmBinaryToAST.cpp, and one testing issue too.
- I could reduce the validation time of AngryBots from 964ms on my machine to 308ms \o/ Probably a Validation algorithm without even constructing the AST could be faster...

Next patches to come:
- There might be some state duplication in AstModule/AstDecodeContext (redundant signatures and such). We could probably remove these.
- ModuleEnvironment refactoring.
- (in the blocked bug): even faster validation algorithm
Attachment #8811366 - Flags: review?(luke)
Comment on attachment 8811358 [details] [diff] [review]
1.table.patch

Review of attachment 8811358 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryToAST.cpp
@@ +1574,5 @@
>      AstName name;
>      if (!GenerateName(c, AstName(u"table"), c.module().tables().length(), &name))
>          return false;
>  
> +    return c.module().addTable(name, (*tables)[0].limits);

Since there's no harm, what about removing the dependency on there being only 1 table here and putting this in a loop?
Attachment #8811358 - Flags: review?(luke) → review+
Comment on attachment 8811359 [details] [diff] [review]
2.global.patch

Review of attachment 8811359 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryToAST.cpp
@@ +1647,5 @@
> +        AstExpr* init = nullptr;
> +        if (global.isConstant()) {
> +            init = new(c.lifo) AstConst(global.constantValue());
> +        } else {
> +            init = ToAstExpr(c, global.initExpr());

Maybe use ternary operator here (or keep if/else but remove { })

@@ +1655,3 @@
>  
> +        auto* g = new(c.lifo) AstGlobal(name, global.type(), global.isMutable(), Some(init));
> +        if (!c.module().append(g))

pre-existing: missing null check
Attachment #8811359 - Flags: review?(luke) → review+
Comment on attachment 8811360 [details] [diff] [review]
3.export.patch

Review of attachment 8811360 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryToAST.cpp
@@ +1471,5 @@
>      for (size_t importIndex = 0; importIndex < imports.length(); importIndex++) {
>          const Import& import = imports[importIndex];
>  
> +        AstName moduleName = ToAstName(c, import.module.get());
> +        AstName fieldName = ToAstName(c, import.field.get());

pre-existing, I think these need a null check of the name (or to have ToAstName converted to a more traditional bool return + outparam).

::: js/src/wasm/WasmGenerator.h
@@ +160,5 @@
>      uint32_t numFuncs() const;
>  
>      // Exports:
>      MOZ_MUST_USE bool addFuncExport(UniqueChars fieldName, uint32_t funcIndex);
> +    MOZ_MUST_USE bool setExports(ExportVector&& exports);

Could you remove addFuncExport() too by converting AsmJS.cpp to build an ExportVector?
Attachment #8811360 - Flags: review?(luke) → review+
Comment on attachment 8811361 [details] [diff] [review]
4.start.patch

Review of attachment 8811361 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryFormat.cpp
@@ +397,1 @@
>      if (numFuncs > MaxFuncs)

One thing I noticed in another patch is we're a bit inconsistent with whether MaxT includes imports or not.  Here we include imports, for globals, iirc, we don't.  Probably it would be good to normalize them to "defs + imports" as we do here.

@@ +622,5 @@
>      return true;
>  }
>  
>  bool
> +wasm::DecodeStartSection(Decoder& d, size_t numFuncs, const SigWithIdPtrVector& funcSigs,

Given funcSigs.length(), do you still need numFuncs?

::: js/src/wasm/WasmBinaryToAST.cpp
@@ -94,2 @@
>      AstModule& module_;
> -    AstIndexVector funcDefSigs_;

Ah, great to see this converted to the function index space to match everything else!

@@ +1457,5 @@
> +static size_t
> +GetSigIndex(const SigWithIdVector& sigs, const SigWithId* sig)
> +{
> +    for (size_t i = 0; i < sigs.length(); i++) {
> +        if (&sigs[i] == sig)

This function is a bit hazardous since it only works for SigWithId args that happen to fall within 'sigs'.  How about making an AstDecodeContext member function (later to be hoisted into ModuleEnvironment):

  uint32_t funcIndexToSigIndex(uint32_t funcIndex) {
    return sigs_.begin() - funcSigs_[funcIndex];
  }

which also avoids O(n^2) search behavior for n-function modules :)
Attachment #8811361 - Flags: review?(luke) → review+
Comment on attachment 8811362 [details] [diff] [review]
5.elem.patch

Review of attachment 8811362 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmGenerator.cpp
@@ +852,5 @@
> +    if (shared_->tables[0].external)
> +        return;
> +
> +    bool foundImport = false;
> +    for (size_t i = 0; !foundImport && i < elemSegments_.length(); i++) {

How about generalizing this to a loop that uses seg.tableIndex, instead of hard-coding tables[0].  Then you wouldn't have foundImport, but instead, when you find an import, you set tables[tableIndex].external = true and break the inner loop and the outer loop skips elem segments whose table is already external.

The side benefit is that the caller could call setElemSegments() unconditionally (instead of only if segments.length() > 0) because the empty case just works (once you remove the tables.length() == 1 assert).
Attachment #8811362 - Flags: review?(luke) → review+
Comment on attachment 8811364 [details] [diff] [review]
6.order.patch

Review of attachment 8811364 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I like the consistent ordering.
Attachment #8811364 - Flags: review?(luke) → review+
Comment on attachment 8811365 [details] [diff] [review]
7.tweak-astdecodefunctionbody.patch

Review of attachment 8811365 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch.  I think you're right that it's not worth it and we should just take on the small amount of code duplication.

::: js/src/wasm/WasmBinaryToAST.cpp
@@ +1439,1 @@
>      const SigWithId* sig = c.funcSigs()[funcIndex];

uber-nit: I feel like declaration-of-vector-about-to-be-initialized should bind "tighter" (and thus go below, with separating \n) the init of sig.
Attachment #8811365 - Flags: review?(luke) → review+
Comment on attachment 8811366 [details] [diff] [review]
8.faster-validation.patch

Review of attachment 8811366 [details] [diff] [review]:
-----------------------------------------------------------------

I definitely like the idea of testing BinaryToAST more thoroughly; the only thing that gives me pause is thus far BinaryToAST hasn't been directly exposed to content and I worry this could open up new security surface that we haven't hardened.  Since you're saying that the goal is to do the even faster version that just decodes directly without building an AST, I'd rather hold off for that.  Could you update this patch with just the fixes to WasmBinaryToAST.cpp?
Attachment #8811366 - Flags: review?(luke)
Great job on all these patches; this is a huuuge improvement.
Oh, one other comment: it'd be good to move the various methods of types that were moved from WasmModule.h to WasmTypes.h from WasmModule.cpp to WasmTypes.cpp (e.g., Import::serializedSize).
Thanks for the reviews!

(In reply to Luke Wagner [:luke] from comment #12)
> Comment on attachment 8811361 [details] [diff] [review]
> 4.start.patch
> 
> Review of attachment 8811361 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmBinaryFormat.cpp
> @@ +397,1 @@
> >      if (numFuncs > MaxFuncs)
> 
> One thing I noticed in another patch is we're a bit inconsistent with
> whether MaxT includes imports or not.  Here we include imports, for globals,
> iirc, we don't.  Probably it would be good to normalize them to "defs +
> imports" as we do here.

I did a full audit of all the MaxT, and it doesn't seem there's this inconsistency for functions, unless I've missed it?

(In reply to Luke Wagner [:luke] from comment #18)
> Oh, one other comment: it'd be good to move the various methods of types
> that were moved from WasmModule.h to WasmTypes.h from WasmModule.cpp to
> WasmTypes.cpp (e.g., Import::serializedSize).
Done in each patch of this patch series that was moving stuff from WasmModule.h to WasmTypes.h; for Import (which has been moved in a former patch), I've put it in the code motion patch.
Keywords: leave-open
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/75b417fb952f
Factor out Table section decoding; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b9544a7f1d8
Factor out Global section decoding; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d3cab095f63
Factor out Export section decoding; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcaa68d34a94
Factor out Start section decoding; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/04f447c49c46
Factor out Elem section decoding; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/954af429f89c
Move sections decoding around to follow their order in the binary; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/eb0e39566b63
Tweak AstDecodeFunctionBody; r=luke
Without using BinaryToAST, as suggested.
Attachment #8811366 - Attachment is obsolete: true
Attachment #8811685 - Flags: review?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #19)
> I did a full audit of all the MaxT, and it doesn't seem there's this
> inconsistency for functions, unless I've missed it?

Oh sorry, I was mostly referring to MaxGlobals (I had thought maybe there was another case, but I guess table/memory are limited to 1).  It's not a big deal; maybe in a followup patch?
Attachment #8811685 - Flags: review?(luke) → review+
Attached patch 9.rename-gendata-to-env.patch (obsolete) — Splinter Review
A simple renaming (ModuleGeneratorData -> ModuleEnvironment) and code motion from WasmGenerator.h -> WasmBinaryFormat.h. Also rearranges headers to still compile in non-unified mode.
Attachment #8811812 - Flags: review?(luke)
Attached patch 10.use-module-env.patch (obsolete) — Splinter Review
This is the interesting part: this uses the ModuleEnvironment and refactors sections decoding for, well, everything before the Code section. It makes many things simpler and cleaner: suppresses all the O(n) loops to find whether a table is external or not, splits the ModuleGenerator::init functions into variants for asm.js vs wasm, removes a few ModuleGenerator helpers, exposes fewer decoding routines, and overall it removes more code than it adds!

6 files changed, 312 insertions(+), 405 deletions(-)
Attachment #8811816 - Flags: review?(luke)
Attached patch 11.account-all-globals.patch (obsolete) — Splinter Review
Now I got what you meant by checking the MaxT, thanks for bearing with me! This patch takes all the globals into account when computing the max and makes global decoding even more parallel to function decoding.
Attachment #8811818 - Flags: review?(luke)
Comment on attachment 8811812 [details] [diff] [review]
9.rename-gendata-to-env.patch

Review of attachment 8811812 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryFormat.h
@@ +629,5 @@
>  
>  MOZ_MUST_USE bool
>  DecodeLocalEntries(Decoder& d, ModuleKind kind, ValTypeVector* locals);
>  
> +// The ModuleEnvironment holds all the state shared between the

I'd start with a sentence: "ModuleEnvironment contains all the state necessary to validate, process or render functions. It is created by decoding all the sections before the wasm code section and then used immutably during.  When compiling a module using a ModuleGenerator, ".

@@ +633,5 @@
> +// The ModuleEnvironment holds all the state shared between the
> +// ModuleGenerator thread and background compile threads. The background
> +// threads are given a read-only view of the ModuleEnvironment and the
> +// ModuleGenerator is careful to initialize, and never subsequently mutate,
> +// any given datum before being read by a background thread. In particular,

These last two sentences can be simplified since I think now ModuleEnvironment isn't mutated by ModuleGenerator (for wasm).  There's also the asm.js special case, but I'd rather leave that for AsmJS.cpp.

::: js/src/wasm/WasmCompile.cpp
@@ +464,3 @@
>  {
>      Maybe<Limits> memory;
> +    if (!DecodeImportSection(d, env->sigs, &env->funcSigs, &env->globals, &env->tables, &memory,

Could all these &env->foo parameters be replaced by an 'env' argument (here and for most of these functions)?  (It's possible you'll be making these simplifications in a future patch, in which case, n/m)
Attachment #8811812 - Flags: review?(luke) → review+
Comment on attachment 8811816 [details] [diff] [review]
10.use-module-env.patch

Review of attachment 8811816 [details] [diff] [review]:
-----------------------------------------------------------------

Mmm, this patch makes me happy.  Almost r+, but I have one nit that I was hoping to see the next iteration of below.

::: js/src/wasm/WasmBinaryFormat.h
@@ +660,5 @@
> +    size_t numFuncs() const {
> +        return funcSigs.length();
> +    }
> +    size_t numFuncDefs() const {
> +        return funcSigs.length() - funcImportGlobalDataOffsets.length();

So these two functions are only meaningful for wasm (asm.js max-size-allocates these Vectors); to be like ModuleGenerator, could you MOZ_ASSERT(!isAsmJS())?  I wonder also if ModuleGenerator should remove its copies of these and simply have an env() method.

::: js/src/wasm/WasmBinaryToAST.cpp
@@ +1518,5 @@
>  /*****************************************************************************/
>  // wasm decoding and generation
>  
>  static bool
> +RebuildSignatures(AstDecodeContext& c)

I like the idea of having a common prefix.  "Rebuild" sounds a bit like something has been lost or corrupted.  To match the general Ast* naming scheme, what about AstCreate*?

::: js/src/wasm/WasmCompile.cpp
@@ +589,4 @@
>      ImportVector imports;
> +    ExportVector exports;
> +    Maybe<uint32_t> startFunc;
> +    ElemSegmentVector elemSegments;

How about moving all four of these into ModuleEnvironment?  They're just as much a part of the environment; the only reason they aren't already is that helper threads haven't needed them.

@@ +589,5 @@
>      ImportVector imports;
> +    ExportVector exports;
> +    Maybe<uint32_t> startFunc;
> +    ElemSegmentVector elemSegments;
> +    if (!DecodeEnvironment(d, env.get(), &imports, &exports, &startFunc, &elemSegments))

Decode*Module*Environment?

::: js/src/wasm/WasmGenerator.h
@@ +92,5 @@
>      MOZ_MUST_USE bool addFuncImport(const Sig& sig, uint32_t globalDataOffset);
>      MOZ_MUST_USE bool allocateGlobalBytes(uint32_t bytes, uint32_t align, uint32_t* globalDataOff);
>      MOZ_MUST_USE bool allocateGlobal(GlobalDesc* global);
>  
> +    MOZ_MUST_USE bool init(UniqueModuleEnvironment shared, const CompileArgs& args);

Could you rename shared/shared_ to env/env_ inside ModuleGenerator?  I initially just used "shared" to remind myself not to accidentally mutate it, but now that seems quite well established in our code structure.

@@ +105,5 @@
> +
> +    // Init function for wasm.
> +    MOZ_MUST_USE bool init(UniqueModuleEnvironment shared, ExportVector&& exports,
> +                           ElemSegmentVector&& elemSegments, Maybe<uint32_t> startFunc,
> +                           const CompileArgs& args);

I like the init() refactoring you've done in this patch; it was definitely hairy before.  I was thinking maybe a further cleanup would be to pass all these arguments to ModuleGenerator's constructor (going back to having one function and an optional maybeAsmJSMetadata parameter, since the ctor is just doing trivial construction), and have init() just be a trivial function that does "return isAsmJS() ? initAsmJS() : initWasm()" so that we can maintain the internal separation you've done.  This would allow symmetry between how and imports/exports are treated.  In AsmJS.cpp, I think you'll need to wrap the ModuleGenerator in a Maybe<> and then do an emplace() in ModuleValidator::init() once the AsmJSMetadata/ModuleEnvironment have been created.
Comment on attachment 8811818 [details] [diff] [review]
11.account-all-globals.patch

Review of attachment 8811818 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmBinaryFormat.cpp
@@ +640,1 @@
>      if (numGlobals > MaxGlobals)

Oh dear, I just realized we should have a CheckedInt<uint32_t> to check for overflow here and with numFuncs in DecodeFunctionSection... and that we probably need to backport that fix to m-a which seems to have the bug :(
Attachment #8811818 - Flags: review?(luke) → review+
Attached patch 10.use-module-env.v2.patch (obsolete) — Splinter Review
Thanks for the reviews! I didn't understand the suggestion about the ModuleGenerator ctor vs init() functions; in particular, most of the work done under init() is fallible, so it could not be put under the ctor. I've taken inspiration from it, though, by having a single init(Env, CompileArgs, maybeAsmJSMetadata) function that calls into the asm.js/wasm variants, which is cleaner, thanks. Happy to modify further, if needed.

I've also made use of the ModuleEnvironment in the parameters of DecodeDataSection, which makes its signature shorter and more symmetric with the other decoding functions (the former version even had a dead parameter!).
Attachment #8811816 - Attachment is obsolete: true
Attachment #8811816 - Flags: review?(luke)
Attachment #8812181 - Flags: review?(luke)
Comment on attachment 8812181 [details] [diff] [review]
10.use-module-env.v2.patch

Review of attachment 8812181 [details] [diff] [review]:
-----------------------------------------------------------------

Beautiful, thanks!

::: js/src/wasm/WasmCompile.cpp
@@ +501,3 @@
>          return false;
>  
>      mg.setDataSegments(Move(dataSegments));

Since you've now generally removed most of the non-asm.js mutable operations on ModuleGenerator (which I really like), how about we finish the job by having MG::finish() take a DataSegmentVector&& and a NameInBytecodeVector&&.  Then we can remove MG::setDataSegments()/dataSegments_/setFuncNames() as well as this wrapper function, leaving module generation a little more functional.

::: js/src/wasm/WasmGenerator.cpp
@@ +201,5 @@
>  
>      if (!exportedFuncs_.init())
>          return false;
>  
> +    if (env_->isAsmJS() ? !initAsmJS(maybeAsmJSMetadata) : !initWasm())

Is there an ordering dependency between these calls and the succeeding code?  If not, I think it'd be nice to have 'return env_->isAsmJS() ? initAsmJS() : initWasm()' as the last line so you can logically think of it as the asm.js/wasm-specific tail.
Attachment #8812181 - Flags: review?(luke) → review+
Attached patch 8.patchSplinter Review
updated and rebased for checkin
Attachment #8811685 - Attachment is obsolete: true
Attachment #8812242 - Flags: review+
Attachment #8812242 - Flags: checkin?(luke)
Attached patch 9.patchSplinter Review
updated and rebased
Attachment #8811812 - Attachment is obsolete: true
Attachment #8812243 - Flags: review+
Attachment #8812243 - Flags: checkin?(luke)
Attached patch 10.patchSplinter Review
updated and rebased
Attachment #8812181 - Attachment is obsolete: true
Attachment #8812244 - Flags: review+
Attachment #8812244 - Flags: checkin?(luke)
Attached patch 11.patchSplinter Review
updated and rebased
Attachment #8811818 - Attachment is obsolete: true
Attachment #8812245 - Flags: review+
Attachment #8812245 - Flags: checkin?(luke)
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6fb97e41a590
Fix BinaryToAST function validation; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/e37f3ccc1765
Rename ModuleGeneratorData to ModuleEnvironment; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b6fcd348d49
Embed most section decoding under DecodeModuleEnvironment; r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b937c221d538
Account imported globals when checking against the maximum number of globals; r=luke
Attachment #8812242 - Flags: checkin?(luke) → checkin+
Attachment #8812243 - Flags: checkin?(luke) → checkin+
Attachment #8812244 - Flags: checkin?(luke) → checkin+
Attachment #8812245 - Flags: checkin?(luke) → checkin+
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Keywords: leave-open
Depends on: 1326452
Depends on: 1455703
You need to log in before you can comment on or make changes to this bug.