Closed Bug 1292723 Opened 6 years ago Closed 5 years ago

Baldr: implement WebAssembly.validate


(Core :: JavaScript Engine, defect)

Not set



Tracking Status
firefox51 --- fixed


(Reporter: luke, Assigned: bbouvier)




(2 files, 5 obsolete files)


As discussed, there are 2 ways to do this:
- a dump way: try to compile and throw away the compilation.
- abstract module validation somehow, to avoid code duplication, and have a real ValidateModule function that just goes through the module by validating it.

I'll go for the latter, since there is already code duplication between WasmCompile.cpp and WasmBinaryToAST.cpp.

My assumptions are the following. If they don't seem reasonable or could be relaxed, please let me know:
- the main use case is WasmCompile.cpp, so this one has to be as fast as now.
- it is acceptable that WasmBinaryToAST be a bit slow.
- we don't want to have vtables, so no virtual inheritance.

Some early experiments show that an Iterator with a Template pattern could be enough and not too heavy: the user of the module iterator would provide a structure that owns data and provides a few functions. The module iterator would call these functions whenever needed (whereas the function iterator provides functions that get called by the user). Let's see how it goes.
Assignee: nobody → bbouvier
(thinking about it, it might just be a Template pattern, nothing dealing with an Iterator)
Attached patch wip.patch (obsolete) — Splinter Review
Sneak peak at the patch (doesn't compile yet, so no tested either).
Attached patch wip.patch (obsolete) — Splinter Review
Another wip (compiles but doesn't pass tests at all, yet).

So this gives a raw idea of the usage of the so-called ModuleIter (name will change later), with the Template pattern.

It also exposes the weaknesses of the Template pattern: whenever a user needs to be called at a particular location in the ModuleIter code, other users need to implement the called function too, although they might not make use of it. Fortunately, this happens only once, with the initialization of the ModuleGenerator that WasmCompile needs to do, but not WasmBinaryToAST. (it really is a distorted form of polymorphism)

In this current state, there are a lot of back and forth between the user and the ModuleIter itself; I hope to be able to reduce this by storing more validation data into the ModuleIter (like numSigs, the sigs themselves), and having "finish()" functions where the users would steal the data collected by the ModuleIter.

There are two users:
- WasmCompile, which, as the main user of this API, can just steal and Move the data to its own fields.
- WasmBinaryToAST, which has to convert the structs passed by the ModuleIter to its own format (and lifetime). Or does it have to convert, really?

I also think a few decoding utilities that only use the Decoder could be put in the WasmBinary.cpp file, instead (following the idea of Dan in the 0xC patch).

Anyways, this is quite far from ready for review, but feedback on the general design would be appreciated.
Attachment #8787293 - Attachment is obsolete: true
Flags: needinfo?(luke)
Oh and also, another strength of this design: a user *can not* forget a section, or mess with the order of sections, as they all need to be implemented. The current patch (and Dan's patch too) shows that WasmBinaryToAST was missing a few sections.
> In this current state, there are a lot of back and forth between the user 
> and the ModuleIter itself; I > hope to be able to reduce this by storing 
> more validation data into the ModuleIter (like numSigs, the sigs themselves),
> and having "finish()" functions where the users would steal the data collected by
> the ModuleIter.

I think you've hit upon a very good point.  I kinda hate the visitor pattern for this reason :)

What if we took this all the way to the limit and took out all of the user->iter->user callbacks and instead designed factored out a simple function (perhaps into a new WasmDecodeModule.h):
  UniquePtr<const ModuleContext> DecodeModuleContext(Decoder& d)
Then we just kill ModuleGeneratorData (which has always been an awkward "what exactly are you?" sort of struct) and have 'const ModuleContext&' be what gets shared between the validator and compiler threads.  That seems like a really clean design and probably I should've done it all this way to begin with.

(To finish the job, I think we'd also want a second function in WasmDecodeModule.h, DecodeModuleEnd(), which decoded the data and names sections and returned those vectors.  I think it's fine for each client to have their own version of DecodeCodeSection(); we'll probably have a streaming version before long anyhow.)

So this would involve a moderate degree of refactoring of ModuleGenerator; I'd be fine to just land the minimal implementation of WebAssembly.validate now (that uses wasm::Compile) and work on the "right" fix a little later.

Thoughts Dan?  I know you did some factoring for 0xc and probably we should avoid landing anything that would disrupt your 0xc patch.
Flags: needinfo?(luke) → needinfo?(sunfish)
I've also found it a little unclear what exactly ModuleGeneratorData wants to be, so that part sounds good :-). How about "ModuleManifest" for the everything-before-code part of a module, and "ModuleTail" for the everything-after-code part?

It may also make sense to rename ExprIter. CodeIter? 

I'm not opposed to visitor or template method patterns in general, but I agree with Luke's suggestion here; in the future it's possible we may want consumers that don't decode the entire module, so sticking to simple utility functions provides a simple path for it to evolve in that direction. We could split up the utility functions into smaller parts, and then consumers could just call the parts they need (and explicitly skip over the parts they don't).

The 0xc patches are still waiting on a few language design questions, and then there's the matter of coordinating to get the 0xc demo bits up, but hopefully it won't be long before we can land them. It would be easier to land them before this refactoring, but I can be flexible if you want to move more quickly.
Flags: needinfo?(sunfish)
(Like "Manifest" and "Tail".)
Attached patch wip-dumb.patch (obsolete) — Splinter Review
Here's the dummy way to do it: compile the module and throw away the compilation! Tests pending.
Attached patch validate.patchSplinter Review
Here's the "dump way" to implement WebAssembly.validate, and basic.js changed to use it where we can, as an example.
Attachment #8788552 - Attachment is obsolete: true
Attachment #8788847 - Flags: review?(luke)
Comment on attachment 8788847 [details] [diff] [review]

Review of attachment 8788847 [details] [diff] [review]:

::: js/src/asmjs/WasmJS.cpp
@@ +1304,5 @@
> +    SharedModule module = Compile(*bytecode, compileArgs, &error);
> +    if (!module) {
> +        if (error) {
> +            JS_ReportErrorFlagsAndNumber(cx, JSREPORT_WARNING, GetErrorMessage,
> +                                         nullptr, JSMSG_WASM_DECODE_FAIL, error.get());

This ends up breaking a SM-wide invariant: there shall be no pending exception if you have not returned false to signal failure.  (Various pinch-points on non-failure paths like CheckForInterrupt() assert this.)

It'd also be nice to have a comment explaining why we *do* want to report OOM (good catch, btw!).  Maybe you could express the logic as:

  bool validated = !!Compile(...);

  // If the reason for validation failure was OOM (signaled by null error message), 
  // report out-of-memory so that validate's return value is always correct.
  if (!validated && !error)
     ... report and return false

@@ +1313,5 @@
> +        }
> +    }
> +
> +    // Destructor of SharedModule (aka RefPtr<Module>) will automatically
> +    // Release() the module.

nit: we generally rely on this sort of behavior, so I'm not sure this comment is necessary

::: js/src/jit-test/lib/wasm.js
@@ +9,5 @@
> +
> +function wasmFailValidateText(str, errorType, pattern) {
> +    let binary = wasmTextToBinary(str, 'new-format');
> +    assertEq(WebAssembly.validate(binary), false);
> +    assertErrorMessage(() => new WebAssembly.Module(binary), errorType, pattern);

great idea to check that these two are in sync!
Attachment #8788847 - Flags: review?(luke) → review+
Oops, I totally missed that you were reporting a warning, not an error, above.  n/m the bit about breaking the SM invariant :)
Pushed by
Implement a basic WebAssembly.validate; r=luke
Attached patch test.jsSplinter Review
Just realized I've forgotten to add these tests to the initial patch.
Attachment #8794739 - Flags: review?(luke)
Attachment #8794739 - Flags: review?(luke) → review+
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)

Unfortunately, that didn't work as well as I'd hoped it would (was attempting to land it so that bug 1305197 would apply more easily). Backed out for failures.
Attachment #8787699 - Attachment is obsolete: true
Attached patch 2.refactor-imports-section.patch (obsolete) — Splinter Review
So this refactoring gets a bit goofy here: we're overly specialized for WasmCompile in the factored out version, since we're using its data structures (SigWithIdVector, GlobalDescVector, TableDescVector). As a matter of fact, WasmBinaryToAST ends up creating all these structures on the stack and use them just to translate to its own data structures. WasmValidate would have to do the same.

Ideally, DecodeImportSection would return a vector of more abstract imports (ImportDescriptor) that would be a union of Function + Table + Memory + Global, and then each of the users (WasmCompile/WasmBinaryToAST) would reconstruct the data structures they actually need. However, this might have a cost for WasmCompile, which we want to be real super fast. In particular, I'm thinking about the cost of allocating an array of these abstract import descriptors, which would be released just after use.

What do you think, Luke?
Attachment #8809054 - Flags: feedback?(luke)
Comment on attachment 8809054 [details] [diff] [review]

Review of attachment 8809054 [details] [diff] [review]:

Actually, I think this is great.  It's true that there are a few dead fields like TableDesc::globalDataOffset or the SigIdDesc, but I think they're fairly harmless.

In fact, I was thinking it'd be nice to re-cast ModuleGeneratorData as ModuleEnvironment with the idea being that the process of compiling a module is:
 1. build up ModuleEnvironment incrementally by decoding everything up to the code section
 2. fork out N function compilation tasks all using the ModuleEnvironment immutably (modulo asm.js hacks)

So to me that suggests moving ModuleEnvironment to WasmTypes.h and just having one top-level DecodeModuleEnvironment in WasmBinaryFormat.h that is called by validation, compilation, and binary->text.
Attachment #8809054 - Flags: feedback?(luke) → feedback+
Thanks! moving the refactoring patches to bug 1316635 and the impl of better validation to bug 1316634, to avoid tracking bugs scattered over several releases. Marking this one as fixed then.
Closed: 5 years ago
Resolution: --- → FIXED
Attachment #8809052 - Attachment is obsolete: true
Attachment #8809054 - Attachment is obsolete: true
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.