Closed Bug 1234985 Opened 10 years ago Closed 10 years ago

BaldrMonkey: land initial wasm compilation and testing functions

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(9 files, 6 obsolete files)

2.97 KB, patch
sfink
: review+
Details | Diff | Splinter Review
27.83 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
9.47 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
19.87 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
39.12 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
1.47 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
1.84 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
70.56 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
2.91 KB, patch
luke
: review+
Details | Diff | Splinter Review
This bug is for introducing a path to compile wasm, a shell testing function 'wasmEval(bytecode)' (which compiles an ArrayBuffer of wasm bytecode), and a shell 'wasmTextToBinary(text)' (which encodes the given wasm text format into the wasm binary format). Together this will allow writing unit tests in the wasm text format a la `wasmEval(wasmTextToBinary('...'))` while also allowing direct fuzzing of the binary format via wasmEval. Initially, the binary format will simply be that of the current asm.js bytecode (introduced by optimizations in bug 1157624, bug 1181612), but it will evolve to match the binary format being defined in the wasm spec (see https://github.com/WebAssembly/design/issues/497). Also, initially the text format will be the temporary s-expression syntax (https://github.com/WebAssembly/spec/tree/master/ml-proto#s-expression-syntax), but it will evolve to match the spec-defined text format (https://github.com/WebAssembly/design/blob/master/TextFormat.md#official-text-format). This bug will just establish the pipeline (text->binary->code) and iterate in future bugs.
Attached patch WIP (obsolete) — Splinter Review
Attached patch WIP 2 (obsolete) — Splinter Review
Attachment #8701694 - Attachment is obsolete: true
Depends on: 1237508
Depends on: 1238195
Depends on: 1239177
Attachment #8708647 - Flags: review?(sphink)
Attached patch tweak-generatorSplinter Review
This patch changes the interface between MG and clients to make things smoother for the baldr patch.
Attachment #8708649 - Flags: review?(bbouvier)
Attached patch baldr (WIP 3) (obsolete) — Splinter Review
Building on bug 1239177, this patch adds the signature table so we can actually start validating function bodies. But to make more progress on wasm function bodies, we'll need stmt-expr unification.
Attachment #8703225 - Attachment is obsolete: true
Attached patch baldr (WIP 3) (obsolete) — Splinter Review
Attachment #8708650 - Attachment is obsolete: true
Comment on attachment 8708651 [details] [diff] [review] baldr (WIP 3) Review of attachment 8708651 [details] [diff] [review]: ----------------------------------------------------------------- Random driveby review comments: ::: js/src/asmjs/WasmBinary.h @@ +396,5 @@ > > + template <class T> > + MOZ_WARN_UNUSED_RESULT > + bool writeEnum(T v, size_t* offset) { > + return write(uint32_t(v), offset); For added safety, this could static_assert mozilla::IsEnum<T>::value and sizeof(T) <= sizeof(uint32_t). ::: js/src/asmjs/WasmText.cpp @@ +191,5 @@ > + SigVector sigs_; > + SigMap sigMap_; > + > + public: > + WasmAstModule(LifoAlloc& lifo) explicit @@ +657,5 @@ > + return nullptr; > + } > + > + return exp; > + Stray whitespace.
For now, the ops are serialized as a simple uint8. This patch hoists the serialization functions to wasm::Encoder/Decoder so they can be changed later.
Attachment #8709265 - Flags: review?(bbouvier)
I realized neither of these methods belonged in the wasm::Module and they can be pushed into the AsmJSModule now that it derives wasm::Module.
Attachment #8709266 - Flags: review?(bbouvier)
Attachment #8708647 - Flags: review?(sphink) → review+
(In reply to Luke Wagner [:luke] from comment #0) > This bug is for introducing a path to compile wasm, a shell testing function > 'wasmEval(bytecode)' (which compiles an ArrayBuffer of wasm bytecode), and a > shell 'wasmTextToBinary(text)' (which encodes the given wasm text format > into the wasm binary format). Together this will allow writing unit tests > in the wasm text format a la `wasmEval(wasmTextToBinary('...'))` while also > allowing direct fuzzing of the binary format via wasmEval. Any chance I could convince you to name those wasm.eval and wasm.textToBinary? The shell has an insane number of functions these days, making help() mostly useless, and I've started trying to separate things out into module-ish things (see for example shell/OSObject.cpp, which defines os.getenv, os.system, etc. Together with aliases for the old names, but wasm wouldn't need that part.) I still need to fix up the help stuff a little for this (these not-modules should have help text, and the aliases should inherit help text from the "real" names). But that shouldn't block anything.
(In reply to Steve Fink [:sfink, :s:] from comment #10) Actually, you'll get your wish before long: what we'll probably standardize is a new 'wasm' or 'WASM' namespace object. 'wasmEval' is a temporary thing (and note that the tests are mostly not calling it directly so it'll be easy to switch).
Comment on attachment 8708649 [details] [diff] [review] tweak-generator Review of attachment 8708649 [details] [diff] [review]: ----------------------------------------------------------------- I see where this is going. Thanks for the patch. ::: js/src/asmjs/WasmBinary.h @@ +377,5 @@ > typedef Vector<uint8_t, 0, SystemAllocPolicy> Bytecode; > typedef UniquePtr<Bytecode> UniqueBytecode; > > // The Encoder class recycles (through its constructor) or creates a new Bytecode (through its > // init() method). Its Bytecode is released when it's done building the wasm IR in finish(). This comment needs an update. @@ +392,4 @@ > } > > public: > + Encoder(Bytecode& bytecode) : bytecode_(bytecode) {} Please MOZ_ASSERT(empty()) in the ctor. ::: js/src/asmjs/WasmGenerator.cpp @@ +108,2 @@ > { > + nit: blank line @@ +126,2 @@ > shared_ = Move(shared); > + MOZ_ASSERT(shared_->importSigs.length() == shared_->importGlobalDataOffsets.length()); This hasn't been updated to the latest shared_.imports vector, right? @@ +313,5 @@ > > void > ModuleGenerator::initSig(uint32_t sigIndex, Sig&& sig) > { > + MOZ_ASSERT(sigIndex == numSigs_); Perhaps also MOZ_ASSERT that we're in AsmJS mode? @@ +318,3 @@ > MOZ_ASSERT(shared_->sigs[sigIndex] == Sig()); > + > + numSigs_++; Another way to keep numSigs_ (resp. numFuncSigs_) in sync would be to just take out the arrays' lengths in finishFuncDefs. I actually prefer your way, as we can assert order. @@ +330,5 @@ > > bool > +ModuleGenerator::initFuncSig(uint32_t funcIndex, uint32_t sigIndex) > +{ > + MOZ_ASSERT(funcIndex == numFuncSigs_); Also can assert that we're in AsmJS mode.
Attachment #8708649 - Flags: review?(bbouvier) → review+
Comment on attachment 8709266 [details] [diff] [review] rm mutedErrors and displayURL Review of attachment 8709266 [details] [diff] [review]: ----------------------------------------------------------------- Very nice. ::: js/src/asmjs/WasmGenerator.cpp @@ -602,2 @@ > CacheableChars filename, > - CacheableTwoByteChars displayURL, Unused parameter is very unused :-)
Attachment #8709266 - Flags: review?(bbouvier) → review+
Comment on attachment 8709265 [details] [diff] [review] hoist Expr encoder/decoder ops Review of attachment 8709265 [details] [diff] [review]: ----------------------------------------------------------------- Cheers! ::: js/src/asmjs/AsmJS.cpp @@ +2787,5 @@ > return encoder().writeU8(uint8_t(Expr::Unreachable), offset); > } > MOZ_WARN_UNUSED_RESULT > bool tempOp(size_t* offset) { > + return encoder().writeExpr(Expr::Unreachable, offset); This can be unified with tempU8(); and temp32() can use tempOp() as well. ::: js/src/asmjs/WasmBinary.h @@ +326,5 @@ > + // Variable-length is somewhat annoying at the moment due to the > + // pre-order encoding and back-patching; let's see if we switch to > + // post-order first. > + static_assert(mozilla::IsEnum<T>::value, "is an enum"); > + MOZ_ASSERT(uint64_t(v) < UINT8_MAX); static_assert(sizeof(T) == sizeof(uint8_t)); ? @@ +334,5 @@ > + template <class T> > + void patchEnum(size_t pc, T v) { > + // See writeEnum comment. > + static_assert(mozilla::IsEnum<T>::value, "is an enum"); > + MOZ_ASSERT(uint64_t(v) < UINT8_MAX); ditto @@ +457,5 @@ > > + template <class T> > + MOZ_WARN_UNUSED_RESULT bool > + readEnum(T* out) { > + // See Encoder::writeEnum. The static_assert would be nice here too. @@ +476,5 @@ > } > > + template <class T> > + T uncheckedReadEnum() { > + // See Encoder::writeEnum. And here as well.
Attachment #8709265 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #14) > > bool tempOp(size_t* offset) { > > + return encoder().writeExpr(Expr::Unreachable, offset); > > This can be unified with tempU8(); and temp32() can use tempOp() as well. Well my intention here is that, while *U8/32 will always write exactly those sizes (and, ultimately, be eliminated since I think practically all ints will be VarU32 eventually), Op will become variable length one day. > static_assert(sizeof(T) == sizeof(uint8_t)); ? I would've, but that assert won't hold in the Baldr patch where ValType is passed (b/c of that GCC bug preventing a uint8 enum class which I was hoping we could resolve by making configure fail...).
Whiteboard: [leave open]
Attached patch optional-namesSplinter Review
This patch makes names optional, auto-generating wasm names like "wasm-function[0]" in their absence.
Attachment #8710243 - Flags: review?(bbouvier)
Attached patch var-len-litSplinter Review
This trivial patch switches integer immediates of I32Const to be VarU32 from U32. Later I'd actually like to introduce VarS32 so that small negative integers use a small number of bytes instead of 5 with VarU32, but this is fine for now.
Attachment #8710245 - Flags: review?(bbouvier)
Attached patch baldr (obsolete) — Splinter Review
Ok, I think I have enough of a "Hello, World" working to land and then start iterating in small patches. In summary: - Wasm.cpp implements the binary -> wasm::Module compilation using the same WasmGenerator.(h,cpp) abstraction as asm.js. Most of the work up to this point has just been pulling all the asm.js/JS-specific stuff out of this pipeline. - Wasm.cpp decodes all the module-level stuff (imports, exports, etc) but hands over the function body bytes to the shared pipeline for decoding in WasmIonCompile.cpp. - Since WasmIonCompile.cpp assumes well-formed bytes, to avoid a bunch of fuzzer asserts, Wasm.cpp performs a validation pass before OK'ing the bytes to be passed to WasmIonCompile.cpp. This is suboptimal (the bytes are decoded twice) and later this decoding could be moved into baseline-wasm. - Decoding is also recursive now, but should be made iterative later (when folded into baseline). - The wasm binary is reminiscent of, but not exactly the same as, the v8 binary format in https://github.com/WebAssembly/design/issues/497. The expectation is that we'll be iterating on the binary format design over the next few months towards convergence, so SM is basically our take on where we should be going.
Attachment #8708651 - Attachment is obsolete: true
Attachment #8710250 - Flags: review?(bbouvier)
Attached patch final-returnSplinter Review
Oops, this patch goes before baldr. It avoids the need for a final explicit return opcode, making function bodies expressions for wasm.
Attachment #8710251 - Flags: review?(bbouvier)
Comment on attachment 8710245 [details] [diff] [review] var-len-lit Review of attachment 8710245 [details] [diff] [review]: ----------------------------------------------------------------- Can't wait for writeVarS32
Attachment #8710245 - Flags: review?(bbouvier) → review+
Comment on attachment 8710251 [details] [diff] [review] final-return Review of attachment 8710251 [details] [diff] [review]: ----------------------------------------------------------------- Nice. I can already imagine all the tests checking that control flow expressions yield values! ::: js/src/asmjs/AsmJS.cpp @@ +3424,3 @@ > } > > return true; Control flow can be simplified a bit: if (!lastNonEmpty... && !IsVoid(...)) return f.fail(...); return true;
Attachment #8710251 - Flags: review?(bbouvier) → review+
Comment on attachment 8710243 [details] [diff] [review] optional-names Review of attachment 8710243 [details] [diff] [review]: ----------------------------------------------------------------- Quite elegant. ::: js/src/asmjs/WasmGenerator.cpp @@ +123,5 @@ > // already initialized. > shared_ = Move(shared); > if (kind == ModuleKind::Wasm) { > numSigs_ = shared_->sigs.length(); > + module_->numFuncs = shared_->funcSigs.length(); note to self: No need to initialize in asm.js mode because ModuleData is PodZero'd. ::: js/src/asmjs/WasmGenerator.h @@ +35,5 @@ > > struct SlowFunction > { > + SlowFunction(uint32_t index, unsigned ms, unsigned lineOrBytecode) > + : index(index), ms(ms) nit: don't forget to set lineOrBytecode here! ::: js/src/asmjs/WasmModule.cpp @@ +681,5 @@ > + if (!PerfFuncEnabled()) > + return true; > +#elif defined(MOZ_VTUNE) > + if (!IsVTuneProfilingActive()) > + return true; If Spidermonkey is built with Perf support *and* VTune support, this means we'll have to enable both or none. Maybe if (false #ifdef JS_ION_PERF || !PerfFuncEnabled() #endif #ifdef MOZ_VTUNE || !IsVTuneProfilingActive() #endif ) This is a bit ugly though... Or another way is to unify what enables Perf *and* VTune. Or just get back to what it was? @@ +710,3 @@ > unsigned method_id = iJIT_GetNewMethodID(); > if (method_id == 0) > return; This should return a boolean, not void; @@ +1424,5 @@ > + > +const char* > +Module::getFuncName(JSContext* cx, uint32_t funcIndex, UniqueChars* owner) const > +{ > + if (!module_->prettyFuncNames.empty()) In webassembly, can anonymous and named functions live in a same module? (I would say so, assuming that you've moved numFuncs to the pod so as to have the upper bound of the number of names). If so, this condition will trigger a OOB if we're trying to get the name of an anonymous function. @@ +1427,5 @@ > +{ > + if (!module_->prettyFuncNames.empty()) > + return prettyFuncName(funcIndex); > + > + char* chars = JS_smprintf("wasm-function[%u]", funcIndex); I guess annoying know-it-all who'll explicitly name their function "wasm-function[0]" won't be able to distinguish it from unnamed functions. That's a reasonable price to pay for being annoying :-) ::: js/src/asmjs/WasmModule.h @@ +219,5 @@ > > CodeRange() = default; > CodeRange(Kind kind, Offsets offsets); > CodeRange(Kind kind, ProfilingOffsets offsets); > + CodeRange(uint32_t nameIndex, uint32_t lineOrBytecode, FuncOffsets offsets); s/nameIndex/funcIndex
Attachment #8710243 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #25) Thanks! Good catches. > In webassembly, can anonymous and named functions live in a same module? (I > would say so, assuming that you've moved numFuncs to the pod so as to have > the upper bound of the number of names). If so, this condition will trigger > a OOB if we're trying to get the name of an anonymous function. I'm expecting it'll be all-or-nothing, but if it isn't, then yeah, we'd have prettyFuncNames either be empty or numFuncs-sized and have null elements that we'd have to check for in getFuncName().
Attached patch baldr (obsolete) — Splinter Review
Rebased and fixed so that on platforms where wasm isn't supported (JS_CODEGEN_NONE, non-hardware-fp, non-standard-page-size (we could drop this requirement pretty easily)) the wasmEval/wasmTextToBinary functions simply aren't available so all the wasm tests can start with "if (!wasmEval) quit()".
Attachment #8710250 - Attachment is obsolete: true
Attachment #8710250 - Flags: review?(bbouvier)
Attachment #8710818 - Flags: review?(bbouvier)
Comment on attachment 8710250 [details] [diff] [review] baldr Review of attachment 8710250 [details] [diff] [review]: ----------------------------------------------------------------- First set of comments that I made yesterday before the rebased patch. ::: js/src/asmjs/Wasm.cpp @@ +65,5 @@ > + JSContext* cx() const { return cx_; } > + Decoder& d() const { return d_; } > + ModuleGenerator& mg() const { return mg_; } > + FunctionGenerator& fg() const { return fg_; } > + uint32_t funcIndex() const { return funcIndex_; } These getters are unused so far (cx(), mg(), fg(), funcIndex()). How about we add them only when we need them? (FunctionGenerator is entirely unused (yet)) @@ +93,5 @@ > + switch (expr) { > + case Expr::Nop: > + return CheckType(f, ExprType::Void, expected); > + case Expr::I32Const: > + return f.d().readVarU32() && CheckType(f, ExprType::I32, expected); If readVarU32() fails, the whole validation will fail with oom. Could you f.fail("unable to read variable-length u32") here instead? It would actually be even nicer to have helpers for this, bool DecodeVarU32(FunctionContext& f) { if (!f.d().readVarU32()) return f.fail("blablabla"); return true; } ::: js/src/asmjs/Wasm.h @@ +27,5 @@ > +// Compile and instantiate the given binary-encoded wasm module and return the > +// resulting export object, which contains the module's exported functions as > +// properties. > +bool > +Eval(JSContext* cx, UniqueChars filename, Handle<ArrayBufferObject*> code, If you needed js/Utility.h in WasmText.h for UniqueChars, it's likely you'll need it here as well. ::: js/src/asmjs/WasmBinary.h @@ +379,5 @@ > MOZ_WARN_UNUSED_RESULT bool > writeExpr(Expr expr, size_t* offset = nullptr) { > return writeEnum(expr, offset); > } > + MOZ_WARN_UNUSED_RESULT bool writeValType(ValType type, size_t* offset = nullptr) { Can you please unify the signature style with writeExpr above (in one way or the other)? @@ +423,5 @@ > + return bytecode_.append(reinterpret_cast<const uint8_t*>(cstr), strlen(cstr) + 1); > + } > + > + MOZ_WARN_UNUSED_RESULT bool startSection(size_t* offset) { > + if (!writeU32(0)) If you wanted, you could put a sentinel value that gets checked in finishSection afterwards before patching. @@ +429,5 @@ > + *offset = bytecode_.length(); > + return true; > + } > + void finishSection(size_t offset) { > + uint8_t* patchAt = bytecode_.begin() + offset - sizeof(uint32_t); MOZ_ASSERT(patchAt < bytecode_.end()); @@ +430,5 @@ > + return true; > + } > + void finishSection(size_t offset) { > + uint8_t* patchAt = bytecode_.begin() + offset - sizeof(uint32_t); > + uint32_t numBytes = bytecode_.length() - offset; MOZ_ASSERT(numBytes <= bytecode_.length()); @@ +509,5 @@ > public: > + Decoder(const uint8_t* begin, const uint8_t* end) > + : beg_(begin), > + end_(end), > + cur_(begin) MOZ_ASSERT(begin <= end); @@ +532,2 @@ > void assertCurrentIs(const DebugOnly<size_t> offset) const { > MOZ_ASSERT(size_t(cur_ - beg_) == offset); Can you use currentOffset() in that function please? @@ +619,5 @@ > + MOZ_WARN_UNUSED_RESULT bool startSection(uint32_t* offset) { > + uint32_t unused; > + if (!readU32(&unused)) > + return false; > + *offset = cur_ - beg_; You can use currentOffset() here ::: js/src/asmjs/WasmText.h @@ +26,5 @@ > +namespace wasm { > + > +// Translate the textual representation of a wasm module (given by a > +// null-terminated char16_t array) into a Bytecode object. If there is an error > +// other than out-of-memory an error message string will be stored in 'error' nit: end with period ::: js/src/jit-test/lib/asserts.js @@ +74,5 @@ > if (!(e instanceof ctor)) > throw new Error("Assertion failed: expected exception " + ctor.name + ", got " + e); > if (typeof test == "string") { > if (test != e.message) > + throw new Error("Assertion failed: expected " + test + ", got " + e.message); I hear we even have template strings now... ::: js/src/jsutil.h @@ +136,5 @@ > } > > +template <class Container1, class Container2> > +static inline bool > +EqualContainers(const Container1& lhs, const Container2& rhs) Could you describe in a comment what's expected from the classes Container1/Container2 (namely, they need to implement length() and the const operator[]) @@ +149,5 @@ > +} > + > +template <class Container> > +static inline HashNumber > +HashContainer(const Container& c, HashNumber hn = 0) Ditto Considering the signature, would a better name be AddContainerToHash? Ideally, that'd be called HashContainer if hn = 0 and AddContainerToHash otherwise, but we can't go to that level of detail. ::: js/src/vm/ArrayBufferObject.h @@ -276,5 @@ > - /* > - * Ensure data is not stored inline in the object. Used when handing back a > - * GC-safe pointer. > - */ > - static bool ensureNonInline(JSContext* cx, Handle<ArrayBufferObject*> buffer); nice catch (compilers are really permissive, aren't they)
Attachment #8710250 - Attachment is obsolete: false
(In reply to Benjamin Bouvier [:bbouvier] from comment #29) Great comments, thanks! > These getters are unused so far (cx(), mg(), fg(), funcIndex()). How about > we add them only when we need them? Ah, they were used by DecodeReturn in the previous patch which I was able to take out with 9c9ae4b5cacc. I could remove them but they're going to come back immediately as soon as we implement anything more complicated than nop/i32.const so I'd rather keep them to show the intended structure of the code. > It would actually be even nicer to have helpers for this, Yes, that's a good idea, probably as members of the FunctionContext which maybe should be renamed to FunctionDecoder for symmetry with the other Function*s. Perhaps that could be in a future patch that introduces formatting strings though so we could write `f.readVarU32("i32.const")` and get an error that mentioned the operator. > MOZ_ASSERT(numBytes <= bytecode_.length()); This won't hold for wasm input since the bits are arbitrary. > Could you describe in a comment what's expected from the classes > Container1/Container2 (namely, they need to implement length() and the const > operator[]) I could, but I felt like the name, tiny function body, and compile errors would make that sufficiently clear.
Comment on attachment 8710818 [details] [diff] [review] baldr Review of attachment 8710818 [details] [diff] [review]: ----------------------------------------------------------------- Still need to give a look at the tests and re-read all of WasmText, but this looks very nice and elegant. A few more comments below. ::: js/src/asmjs/Wasm.cpp @@ +297,5 @@ > + if (!d.finishSection(sectionStart)) > + return Fail(cx, d, "func section byte size mismatch"); > + > + int64_t after = PRMJ_Now(); > + unsigned generateTime = (after - before) / PRMJ_USEC_PER_SEC; We probably want PRMJ_USEC_PER_MSEC here, don't we? @@ +319,5 @@ > + if (!d.readVarU32(&numFuncs)) > + return Fail(cx, d, "expected number of functions"); > + > + for (uint32_t i = 0; i < numFuncs; i++) { > + if (!DecodeFunc(cx, d, mg, funcIndex++)) Probably want to limit the number of functions or to check funcIndex for overflow here? ::: js/src/asmjs/WasmBinary.h @@ +411,5 @@ > return true; > } > > + MOZ_WARN_UNUSED_RESULT bool writeCString(const char* cstr) { > + return bytecode_.append(reinterpret_cast<const uint8_t*>(cstr), strlen(cstr) + 1); MOZ_ASSERT(cstr); @@ +601,5 @@ > + MOZ_WARN_UNUSED_RESULT bool readCString(const char** cstr = nullptr) { > + if (cstr) > + *cstr = reinterpret_cast<const char*>(cur_); > + for (; cur_ != end_; cur_++) { > + if (*cur_ == '\0') { For consistency with the next function, can we use: if (!*cur_) ? @@ +631,5 @@ > + MOZ_WARN_UNUSED_RESULT bool finishSection(uint32_t offset) { > + const uint8_t* start = beg_ + offset; > + uint32_t numBytes; > + memcpy(&numBytes, start - sizeof(uint32_t), sizeof(uint32_t)); > + return numBytes == uintptr_t(cur_ - start); So if I understand correctly and look at all the uses of startSection/finishSection, the apis could be simpler: - startSection could write the number of bytes in an outparam uint32_t - then finishSection could directly use this number of bytes, instead of recording the offset to where the number of bytes can be read. Also, skipSection() could just read the value recorded by startSection(), instead of being more aware of the section format and explicitly readU32() the size of bytes, in DecodeUnknownSection. What do you think? @@ +636,5 @@ > + } > + MOZ_WARN_UNUSED_RESULT bool skipSection(uint32_t numBytes) { > + if (uintptr_t(end_ - cur_) < numBytes) > + return false; > + cur_ += numBytes; Arbitrary bits could overflow addition this while having the first assertion being correct, e.g. if numBytes == UINT32_MAX we can effectively provoke an infinite loop (see an unknown section of size -1, go back byte per byte to the start of a previous unknown section, skip it, see the unknown section of size -1 again). If this is true, could you add a test, please? ::: js/src/asmjs/WasmText.cpp @@ +47,5 @@ > +using WasmAstHashMap = HashMap<K, V, HP, LifoAllocPolicy<Fallible>>; > + > +typedef WasmAstVector<ValType> WasmAstValTypeVector; > + > +struct WasmAstBase Would using TempObject from jit/JitAllocPolicy work here? Seems we could create a TempAllocator from the LifoAlloc to the same effect. @@ +119,5 @@ > + return static_cast<T&>(*this); > + } > +}; > + > +struct WasmAstNop : WasmAstExpr no public inheritance for Nop? @@ +167,5 @@ > + : WasmAstNode(WasmAstKind::Export), > + externalName_(externalNameBegin), > + externalNameLength_(externalNameEnd - externalNameBegin), > + internalIndex_(UINT32_MAX) > + {} MOZ_ASSERT(externalNameBegin <= externalNameEnd); @@ +314,5 @@ > + } > + const char16_t* textEnd() const { > + MOZ_ASSERT(kind_ == Text); > + MOZ_ASSERT(end_[-1] == '"'); > + return end_ - 1; Per the previous assertion this will be '"' all the time, is this off by one? @@ +329,5 @@ > + > +static bool > +IsWasmNewLine(char16_t c) > +{ > + return c == '\n'; nitpicking here, but what about '\r'? @@ +335,5 @@ > + > +static bool > +IsWasmSpace(char16_t c) > +{ > + return c == ' ' || c == '\t'; Should a new line be a space? I think so, considering the beginning of WasmTokenStream::next() @@ +407,5 @@ > + case ')': > + return WasmToken(WasmToken::CloseParen, begin, cur_); > + > + case '0': case '1': case '2': case '3': case '4': > + case '5': case '6': case '7': case '8': case '9': { What about signed numbers? (later patch?) @@ +413,5 @@ > + while (cur_ != end_ && IsWasmDigit(*cur_)) { > + u32 *= 10; > + u32 += *cur_ - '0'; > + if (!u32.isValid()) > + break; If we break here, then we call u32.value() which MOZ_ASSERT(isValid()), so this will trigger an error. In non-debug builds, this will just yield the first digit. Can we fail instead here? ::: js/src/builtin/TestingFunctions.cpp @@ +3692,5 @@ > fuzzingSafe = true; > > disableOOMFunctions = disableOOMFunctions_; > > + if (!wasm::DefineTestingFunctions(cx, obj)) This isn't very idiomatic, but I guess you're doing this because wasm functions will get into a WASM global object later?
(In reply to Benjamin Bouvier [:bbouvier] from comment #32) Thanks! Great catches; I'll upload a new patch with all these fixed. > - startSection could write the number of bytes in an outparam uint32_t > - then finishSection could directly use this number of bytes, instead of > recording the offset to where the number of bytes can be read. That would almost work but if finishSection is given the number of bytes, it doesn't know how to verify that this is correct: you need to know where you were at startSection. A less trixie strategy would be to return/pass both offset and the numBytes, but that didn't seem worth it. > > + MOZ_WARN_UNUSED_RESULT bool skipSection(uint32_t numBytes) { > > + if (uintptr_t(end_ - cur_) < numBytes) > > + return false; > > + cur_ += numBytes; > > Arbitrary bits could overflow addition this while having the first assertion > being correct, I don't see how: if the test doesn't return false then cur_ should be in the range [beg_, end_]. > Would using TempObject from jit/JitAllocPolicy work here? Seems we could > create a TempAllocator from the LifoAlloc to the same effect. It could perhaps but I kinda like decoupling the two (this is outside the JIT) and the code-reuse is pretty minimal. E.g., I don't want to do the ballast thing. > > +struct WasmAstNop : WasmAstExpr > > no public inheritance for Nop? structs default to public inheritance > > + MOZ_ASSERT(end_[-1] == '"'); > > + return end_ - 1; > > Per the previous assertion this will be '"' all the time, is this off by one? The end in a [begin, end) pair is usually non-inclusive so "one past the end" which, for string literals is what we want. > What about signed numbers? (later patch?) Yeah, this is super-bare-bones; *lots* left to do here if you compare with https://github.com/WebAssembly/spec/blob/master/ml-proto/host/lexer.mll. > > + if (!wasm::DefineTestingFunctions(cx, obj)) > > This isn't very idiomatic, but I guess you're doing this because wasm > functions will get into a WASM global object later? That's right and this function would be called instead from the vm/GlobalObject machinery. Even then, though, if the device doesn't support wasm (no hardfp, old SSE) I was thinking we'd represent that by not having a 'WASM' object.
Attached patch baldrSplinter Review
Updated with comments addressed.
Attachment #8710818 - Attachment is obsolete: true
Attachment #8710818 - Flags: review?(bbouvier)
Attachment #8711210 - Flags: review?(bbouvier)
Attachment #8710250 - Attachment is obsolete: true
Comment on attachment 8711210 [details] [diff] [review] baldr Review of attachment 8711210 [details] [diff] [review]: ----------------------------------------------------------------- Great work! This is so nice that no other pieces in the module generation et al. don't change in this patch, so thank you for all the previous refactorings. This is very nice to read and to understand in general, we can feel the asmjs/ subdir style touch (maybe time to rename this directory?). I am excited to see this landing. Could we expose the wasmEval/wasmTextToBinary on Nightly only and/or behind a flag, so that we can demo it? ::: js/src/asmjs/WasmBinary.h @@ +345,5 @@ > bytecode_[pc] = uint8_t(v); > } > > + template <class T> > + static const T loadUnaligned(const uint8_t* p) { the "unaligned" is implicitly assumed in the Decoder class, maybe we could just assume it here too? ::: js/src/asmjs/WasmText.cpp @@ +269,5 @@ > + ValType valueType_; > + } u; > + public: > + WasmToken() = default; > + WasmToken(Kind kind, const char16_t* begin, const char16_t* end) Can we make it explicit as the other non-default ctors? @@ +331,5 @@ > + > +static bool > +IsWasmNewLine(char16_t c) > +{ > + return c == '\n' || c == '\r'; Hah, getting more information about EOL encoding, there could \r\n to indicate a single end of line, although this will be counted as two lines per the current impl. Can you fix this, please? @@ +337,5 @@ > + > +static bool > +IsWasmSpace(char16_t c) > +{ > + return IsWasmNewLine(c) || c == ' ' || c == '\t'; According the comment below FirstCharKind in TokenStream.cpp, there are also \v and \f in this category. Bonus points if you can share code with TokenStream.cpp/h. @@ +370,5 @@ > + uint32_t lookaheadIndex_; > + uint32_t lookaheadDepth_; > + WasmToken lookahead_[LookaheadSize]; > + > + bool finishKeyword(const char16_t* end, const char16_t* keyword) { Considering that this is much used in the context of a prefix tree, can you name it "startsWith" or "prefixedBy"? The name is misleading in the sense that you can do finishKeyword('e') and then finishKeyword('xport') but this is a single keyword. @@ +513,5 @@ > + WasmToken get() { > + if (lookaheadDepth_) { > + lookaheadDepth_--; > + uint32_t got = lookaheadIndex_; > + lookaheadIndex_ ^= 1; nit: please static_assert LookaheadSize == 2 or MOZ_ASSERT(lookaheadIndex <= 1) @@ +514,5 @@ > + if (lookaheadDepth_) { > + lookaheadDepth_--; > + uint32_t got = lookaheadIndex_; > + lookaheadIndex_ ^= 1; > + return lookahead_[got]; You can just store the WasmToken before xor-ing the lookaheadIndex, and return the token rather than storing the lookahead index? @@ +520,5 @@ > + return next(); > + } > + void unget(WasmToken token) { > + MOZ_ASSERT(lookaheadDepth_ <= 1); > + lookaheadDepth_++; Is it me or lookaheadDepth_ === lookaheadIndex_ + 1 everywhere? Maybe we could spare one variable here... @@ +614,5 @@ > + if (!args.append(valueType.valueType())) > + return nullptr; > + break; > + } > + case WasmToken::Result: { Can Param and Result be put in any order and interleaved? It seems that (func) (param) (result) (param) would validate, and it doesn't look very intuitive. Or is your intent to let wasmTextToBinary be a bit loose (thus not exposed to the open web), so that all errors get caught by wasmEval? In any cases, this needs tests. @@ +616,5 @@ > + break; > + } > + case WasmToken::Result: { > + if (result != ExprType::Void) { > + c.ts.generateError(field, c.error); Could we have a more explicit message, "can't have several result types"? @@ +685,5 @@ > + > + while (c.ts.getIf(WasmToken::OpenParen)) { > + WasmToken section = c.ts.get(); > + > + switch (section.kind()) { Sections can be put in any order/interleaved, and there can be several of them? In any cases, this needs tests! ::: js/src/jit-test/tests/wasm/basic.js @@ +27,5 @@ > +var ver1 = 0xff; > +var ver2 = 0xff; > +var ver3 = 0xff; > + > +assertErrorMessage(() => wasmEval(Uint8Array.of().buffer), Error, magicError); If wasmEval is supposed to be long lived and callable on the wasm global object, you might want to test it as well: 0 args, 3 args, 1 arg but not an array buffer, 2 args and second one isn't an object. Ditto for wasmTextToBinary. Of course, if these are just testing functions, no need to test them. @@ +38,5 @@ > +assertErrorMessage(() => wasmEval(Uint8Array.of(magic0, magic1, magic2, magic3, ver0, ver1, ver2).buffer), Error, versionError); > +assertErrorMessage(() => wasmEval(Uint8Array.of(magic0, magic1, magic2, magic3, ver0, ver1, ver2, ver3, 0, 1).buffer), Error, extraError); > + > +var o = wasmEval(Uint8Array.of(magic0, magic1, magic2, magic3, ver0, ver1, ver2, ver3, 0).buffer); > +assertEq(Object.getOwnPropertyNames(o).length, 0); Binary decoding can go wrong in so many ways that are untested here. Do you mind to move these first tests to a binary.js file, and add more binary tests there? This can be done in a later patch / bug; actually I'd be volunteering to write such test cases. @@ +44,5 @@ > +assertErrorMessage(() => wasmEvalText(""), Error, parsingError); > +assertErrorMessage(() => wasmEvalText("("), Error, parsingError); > +assertErrorMessage(() => wasmEvalText("(m"), Error, parsingError); > +assertErrorMessage(() => wasmEvalText("(module"), Error, parsingError); > +assertErrorMessage(() => wasmEvalText("(module"), Error, parsingError); Can you add a test that "(moduler)" is also a parse error (to try the prefix tree)?
Attachment #8711210 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #35) Thanks, great comments. > I am excited to see this landing. Could we expose the > wasmEval/wasmTextToBinary on Nightly only and/or behind a flag, so that we > can demo it? The plan is to expose behind a pref (default off) however, that will likely be a separate WASM.compile/eval function which returns a Promise (which will be easier to do once we have bug 911216). Thus, wasmEval itself should remain a testing-only function. > Is it me or lookaheadDepth_ === lookaheadIndex_ + 1 everywhere? Maybe we > could spare one variable here... When the lookaheadDepth is > 0, the lookaheadIndex can be either 0 or 1. You can think of this as a circular buffer specialized to the case where the buffer size is 2. > Can Param and Result be put in any order and interleaved? Yep: https://github.com/WebAssembly/spec/blob/master/ml-proto/host/parser.mly#L293 > Or is your intent to let wasmTextToBinary be a bit loose (thus > not exposed to the open web), so that all errors get caught by wasmEval? This is ultimately a temporary text format (https://github.com/WebAssembly/design/blob/master/TextFormat.md#official-text-format) and so the goal is really just to have "enough to write tests". Later a more precisely-defined text language would be specified which gets a more careful treatment (and probably a switch to lex/yacc). Note that, in any case, this won't ever be content-visible. > Could we have a more explicit message, "can't have several result types"? Given the testing-only goals and possible rewrite to lex/yacc, I'm not too interested in building up the error reporting. Patches welcome, of course.
decoder wants to build the files in asmjs/ in non-unified mode. This patch fixes some compile issues blocking that.
Attachment #8712209 - Flags: review?(luke)
Comment on attachment 8712209 [details] [diff] [review] Fix non-unified build Review of attachment 8712209 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! ::: js/src/asmjs/Wasm.cpp @@ +122,5 @@ > uintptr_t bodyLength = bodyEnd - bodyBegin; > if (!fg.bytecode().resize(bodyLength)) > return false; > > + mozilla::PodCopy(fg.bytecode().begin(), bodyBegin, bodyLength); Can you add a 'using mozilla::PodCopy' up at the top?
Attachment #8712209 - Flags: review?(luke) → review+
Whiteboard: [leave open]
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: