Cranelift: validate Wasm as we parse it
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox83 | --- | fixed |
People
(Reporter: bbouvier, Assigned: cfallin)
References
Details
Attachments
(2 files, 3 obsolete files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
2.97 KB,
patch
|
Details | Diff | Splinter Review |
Cranelift will soon be able to validate wasm as we parse it. This requires passing some information down from the ModuleEnvironment to the translator; nothing too complicated, most information is already around or can easily get retrieved. (Before, Cranelift would call C++'s wasm::ValidateFunctionBody, and then after validation feed all the body bytes to Rust, which requires reading the function's body twice)
I've got a WIP patch that tries to support it. Right now we can't properly implement it using the traits defined in the current versions of wasmparser and the above PR. The reason is that there's a trait method that returns &SomeTraitType
and we'd like to return a primitive value there (TypeCode, which is implemented as an u32 enum). Once the API is supported, we should be able to do it correctly.
An optimization opportunity that I identified while looking at this code: right now the ModuleEnvironment will mostly recreate data types every times it gets a request of module-level information (eg a FuncType, etc.). Instead, it could build up and cache this information incrementally in a ModuleEnvironmentCache
, as we compile function batches: every time we request module-level information, if it's available we can just return a reference to it, otherwise we should be able to lazily retrieve it from the real ModuleEnvironment. To avoid sync'ing, each BatchCompiler would get its own cache instance, which should be fine, since there are as many BatchCompilers as there are threads on the machine.
Alternatively, we could preemptively precompute all this information in a simplistic C-like format, since when we start compiling the function bodies, the ModuleEnvironment is immutable (in particular, all the arrays have fixed lengths). The C-like format could be shared between the C++ and Rust implementation, instead of duplicating the information in Rust.
Implementing any of these two should be a small compile-time win, since this implies fewer mallocs, less contention on jemalloc lock, etc. (Julian did put a SmallVec in one of these ModuleEnvironment requests functions, so it must have shown in profiles)
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
One more thought about this: our test suite checks for very precise validation error messages, which might be an issue if Cranelift were to generate different error messages.
Reporter | ||
Comment 2•4 years ago
|
||
WIP patch that i had locally, will probably need to be rebased and updated now that the wasmparser and cranelift's API have changed upstream.
Assignee | ||
Comment 3•4 years ago
|
||
This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.
This is a vendored version of my GitHub fork of Cranelift; once the PR
lands, we can update this patch to refer to mainline before landing.
Assignee | ||
Comment 4•4 years ago
|
||
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.
Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler. Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.
Co-authored-by: Benjamin Bouvier <benj@benj.me>
Depends on D90326
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #1)
One more thought about this: our test suite checks for very precise validation error messages, which might be an issue if Cranelift were to generate different error messages.
:lth: I'm wondering what strategy you would prefer to deal with this; it is indeed the case that quite a few validation tests assert regex matches on specific errors, and Cranelift's validator emits different error text. Should we coarsen the check (in the Cranelift case only) to just one bit, i.e. pass/fail? Or would you prefer us to try for some degree of error-message compatibility here?
Comment 7•4 years ago
|
||
I think compatibility is best but the reality may be that it will not work in the long run. One trick we use elsewhere is to change the regex from /yaddayadda/ to /(yaddayadda)|(blahblah)/ to cope with this. This would be fine with me and probably is preferable to, say, hunting for some key words common to the messages and going with /.*a.a./, for eaxmple.
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
Latest patchset has a bunch of regex tweaks as suggested; it's ugly and messy (in a perfect world we might standardize the error text for Wasm validation errors -- I can dream, I guess?) but it mostly works. Updating this as a checkpoint, but there seems to be an issue with tables/reftypes that I'll debug more tomorrow.
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 9•4 years ago
|
||
This passes all jit-tests now (using an aarch64-simulator build with Cranelift). It requires a vendoring of Cranelift first, once acrichton's validate-while-parse PR lands there and Cranelift's wasmparser dep is bumped to 0.63. I'll create a separate bug for that.
Assignee | ||
Comment 10•4 years ago
|
||
This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.
Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler. Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.
Co-authored-by: Benjamin Bouvier <benj@benj.me>
Depends on D92503
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 11•4 years ago
|
||
Comment 12•4 years ago
|
||
Backed out for build bustages.
Comment 13•4 years ago
|
||
Reporter | ||
Comment 14•4 years ago
|
||
this additional patch makes the Rust code compile without warning, but then I'm seeing a few error messages mismatches again ("Some(xxx)" in front of types, mostly). ni? chris to make sure it's seen before an attempt to re-land.
Assignee | ||
Comment 15•4 years ago
|
||
Thanks -- here's a green try-run with this patch applied:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3a0b7f8294166a1ee1865753e1e33a05ea77099
I'll try re-landing now.
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
bugherder |
Description
•