Closed Bug 1655042 Opened 4 years ago Closed 4 years ago

Cranelift: validate Wasm as we parse it

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
83 Branch
Tracking Status
firefox83 --- fixed

People

(Reporter: bbouvier, Assigned: cfallin)

References

Details

Attachments

(2 files, 3 obsolete files)

Cranelift will soon be able to validate wasm as we parse it. This requires passing some information down from the ModuleEnvironment to the translator; nothing too complicated, most information is already around or can easily get retrieved. (Before, Cranelift would call C++'s wasm::ValidateFunctionBody, and then after validation feed all the body bytes to Rust, which requires reading the function's body twice)

I've got a WIP patch that tries to support it. Right now we can't properly implement it using the traits defined in the current versions of wasmparser and the above PR. The reason is that there's a trait method that returns &SomeTraitType and we'd like to return a primitive value there (TypeCode, which is implemented as an u32 enum). Once the API is supported, we should be able to do it correctly.

An optimization opportunity that I identified while looking at this code: right now the ModuleEnvironment will mostly recreate data types every times it gets a request of module-level information (eg a FuncType, etc.). Instead, it could build up and cache this information incrementally in a ModuleEnvironmentCache, as we compile function batches: every time we request module-level information, if it's available we can just return a reference to it, otherwise we should be able to lazily retrieve it from the real ModuleEnvironment. To avoid sync'ing, each BatchCompiler would get its own cache instance, which should be fine, since there are as many BatchCompilers as there are threads on the machine.

Alternatively, we could preemptively precompute all this information in a simplistic C-like format, since when we start compiling the function bodies, the ModuleEnvironment is immutable (in particular, all the arrays have fixed lengths). The C-like format could be shared between the C++ and Rust implementation, instead of duplicating the information in Rust.

Implementing any of these two should be a small compile-time win, since this implies fewer mallocs, less contention on jemalloc lock, etc. (Julian did put a SmallVec in one of these ModuleEnvironment requests functions, so it must have shown in profiles)

Priority: -- → P3

One more thought about this: our test suite checks for very precise validation error messages, which might be an issue if Cranelift were to generate different error messages.

Attached patch wip.diff (obsolete) — Splinter Review

WIP patch that i had locally, will probably need to be rebased and updated now that the wasmparser and cranelift's API have changed upstream.

This patch pulls in an updated Cranelift with a new validation strategy,
introduced by bytecodealliance/wasmtime#2059. This new design validates
the Wasm module as it parses the function bodies. A subsequent patch
will adapt Baldrdash to work with this.

This is a vendored version of my GitHub fork of Cranelift; once the PR
lands, we can update this patch to refer to mainline before landing.

This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler. Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Depends on D90326

Try run for Cranelift/aarch64: link

(In reply to Benjamin Bouvier [:bbouvier] from comment #1)

One more thought about this: our test suite checks for very precise validation error messages, which might be an issue if Cranelift were to generate different error messages.

:lth: I'm wondering what strategy you would prefer to deal with this; it is indeed the case that quite a few validation tests assert regex matches on specific errors, and Cranelift's validator emits different error text. Should we coarsen the check (in the Cranelift case only) to just one bit, i.e. pass/fail? Or would you prefer us to try for some degree of error-message compatibility here?

Flags: needinfo?(lhansen)

I think compatibility is best but the reality may be that it will not work in the long run. One trick we use elsewhere is to change the regex from /yaddayadda/ to /(yaddayadda)|(blahblah)/ to cope with this. This would be fine with me and probably is preferable to, say, hunting for some key words common to the messages and going with /.*a.a./, for eaxmple.

Flags: needinfo?(lhansen)
Assignee: nobody → cfallin
Attachment #9175911 - Attachment description: Bug 1655042: Part 1 of 2: Vendor Cranelift d7b9a38ad3aed8ea29d855cc28d7dd0a418a41c5. GITHUB FORK; DO NOT CHECK IN. r=jseward → Bug 1655042: Part 1 of 2: Vendor Cranelift 51ac79a9adde9a23d2c541670d5ea8beaea6768a. GITHUB FORK; DO NOT CHECK IN. r=jseward
Status: NEW → ASSIGNED

Latest patchset has a bunch of regex tweaks as suggested; it's ugly and messy (in a perfect world we might standardize the error text for Wasm validation errors -- I can dream, I guess?) but it mostly works. Updating this as a checkpoint, but there seems to be an issue with tables/reftypes that I'll debug more tomorrow.

Attachment #9175912 - Attachment description: Bug 1655042: Part 2 of 2: Support Cranelift-based Wasm validation. r=jseward → Bug 1655042: Support Cranelift-based Wasm validation. r=jseward
Attachment #9175911 - Attachment is obsolete: true

This passes all jit-tests now (using an aarch64-simulator build with Cranelift). It requires a vendoring of Cranelift first, once acrichton's validate-while-parse PR lands there and Cranelift's wasmparser dep is bumped to 0.63. I'll create a separate bug for that.

Depends on: 1669055

This patch is an update of Benjamin Bouvier's WIP patch that supports
Cranelift's new Wasm validator functionality, as introduced in
bytecodealliance/wasmtime#2059.

Previously, Baldrdash allowed SpiderMonkey to validate Wasm function
bodies before invoking Cranelift. The Cranelift PR now causes validation
to happen in the function-body compiler. Because the Cranelift backend
now takes on this responsibility, we no longer need to invoke the
SpiderMonkey function-body validator. This requires some significant new
glue in the Baldrdash bindings to allow Cranelift's Wasm frontend to
access module type information.

Co-authored-by: Benjamin Bouvier <benj@benj.me>

Depends on D92503

Attachment #9175912 - Attachment is obsolete: true
Attachment #9167352 - Attachment is obsolete: true
Pushed by cfallin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0807415f1ea5
Support Cranelift-based Wasm validation. r=jseward
Backout by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3792ecb5de48
Backed out 2 changesets (bug 1655042, bug 1669055) for build bustages. CLOSED TREE

this additional patch makes the Rust code compile without warning, but then I'm seeing a few error messages mismatches again ("Some(xxx)" in front of types, mostly). ni? chris to make sure it's seen before an attempt to re-land.

Flags: needinfo?(chris)

Thanks -- here's a green try-run with this patch applied:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3a0b7f8294166a1ee1865753e1e33a05ea77099

I'll try re-landing now.

Flags: needinfo?(chris)
Pushed by cfallin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dc2e751e246b
Support Cranelift-based Wasm validation. r=jseward
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: