Closed Bug 1527871 Opened 1 year ago Closed 2 months ago

[meta] Syntax alignment with wabt / wat2wasm

Categories

(Core :: Javascript: WebAssembly, task, P3)

task

Tracking

()

RESOLVED FIXED

People

(Reporter: lth, Unassigned, Mentored)

References

Details

(Keywords: good-first-bug, meta)

There's a steady stream of bikeshedding on the wasm text format + various text handling bugs in SpiderMonkey. I don't think it's reasonable to be 100% compatible with what wabt accepts but we should at least get rid of the grossest mismatches in what wasmTextToBinary() can handle.

This meta bug is a locus for such text format fixing bugs; I don't expect we'll ever close it.

Basically everything hanging off of this is good-first-bug so I'll flag this one as one too.

Depends on: 1530273
Depends on: 1531662
Depends on: 1545755
Depends on: 1545761

Really it's becoming a headache that we're drifting away from wabt syntax; the closer we can get the better. When I write test code that I run in the JS shell I emit code for wasmTextToBinary, but when I'm testing in the browser I must compile to wasm with wabt, and in that case the code must be wabt-compatible.

Depends on: 1546074
Depends on: 1549926
Type: enhancement → task
Depends on: 1556779
Depends on: 1571230
Depends on: 1584097

(In reply to Lars T Hansen [:lth] from comment #1)

Really it's becoming a headache that we're drifting away from wabt syntax; the closer we can get the better. When I write test code that I run in the JS shell I emit code for wasmTextToBinary, but when I'm testing in the browser I must compile to wasm with wabt, and in that case the code must be wabt-compatible.

I think that for the browser use case, a pure JS implementation of wat2wasm has an advantage here: https://github.com/wingo/wassemble/blob/master/wassemble.mjs

Depends on: 1564760

So there's been a few (internal and Bugzilla) threads about this, so maybe we should step back and try to understand what's the problem to solve here.

The current implementation of wasmTextToBinary is in C++ and we in the Spidermonkey team have written it, so we know how it works and we can trust it. It's slightly more complicated than it ought to be, because of historical reasons: at some point we had wasmBinaryToText, and having the AST representation helped sharing code then, but now it might be slightly overkill. Having our own implementation also helps with adding our own syntax, as has been done for the GC proposal.

So the requirements for wasmTextToBinary that I can think of would be:

  1. it's possible to easily add new syntax, like the GC proposal. I suspect we may have a lot of our tests with some custom syntax that's never been official, so using a different wasmTextToBinary would either mean updating our test suite or including our quirks in the new parser.
  2. something we can trust, not especially in terms of security since this is only for testing, but in terms of having the right binary representation. Also this comment shows that wasmTextToBinary is used by Spidermonkey embedders.

Some nice-to-have but less strong requirements:

  1. something that's easy to understand and hack on;
  2. not having an external dependency (which requires synchronization, and/or maintaining our own patches if we want to add custom behaviors);

Andy was having issues with implementing multi-values, because our AST nodes assume only one result is possible per AST node, and this would require some important refactoring, hence the idea to have something else.

In the realm of possibilities, something else could be:

  • replacing wasmTextToBinary entirely with something else already existing. I don't know much about wabt/wassemble to tell how they would match (3) in the above list, but using something external in general would require us to maintain our own patches and sync with upstream, which is not ideal. Maybe wassemble's only goal is to be used in Spidermonkey, in which case it wouldn't be external anymore.
  • or write our own replacement. Dan had talked about wanting to write a new Rust parser for Cranelift/Wasmtime at some point, since we're depending on wabt for Cranelift to parse WAT text cases. Since it's now possible to add Rust dependencies to the JS engine, this new parser could also be used in Spidermonkey. That would still be an external dependency, though.
  • or adding support for supplementary parsers in addition to wasmTextToBinary. This sounds like a maintenance burden, and having duplicate code just for the sake of testing, possibilities to run out of sync between the two parsers, making it complicated to choose among one or the other. Not ideal either.

My opinion right now is that it would just be simpler to fix wasmTextToBinary, simplify it if we feel the need to, and keep on using it. In particular, the only con I've got against it is for point (1), as adding support for multi-value shows. But every other solution implies having to deal with every other item in the list of requirements, which looks like it could mean much more work in the long run.

Lars has already commented about the fact that a new parser be written in JS. Curious to know what other people's thoughts are about this topic.

Moving my comment here:

At the risk of coming across as negative:

I am strongly disinclined to accept a JS solution for this, primarily because it is a weak language with a lot of error-hiding behavior (dynamically typed, permissive data structures, late errors) and the less of it we have in our code the better. In general, I'm disinclined to replace wasmTextToBinary at all; it has served us well and it fits in well with the C++ codebase, options processing, conditional compilation, ad-hoc needs, and what have you. The point of bug 1564760 was that wabt could be provided as a service if it is already installed on the system, for those (external) shell users who find it more convenient to use.

Clearly it's possible for anyone who needs it / wants it to pick up wassemble.mjs, and use it as a prefix for their own projects, if that is more suitable than what the shell offers. Or to precompile using wat2wasm.

If we have to fix wasmTextToBinary to support multi-value I'm not sure why we shouldn't just do that. I'm open for arguments, but "use this JS library instead" isn't one I think.

And Andy's response to my comment:

Hi! I think this is the wrong conclusion. Hear me out a bit.

  1. I am sympathetic to wanting stronger guarantees but unless I misunderstand, wasmTextToBinary has no safety characteristics, as the output has to be validated -- the guarantees are already present.
  2. The argument about options, ad-hoc needs, etc is adequately addressed by a JS solution. wassemble (for example) can support the union of all options supported by SpiderMonkey, and leave it to the engine to validate/compile as appropriate.
  3. Integration with C++ is unnecessary -- the only interface is from JS, and only in shell builds.
  4. Making WasmAST work with multi-value is not straightforward. As a simple example: AstBinaryOperator doesn't make sense. It doesn't allow for (i32.add (block (result i32 i32) (i32.const 10) (i32.const 42))).
  5. The syntax supported by wasmTextToBinary should be the standard syntax -- the one that can express all grammatical productions.
  6. If we imported wabt (or a wasm-compiled routine from some language with stronger guarantees with JS), there's an overhead in terms of communication, management, updates, etc -- and all this for code that never ships in a browser and doesn't add value. A single JS file that you can just copy around, modify if needed for experimental standards, etc is easier to maintain.

What is your proposal for addressing problems (4) and (5), lth?

Following up first to Andy:

  1. Personally I just dislike dynamic typing and late errors, it makes software less reliable. Benjamin made a more reasoned case for the best reliability we can have: we depend on this function for our own testing (both positive tests and negative tests), so the quality of Firefox is in some sense directly dependent on it. That said, wasmTextToBinary is a big piece of code with a very (overly?) permissive parser, and the size and the all the moving parts together make it a little brittle.
  2. I guess so. I guess if properly embedded it's even ifdeffable (maybe moreso than a Rust module would be, say).
  3. It's true it's only for shell builds, but it's not quite true that the only interface is from JS -- we could use it in jsapi-tests, though we don't currently do that. I don't think much hinges on this argument; embedded JS should work OK even in that context.
  4. I believe you, but obviously the proposal for addressing that would be to change the data structure.
  5. I actually think that it's nice if we support the standard syntax (cf the subject of this very bug) but I think it's required that we support whatever syntax we want to support, so that we can easily play with extensions and variants. You get no argument from me about how (relatively) clunky it is to add syntax to wasmTextToBinary; but (apart from cross-cutting problems like multi-value) it's largely grunt work.
  6. I agree with this, but I don't think we should import a third-party library, I think wasmTextToBinary should be part of the engine and should be supported by us. There's a cost to that, and I think we should be willing to pay that cost. (In particular, I've never supported importing wabt, I've only suggested that if it's installed on the system the shell could provide a namespace/module that allows wabt to be used as a subroutine.)

I guess TypeScript could be brought to bear, if we wanted to increase reliability but embed some JS. I still think fixing wasmTextToBinary is the better option.

If you were to fix wasmTextToBinary, I think you would start by removing AstExprType and AstExpr from WasmAST.h. Because we want to test cases in which values flow in through stack parameters rather than restricting to nested parameters, this abstraction doesn't work. Of course this means lots of adaptation both in WasmTextToBinary but also the AsmJS compiler and the wasm encoder. You would get to keep most concrete classes (e.g. AstCall) but they would inherit from AstNode and not store a type and not reference subexpressions. Then I think I would move to implement the unfolding algorithm from the spec. It would be a big couple patches and I would estimate it to take about 10 working days or so.

See Also: → 1590920

As a side note, not sure how asm.js is affected by this possible rewrite (comment 7), as it reads input with a special parser and then generates wasm in binary form directly. This may have been different before. Anyhow I'll fork off a new bug to track this work and copy salient comments from here into that one.

Depends on: 1592926
Depends on: 1594652
Depends on: 1594655
Depends on: 1600557
Depends on: 1611080
Depends on: 1612534

Bug 1612534 nuked the old wasmTextToBinary implementation with the wat crate. I've looked through all the open dependent bugs here and they're all fixed by this switch. Therefore I'm going to close this bug. If we have syntax changes or more alignment work to do, let's file that work upstream [1].

[1] github.com/bytecodealliance/wat

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.