Proper boxing/unboxing semantics for AnyRef (Rabaldr + interpreter stubs)

RESOLVED FIXED in Firefox 66

Status

()

enhancement
P2
normal
RESOLVED FIXED
6 months ago
5 months ago

People

(Reporter: lth, Assigned: lth)

Tracking

(Blocks 1 bug)

unspecified
mozilla66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 fixed)

Details

Attachments

(1 attachment, 4 obsolete attachments)

Assignee

Description

6 months ago
When a JS value flows into wasm to an AnyRef parameter, or via an AnyRef return from JS, or is stored by JS into an AnyRef table, or into an AnyRef global, it is currently converted to an object by a modified (*) ToObject operation.  When wasm passes an AnyRef value to JS as a parameter, or returns one to JS, or JS reads an AnyRef table or global, JS just gets an object pointer (or null).  Thus exposing a JS value to wasm via anyref is not an identity operation and does not allow all JS values to be used.

The correct marshalling operation is more subtle: If the JS value is an Object value or null then the operation is an identity operation and in this case the  pointer is transmitted unchanged across the boundaries, just as it is now.  But if the JS value is not an Object value or null then it is not converted to an Object but is instead /boxed/ when it is exposed to wasm, and when a boxed value is exposed to JS it is /unboxed/ so as to retrieve the original value.

There are several possible schemes for representing boxes.  One attractive scheme is to use the low bit of an AnyRef as a tag bit.  A value of zero means a pointer-to-something, where the something is self-identifying; a value of one means that the upper bits represent a small signed integer, which is a very common case that we would strongly prefer to handle without storage allocation, especially in the future (**).  Values such as true, false, and undefined could be represented as distinguished singleton objects, or we could use the next-lowest bit of a non-integer to distinguish among a pointer ('0') and various immediate values ('1') (***).  Strings would also be object pointers, assuming they carry enough type information to tell them apart from other object types.  And objects carry type information.

We will need to amend our GC tracing machinery for wasm frames to distinguish traceable values from nontraceable ones in anyref slots, at a minimum.  (We can in principle treat anyref slots and non-anyref slots differently, though it's far from obvious that this is beneficial.)

(*) ToObject is supposed to throw on null and undefined, but we allow null to pass through.

(**) The type `i31ref` is coming; its relationship to boxed JS int values is a little unclear but we'll need this representation one way or the other.

(***) Thus we could use the second-lowest bit to indicate a non-integer non-pointer and we'd have 30 bits worth of payload we could slice in many ways, including for true/false/undefined but also for up-to-29-bit integers, if i31ref turns out not to be the same thing as an implicitly boxed integer value.
Nice writeup!

> Strings would also be object pointers, assuming they carry enough type information to tell them apart from other object types.

JSString doesn't have a Shape* as the first word, so we'd need to replicate the logic in Cell::getTraceKind() to distinguish JSStrings from all other js::ShapedObjects.  That could work, but we'd want to compare the perf of that vs. using the second-LSB to make the distinction instead.
Assignee

Updated

6 months ago
Assignee

Updated

6 months ago
No longer blocks: 1488199
Assignee

Updated

6 months ago
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee

Comment 2

6 months ago
A couple more notes as I'm grubbing through the engine to look for places that need to be updated:

* If a pointer is represented as an anyref with no additional tag bits (so, the two or three low bits zero, etc) then upcasting from (ref T) to anyref is zero work in generated wasm code.  This is a fairly significant simplification, probably.  Also a null default value works as all-zero-bits, as now.

* Write barriers on anyref fields probably become even more complicated, as we may have to account for stores of non-pointer values and filter them out the way we now filter out null values in both the prebarrier and the postbarrier.  If we represent singletons true, false etc as non-pointers then there's an argument to be made here that the anyref representation of null should also be a (tagged) non-pointer value, though this moves complexity elsewhere, notably everywhere we use a null default value.  Even if the singletons are pointer values we will need to deal with the tagged ints.  I'm not sure how big a deal this really is, as reasonable people will use typed fields - possibly table.set is most affected and that's already dog-slow.

* I'll probably do anyref boxing/unboxing before I do stubs (bug 1508560) since the boxing/unboxing is not too hairy to get right in the context of our interpreter-only paths, and then we'll know how to do it also for the jit paths when we implement those.
Assignee

Comment 3

6 months ago
A patch that adds "TODO/AnyRef" comments to all places in the code that need to be updated (anyway all I've found so far).  It doesn't look too awful actually.
(In reply to Lars T Hansen [:lth] from comment #2)
> * If a pointer is represented as an anyref with no additional tag bits (so,
> the two or three low bits zero, etc) then upcasting from (ref T) to anyref
> is zero work in generated wasm code.  This is a fairly significant
> simplification, probably.  Also a null default value works as all-zero-bits,
> as now.

With non-coercive subtyping, we have a hard requirement that a (ref T) must be a valid anyref without changing any bits.  That's because a reference to a struct
  (struct (fields (mut int31ref) (mut (ref $blah))))
can be upcast to a reference to a struct
  (struct (fields anyref anyref))
which effectively reinterprets the bits of those two anyref-subtypes as anyref.

Thus, if we tag the low-bits of pointers, they must remain tagged even in (ref $T) and int31ref.  I think null can stay all-bits-0 in all cases.
Assignee

Comment 5

6 months ago
This is more or less a correct implementation of boxing and unboxing.  It transparently boxes and unboxes everything that isn't already a JSObject* or null, which means I have not had to worry about barriers and GC issues and I have also been able to paper over issues with TypedObject (I think).  But it does not handle the fast i31ref case, nor does it have optimizations for common immediates, and the boxing itself may be much slower than we want it to be.

Still, as a POC it takes us part of the way there, and it passes all tests, after removing those tests that assumed improper boxing and unboxing...
Assignee

Comment 6

6 months ago
Subsequent stages might thus be:

stage 1: optimize the boxed representation as much as is reasonable
stage 2: figure out how to pass strings by pointer without introducing tagging
stage 3: introduce tagging, notably for ints and maybe also for strings (big job)

We are interested in some performance numbers for microbenchmarks comparing 2 and 3, especially, for string and int data in particular.
Assignee

Comment 7

6 months ago
Cleaned up, with more test cases and some bug fixes.  See longish commit message for tech details.  Note the following:

* This patch sits on top of the stack scanning patch queue (bug 1476251,
  bug 1488162, and local adjustments) , mostly because I expected that queue
  to land before this patch, but the touch points aren't too many, just
  be sure you're not confused when reading because of it.

* Is "anyref_t" the type name we want to represent boxed AnyRef values in
  C++ code?  Given that anyref is always a pointerish/scalar type I'm inclined
  to think so but I'm open to arguments.

* Instead of making WasmValueBox a subtype of NativeObject we can probably
  make it a subtype of JSObject; however, once we stop wrapping strings in
  WasmValueBox we will need to deal with gc::Cell subtypes directly, at which
  point WasmValueBox might well be a gc::Cell subtype.  (Modulo the issue
  below being addressed.)  So I'm deferring this.

* This patch does not properly handle the cases where JS writes into a TO
  anyref field or reads out of a TO anyref field in a struct type.  In the
  former case, JS will use ToObject on non-null values, which means it'll
  throw on undefined and will wrap non-Object values in JS value objects,
  which will then be seen as such by wasm and JS.  In the latter case, if
  wasm has written a WasmValueBox object into an anyref field, JS reading
  from that field will see the box.  I want to fix this, but it'll mean
  somewhat broad changes to our TO system, and since the current situation
  should be safe and wasm GC types is a WIP anyway, it seemed to me we can
  possibly land this stage 0 without worrying about struct fields.
Attachment #9026455 - Attachment is obsolete: true
Attachment #9026696 - Attachment is obsolete: true
Attachment #9027851 - Flags: review?(luke)
Assignee

Updated

6 months ago
Depends on: 1510216
Assignee

Updated

6 months ago
Blocks: 1510216
No longer depends on: 1510216
Assignee

Updated

6 months ago
Blocks: 1507491
Assignee

Comment 8

6 months ago
Minor update coming to deal with proper rooting of anyref, but i want to get it past try first.
Assignee

Comment 9

6 months ago
Some minor bug fixes.
Attachment #9027851 - Attachment is obsolete: true
Attachment #9027851 - Flags: review?(luke)
Attachment #9028988 - Flags: review?(luke)
Assignee

Updated

6 months ago
Summary: Proper boxing/unboxing semantics for AnyRef → Proper boxing/unboxing semantics for AnyRef (Rabaldr + interpreter stubs)
Assignee

Comment 10

6 months ago
Note, AnyRefInvalid() is fine the way it is for Rabaldr (where we control code layout tightly) but not for Baldr (where we don't and the value can be exposed to the GC in various ways, at least in principle).  In the longer term we can fall back on special tagged values here, but it may be better to just avoid needing it.
Comment on attachment 9028988 [details] [diff] [review]
bug1505768-anyref-box-and-unbox-stage-0-v3.patch

Review of attachment 9028988 [details] [diff] [review]:
-----------------------------------------------------------------

I really like the ASSERT_ANYREF_IS_JSOBJECT idiom.  r+ with nits addressed

::: js/src/wasm/WasmTypes.cpp
@@ +112,5 @@
> +      if (u.anyref_) {
> +        // TODO/AnyRef-boxing: With boxed immediates and strings, the write
> +        // barrier is going to have to be more complicated.
> +        ASSERT_ANYREF_IS_JSOBJECT;
> +        JSObject::writeBarrierPost((JSObject**)dst, nullptr, u.anyref_);

pre-existing: because of the post-barrier, this function is now quite dependent on what 'dst' belongs to: only stable tenured memory (iiuc).  Rather than documenting the contract in the abstract, how about moving this function out of Val and to a static function right above Instance() which is the only use.

::: js/src/wasm/WasmTypes.h
@@ +528,5 @@
> +// An anyref_t is a boxed value that can represent any wasm reference type and
> +// any host type that the host system allows to flow into and out of wasm
> +// transparently.  It is a pointer-sized datum where there is a
> +// representation-invariant conversion between a JSObject pointer and anyref as
> +// well as between a (ref T) and an anyref in wasm.  In general, host types that

(ref T) and anyref must necessarily have the same representation but anyref/JSObject* having the same rep is just an impl choice; it might be good to call out this distinction.  I'd suggest wording it as "anyref_t is a pointer-sized datum that has the same representation as all its subtypes due to the non-coercive subtyping of the wasm type system. It's current representation is a plain JSObject*, using an internal class of JSObjects to box non-object JS values."

@@ +551,5 @@
> +// are technically no tags at all yet).  We use a simple boxing scheme that
> +// wraps a JS value that is not already JSObject in a distinguishable JSObject
> +// that holds the value, see WasmTypes.cpp for details.
> +
> +typedef JSObject* anyref_t;

nit: this doesn't (currently, who knows what future of style changes are in store) match the camel-case, no-_t-suffix convention.  How about just AnyRef, which matches the Rooted*/Handle*/MutableHandle* names below?

Second, how about preemptively making AnyRef a class holding a uintptr_t, for symmetry with JS::Value.  Then various of these global functions / constants below can be members.  I think use in unions shouldn't be a problem anymore (with enough punching).

@@ +578,5 @@
> +
> +// Given a void* that comes from compiled wasm code, turn it into anyref_t after
> +// sanity checking what we can.
> +
> +static inline anyref_t AsAnyRef(void* p) {

Despite the comment, this seems to be being used as a generic "convert a void* to an AnyRef" function.  How about splitting into the 3 use cases I see (assuming static members of class AnyRef):
  FromCompiledCode(void*)
  FromJSObject(JSObject*)
  Null()
which also matches the above:
  Invalid()

@@ +587,5 @@
> +
> +// Given an anyref_t, turn it into a void* for consumption by compiled wasm
> +// code.
> +
> +static inline void* FromAnyRef(anyref_t q) { return (void*)q; }

And this could be ForCompiledCode()

@@ +683,5 @@
>  
>   public:
> +  // TODO/AnyRef-boxing: IsAnyRef is a hack that will go away with a more
> +  // sophisticated representation for anyref_t.
> +  enum class IsAnyRef { True };

As long as class AnyRef has an explicit ctor, I think this isn't necessary?
Attachment #9028988 - Flags: review?(luke) → review+
Assignee

Comment 12

5 months ago
(In reply to Luke Wagner [:luke] from comment #11)
> Comment on attachment 9028988 [details] [diff] [review]
> bug1505768-anyref-box-and-unbox-stage-0-v3.patch
> 
> Review of attachment 9028988 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/wasm/WasmTypes.h
> @@ +528,5 @@
> > +// An anyref_t is a boxed value that can represent any wasm reference type and
> > +// any host type that the host system allows to flow into and out of wasm
> > +// transparently.  It is a pointer-sized datum where there is a
> > +// representation-invariant conversion between a JSObject pointer and anyref as
> > +// well as between a (ref T) and an anyref in wasm.  In general, host types that
> 
> (ref T) and anyref must necessarily have the same representation but
> anyref/JSObject* having the same rep is just an impl choice; it might be
> good to call out this distinction.  I'd suggest wording it as "anyref_t is a
> pointer-sized datum that has the same representation as all its subtypes due
> to the non-coercive subtyping of the wasm type system. It's current
> representation is a plain JSObject*, using an internal class of JSObjects to
> box non-object JS values."

IIUC all you're reacting to here is that JSObject* is a too-specific type; once we no longer put strings into WasmValueBox objects, for example, the type in question would be gc::Cell*.  Agreed this could be clearer.
Assignee

Comment 13

5 months ago
(In reply to Luke Wagner [:luke] from comment #11)
> Comment on attachment 9028988 [details] [diff] [review]
> bug1505768-anyref-box-and-unbox-stage-0-v3.patch
> 
> Review of attachment 9028988 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +551,5 @@
> > +// are technically no tags at all yet).  We use a simple boxing scheme that
> > +// wraps a JS value that is not already JSObject in a distinguishable JSObject
> > +// that holds the value, see WasmTypes.cpp for details.
> > +
> > +typedef JSObject* anyref_t;
> 
> nit: this doesn't (currently, who knows what future of style changes are in
> store) match the camel-case, no-_t-suffix convention.  How about just
> AnyRef, which matches the Rooted*/Handle*/MutableHandle* names below?

Although AnyRef is clearly the most natural name, I specifically avoided it because it is so generic that text searching is becoming impractical.  (We now use AnyRef for all sorts of type tags, for example.)  Additionally, I sort-of justified it by equating anyref_t with a primitive type.

> Second, how about preemptively making AnyRef a class holding a uintptr_t,
> for symmetry with JS::Value.  Then various of these global functions /
> constants below can be members.  I think use in unions shouldn't be a
> problem anymore (with enough punching).

OK, good point.  This didn't happen before because I was trying to use anyref_t at the VM boundary, but I could not risk structures of one element being passed using a different ABI.  But now that the ABI on callouts from the VM is strictly using literal void* it is more practical to contemplate a wrapper class.

> 
> @@ +578,5 @@
> > +
> > +// Given a void* that comes from compiled wasm code, turn it into anyref_t after
> > +// sanity checking what we can.
> > +
> > +static inline anyref_t AsAnyRef(void* p) {
> 
> Despite the comment, this seems to be being used as a generic "convert a
> void* to an AnyRef" function.  How about splitting into the 3 use cases I
> see (assuming static members of class AnyRef):
>   FromCompiledCode(void*)
>   FromJSObject(JSObject*)
>   Null()
> which also matches the above:
>   Invalid()

Well, Invalid has disappeared because it was a bad API, but yes, I will try to clean up.
Assignee

Comment 14

5 months ago
(In reply to Lars T Hansen [:lth] from comment #13)
> (In reply to Luke Wagner [:luke] from comment #11)
> > Comment on attachment 9028988 [details] [diff] [review]
> > bug1505768-anyref-box-and-unbox-stage-0-v3.patch
> > 
> > Review of attachment 9028988 [details] [diff] [review]:
> > -----------------------------------------------------------------
> 
> > Second, how about preemptively making AnyRef a class holding a uintptr_t,
> > for symmetry with JS::Value.  Then various of these global functions /
> > constants below can be members.  I think use in unions shouldn't be a
> > problem anymore (with enough punching).
> 
> OK, good point.  This didn't happen before because I was trying to use
> anyref_t at the VM boundary, but I could not risk structures of one element
> being passed using a different ABI.  But now that the ABI on callouts from
> the VM is strictly using literal void* it is more practical to contemplate a
> wrapper class.

This works reasonably well, actually.  Not a trivial delta but remarkably little ugliness.  And it does highlight another issue, which I am not planning to fix now, which is that (ref T) is exposed as a JSObject*.  We could probably hide that in a similar way.
Assignee

Comment 15

5 months ago
Addresses all comments, carrying r+.
Attachment #9028988 - Attachment is obsolete: true
Attachment #9032685 - Flags: review+

Comment 16

5 months ago
Pushed by rgurzau@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/c42679c7a94d
Box/unbox non-pointers for anyref as WasmValueBox objects. r=luke

Comment 17

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c42679c7a94d
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.