Closed Bug 753609 Opened 12 years ago Closed 11 years ago

Design public handle API

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Unassigned)

References

Details

(Whiteboard: [js:t])

We need to firm up the rooting/handle API for generational GC and use it in JSAPI. It's going to be C++: not using C++ for rooting would be way too inconvenient. Blog post to follow about the change to C++. This bug is just for discussion of the C++ API itself.

We have a current C++ rooting/handle API in gc/Root.h as a starting point, which uses the types Handle<T>, RootedVar<T>, and Root<T>. There is some documentation in gc/Root.h and also in https://developer.mozilla.org/En/SpiderMonkey/Internals/Garbage_collection#Explicit_rooting_API.

There are some unresolved issues for this API:

- We don't know the performance implications yet.
- We don't yet have a solution for rooting |this| in internal methods. Proposed so far is to switch to static methods that use handle type for the receiver parameter, but I suspect there are other options.
- Root<T> is less easy to use than the other types--should we keep it public, make it friendapi, or remove it?
- The names have been found to be confusing, at least initially, by some readers.
- We need better docs.

More important than all that, only engine implementers have seen and commented on the handle API so far, so we don't know whether it serves the needs of API users. We do have one bit of evidence that it might not be complete, which is that IonMonkey has found it necessary to layer a handle-scope-type concept on top.

I want the rooting/handle API to have good human factors and be pretty stable before we land it, so if you're going to be using it or have some ideas, please give the docs a read and comment here.
Depending on whether Mozilla wants to ever put some resources into Spidernode in the near-ish future, it might be worthwhile to try to stay somewhat close to the V8 API where possible.

The proposed API would make V8Monkey a lot easier in any case, though, so that's a great side effect already.
Some initial random thoughts from the perspective of embedding SpiderMonkey (originally 1.5, now 1.8.5) as the scripting engine for a game (with code at http://trac.wildfiregames.com/browser/ps/trunk/source/scriptinterface and thereabouts):

 * C++ is good.

 * We have a CScriptValRooted, which is basically a boost::shared_ptr<jsval>. We heap-allocate the jsval and call JS_Add[Named]ValueRoot when constructing the CScriptValRooted, and the shared_ptr destructor calls JS_RemoveValueRoot, and the shared_ptr copy constructor does thread-safe reference counting. (If I remember correctly, JS_AddValueRoot was pretty slow, so the shared_ptr gave better performance than having a non-heap-allocated value and calling JS_AddValueRoot every time it was copied). It's unclear how copying RootedVar is expected to work.

 * Very little of our code does any manual rooting - it mostly either uses CScriptValRooted or relies on the conservative stack scanner.

 * Currently the only case where performance really matters is where we have large arrays of jsvals in C++ (in some code that clones values from one runtime into another). In that code, we avoid rooting entirely and use JS_SetExtraGCRoots and JS_CALL_VALUE_TRACER, so that there's very little cost if the GC isn't triggered.

 * In several places we use std::map<JSObject*, ...>, when we recurse through an object's properties and want to detect objects we've already visited. Presumably that won't work at all if objects can move.

 * In almost all our code, performance doesn't matter much, and safety (avoiding bugs caused by forgetting to root) is more important; I'd prefer an API that encourages rooting too much rather than too little (as long as there's a way to avoid the cost of rooting in the cases where performance matters and we're willing to take more care). The stack scanner helped a lot with this, compared to the old C rooting API.

 * The wiki says "Functions that return gcthing pointers should return them raw, as JSObject*. But the call site should immediately store the value into a RootedVar". That sounds bad for porting our code to the new JSAPI: as I understand it, code that currently stores the value into a JSObject* or jsval, and relies on the stack scanner, will still compile fine but will be buggy. It'd probably be safer if it returned a new type that triggered compiler errors when accidentally storing into an unsafe raw pointer, then we can manually find and fix all the cases that need to use RootedVar.

E.g. hypothetically maybe JSAPI functions could return an UnrootedVar<JSObject> (just a simple wrapper around JSObject*), set up so you could write
  js::RootedVarObject obj(cx, JS_NewObject(cx, ...));
or
  JSObject* obj = JS_NewObject(cx, ...).RawPointer();
but you wouldn't be allowed to write
  JSObject* obj = JS_NewObject(cx, ...);

 * It sounds like RootedVar will be used a lot more frequently than Root. It seems backwards for the more common type to have the much longer name.
Depends on: 754664
(In reply to Philip Taylor from comment #3)
> Some initial random thoughts from the perspective of embedding SpiderMonkey
> (originally 1.5, now 1.8.5) as the scripting engine for a game (with code at
> http://trac.wildfiregames.com/browser/ps/trunk/source/scriptinterface and
> thereabouts):
>
>  * C++ is good.

Yay!

>  * We have a CScriptValRooted, which is basically a
> boost::shared_ptr<jsval>. We heap-allocate the jsval and call
> JS_Add[Named]ValueRoot when constructing the CScriptValRooted, and the
> shared_ptr destructor calls JS_RemoveValueRoot, and the shared_ptr copy
> constructor does thread-safe reference counting. (If I remember correctly,
> JS_AddValueRoot was pretty slow, so the shared_ptr gave better performance
> than having a non-heap-allocated value and calling JS_AddValueRoot every
> time it was copied). It's unclear how copying RootedVar is expected to work.

Interesting. Relating to your usage, I think the RootedVar/Handle API is organized around the assumption that there is an "owner" of the value, which would be a RootedVar, and every other place that sees that value would use a Handle. Does that fit your usage at all? (I would guess not if you are using refcounting, but not sure.)

I'm pretty sure that copying RootedVars is totally fine, though.

>  * Very little of our code does any manual rooting - it mostly either uses
> CScriptValRooted or relies on the conservative stack scanner.
> 
>  * Currently the only case where performance really matters is where we have
> large arrays of jsvals in C++ (in some code that clones values from one
> runtime into another). In that code, we avoid rooting entirely and use
> JS_SetExtraGCRoots and JS_CALL_VALUE_TRACER, so that there's very little
> cost if the GC isn't triggered.

That's an interesting case. It seems JS_SetExtraGCRoots is well designed for that.

>  * In several places we use std::map<JSObject*, ...>, when we recurse
> through an object's properties and want to detect objects we've already
> visited. Presumably that won't work at all if objects can move.

We're planning to adapt SpiderMonkey's hash tables to work with moving GC. Terrence knows more. It seems like it might work to use a handle as the key, and maybe a RootedVar too.

>  * In almost all our code, performance doesn't matter much, and safety
> (avoiding bugs caused by forgetting to root) is more important; I'd prefer
> an API that encourages rooting too much rather than too little (as long as
> there's a way to avoid the cost of rooting in the cases where performance
> matters and we're willing to take more care). The stack scanner helped a lot
> with this, compared to the old C rooting API.
> 
>  * The wiki says "Functions that return gcthing pointers should return them
> raw, as JSObject*. But the call site should immediately store the value into
> a RootedVar". That sounds bad for porting our code to the new JSAPI: as I
> understand it, code that currently stores the value into a JSObject* or
> jsval, and relies on the stack scanner, will still compile fine but will be
> buggy. It'd probably be safer if it returned a new type that triggered
> compiler errors when accidentally storing into an unsafe raw pointer, then
> we can manually find and fix all the cases that need to use RootedVar.
> 
> E.g. hypothetically maybe JSAPI functions could return an
> UnrootedVar<JSObject> (just a simple wrapper around JSObject*), set up so
> you could write
>   js::RootedVarObject obj(cx, JS_NewObject(cx, ...));
> or
>   JSObject* obj = JS_NewObject(cx, ...).RawPointer();
> but you wouldn't be allowed to write
>   JSObject* obj = JS_NewObject(cx, ...);

Makes sense to me.

>  * It sounds like RootedVar will be used a lot more frequently than Root. It
> seems backwards for the more common type to have the much longer name.

That was my thought as well. It seemed to me that Root<T*> was a perfectly good name for a T* that is a root. Maybe Root(ed)Loc(ation) for the less common concept? Any suggestions on names for the thing returned from JSAPI?
Note that this is going to break Servo. Rust can't use a C++ API.

We're happy to do the work to add a C API on top of the C++ one -- generational GC in Firefox is too important to justify holding it up for Servo's sake -- but we'll need some sort of C API.
The patch in bug 750733 allows a C API by calling e.g. JS_CreateHandleObject and JS_DestroyHandleObject, which must nest appropriately.  Pretty unwieldy except for glue and autogenerated code.
Additional bikeshedding on the names:

I have:

    JSBool mynative(JSContext *cx, unsigned argc, Value *vp)
    {
        CallArgs args = CallArgsFromVp(argc, vp);
        JSObject *obj = makeStuff(cx, &args[0].toObject());
        if (!obj)
            return false;
        vp->setObject(*obj);
        return true;
    }

IIUC, this should now be:

    JSBool mynative(JSContext *cx, unsigned argc, Value *vp)
    {
        CallArgs args = CallArgsFromVp(argc, vp);
        JSObject *obj = makeStuff(cx, RootedVarObject(cx, &args[0].toObject()));
        if (!obj)
            return false;
        vp->setObject(*obj);
        return true;
    }

(or, if you were really nervous, you'd put |obj| into a RootedVarObject too.)

In this example, the RootedVarObject name is deceptive, since I'm just using it to root the value of an expression during the call to makeStuff(); there is no "Var"iable involved.

Unless this example is wrong (which is perfectly possible), it's another vote for s/RootedVar/Root/.
I am only slightly paying attention to this at the edges, but presumably args[0] would be a handle or some other already-rooted type in that case, right?  And maybe it could be downcast to an already-rooted-object type too, right?  I'm not sure RootedVarObject is really needed there.
(In reply to Jeff Walden [:Waldo] (busy, try to prefer other reviewers if possible) from comment #8)
> I am only slightly paying attention to this at the edges, but presumably
> args[0] would be a handle or some other already-rooted type in that case,
> right?  And maybe it could be downcast to an already-rooted-object type too,
> right?  I'm not sure RootedVarObject is really needed there.

args[0] is pulled from the stack, and so is rooted by the stack for the "won't be collected" meaning of root, but it can still be moved. If it moves, then its value on the stack will be updated, but the JSObject* extracted from it won't be unless it's wrapped in a handle of some sort. (caveat: I don't really know what I'm talking about, and I used to have the brain the size of a peanut, but it shrunk.)

The problem is that &args[0].toObject() is an expression that severs the connection to any previous rooting-for-moving. It may as well be js_MakeArrayWithTwoCopiesOf(args[0].toObject()). You get back a relocatable JSObject* that you're passing into a function that takes a Handle<> and could trigger a GC. You need to keep it on some sort of Root* leash so you can find it when it wanders off.

Oh! Or are you saying that CallArgs should automatically root everything? If that's what you mean, that scares me, so I will defer to someone with a clue.
vp passed to a JSNative is a pointer to a spot on the JS stack, which is rooted by the engine.  So everything pointed to in there, within argc range (and a little before, for callee and this), is already rooted.  CallArgs is only used with vp/argc from JSNative calls (or at least if it's not used that way, it's something we should fix, and possibly an actual bug as well).  So any location abstracted by CallArgs is rooted.  So if extracting stuff from CallArgs only extracts handles that work through the usual indirection, where said indirection will do the right thing for incrementally-updated values, I would think you would be fine.

Oh, and my brain is mustard-seed-sized, so it's amateur hour, blind-leading-the-blind around here.  Who are Dunning and Kruger again?
Erm.  What I think maybe I meant more is that your makeStuff method shouldn't be passed a value/object/gcpointer directly, but should be passed something slightly more indirect so that rooting just works, and actual use requires a dereference.  Any way I can make this any muddier?
(In reply to David Mandelin from comment #4)
> Relating to your usage, I think the RootedVar/Handle API is
> organized around the assumption that there is an "owner" of the value, which
> would be a RootedVar, and every other place that sees that value would use a
> Handle. Does that fit your usage at all? (I would guess not if you are using
> refcounting, but not sure.)

The benefit of automatic memory management is largely that we shouldn't have to care about having an owner (and so we won't get dangling pointer bugs due to incorrect ownership logic) - that's as useful when writing in C++ as when writing in JS. In most cases we probably could identify an owner of each value if necessary, but it's easier and safer and usually not significantly slower to use an ownerless API instead (like always using shared_ptr<T> and almost never converting to T*; or equivalently always using RootedVar<T> and avoiding Handle<T>).

> >  * In several places we use std::map<JSObject*, ...>, when we recurse
> > through an object's properties and want to detect objects we've already
> > visited. Presumably that won't work at all if objects can move.

To be less handwavey: This is basically structured clones (with some minor changes), so it should be fine for us to do whatever the real structured clone implementation does, as long as that's supported by the public API.

> We're planning to adapt SpiderMonkey's hash tables to work with moving GC.
> Terrence knows more. It seems like it might work to use a handle as the key,
> and maybe a RootedVar too.

Do you mean it might work to do std::map<Handle, ...>? (I don't see how it can, since the only way to order the keys is by comparing their JSObject* as an integer, and the moving GC means the ordering might change over time, if I'm understanding correctly). I guess you mean something like js::HashMap<Handle, ...> instead, which seems fine once it's adapted to work (assuming js::HashMap can be used through the public API).

> It seemed to me that Root<T*> was a perfectly
> good name for a T* that is a root. Maybe Root(ed)Loc(ation) for the less
> common concept? Any suggestions on names for the thing returned from JSAPI?

Maybe the old Root could be called something like IndirectRoot, given that its purpose is seemingly to separate the rooting logic from the rooted value. But I don't have any good ideas for that or for the returned thing.
(In reply to Philip Taylor from comment #12)
> (In reply to David Mandelin from comment #4)
> > Relating to your usage, I think the RootedVar/Handle API is
> > organized around the assumption that there is an "owner" of the value, which
> > would be a RootedVar, and every other place that sees that value would use a
> > Handle. Does that fit your usage at all? (I would guess not if you are using
> > refcounting, but not sure.)
> 
> The benefit of automatic memory management is largely that we shouldn't have
> to care about having an owner (and so we won't get dangling pointer bugs due
> to incorrect ownership logic) - that's as useful when writing in C++ as when
> writing in JS. In most cases we probably could identify an owner of each
> value if necessary, but it's easier and safer and usually not significantly
> slower to use an ownerless API instead (like always using shared_ptr<T> and
> almost never converting to T*; or equivalently always using RootedVar<T> and
> avoiding Handle<T>).

The only supported usage of the Handle API's on the stack, where there is a definitive owner -- the first place where a unique thing pointer appears on the stack -- and thus a definitive lifetime.  If you need to root thing pointers where the lifetime is not known to be tied exclusively to the stack, then you will need to additionally use the existing rooting APIs, which I'm sure you are already doing. :-)

It's not quite clear from your description, but if CScriptValRooted is only stored on the stack, you may be able to just make it a boost::shared_ptr< RootedVarValue<jsval> >.  If it is also stored on the heap, you could easily make a new CStackValRooted variant for use exclusively on the stack that is automatically coercible and has the same shared_ptr goodness.

> > >  * In several places we use std::map<JSObject*, ...>, when we recurse
> > > through an object's properties and want to detect objects we've already
> > > visited. Presumably that won't work at all if objects can move.
> 
> To be less handwavey: This is basically structured clones (with some minor
> changes), so it should be fine for us to do whatever the real structured
> clone implementation does, as long as that's supported by the public API.
> 
> > We're planning to adapt SpiderMonkey's hash tables to work with moving GC.
> > Terrence knows more. It seems like it might work to use a handle as the key,
> > and maybe a RootedVar too.
> 
> Do you mean it might work to do std::map<Handle, ...>? (I don't see how it
> can, since the only way to order the keys is by comparing their JSObject* as
> an integer, and the moving GC means the ordering might change over time, if
> I'm understanding correctly). I guess you mean something like
> js::HashMap<Handle, ...> instead, which seems fine once it's adapted to work
> (assuming js::HashMap can be used through the public API).

You are quite correct.  Handle is not the right abstraction here: to deal with this case (and indeed any object pointer stored in the program heap) we have a collection of "Barrier" classes.  These classes wrap all pointers stored on the heap so that we can move them.  These are different from RootedVar objects in a number of ways:
  (1) They do not root -- you still need to trace and sweep them normally.
  (2) They are far more performant.

There are 3 "levels" of Barriers that are used for handling subsequently more complex classes of problems:

HeapPtr<T>:
    The general workhorse of heap barriers.  Use this for anything with no other special constraints.

RelocatablePtr<T>:
    The GC internally has a cache of pointers to values and pointers to object pointers.  If you or the system realloc, memcpy, free, or otherwise move a value or pointer, then the GC will walk off an invalid pointer.  This would be bad.  Use a RelocatablePtr to protect against this.

EncapsulatedPtr<T>:
    This class should only be used in the hopefully rare cases where the GC should not move a pointer without telling the pointer's owner that it is doing so.  This class does not maintain the GC's invariants itself, so it is vital that users of this class understand the implications and insert correct annotations around all updates into instances of this type.

To (finally) answer your question, to have a hash_map<JSObject *, JSObject *>, you would need to make this a hash_map<EncapsulatedPtrObject, RelocatablePtrObject> and add a manual write barrier around insertions into the map.  This type of write barrier includes a callback from the GC so you can update the pointer in the map safely and a checking routine for the GenerationalGC verifier.  This is abstracted for our builtin HashMap to about 6 lines of boilerplate (and I may be able to make it even easier in the future).  Implementing something equivalent for hash_map should not be too hard.

Our HashMap class is in js/public/HashTable.h, so I certainly hope it is exposed publicly somewhere.

I should probably go put all of this on our wiki.
No longer blocks: 750733
Depends on: 756823
(In reply to Terrence Cole [:terrence] from comment #13)
> I should probably go put all of this on our wiki.

I would really like to have comments in the code that explain the rules fully. I don't find the wiki easier to read or search than comments, and comments seem more likely to stay up to date.
(In reply to Philip Taylor from comment #12)
> (In reply to David Mandelin from comment #4)
> > Relating to your usage, I think the RootedVar/Handle API is
> > organized around the assumption that there is an "owner" of the value, which
> > would be a RootedVar, and every other place that sees that value would use a
> > Handle. Does that fit your usage at all? (I would guess not if you are using
> > refcounting, but not sure.)
> 
> The benefit of automatic memory management is largely that we shouldn't have
> to care about having an owner (and so we won't get dangling pointer bugs due
> to incorrect ownership logic) - that's as useful when writing in C++ as when
> writing in JS. In most cases we probably could identify an owner of each
> value if necessary, but it's easier and safer and usually not significantly
> slower to use an ownerless API instead (like always using shared_ptr<T> and
> almost never converting to T*; or equivalently always using RootedVar<T> and
> avoiding Handle<T>).

Makes sense. I figured as much, but I thought it wouldn't hurt to ask if the easy and obvious thing turned out to work. :-)

For refcounting GC holders, I think it would work to use RootedVar everywhere, although that would be somewhat inefficient because you'd be creating a root N times for the same underlying object.

An interesting question is how this would work in the V8 handle API. I think basically you would use a refcounting smart pointer class, except that you'd have to arrange for it to call Persistent:New to get started, and then in the dtor it would call Dispose instead of delete.

In the SpiderMonkey, you could do the same thing except with AddRoot and RemoveRoot. So I guess you wouldn't even use the handle/root API for that case.

> > >  * In several places we use std::map<JSObject*, ...>, when we recurse
> > > through an object's properties and want to detect objects we've already
> > > visited. Presumably that won't work at all if objects can move.
> 
> To be less handwavey: This is basically structured clones (with some minor
> changes), so it should be fine for us to do whatever the real structured
> clone implementation does, as long as that's supported by the public API.
> 
> > We're planning to adapt SpiderMonkey's hash tables to work with moving GC.
> > Terrence knows more. It seems like it might work to use a handle as the key,
> > and maybe a RootedVar too.
> 
> Do you mean it might work to do std::map<Handle, ...>? (I don't see how it
> can, since the only way to order the keys is by comparing their JSObject* as
> an integer, and the moving GC means the ordering might change over time, if
> I'm understanding correctly). I guess you mean something like
> js::HashMap<Handle, ...> instead, which seems fine once it's adapted to work
> (assuming js::HashMap can be used through the public API).

Ah, good point. Well, that does seem like a problem. V8 has a specialized hash table for object keys (which is needed to implement weak maps), and it turns out they create a hidden identify value on demand on objects that get used as keys.

I think the other main solution is to have a specialized hash table that rekeys when stuff gets moved, which is what Terrence has done.

> > It seemed to me that Root<T*> was a perfectly
> > good name for a T* that is a root. Maybe Root(ed)Loc(ation) for the less
> > common concept? Any suggestions on names for the thing returned from JSAPI?
> 
> Maybe the old Root could be called something like IndirectRoot, given that
> its purpose is seemingly to separate the rooting logic from the rooted
> value. But I don't have any good ideas for that or for the returned thing.

The IndirectRoot got taken out, I think.
Whiteboard: [js:t]
OK, gotta keep this moving forward. My sense is that the handle/root API is somewhat implementer-oriented, and has more concepts than the V8 handle API. And it would be nice to have handle API compat since it moves us toward C++ API compat for the future. Brian, how hard would it be to switch our existing stuff to the V8 handle API?

If that seems like too much, I suggest trying to write up good solid documentation for the API. If that can be done and the result makes sense to non-GC people and JSAPI users, then it's worth a try.
(In reply to David Mandelin from comment #16)
> OK, gotta keep this moving forward. My sense is that the handle/root API is
> somewhat implementer-oriented, and has more concepts than the V8 handle API.
> And it would be nice to have handle API compat since it moves us toward C++
> API compat for the future. Brian, how hard would it be to switch our
> existing stuff to the V8 handle API?
> 
> If that seems like too much, I suggest trying to write up good solid
> documentation for the API. If that can be done and the result makes sense to
> non-GC people and JSAPI users, then it's worth a try.

Can you explain your sense in a bit more detail?  V8 has handles and handle scopes, we have handles and rooted variables.  Discussions about reference counted heap things don't really belong here, since handles in both engines are around rooting things with stack based lifetimes.
So is this bug just about the stack-based rooting setup?  The obvious question I had, reading through the docs, is how I handle things that live on the heap, might get moved, but I am NOT trying to make into roots (though I might trace them from trace class hooks on other objects).

As far as the stack goes, the proposed API seems OK, modulo the naming of RootedVar/Root and the unsafe return value thing (both already mentioned in comment 3).
(In reply to Boris Zbarsky (:bz) from comment #18)
> So is this bug just about the stack-based rooting setup?  

Yes.

> The obvious
> question I had, reading through the docs, is how I handle things that live
> on the heap, might get moved, but I am NOT trying to make into roots (though
> I might trace them from trace class hooks on other objects).

HeapPtr.  Read comment 13 and the docs in gc/Barrier.h for the full details.
Unfortunately we don't have anything like HeapPtr that can be used outside the JS engine. We'll need to make something like that for moving GC.
(In reply to Brian Hackett (:bhackett) from comment #17)
> (In reply to David Mandelin from comment #16)
> > OK, gotta keep this moving forward. My sense is that the handle/root API is
> > somewhat implementer-oriented, and has more concepts than the V8 handle API.
> > And it would be nice to have handle API compat since it moves us toward C++
> > API compat for the future. Brian, how hard would it be to switch our
> > existing stuff to the V8 handle API?
> > 
> > If that seems like too much, I suggest trying to write up good solid
> > documentation for the API. If that can be done and the result makes sense to
> > non-GC people and JSAPI users, then it's worth a try.
> 
> Can you explain your sense in a bit more detail?  V8 has handles and handle
> scopes, we have handles and rooted variables.  Discussions about reference
> counted heap things don't really belong here, since handles in both engines
> are around rooting things with stack based lifetimes.

It's subjective. I see the user-visible "moving parts" of V8 handle API as HandleScope and Handle, with Handle also dividing into Local and Persistent. I see the moving parts of SM handle API as Root, Handle, AddRoot/RemoveRoot, HeapPtr, all being pretty much separate concepts, and then either a sharp edge for returned pointers or an Unrooted type.

Reference counted heap things do matter, because some users may need them, so we need to know that APIs can support them. My analysis was that they were of equal complexity in the 2 APIs.
Depends on: 790436
Depends on: 791322
No longer depends on: 754664
These points all seem to have been more or less addressed by now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.