Closed Bug 863235 Opened 7 years ago Closed 4 years ago

Add a method in nsICycleCollectorListener to retrieve a JS object at a given address

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ochameau, Unassigned)

References

(Blocks 1 open bug)

Details

It would be super handy to retrieve a javascript object at a given address reported by the cycle collector listener.
Currently, in some crazy prototype addon, I'm using jsctypes to call tons of JSAPI method in order to inspect "JS Object (OBject)" being reported while using nsICycleCollectorListener.
It would be just easier to get a reference to this object instead.
We would have to be extremelly carefull to not leak any such object, but would allows us to do any inspection work we can imagine.

Having said that, it also sounds scary, as the JS object can be in any compartment. And it may not possible, or not reasonably possible...

gabor suggested he may be interested in looking into this.

Here is a testcase, given that we implement a "jsval nsICycleCollectorListener.getJSObject(in string address);":

let obs = {
noteRefCountedObject: function (aAddress, aRefCount, aObjectDescription) {
  if (aObjectDescription == "JS Object (Object)") {
    let js = listener.getJSObject(aAddress);
    dump(aAddress+"= "+JSON.stringify(js)+"\n");
  }
},
noteGCedObject: function (aAddress, aMarked, aObjectDescription) {
  if (aObjectDescription == "JS Object (Object)") {
    let js = listener.getJSObject(aAddress);
    dump(aAddress+"= "+JSON.stringify(js)+"\n");
  }
},
noteEdge: function (aFromAddress, aToAddress, aEdgeName) {},
describeRoot: function (aAddress, aKnownEdges) {},
describeGarbage: function (aAddress) {}
};
let listener = Cc["@mozilla.org/cycle-collector-logger;1"]
                 .createInstance(Ci.nsICycleCollectorListener);
listener = listener.allTraces();
let window = Services.wm.getMostRecentWindow('navigator:browser');
let utils = window.QueryInterface(Ci.nsIInterfaceRequestor)
                  .getInterface(Ci.nsIDOMWindowUtils);
utils.garbageCollect(listener);
while(!listener.processNext(obs)) {}
Duplicate of this bug: 729176
Converting from string to a JS object shouldn't really be on the CC listener interface.
I'm not sure how useful the resulting JS object will be, as I think you'll have to deal with all of the security wrapper stuff.  And you can't really just avoid that, as you have to worry about things like content-defined getters on objects you are probing. bholley might have a better idea of whether you could extract anything useful from a reference to an arbitrary object.
So my plan is to use raw addresses on JS side only for logging purposes... And for everything else we need a JSObject/jsval based API, especially when passing objects around between different APIs (back and forth from JS to c++).

Since all this happening from a chrome compartment, wrappers are quite transparent unless you are dealing with some native objects where you have xrays ofc. But currently these addresses are used like ids in JS so no operation is really done on them. On the c++ side we can always do the dirty work of unwrapping or doing what is the right to do. But I will look into this soon, I might got something wrong... I just want to find a way to avoid dealing with raw addresses from JS.
I'm fine with other kind of API, like the one described in bug 729176!
I was just wondering if it is really worth always passing a js reference, I'm expecting it to be a costly action and we will most likely only inspect some very specific objects. If that's a costly operation, an "on demand" API would make more sense.

Otherwise about wrappers... I don't think it's gonna be a major issue as we are in chrome code, we shouldn't have much limited view of the object. We may miss something, but in most other cases, we will be able to inspect easily the object.
In its current state nsICycleCollectorListener isn't really usefull for debugging JS object, you have extremelly few javascript information out of it. At best, just property names. Getting object reference is, to my mind the easier way to inspect the object.
For example if you have a "JS Object (Function)", with this API, you will be able to stringify it! Then if you have a "JS Object (Object)" you could get its keys, properties, ... and also do some insteresting query like Cu.isDeadWrapper(jsref), ...
This seems totally incompatible with moving GC.
How so? if you know nothing has been moved (gc hasn't run) it should be ok.
(In reply to Bobby Holley (:bholley) from comment #6)
> This seems totally incompatible with moving GC.

What are you referring to with 'this' exactly? I can totally imagine an API that gives some information about the work of the GC in a way that is sane from JS. Even if this info collecting mode has to be turned in explicitly (because it comes with some/huge performance cost). The point here I think that we need a tool that can help to tell, why is this and that object not GCed/CCed.

Alex hacking in the engine with nsICycleCollectorListener and jsctypes and does some dirty tricks to get all the info he can get. I'm sure we can do something better on C++ side and expose some API to JS he can use better/easier for collecting useful data. There is a high demand for a tool like this, obviously the current approach is not the way to do it, but the fact that there is no alternative tool right now is a good reason to work on an API for this.
Handing out actual references to JS objects should be compatible with moving GC.  It is true that a string to C++ mapping won't really work, because the address of things can change during GC, so the operation maybe fail, or return a different object.
(In reply to Olli Pettay [:smaug] from comment #7)
> How so? if you know nothing has been moved (gc hasn't run) it should be ok.

Well, my initial reaction was that storing object pointers as strings in JS and then trying to rehydrate those into pointers is inherently problematic, because JS code can always GC. But maybe there's some special sauce for CC listeners that guarantees that you can run some JS without a GC taking place?
(In reply to Bobby Holley (:bholley) from comment #10)
> Well, my initial reaction was that storing object pointers as strings in JS
> and then trying to rehydrate those into pointers is inherently problematic,
> because JS code can always GC. But maybe there's some special sauce for CC
> listeners that guarantees that you can run some JS without a GC taking place?

No, you are totally right.  The listeners generate their data without running a GC, but once we're running JS, that's basically just replaying the listeners data, and anything can happen.  It would be possible to do stuff like tenure all objects we hand out a stringified version of, or something silly like that, so they won't move, but it would be better to just hand out object references.  That also avoids other problems like the original object going away, and another totally different object being allocated in the same address.
We should not hand out all the references, at least in case CC has detected some JS objects 
are garbage.
Yeah, given that the main use is to examine things that are still around, that makes sense.  Olli also points out that we have to make sure the listener doesn't keep things alive forever, so the listener iterator thing will have to purge all references it holds once it is reaches the end.
(In reply to Bobby Holley (:bholley) from comment #10)
> (In reply to Olli Pettay [:smaug] from comment #7)
> > How so? if you know nothing has been moved (gc hasn't run) it should be ok.
> 
> Well, my initial reaction was that storing object pointers as strings in JS
> and then trying to rehydrate those into pointers is inherently problematic,
> because JS code can always GC. But maybe there's some special sauce for CC
> listeners that guarantees that you can run some JS without a GC taking place?

No, we don't expose such a tool. With very few exceptions, any call into JS_* can trigger a GC. Of course, any running JS code can also trigger GC at any time.

Please note that with generational GC, we will be allocating objects from all compartments and of all types in the same store, with no alignment constraints. Any "pointer" interface that does not support relocation will be *monstrously* unsafe.

We've been working very hard for the last year and a half to give the GC more control of the pointers into its heap. Attaining complete pointer control is a requirement for us to implement non-trivial GC algorithms, such as generational GC.

That said, there is probably a safe way to do what you want. However, I don't know what it is you want. Could you please explain what exactly you're trying to accomplish at a higher level?
(In reply to Terrence Cole [:terrence] from comment #14)
> That said, there is probably a safe way to do what you want. However, I
> don't know what it is you want. Could you please explain what exactly you're
> trying to accomplish at a higher level?

Being able to inspect and print more information about a given JS object.
That can be done long time after we actually queries nsICycleCollector listener.
Here is one concrete usecase:
1) I do a first snapshot of all JS OBject for a given compartment.
2) I redo the same snapshot later
3) I compute those two snapshot to extract all new JS object that have been allocated between 1 and 2
4) I display that list to the user, with edges names between objects. So far we don't know much about js object, they are all labeled JS Object (Object), JS Object (Function), or if we are lucky JS Object (Function - myFunctionName). We do want more information here, but only on demand, as saving all objects additional description during 1 and 2 would produce huge snapshots.
I've opened bug 850026 that would give a super usefull information to help the developer to know what is a given object. But it would also be interesting to inspect the object.

Do that help?

Again, I can easily understand that's just not possible, but it's definitely worth knowing what we can eventually fetch from the GC/CC...
I've done something kind of like this as an experiment -- I added an accessor that returned the entire JS heap as an array of JS objects. It "worked", but caused problems when some interior objects became visible.

If you return actual references to the objects, you're going to keep everything alive, which would totally obscure the information you're trying to gather (as in, all the temporary objects from your first snapshot will appear to be leaked.)

So if you went that route, you'd probably want to put all of the objects into a weak map, and then expose something that allows iterating over the weak map. It would screw with GC timing, but that doesn't seem like a big deal.

We don't have any notion of object identity, so once we start moving things around during GC, anything based on pointer values is going to break unless we add something else to the engine. As in, we could record all moves and deletes in a buffer so you could "replay" the buffer over your first snapshot, to update the pointers. But that would be basically reimplementing a weak map in a weird way, so it seems pointless unless the weak map solution doesn't work for some reason. (Which is very well might not!)
For a snapshot, I'd assume that you would iterate over objects of interest and record data about them, then not hold onto the objects themselves.  I guess you could use some kind of weak map to see what objects are still alive later in the snapshot.
In my current tool (which is unsafe and surely broken in some cases), I just store JS objects addresses on snapshots, and only try to inspect object that are still alive on the last snapshot.
So may be we can just keep JS objects identifiers on snapshots and only fetch metadata on the last one, when we have a list of 'still alive' objects.
(In reply to Alexandre Poirot (:ochameau) from comment #18)
> In my current tool (which is unsafe and surely broken in some cases), I just
> store JS objects addresses on snapshots, and only try to inspect object that
> are still alive on the last snapshot.

Right, and that'll work until we start moving objects around. Which we will, pretty soon.

> So may be we can just keep JS objects identifiers on snapshots and only
> fetch metadata on the last one, when we have a list of 'still alive' objects.

Yes, there's no need to store metadata, since you only want the metadata at the time of the 2nd snapshot. Identifying the 'still alive' objects is the tricky bit, since we don't have any notion of object identity that can survive pointer values changing. That's where the weak map would work, since the GC system already has to update those pointer values.
Perhaps the best option here is to just disable moving collections while the map is alive.
The stated use case seems to be solved by UbiNode now. In any case, I am WONTFIXing this, with prejudice: the requested interface is not safely implementable, and is therefore not something we are willing to implement.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.