Closed Bug 960786 Opened 10 years ago Closed 10 years ago

SpiderMonkey should provide an introspection API for memory heap analysis (ubi::Node)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jimb, Assigned: jimb)

References

Details

Attachments

(1 file, 6 obsolete files)

SpiderMonkey should define an abstract base class that provides a generic nodes-and-edges view of any sort of heap object, with introspection methods, for use in implementing developer tools.

We need to build tools that help web developers understand their content JS's memory usage. Since the web uses JS to access the web platform, these tools need to take into account the memory Gecko uses, not simply JS itself.

We can visualize a Firefox heap as a graph of different types of nodes, but the wide variety of nodes makes it difficult to implement complex algorithms (like computing objects' retained sizes by computing a dominator tree, say) on that graph: one would need to be an expert on both FF's data types and the algorithm. It would be a shame if these algorithms needed to be updated each time someone introduced a new kind of object to FF.

To decouple these problems, SpiderMonkey should define a type, which I'll call ubi::Node (for "ubiquitous node") that represents a reference to any type of node in the heap graph: strings, JSObjects, Shapes, BaseShapes, and so on; but also to non-SpiderMonkey types like XPCOM objects, nsINodes, and so on. 

SpiderMonkey itself should define an abstract base class with methods to obtain the name of the referent's type; how much memory it's using; enumerate its outgoing edges (with ubi::Nodes for their referents); and provide whatever other kinds of metadata we can come up with. Then concrete implementations scattered across Firefox of this abstract base class will implement these for all the different kinds of nodes our tools observer.

Once we've implemented concrete ubi::Node subclasses for the essentials (JS types; DOM nodes), the developer tools team can begin implementing web developer tools using ubi::Node. Then, domain experts can extend ubi::Node to understand more types (and edges that lead to them!) in parallel with tool development.

The analysis algorithms can be implemented on top of ubi::Node by people with only a general understanding of the specific kinds of nodes involved; while domain experts can add detail and depth to our tools' understanding of the heap without knowing the details of the algorithms that consume them.
Perhaps obvious, but it would be good to make sure that this class can easily use all the existing cycle collector machinery for tracing the heap.
What is the plan for coalescing nodes into objects that represent concepts a web developer would understand?  That seems to be where our mental progress on this has faltered in the past.
Yeah, the drawback of the CC tracing machinery is that it seems a lot of Gecko internals.
seems -> sees
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #2)
> What is the plan for coalescing nodes into objects that represent concepts a
> web developer would understand?  That seems to be where our mental progress
> on this has faltered in the past.

The right way to "collapse" internal structure depends on the specific case, and on the analysis, so I don't think there is a general principle to be discovered. We should figure out answers for some cases we know about, to get the feel of the problem, and then iterate. However, solutions will generally entail adding methods to ubi::Node that tell its consumers when to collapse things for web developer presentation, and how.

There are three analyses we know we're going to want:

- Heap census, where we produce counts and total sizes for categories of objects.

- Finding reference paths, where we explain to the JS developer how the object is held alive. This should produce a valid JS expression where possible. (It is not always possible, even in pure JS. For example, there is no JavaScript expression that one can evaluate in the global scope that can refer to a local variable captured in a nested function's environment. So perfection simply isn't within reach.)

- Retained object count and size: how much would go away if we deleted this object? This is a very intuitive and flexible measure of an object's "size", as it includes the object's private structure. To find retained size, we compute a dominator tree for the graph, showing which nodes can be reached only through which other nodes.


Case 1: DOM nodes and their JS shadows:

- Heap census should coalesce a DOM node and its shadow into a single object. A DOM node without a shadow is just a smaller DOM node.

- Reference paths should also coalesce a DOM node and its shadow. The edge from the shadow to the DOM node should simply be omitted.

- For retained size, it's more complicated, but I think it works out. I believe the DOM node has a pointer to its shadow; and shadows certainly point to the DOM nodes they represent. This means that they create a little cycle. Strongly connected subgraphs always appear in a dominator tree as siblings, or as immediate parents and children. This means that the presentation can easily coalesce them into a single node.


Case 2: SetTimeout callbacks, which are held by a linked list of nsTimeouts hung off the nsGlobalWindow. The nsTimeouts are themselves cycle-collected.

- Heap census should coalesce the JSObject, the nsGlobalWindow, and all its nsTimeouts.

- Reference paths will need to include non-JS edge names: perhaps "[[SetTimeout callback 17:42]].[[function captured local variables]].foo", presented as edges leaving the global object directly.

- For retained size, again, since nsTimeouts refer to their nsGlobalWindow, they form strongly connected subgraphs, and will appear together in the dominator tree. The tool should again be able to coalesce these.


It seems to me we can tell what behavior we'd like to see. I'll try to figure out how to support their implementation in the ubi::Node API.
Blocks: 961323
Just to get you on a shared terminological grounds, the standard Gecko name for the "JS shadow" is a "JS reflector" or (confusingly with similar JS engine concepts) a "wrapper".  A reflector always points to its reflectee, but the reverse is not always true.  If the reflectee points to its reflector, that is called "wrapper preservation", because the underlying object is ensuring that the reflector won't get GCed.

In addition to the stuff you've mentioned, one of the more annoying things for me to deal with when interpreting a raw view of the heap is various internal details of the JS engine, such as shapes (which I think can have data stuck on them in some cases, plus various getter/setter stuff I don't understand), various string stuff, and cross-compartment wrappers (CCWs).
"Reflectee" is not a standard term, I made that up just now.  I'm not sure what the right word is.
> - Reference paths will need to include non-JS edge names: perhaps
> "[[SetTimeout callback 17:42]].[[function captured local variables]].foo",
> presented as edges leaving the global object directly.
To a possible extent, it'd be great if new names weren't invented and the specs names were used. They aren't always glamourous, but at least, they're in a single document with unambiguous definitions.
For instance, the ES spec defines an internal [[Scope]] property for functions, etc.
In some occasions, it might make sense to abstract a couple of details out, I'm thinking of the difference between Window and WindowProxy.
(In reply to David Bruant from comment #8)
> To a possible extent, it'd be great if new names weren't invented and the
> specs names were used. They aren't always glamourous, but at least, they're
> in a single document with unambiguous definitions.
> For instance, the ES spec defines an internal [[Scope]] property for
> functions, etc.

Absolutely. I'm just riffing here. ubi::Node will have methods dedicated to finding JS-legible representations of actual paths, and those will try to use standard or documented notation whenever possible.
(In reply to Andrew McCreight [:mccr8] from comment #6)
> Just to get you on a shared terminological grounds, the standard Gecko name
> for the "JS shadow" is a "JS reflector" or (confusingly with similar JS
> engine concepts) a "wrapper".

Perfect --- I had intended to use the standard terminology, but I'd forgotten about "reflector". Thanks!
Work in progress, just for fun.
Assignee: nobody → jimb
Status: NEW → ASSIGNED
Depends on: 944176
This one builds; adds a JS shell function findPath to exercise ubi::Node in a breadth-first search; and includes tests!

Note that this depends on the patches in bug 944176; without that, you'll get lovely double-frees.


(Note especially the last test in the new js/src/jit-test/tests/heap-analysis/findPath.js, which shows findPath discovering a reference to an object in a function's captured scope.)
Attachment #8393823 - Attachment is obsolete: true
Jotting down a potential issue with the ubi::Node implementation while it's on my mind:

The intention is that every specialization of ubi::Concrete will have a distinct concreteTypeName object, so that if node->typeName() is pointer-equal to a given Concrete<T>::concreteTypeName, then we can know for a fact that node refers to a T. But does C++ promise that distinct const jschar arrays indeed have distinct addresses? Mightn't the linker unify immutable arrays whose contents are equal?
(In reply to Jim Blandy :jimb from comment #13)
> The intention is that every specialization of ubi::Concrete will have a
> distinct concreteTypeName object, so that if node->typeName() is
> pointer-equal to a given Concrete<T>::concreteTypeName, then we can know for
> a fact that node refers to a T. But does C++ promise that distinct const
> jschar arrays indeed have distinct addresses? Mightn't the linker unify
> immutable arrays whose contents are equal?

It does not promise that and the linker indeed might do that.
(In reply to Nathan Froyd (:froydnj) from comment #14)
> (In reply to Jim Blandy :jimb from comment #13)
> > The intention is that every specialization of ubi::Concrete will have a
> > distinct concreteTypeName object, so that if node->typeName() is
> > pointer-equal to a given Concrete<T>::concreteTypeName, then we can know for
> > a fact that node refers to a T. But does C++ promise that distinct const
> > jschar arrays indeed have distinct addresses? Mightn't the linker unify
> > immutable arrays whose contents are equal?
> 
> It does not promise that and the linker indeed might do that.

Thanks! May I press you for a proper solution?

Statement of problem:

A ubi::Node is a discriminated generic pointer type, which you can extend to be able to support whatever types you want. If you specialize the ubi::Concrete template for some type T such that ubi::Concrete<T> implements ubi::Base, then ubi::Nodes can point to T's.

In particular, you're supposed to be able to ask n.is<T>() and have that be true if n points to a T, false otherwise. The way I "solved" this was to require ubi::Concrete<T> specializations to have two members: a static data member we'll call TAG, and a virtual function overriding the one in Base that returns TAG's address, which we'll call GETTAG. Then, is<T>() is simply:

template<typename T>
bool is() const { return GETTAG() == Concrete<T>::TAG }

Because I already had a virtual function typeName to return the name of the referent type as a statically allocated string, I figured I'd make that GETTAG, and just require it to return the address of the string named Concrete<T>::concreteTypeName.

But if two concreteTypeNames happen to have the same text, then their tags will be equal. Fail.

Does it matter if I declare it:

template<> const jschar *Concrete<JSObject>::concreteTypeName = u"JSObject";

versus

template<> const jschar  Concrete<JSObject>::concreteTypeName[] = u"JSObject";

That is, I understand that the linker may unify string literals; but is it permitted to unify statically allocated arrays?
Flags: needinfo?(nfroyd)
(In reply to Jim Blandy :jimb from comment #15)
> Does it matter if I declare it:
> 
> template<> const jschar *Concrete<JSObject>::concreteTypeName = u"JSObject";
> 
> versus
> 
> template<> const jschar  Concrete<JSObject>::concreteTypeName[] =
> u"JSObject";
> 
> That is, I understand that the linker may unify string literals; but is it
> permitted to unify statically allocated arrays?

I think the linker can do that, just like the linker can merge identical function bodies.  Of course, the latter optimization is subject to certain constraints, due to the requirement of pointers to different functions comparing different.  (I can't locate that verbiage in the spec right now, though.)

I suspect there is similar verbiage requiring that pointers to different objects compare different.  And so unifying statically allocated (constant!) arrays would be subject to the same constraints as unifying functions.  Since the constraint on merging functions is essentially "you can't merge functions when one of them has its address taken", a similar constraint would apply to arrays.  Therefore, since you are taking pointers to the arrays to do your comparisons, you *ought* to be safe.
Flags: needinfo?(nfroyd)
Comment on attachment 8394569 [details] [diff] [review]
WIP v2: Revised ubi::Node, this time using virtual dispatch to support embedding types.

Review of attachment 8394569 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/UbiNode.h
@@ +328,5 @@
> +
> +  public:
> +    // This edge's name.
> +    //
> +    // The storage is owned by this Edge, and will be freed when ARF ARF ARF.

http://upload.wikimedia.org/wikipedia/en/f/f8/Internet_dog.jpg
Comment on attachment 8394569 [details] [diff] [review]
WIP v2: Revised ubi::Node, this time using virtual dispatch to support embedding types.

Review of attachment 8394569 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/UbiNode.h
@@ +316,5 @@
> +
> +
> +// Edge is the abstract base class representing an outgoing edge of a node.
> +// Edges are owned by EdgeRanges, and need not have assignment operators or copy
> +// constructors.

Should the assignment operator and copy constructor be MOZ_DELETE'd?

::: js/src/jit-test/tests/heap-analysis/findPath.js
@@ +11,5 @@
> +function C() {}
> +C.prototype.obj = {};
> +var c = new C;
> +Match.Pattern([{node: {}, edge: "type"},
> +               {node: Match.Pattern.ANY, edge: "type_proto"},

Would "__proto__" be a better name for these types of edges?

::: js/src/shell/js.cpp
@@ +4518,5 @@
>  "  If any references are found by the conservative scanner, the references\n"
>  "  object will have a property named \"edge: machine stack\"; the referrers will\n"
>  "  be 'null', because they are roots."),
>  
> +    JS_FN_HELP("findPath", FindPath, 1, 0,

Shouldn't |nargs| be 2? Or does target default to the global?

::: js/src/shell/jsheaptools.cpp
@@ +649,5 @@
> +bool
> +FindPath(JSContext *cx, unsigned argc, jsval *vp)
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    if (argc < 1) {

Reading below, it looks like you always expect both a |start| and a |target| (ie no default |target| like I asked above) so shouldn't this check be |argc < 2|?

@@ +673,5 @@
> +    // The nodes and edges of the path --- should we find one. The path is
> +    // stored in reverse order, because that's how it's easiest for us to
> +    // construct it:
> +    // - edge[i] is the name of the edge from node[i] to node[i-1].
> +    // - edge[0] is the name of the edge from node[0] to the target.

egde*s* and node*s* are the vars in the code, I assume you meant to match them.

@@ +719,5 @@
> +                const ubi::Edge &edge = range->front();
> +                const ubi::Node referent = edge.referent;
> +
> +                // Have we reached this edge's referent before?
> +                HashMap<ubi::Node, BackEdge>::AddPtr a = black.lookupForAdd(referent);

Maybe it's just me, and maybe this is beyond nit'ing, but I think typedef'ing |HashMap<ubi::Node, BackEdge>| would make this more readable.
I've been thinking this over a bit, and isn't it kinda crazy to make a |HashMap| for |black|? Won't we be running on memory constrained fxos devices where profiling memory shouldn't cause a memory overhead?

It seems like we could save a bit of memory if we could attach a random void * pointer directly to a |ubi::Node| and then we wouldn't have to have the empty space overhead that |HashTable|s imply.

I'm running into this in my dominators stuff as well where I am keeping track of even more meta data needed for the algorithm in various hash tables.

Am I worrying about nothing? Premature?
Well, you're not guaranteed to get the same ubi::Node instance for each pointer to a given object. ubi::Nodes are really value types, not object types. There's no overarching WeakMap like the one that helps us find the extant Debugger.Object instance for a given referent. So you could store a void * in a ubi::Node, but you wouldn't find it when you encountered that referent again.

Each ubi::Node is completely independent.
Attached file make check-style error output (obsolete) —
FYI: Jim, the style checker has a couple complaints about includes.
(In reply to Nick Fitzgerald [:fitzgen] from comment #21)
> FYI: Jim, the style checker has a couple complaints about includes.

/me is so disappointed in his alphabetization skills
Addresses fitzgen's comments, and the JS style checker's complaints.
Attachment #8394569 - Attachment is obsolete: true
Attachment #8397446 - Attachment is obsolete: true
(In reply to Nick Fitzgerald [:fitzgen] from comment #18)
> Should the assignment operator and copy constructor be MOZ_DELETE'd?

Sure; done.

> Would "__proto__" be a better name for these types of edges?

It would, but for now we're just going along with whatever JS_TraceChildren hands us; we have no control. When we actually specialize for JSObject (which we will!), then we can do better, and __proto__ seems entirely reasonable.

> Shouldn't |nargs| be 2? Or does target default to the global?

Yes - fixed this and the error-checking code.

> egde*s* and node*s* are the vars in the code, I assume you meant to match
> them.

Fixed - thanks.

> Maybe it's just me, and maybe this is beyond nit'ing, but I think
> typedef'ing |HashMap<ubi::Node, BackEdge>| would make this more readable.

Oops, I missed this comment. Yes, I agree.
The ScopedDeletePtr support (bug 944176) has landed.
Err, "Move construction and assignment support for ScopedDeletePtr" is what has landed.
What else does this patch need before someone reviews it and it can land?
Jim, FYI, v4 of this patch no longer compiles. I think a header file was renamed or removed.

/Users/fitzgen/src/spidermonkey/js/src/vm/UbiNode.cpp:15:10: fatal error: 'js/Tracer.h' file not found
#include "js/Tracer.h"
See bug 807168.  I think it is "TracingAPI.h" now.
Attached patch v5 (obsolete) — Splinter Review
Rebased to work with the JSTracer changes.

https://tbpl.mozilla.org/?tree=Try&rev=eaadf51a24ee
Attachment #8403665 - Attachment is obsolete: true
Attachment #8403665 - Flags: review?(sphink)
Attachment #8410427 - Flags: review?(sphink)
Comment on attachment 8410427 [details] [diff] [review]
v5

Review of attachment 8410427 [details] [diff] [review]:
-----------------------------------------------------------------

Basically r=me, except either I want you to convince me that you really need to be using ContextAllocPolicy, or I want to see it again after you remove the cx stuff.

::: js/public/UbiNode.h
@@ +15,5 @@
> +
> +#include "js/HashTable.h"
> +#include "js/TypeDecls.h"
> +
> +// js::ubi::Node

This is exported in js/public. Why isn't this JS::ubi::Node?

@@ +37,5 @@
> +// dynamically checked, properly typed pointers to the original objects via the
> +// as<T> method, or generate descriptions of the referent itself.
> +//
> +// ubi::Node instantiations are lightweight (two-word) value types. Instances:
> +// - compare equal when they refer to the same object;

when, or when and only when? If an object were to be freed and a new object allocated in its place, would ubi::Node equality say they're the same or different?

@@ +40,5 @@
> +// ubi::Node instantiations are lightweight (two-word) value types. Instances:
> +// - compare equal when they refer to the same object;
> +// - have hash values that respect their equality relation; and
> +// - can be serialized and deserialized in a way that preserves their identity
> +//   (assuming the referent is still at the same address).

This assumption is a big deal. The "Rooting Restrictions" addresses this. Maybe this needs a forward pointer to that section?

Although I'd kind of prefer to think of this in terms of a snapshot view or something. Is the deserialized node graph still valid, as a description of a previous point in time, even if things have moved? If you're deserializing into ubi::Nodes, then clearly you can no longer go from a ubi::Node to the underlying pointer. I'm not sure what your intent is (I haven't read the rest of the patch yet.)

Ok, now I have. I think you need to point out up front that the whole graph is only valid until a GC or something else happens that could muck with addresses (eg a delete/new of some C++ node), at which time it's totally meaningless and equality, hash values, etc., are completely undefined.

@@ +71,5 @@
> +// In many cases, a JavaScript developer's view of their data differs
> +// substantially from its actual implementation. For example, while JavaScript
> +// objects are putatively maps from property names to sets of attributes (like
> +// [[Value]]), in practice, many objects have only a pointer to a shape, shared
> +// with other similar objects, and indexed slots that [[Value]] attributes. As

...that contain [[Value]] attributes.

Wait, why are we using [[double bracket]] notation here? It makes me think of ES spec things, which doesn't work because Value is an internal implementation type.

@@ +73,5 @@
> +// objects are putatively maps from property names to sets of attributes (like
> +// [[Value]]), in practice, many objects have only a pointer to a shape, shared
> +// with other similar objects, and indexed slots that [[Value]] attributes. As
> +// another example, a string produced by concatenating two other strings may
> +// sometimes be represented simply by a structure that points to those other

s/simply //

(Or s/simply/complexly/, which I just learned is a word.)

Or if you really want your positive adjective, s/simply/efficiently/.

And maybe a "rope structure", just to get the term "rope" in there to explain this to people familiar with these data structures?

@@ +76,5 @@
> +// another example, a string produced by concatenating two other strings may
> +// sometimes be represented simply by a structure that points to those other
> +// strings.
> +//
> +// ubi::Node's goal is to accurately portray the memory that nodes consume.

...at one instant in time?

Wait, is this the one and only goal? I could imagine somebody using this to find large subgraphs of retained objects without caring exactly how much memory they consume. Or to figure out what a closure captures. Or to count the different types of objects, or detect cases where two objects could have the same shape but don't due to their history, or learn how cycles are formed between GC and CC controlled objects.

I think accurate portrayal of memory consumption is a goal, not the goal.

Ok, on reading the rest of the paragraph, it seems like you're saying that one of the objectives of this structure is to accurately portray the underlying data structures with as fine a granularity as possible. Well, within limits -- it doesn't tell you things like malloc fragmentation due to odd structure sizes or something.

I don't know why I'm picking on this sentence so much. It's just that when initially reading it, I thought "cool, this is the one and only motivation for ubi::Node, so I can judge everything in terms of whether or not it is needed for this goal." And that's not the case.

@@ +86,5 @@
> +//
> +// However, when a particular object is the exclusive owner of a separate block
> +// of memory, ubi::Node may present the object and its block as a single node,
> +// and add their sizes together when reporting the node's size, as there is no
> +// meaningfull loss of data in this case. Thus, for example, a ubi::Node

meangingfulllll

@@ +90,5 @@
> +// meaningfull loss of data in this case. Thus, for example, a ubi::Node
> +// referring to a JavaScript object, when asked for the object's size in bytes,
> +// includes the object's slot and element arrays' sizes in the total. There is
> +// no separate ubi::Node value representing the slot and element arrays, since
> +// they are owned exclusively by the object.

I bet if you had an object with a (non-rope) string object property, you'd never merge the string into the object, even though it wouldn't change the memory portrayal.

@@ +100,5 @@
> +// interface presenting those results will generally need to clean them up
> +// before they can be understood by JavaScript developers. For example,
> +// JavaScript developers should not need to understand shapes, only JavaScript
> +// objects. Similarly, they should not need to understand the distinction
> +// between DOM nodes and the JavaScript shadow objects that represent them.

Bold claims. I might ease off on these a bit, since they're not strictly true. A JS developer may care very much whether two objects with the same set of properties are able to share a shape or not.

But yes, you want to be able to present views that abstract away things like this.

@@ +125,5 @@
> +// - INTERNAL - the node is an internal implementation detail, unfamiliar to
> +//   JavaScript developers. Tools should attempt to 'collapse' these in with
> +//   DIRECT or WRAPPED objects that refer to them, or if that is not possible,
> +//   use the type name and edge names provided by ubi::Nodes and hope for the
> +//   best.

This is a little funky. The INTERNAL distinction seems very fuzzy, and dependent on the goal of the tool rather than the underlying ubi::Node's opinions.

But I can live with it.

Then again, after reading the rest of the patch, I see no mention of this in the code. Is this future work? If so, I'd rather not have the comment here until the matching code is present, so it can be evaluated in that context.

@@ +139,5 @@
> +// computation, we run the whole algorithm in the scope of an AutoAssertNoGC
> +// instance.)
> +//
> +// If this restriction is too annoying, we may teach the GC how to root
> +// ubi::Nodes, fix up hash tables that use them as keys, etc.

I would think it would be "if we want further yummy juicy functionality like comparing two graphs, we may teach..."

Please don't use those words verbatim. :)

@@ +142,5 @@
> +// If this restriction is too annoying, we may teach the GC how to root
> +// ubi::Nodes, fix up hash tables that use them as keys, etc.
> +
> +
> +// Forward declarations of SpiderMonkey's ubi::Node reference types.

Ok, now I'm going to be an ass. Why "ubi"? Ubiquitous means something that occurs everywhere. Why is that the salient characteristic of these nodes? I would understand generic::Node or general::Node or mem::Node or HeapNode or AbstractNode or something. That last is probably the most descriptive. abstract::Node?

I don't mind if you want to keep the name, it's just that I don't entirely understand it. I guess they *are* everywhere, and there's sort of an implied "once you start thinking of them abstractly, you'll see them everywhere"?

@@ +168,5 @@
> +class Base {
> +    friend class Node;
> +
> +    // Subclasses may not override the destructor.
> +    // How to enforce this?

Heavyweight way: ask jcranmer to add a static analysis for it. (Yeah, overkill.)

Why is this necessary? I haven't read far enough. Or thought hard enough.

@@ +180,5 @@
> +
> +  public:
> +    bool operator==(const Base &rhs) const {
> +        // There's no need to compare the vtables, because you can't have two
> +        // objects of different types at the same address.

Oh, great, you've broken my whole automatic type-independent deduplication scheme. Thanks a lot.

Seriously, I'm not sure if this is 100% true. If you

const Type1 a = 0;
const Type2 b = 0;

is the compiler/linker allowed to use the same address for them? But yeah, it's an academic question. This is fine.

@@ +197,5 @@
> +    // node owns exclusively that are not exposed as their own ubi::Nodes.
> +    virtual size_t size() const = 0;
> +
> +    // Return an EdgeRange that initially contains all the referent's outgoing
> +    // edges. It is dynamically allocated in |cx|, and should be freed with

in cx? What does that mean? With cx, I guess? (Contexts don't really contain data other than some unfortunate state.)

@@ +228,5 @@
> +    // implementation type.
> +    //
> +    // So, we delegate the actual construction to this specialization, which
> +    // knows Referent's details.
> +    static void construct(void *storage, Referent *referent);

This is crazy powerful. And scary.

@@ +250,5 @@
> +
> +    template<typename T>
> +    Node(T *ptr) {
> +        MOZ_ASSERT(sizeof(T) == sizeof(*base()),
> +                   "ubi::Base specializations must be the same size as ubi::Base");

Can this be done with a static assert?

@@ +285,5 @@
> +
> +    bool operator==(const Node &rhs) const { return *base() == *rhs.base(); }
> +    bool operator!=(const Node &rhs) const { return *base() != *rhs.base(); }
> +
> +    operator ConvertibleToBool() const { return ConvertibleToBool(!!base()->ptr); }

I'm surprised the compiler didn't demand

  return reinterpret_cast<ConvertibleToBool>(!!base()->ptr);

@@ +337,5 @@
> +
> +  public:
> +    // This edge's name.
> +    //
> +    // The storage is owned by this Edge, and will be freed when ARF ARF ARF.

I think I may need to know a little more about when the dog barks...

Can't this just be owned and destroyed by the Base destructor? I imagine I'm missing something.

Or maybe you need some sort of LifoAlloc thingie for this stuff.

@@ +348,5 @@
> +    // This edge's referent.
> +    Node referent;
> +
> +    // Whether the front edge of this range should be visible to
> +    // JavaScript developers.

The whozit whatzit?

@@ +398,5 @@
> +
> +// A reusable ubi::Concrete specialization base class for types supported by
> +// JS_TraceChildren.
> +template<typename Referent>
> +class TracerConcrete: public Base {

space before the ':'

::: js/src/moz.build
@@ +212,5 @@
>  # from <stdlib.h> on Windows through a preprocessor define.
>  # jsutil.cpp cannot be built in unified mode because it is needed for
>  # check-vanilla-allocations.
> +#
> +# vm/UbiNode.cpp is broken out for testing purposes; it should be moved to UNIFIED_SOURCES.

I won't tell anybody.

::: js/src/shell/js.cpp
@@ +4516,5 @@
> +"  { type: <string describing node>, edge: <string describing edge> }\n"
> +"if the node is some internal thing that is not a proper JavaScript value\n"
> +"(like a shape or a scope chain element). The destination of the i'th array\n"
> +"element's edge is the node of the i+1'th array element; the destination of\n"
> +"the last array element is implicitly |target|.\n"),

I guess it doesn't belong in this patch, but wouldn't it make sense to move this stuff (including jsheaptools) to builtin/ next to TestingFunctions.cpp?

::: js/src/shell/jsheaptools.cpp
@@ +626,5 @@
> +    BackEdge(ubi::Node predecessor, jschar *name)
> +        : predecessor_(predecessor), name_(name) { }
> +    BackEdge(BackEdge &&rhs) : predecessor_(rhs.predecessor_), name_(rhs.name_.forget()) { }
> +    BackEdge &operator=(BackEdge &&rhs) {
> +        JS_ASSERT(&rhs != this);

Why is this so awful? I mean, why assert instead of just handling the case? I guess you don't use it, but it seems like it would be simpler to allow than to restrict the interface.

@@ +657,5 @@
> +                             "findPath", "1", "");
> +        return false;
> +    }
> +
> +    if (!args[0].isObject() && !args[0].isString()) {

Maybe comment that you're using the identity of the string, which is why you don't ToString non-objects?

@@ +722,5 @@
> +                const ubi::Node referent = edge.referent;
> +
> +                // Have we reached this edge's referent before?
> +                BackEdgeMap::AddPtr a = black.lookupForAdd(referent);
> +                if (!a) {

Personally, I'd reduce the indentation depth with

  if (a)
      continue;

@@ +737,5 @@
> +
> +                    // If we've reached our target, then walk the backlinks
> +                    // to produce the (reversed) path, and save the path in
> +                    // |nodes| and |edges|.
> +                    if (referent == target) {

Same here, though it's a little more questionable.

@@ +752,5 @@
> +                                JSObject &obj = *predecessor.as<JSObject>();
> +                                if (obj.is<ScopeObject>()) {
> +                                    v.setUndefined();
> +                                } else if (obj.is<JSFunction>() && IsInternalFunctionObject(&obj)) {
> +                                    v.setUndefined();

This "JS should never see" test really needs to be extracted out into a function. (Especially since it's easy to imagine it varying.) Then this function will be mostly a straightforward BFS.

@@ +757,5 @@
> +                                } else {
> +                                    v.setObject(obj);
> +                                }
> +                            } else if (predecessor.is<JSString>()) {
> +                                v.setString(predecessor.as<JSString>());

So the comment lied about ropes? :-)

@@ +765,5 @@
> +                            if (!nodes.append(v) || !edges.append(p->value().forgetName()))
> +                                return false;
> +                            here = predecessor;
> +                        } while (here != start);
> +                        goto found;

There's too much RAII stuff going on here for me to be comfortable with a goto. I know it's a pain, but I think this would read better if you extracted out the found: stuff into a separate function and called it here.

@@ +796,5 @@
> +    // The |nodes| and |edges| array hold the path in reverse order. Walk it in
> +    // the stored order, and construct the result array in start-to-target order.
> +    for (size_t i = 0; i < length; i++) {
> +        // Build an object describing the node and edge.
> +        RootedObject obj(cx, NewBuiltinClassInstance(cx, &JSObject::class_));

I *think* you can do NewBuiltinClassInstance<JSObject>(cx).

@@ +800,5 @@
> +        RootedObject obj(cx, NewBuiltinClassInstance(cx, &JSObject::class_));
> +        if (!obj)
> +            return false;
> +
> +        RootedValue node(cx, nodes[i]);

No need to reroot. Use nodes.handleAt(i). And remind me why we didn't make this operator[] return a Handle.

@@ +805,5 @@
> +        if (!JS_DefineProperty(cx, obj, "node", node,
> +                               JSPROP_ENUMERATE, nullptr, nullptr))
> +            return false;
> +
> +        RootedString edge(cx, js_NewString<CanGC>(cx, edges[i], js_strlen(edges[i])));

Isn't this the same as js_NewStringCopyZ<CanGC>(cx, edges[i])?

::: js/src/shell/jsheaptools.h
@@ +14,3 @@
>  
>  bool FindReferences(JSContext *cx, unsigned argc, JS::Value *vp);
> +bool FindPath(JSContext *cx, unsigned argc, jsval *vp);

s/jsval/JS::Value/

::: js/src/vm/UbiNode.cpp
@@ +93,5 @@
> +    }
> +};
> +
> +
> +typedef mozilla::Vector<SimpleEdge, 8, js::ContextAllocPolicy> SimpleEdgeVector;

Why do you use ContextAllocPolicy (and JSContext*) for this stuff instead of SystemAllocPolicy? This stuff isn't (currently) GC-managed. The comment from Utility.h:

 * - If the lifetime of the allocation is tied to the lifetime of a GC-thing
 *   (that is, finalizing the GC-thing will free the allocation), call one of
 *   the following functions:
 *
 *     JSContext::{malloc_,realloc_,calloc_,new_}
 *     JSRuntime::{malloc_,realloc_,calloc_,new_}

@@ +130,5 @@
> +        // The simplest code is correct! The temporary SimpleEdge takes
> +        // ownership of name; if the append succeeds, the vector element
> +        // then takes ownership; if the append fails, then the temporary
> +        // retains it, and its destructor will free it.
> +        if (!vec->append(mozilla::Move(SimpleEdge(jsname, Node(kind, *thingp))))) {

don't you need to free jsname here?

@@ +192,5 @@
> +template<> const jschar TracerConcrete<js::Shape>::concreteTypeName[] = u"js::Shape";
> +template<> const jschar TracerConcrete<js::BaseShape>::concreteTypeName[] = u"js::BaseShape";
> +template<> const jschar TracerConcrete<js::types::TypeObject>::concreteTypeName[] = u"js::types::TypeObject";
> +
> +#if 0

Kill or make real.
Attachment #8410427 - Flags: review?(sphink)
> Review of attachment 8410427 [details] [diff] [review] [diff] [review]:
> -----------------------------------------------------------------
> 
> Basically r=me, except either I want you to convince me that you really need to be using ContextAllocPolicy, or I want to see it again after you remove the cx stuff.

I've switched to TempAllocPolicy, which also takes a cx.


> This is exported in js/public. Why isn't this JS::ubi::Node?

Moved as suggested.

> @@ +37,5 @@
> > +// dynamically checked, properly typed pointers to the original objects via the
> > +// as<T> method, or generate descriptions of the referent itself.
> > +//
> > +// ubi::Node instantiations are lightweight (two-word) value types. Instances:
> > +// - compare equal when they refer to the same object;
> 
> when, or when and only when? If an object were to be freed and a new object allocated in its place, would ubi::Node equality say they're the same or different?

I've added language:

    ubi::Node instances ...:
    - compare equal if and only if they refer to the same object;
    ...

    A ubi::Node is only valid for as long as its referent is alive; if its
    referent goes away, the ubi::Node is simply a dangling pointer. A ubi::Node
    that refers to a GC thing is not (automatically) a GC root; if the GC
    relocates its referent, the ubi::Node becomes invalid. A ubi::Node that
    refers to a reference-counted object does not bump the reference count.


> @@ +40,5 @@
> > +// ubi::Node instantiations are lightweight (two-word) value types. Instances:
> > +// - compare equal when they refer to the same object;
> > +// - have hash values that respect their equality relation; and
> > +// - can be serialized and deserialized in a way that preserves their identity
> > +//   (assuming the referent is still at the same address).
> 
> This assumption is a big deal. The "Rooting Restrictions" addresses this. Maybe this needs a forward pointer to that section?

That serialization comment promises more than we actually want to implement. We don't need deserialization at all. I've changed the comment to say:

    - have serializations that are only equal if the ubi::Nodes are equal.

(Some history: Originally ubi::Nodes were just tagged pointers, so we simply serialized the tag along with the address, and deserialization could produce an identical value. However, to support embeddings' types, we need an open-ended set of referent types. Virtual dispatch solves this in a natural way, so we switched from tagged pointers to instances of an abstract base class. However, unlike a tag, there's no way to serialize or deserialize a vtable pointer, so it's not possible to deserialize the new ubi::Nodes. Fortunately, the only property we actually need is that serializations of valid ubi::Nodes be distinct; that's sufficient for heap dumps, for example.)


> Ok, now I have. I think you need to point out up front that the whole graph is only valid until a GC or something else happens that could muck with addresses (eg a delete/new of some C++ node), at which time it's totally meaningless and equality, hash values, etc., are completely undefined.

I think the comment about ubi::Nodes being valid only as long as their referents are addresses these concerns.


> @@ +71,5 @@
> > +// In many cases, a JavaScript developer's view of their data differs
> > +// substantially from its actual implementation. For example, while JavaScript
> > +// objects are putatively maps from property names to sets of attributes (like
> > +// [[Value]]), in practice, many objects have only a pointer to a shape, shared
> > +// with other similar objects, and indexed slots that [[Value]] attributes. As
> 
> ...that contain [[Value]] attributes.
> 
> Wait, why are we using [[double bracket]] notation here? It makes me think of ES spec things, which doesn't work because Value is an internal implementation type.

Yes, I was citing the ES spec sense of [[Value]], which is just another attribute of a property, along with [[Writable]], etc. I'll clarify.


> And maybe a "rope structure", just to get the term "rope" in there to explain this to people familiar with these data structures?

I think I've clarified this.


> @@ +76,5 @@
> > +// another example, a string produced by concatenating two other strings may
> > +// sometimes be represented simply by a structure that points to those other
> > +// strings.
> > +//
> > +// ubi::Node's goal is to accurately portray the memory that nodes consume.
> 
> ...at one instant in time?

Well, for as long as the referent lives. I think this is also clarified by the "referent > ubi::Node" lifetime requirement.


> I think accurate portrayal of memory consumption is a goal, not the goal.

Yes, you're right. Clarified.


> Ok, on reading the rest of the paragraph, it seems like you're saying that one of the objectives of this structure is to accurately portray the underlying data structures with as fine a granularity as possible. 

> Well, within limits -- it doesn't tell you things like malloc fragmentation due to odd structure sizes or something.

Well, if it could obtain that information, it probably should make it available. I've tried to make this clearer, but I'd welcome another critical reading.



> I don't know why I'm picking on this sentence so much. It's just that when initially reading it, I thought "cool, this is the one and only motivation for ubi::Node, so I can judge everything in terms of whether or not it is needed for this goal." And that's not the case.

Well, good writing shouldn't set up expectations that then confound further interpretation, so this is all germane.


> meangingfulllll

WHOOOOA FEELINGS

(Fixed, thanks.)


> @@ +90,5 @@
> > +// meaningfull loss of data in this case. Thus, for example, a ubi::Node
> > +// referring to a JavaScript object, when asked for the object's size in bytes,
> > +// includes the object's slot and element arrays' sizes in the total. There is
> > +// no separate ubi::Node value representing the slot and element arrays, since
> > +// they are owned exclusively by the object.
> 
> I bet if you had an object with a (non-rope) string object property, you'd never merge the string into the object, even though it wouldn't change the memory portrayal.

I don't think the text promises that ubi::Node will or won't consolidate. It says "ubi::Node *may* present", and suggests that decisions will be made on a case-by-case basis.


> Then again, after reading the rest of the patch, I see no mention of this in the code. Is this future work? If so, I'd rather not have the comment here until the matching code is present, so it can be evaluated in that context.

Yes, it's future work. I included it because khuey and I have had discussions about how to handle these sorts of issues, and I wanted to sketch out the solution. But I'll drop the detailed comments for now.


> @@ +139,5 @@
> > +// computation, we run the whole algorithm in the scope of an AutoAssertNoGC
> > +// instance.)
> > +//
> > +// If this restriction is too annoying, we may teach the GC how to root
> > +// ubi::Nodes, fix up hash tables that use them as keys, etc.
> 
> I would think it would be "if we want further yummy juicy functionality like comparing two graphs, we may teach..."
> 
> Please don't use those words verbatim. :)

Used verbatim.


> @@ +142,5 @@
> > +// If this restriction is too annoying, we may teach the GC how to root
> > +// ubi::Nodes, fix up hash tables that use them as keys, etc.
> > +
> > +
> > +// Forward declarations of SpiderMonkey's ubi::Node reference types.
> 
> Ok, now I'm going to be an ass. Why "ubi"? Ubiquitous means something that occurs everywhere. Why is that the salient characteristic of these nodes? I would understand generic::Node or general::Node or mem::Node or HeapNode or AbstractNode or something. That last is probably the most descriptive. abstract::Node?

The salient characteristic of ubi::Node is that, with the appropriate steps taken, they
can point at anything. This is unlike Value, which points to JavaScript values only, or
nsISupports, which encompasses XPCOM objects only, for example. "Generic" or "universal"
are not bad. But "abstract" is completely vague, much worse than all the others. It's a
word to be forsworn, like "object".


> I don't mind if you want to keep the name, it's just that I don't entirely understand it. I guess they *are* everywhere, and there's sort of an implied "once you start thinking of them abstractly, you'll see them everywhere"?

No, it's meant to evoke the ability to point anywhere. This is why "generic" and "universal" are not bad candidates.


> @@ +168,5 @@
> > +class Base {
> > +    friend class Node;
> > +
> > +    // Subclasses may not override the destructor.
> > +    // How to enforce this?
> 
> Heavyweight way: ask jcranmer to add a static analysis for it. (Yeah, overkill.)
> 
> Why is this necessary? I haven't read far enough. Or thought hard enough.

I've changed the comment to:

    // For performance's sake, we'd prefer to avoid a virtual destructor; and
    // an empty constructor seems consistent with the 'lightweight value type'
    // visible behavior we're trying to achieve. But if the destructor isn't
    // virtual, and a subclass overrides it, the subclass's destructor will be
    // ignored. Is there a way to make the compiler catch that error?


> @@ +180,5 @@
> > +
> > +  public:
> > +    bool operator==(const Base &rhs) const {
> > +        // There's no need to compare the vtables, because you can't have two
> > +        // objects of different types at the same address.
> 
> Oh, great, you've broken my whole automatic type-independent deduplication scheme. Thanks a lot.
> 
> Seriously, I'm not sure if this is 100% true. If you
> 
> const Type1 a = 0;
> const Type2 b = 0;
> 
> is the compiler/linker allowed to use the same address for them?

As we found in the accepted answer to this StackOverflow question:
http://stackoverflow.com/questions/7525942/address-of-a-static-variable
it seems that sometimes MSVC will put distinct objects at the same address. I thought this was forbidden.

I still think that that definition of equality is going to be okay in practice. I've changed the comment to:

        // Some compilers will indeed place objects of different types at
        // the same address, so technically, we should include the vtable
        // in this comparison. But it seems unlikely to cause problems in
        // practice.


> @@ +197,5 @@
> > +    // node owns exclusively that are not exposed as their own ubi::Nodes.
> > +    virtual size_t size() const = 0;
> > +
> > +    // Return an EdgeRange that initially contains all the referent's outgoing
> > +    // edges. It is dynamically allocated in |cx|, and should be freed with
> 
> in cx? What does that mean? With cx, I guess? (Contexts don't really contain data other than some unfortunate state.)

I meant that the memory used is attributed to the counters that cx maintains. I wrote this when I was under the impression that it was good to use JSContext's MallocProvider methods for general storage. Since we're using TempAllocPolicy now, cx is nothing more than the place we report OOM.


> @@ +228,5 @@
> > +    // implementation type.
> > +    //
> > +    // So, we delegate the actual construction to this specialization, which
> > +    // knows Referent's details.
> > +    static void construct(void *storage, Referent *referent);
> 
> This is crazy powerful. And scary.

It's going to work great, once we add a JS::Class hook for it. JSObjects will have all kinds of wild edges.


> @@ +250,5 @@
> > +
> > +    template<typename T>
> > +    Node(T *ptr) {
> > +        MOZ_ASSERT(sizeof(T) == sizeof(*base()),
> > +                   "ubi::Base specializations must be the same size as ubi::Base");
> 
> Can this be done with a static assert?

Good suggestion --- done.


> @@ +285,5 @@
> > +
> > +    bool operator==(const Node &rhs) const { return *base() == *rhs.base(); }
> > +    bool operator!=(const Node &rhs) const { return *base() != *rhs.base(); }
> > +
> > +    operator ConvertibleToBool() const { return ConvertibleToBool(!!base()->ptr); }
> 
> I'm surprised the compiler didn't demand
> 
>   return reinterpret_cast<ConvertibleToBool>(!!base()->ptr);

That is mysterious. I guess one can construct pointers from bools?? Or maybe I'll only get an
error when this function is first used...


> @@ +337,5 @@
> > +
> > +  public:
> > +    // This edge's name.
> > +    //
> > +    // The storage is owned by this Edge, and will be freed when ARF ARF ARF.
> 
> I think I may need to know a little more about when the dog barks...

Oh, I was supposed to fix that. Sorry.

    // The storage is owned by this Edge, and will be freed when this Edge is
    // destructed.


> Can't this just be owned and destroyed by the Base destructor? I imagine I'm missing something.

Well, it belongs to the Edge, which you usually obtain via EdgeRange::front, which specifies the lifetime of the Edge it hands you a reference to.


> @@ +348,5 @@
> > +    // This edge's referent.
> > +    Node referent;
> > +
> > +    // Whether the front edge of this range should be visible to
> > +    // JavaScript developers.
> 
> The whozit whatzit?

I'm just going to take this out, since we're not even using it enough yet to give it a
proper specification.


> @@ +398,5 @@
> > +
> > +// A reusable ubi::Concrete specialization base class for types supported by
> > +// JS_TraceChildren.
> > +template<typename Referent>
> > +class TracerConcrete: public Base {
> 
> space before the ':'

Ah, thanks.


> ::: js/src/moz.build
> @@ +212,5 @@
> >  # from <stdlib.h> on Windows through a preprocessor define.
> >  # jsutil.cpp cannot be built in unified mode because it is needed for
> >  # check-vanilla-allocations.
> > +#
> > +# vm/UbiNode.cpp is broken out for testing purposes; it should be moved to UNIFIED_SOURCES.
> 
> I won't tell anybody.

Fixed; thanks.


> ::: js/src/shell/js.cpp
> @@ +4516,5 @@
> > +"  { type: <string describing node>, edge: <string describing edge> }\n"
> > +"if the node is some internal thing that is not a proper JavaScript value\n"
> > +"(like a shape or a scope chain element). The destination of the i'th array\n"
> > +"element's edge is the node of the i+1'th array element; the destination of\n"
> > +"the last array element is implicitly |target|.\n"),
> 
> I guess it doesn't belong in this patch, but wouldn't it make sense to move this stuff (including jsheaptools) to builtin/ next to TestingFunctions.cpp?

Yes, the new stuff could go in builtin.

The other jsheaptools stuff, perhaps. It was intended to make it unnecessary to write tests that call 'gc(); gc();' and garbage like that, by letting you actually specify which edges you wanted to be gone, making the tests insensitive to things like conservative and generational GC. But I don't think it really ended up being as valuable as I thought it would be. So maybe it should just be dropped.

But yes, let's leave that to another bug.


> ::: js/src/shell/jsheaptools.cpp
> @@ +626,5 @@
> > +    BackEdge(ubi::Node predecessor, jschar *name)
> > +        : predecessor_(predecessor), name_(name) { }
> > +    BackEdge(BackEdge &&rhs) : predecessor_(rhs.predecessor_), name_(rhs.name_.forget()) { }
> > +    BackEdge &operator=(BackEdge &&rhs) {
> > +        JS_ASSERT(&rhs != this);
> 
> Why is this so awful? I mean, why assert instead of just handling the case? I guess you don't use it, but it seems like it would be simpler to allow than to restrict the interface.

Well, self-moves need to be handled in one way or another: if we just proceed as normal, we destruct the object and then try to move the destructed object onto itself ("... you're gonna have a bad time.") It seemed to me that a self-move was more likely to be a mistake than intentional, so I chose to crash. It's a local fix if that's not the right choice.


> @@ +657,5 @@
> > +                             "findPath", "1", "");
> > +        return false;
> > +    }
> > +
> > +    if (!args[0].isObject() && !args[0].isString()) {
> 
> Maybe comment that you're using the identity of the string, which is why you don't ToString non-objects?

I commented:

    // We don't ToString non-objects given as 'start' or 'target'. We can't
    // see edges to non-string primitive values, and it doesn't make much
    // sense to ask for paths to or from a freshly allocated string, so
    // if a non-string primitive appears here it's probably a mistake.


> @@ +722,5 @@
> > +                const ubi::Node referent = edge.referent;
> > +
> > +                // Have we reached this edge's referent before?
> > +                BackEdgeMap::AddPtr a = black.lookupForAdd(referent);
> > +                if (!a) {
> 
> Personally, I'd reduce the indentation depth with
> 
>   if (a)
>       continue;

Good idea; done.


> @@ +737,5 @@
> > +
> > +                    // If we've reached our target, then walk the backlinks
> > +                    // to produce the (reversed) path, and save the path in
> > +                    // |nodes| and |edges|.
> > +                    if (referent == target) {
> 
> Same here, though it's a little more questionable.

Actually, we could move that entire block after the main loop, effectively pushing it
through the 'goto'. That's much better.


> @@ +752,5 @@
> > +                                JSObject &obj = *predecessor.as<JSObject>();
> > +                                if (obj.is<ScopeObject>()) {
> > +                                    v.setUndefined();
> > +                                } else if (obj.is<JSFunction>() && IsInternalFunctionObject(&obj)) {
> > +                                    v.setUndefined();
> 
> This "JS should never see" test really needs to be extracted out into a function. (Especially since it's easy to imagine it varying.) Then this function will be mostly a straightforward BFS.

I went a bit further and pulled the BFS code out into its own header file; it's now a
class template that takes a visitor type. Everything got a lot cleaner.

> 
> @@ +757,5 @@
> > +                                } else {
> > +                                    v.setObject(obj);
> > +                                }
> > +                            } else if (predecessor.is<JSString>()) {
> > +                                v.setString(predecessor.as<JSString>());
> 
> So the comment lied about ropes? :-)

How so? This isn't meant to touch the string structure --- does it?



> 
> @@ +796,5 @@
> > +    // The |nodes| and |edges| array hold the path in reverse order. Walk it in
> > +    // the stored order, and construct the result array in start-to-target order.
> > +    for (size_t i = 0; i < length; i++) {
> > +        // Build an object describing the node and edge.
> > +        RootedObject obj(cx, NewBuiltinClassInstance(cx, &JSObject::class_));
> 
> I *think* you can do NewBuiltinClassInstance<JSObject>(cx).

You can! Yay!!!


> @@ +800,5 @@
> > +        RootedObject obj(cx, NewBuiltinClassInstance(cx, &JSObject::class_));
> > +        if (!obj)
> > +            return false;
> > +
> > +        RootedValue node(cx, nodes[i]);
> 
> No need to reroot. Use nodes.handleAt(i). And remind me why we didn't make this operator[] return a Handle.

Seems like the operator[] has been fixed in the mean time. I'll use that. Thanks!


> @@ +805,5 @@
> > +        if (!JS_DefineProperty(cx, obj, "node", node,
> > +                               JSPROP_ENUMERATE, nullptr, nullptr))
> > +            return false;
> > +
> > +        RootedString edge(cx, js_NewString<CanGC>(cx, edges[i], js_strlen(edges[i])));
> 
> Isn't this the same as js_NewStringCopyZ<CanGC>(cx, edges[i])?

No, because it steals the jschar array. Hence the edges[i].forget() call below.


> ::: js/src/shell/jsheaptools.h
> @@ +14,3 @@
> >  
> >  bool FindReferences(JSContext *cx, unsigned argc, JS::Value *vp);
> > +bool FindPath(JSContext *cx, unsigned argc, jsval *vp);
> 
> s/jsval/JS::Value/

Ah, thanks.


> ::: js/src/vm/UbiNode.cpp
> @@ +93,5 @@
> > +    }
> > +};
> > +
> > +
> > +typedef mozilla::Vector<SimpleEdge, 8, js::ContextAllocPolicy> SimpleEdgeVector;
> 
> Why do you use ContextAllocPolicy (and JSContext*) for this stuff instead of SystemAllocPolicy? This stuff isn't (currently) GC-managed. The comment from Utility.h:

Everything uses TempAllocPolicy now. ContextAllocPolicy is no longer used here.

> @@ +130,5 @@
> > +        // The simplest code is correct! The temporary SimpleEdge takes
> > +        // ownership of name; if the append succeeds, the vector element
> > +        // then takes ownership; if the append fails, then the temporary
> > +        // retains it, and its destructor will free it.
> > +        if (!vec->append(mozilla::Move(SimpleEdge(jsname, Node(kind, *thingp))))) {
> 
> don't you need to free jsname here?

Nope --- The SimpleEdge constructor takes ownership of jsname, as documented. Vector<T>::append(T &&) (selected by the use of Move) is defined such that if and only if the append succeeds, the vector element takes ownership of the string. So in the failure case, the temporary SimpleEdge frees jsname. It's sweet.


> @@ +192,5 @@
> > +template<> const jschar TracerConcrete<js::Shape>::concreteTypeName[] = u"js::Shape";
> > +template<> const jschar TracerConcrete<js::BaseShape>::concreteTypeName[] = u"js::BaseShape";
> > +template<> const jschar TracerConcrete<js::types::TypeObject>::concreteTypeName[] = u"js::types::TypeObject";
> > +
> > +#if 0
> 
> Kill or make real.

Ach, forgot about that. I don't even remember what it was meant to be.
Revised per comments.
Attachment #8410427 - Attachment is obsolete: true
Attachment #8422920 - Flags: review?(sphink)
Comment on attachment 8422920 [details] [diff] [review]
v6: SpiderMonkey should provide an introspection API for memory heap analysis (ubi::Node)

Review of attachment 8422920 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/UbiNodeTraverse.h
@@ +77,5 @@
> +    bool init() { return visited.init(); }
> +
> +    // Add |node| as a starting point for the traversal. You may add
> +    // as many starting points as you like. Return false on OOM.
> +    bool addStart(Node node) { return pending.append(node); }

This is a neat trick :)
Comment on attachment 8422920 [details] [diff] [review]
v6: SpiderMonkey should provide an introspection API for memory heap analysis (ubi::Node)

Review of attachment 8422920 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't go through this quite as deeply this time, but it seems plenty good enough to land.

::: js/public/UbiNode.h
@@ +28,5 @@
> +//   object, or an internal DOM node class instance
> +//
> +// A ubi::Node instance provides metadata about its referent, and can
> +// enumerate its referent's outgoing edges, so you can implement heap analysis
> +// algorithms that walk thre graph - finding paths between objects, or

"thre" must be a clever combination of "through the".

@@ +56,5 @@
> +// feasible for use in memory-constrained devices --- ideally, the memory
> +// requirements of the algorithm which uses them will be the limiting factor,
> +// not the demands of ubi::Node itself.
> +//
> +// One can construct a ubi::Node value given a pointer to a type that Ubi::Node

downcase Ubi

@@ +71,5 @@
> +// To add support for a new ubi::Node referent type R, you must define a
> +// specialization of the ubi::Concrete template, ubi::Concrete<R>, which
> +// inherits from ubi::Base. ubi::Node itself uses the specialization for
> +// compile-time information (i.e. the checked conversions between R * and
> +// ubi::node), and the inheritance for run-time dispatching.

upcase node

@@ +79,5 @@
> +//
> +// In many cases, a JavaScript developer's view of their data differs
> +// substantially from its actual implementation. For example, while the
> +// ECMAScript specification describes objects as maps from property names to
> +// sets of attributes (like ECMAScript's [[Value]]), in practice, many objects

can you kill the comma after "practice"?

@@ +115,5 @@
> +// interface presenting those results will generally need to clean them up
> +// before they can be understood by JavaScript developers. For example,
> +// JavaScript developers should not need to understand shapes, only JavaScript
> +// objects. Similarly, they should not need to understand the distinction
> +// between DOM nodes and the JavaScript shadow objects that represent them.

Not actually a comment on this patch, but this makes me wonder -- how can you avoid exposing shapes if you're exposing the dominator tree or information derived from it? Or rather, don't you need to suppress shapes before computing the dominator tree? Because it'll be very common for two totally independent high-level nodes to share shapes somewhere within their subgraphs, and that'll prevent them from dominating "their" stuff. You sort of want to do the dominator tree computation without the shapes, and maybe add their proportional memory usage back in afterwards.

@@ +126,5 @@
> +// run to completion and convert their results to some other rootable type, or
> +// save their intermediate state in some rooted structure if they must GC before
> +// they complete. (For algorithms like path-finding and dominator tree
> +// computation, we run the whole algorithm in the scope of an AutoAssertNoGC
> +// instance.)

As I read it, the implication of that parenthetical phrase is that AutoAssertNoGC prevents GC from happening, and that if you're writing some sort of analysis or traversal you can do the same and remove the chance of GCing for free. Which is backwards -- it only checks that you don't GC, and asserts if you do. Maybe s/we run/we do nothing that can GC, and verify this by running/?

@@ +193,5 @@
> +    // node owns exclusively that are not exposed as their own ubi::Nodes.
> +    virtual size_t size() const = 0;
> +
> +    // Return an EdgeRange that initially contains all the referent's outgoing
> +    // edges. The EdgeRange should be freed with 'delete'. (You could used

*use

@@ +282,5 @@
> +
> +    bool operator==(const Node &rhs) const { return *base() == *rhs.base(); }
> +    bool operator!=(const Node &rhs) const { return *base() != *rhs.base(); }
> +
> +    operator ConvertibleToBool() const { return ConvertibleToBool(!!base()->ptr); }

I guess you'd know soon enough if some compiler barfed on this or decided to zero out the bottom bits of your pointers (thus converting true -> 0x1 -> 0).

@@ +400,5 @@
> +// JS_TraceChildren.
> +template<typename Referent>
> +class TracerConcrete : public Base {
> +    virtual const jschar *typeName() const { return concreteTypeName; }
> +    virtual size_t size() const { return 0; } // not implemented yet

Can you file a bug for this and put it in the comment?

::: js/public/UbiNodeTraverse.h
@@ +28,5 @@
> +//      visited for its own use, so it is "free" to let the Handler store
> +//      its own metadata about each node in the HashMap. This must have a
> +//      default constructor. If this type owns any other resources, move
> +//      constructors and assignment operators are probably a good idea,
> +//      too.

So I guess if you want to print a shortest path to a node while visiting it, you would store the origin into NodeData during operator(). (Just talking to myself in print; it's very often what I want.)

@@ +57,5 @@
> +//      The visitor function may consult |traversal.visited| for information
> +//      about other nodes, but it should not add or remove entries.
> +//
> +//      The visitor function should return true on success, or false if an
> +//      error occurs.

Why?

I guess I'd expect this to say either (1) what it says, plus where the overall error report goes to, or (2) the effect of returning true or false (eg the traversal terminates if the visitor function returns false.)

Ok, from reading it, I'd say "The visitor function may return false to abort the traversal and return false from traverse()" or something.

@@ +141,5 @@
> +    // error.
> +    void stop() { stopRequested = true; }
> +
> +    // The context with which we were constructed.
> +    JSContext *cx;

This makes me think we should have an AllocOp or ErrorReportingOp to pair with FreeOp. (Why "Op", I wonder?)

::: js/src/moz.build
@@ +219,5 @@
>      'jsarray.cpp',
>      'jsatom.cpp',
>      'jsmath.cpp',
>      'jsutil.cpp',
> +    'vm/UbiNode.cpp',

Temporarily here, right?

::: js/src/shell/jsheaptools.cpp
@@ +734,5 @@
> +        FindPathHandler handler(start, target, nodes, edges);
> +        FindPathHandler::Traversal traversal(cx, handler, autoNoGC);
> +        if (!traversal.init() ||
> +            !traversal.addStart(start))
> +            return false;

Sorry, but needs to either have curly brackets or be on one line.

@@ +755,5 @@
> +    //
> +    // or, if the node is some internal thing, that isn't a proper
> +    // JavaScript value:
> +    //
> +    //   { node: undefined, edge: <string> }

I would almost expect this to be null instead of undefined, but nah, null is annoying.

::: js/src/vm/UbiNode.cpp
@@ +153,5 @@
> +
> +        // The simplest code is correct! The temporary SimpleEdge takes
> +        // ownership of name; if the append succeeds, the vector element
> +        // then takes ownership; if the append fails, then the temporary
> +        // retains it, and its destructor will free it.

Thanks for the comment.
Attachment #8422920 - Flags: review?(sphink) → review+
(In reply to Steve Fink [:sfink] from comment #36)
> Can you file a bug for this and put it in the comment?

I've already filed bug 1011300 :)
Comment on attachment 8422920 [details] [diff] [review]
v6: SpiderMonkey should provide an introspection API for memory heap analysis (ubi::Node)

Review of attachment 8422920 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/UbiNode.cpp
@@ +197,5 @@
> +
> +template<typename Referent>
> +EdgeRange *
> +TracerConcrete<Referent>::edges(JSContext *cx) const {
> +    mozilla::ScopedDeletePtr<SimpleEdgeRange> r(new SimpleEdgeRange(cx));

I think this is causing my try pushes to fail the vanilla allocations checker. See https://bugzilla.mozilla.org/show_bug.cgi?id=1003302#c6

Have you done a try push for this patch itself yet?
(In reply to Steve Fink [:sfink] from comment #36)

Thanks for the copy-editing; I've made all the changes you suggest.

> @@ +115,5 @@
> > +// interface presenting those results will generally need to clean them up
> > +// before they can be understood by JavaScript developers. For example,
> > +// JavaScript developers should not need to understand shapes, only JavaScript
> > +// objects. Similarly, they should not need to understand the distinction
> > +// between DOM nodes and the JavaScript shadow objects that represent them.
> 
> Not actually a comment on this patch, but this makes me wonder -- how can
> you avoid exposing shapes if you're exposing the dominator tree or
> information derived from it? Or rather, don't you need to suppress shapes
> before computing the dominator tree? Because it'll be very common for two
> totally independent high-level nodes to share shapes somewhere within their
> subgraphs, and that'll prevent them from dominating "their" stuff. You sort
> of want to do the dominator tree computation without the shapes, and maybe
> add their proportional memory usage back in afterwards.

The way dominator trees work, shared shapes will "float up" to be dominated by the compartment global or something. We don't have plans at the moment to let people browse the dominator tree directly (although Google's tools have this feature); we're just using it to compute retained size. So presentation isn't an issue for us yet.

In general, I think we'll need to take these presentation issues case-by-case. I have to admit I don't actually know what to do with *this* case, so it's a good thing we don't need an answer immediately.


> @@ +126,5 @@
> > +// run to completion and convert their results to some other rootable type, or
> > +// save their intermediate state in some rooted structure if they must GC before
> > +// they complete. (For algorithms like path-finding and dominator tree
> > +// computation, we run the whole algorithm in the scope of an AutoAssertNoGC
> > +// instance.)
> 
> As I read it, the implication of that parenthetical phrase is that
> AutoAssertNoGC prevents GC from happening, and that if you're writing some
> sort of analysis or traversal you can do the same and remove the chance of
> GCing for free. Which is backwards -- it only checks that you don't GC, and
> asserts if you do. Maybe s/we run/we do nothing that can GC, and verify this
> by running/?

That's a good point. I've change it to say:

    (For algorithms like path-finding and dominator tree computation, we
    implement the algorithm avoiding any operation that could cause a GC ---
    and use AutoAssertNoGC to verify this.)


> > +    bool operator==(const Node &rhs) const { return *base() == *rhs.base(); }
> > +    bool operator!=(const Node &rhs) const { return *base() != *rhs.base(); }
> > +
> > +    operator ConvertibleToBool() const { return ConvertibleToBool(!!base()->ptr); }
> 
> I guess you'd know soon enough if some compiler barfed on this or decided to
> zero out the bottom bits of your pointers (thus converting true -> 0x1 -> 0).

I assume you're referring to the cast, as '!' is well-defined on pointers. I have no idea whether my choice of ConvertibleToBool is a good one. I've swapped it out for the one in HashTable.h, as that's luke and is thus beyond suspicion.


> @@ +400,5 @@
> > +// JS_TraceChildren.
> > +template<typename Referent>
> > +class TracerConcrete : public Base {
> > +    virtual const jschar *typeName() const { return concreteTypeName; }
> > +    virtual size_t size() const { return 0; } // not implemented yet
> 
> Can you file a bug for this and put it in the comment?

I've put in fitzgen's bug number here.


> ::: js/public/UbiNodeTraverse.h
> @@ +28,5 @@
> > +//      visited for its own use, so it is "free" to let the Handler store
> > +//      its own metadata about each node in the HashMap. This must have a
> > +//      default constructor. If this type owns any other resources, move
> > +//      constructors and assignment operators are probably a good idea,
> > +//      too.
> 
> So I guess if you want to print a shortest path to a node while visiting it,
> you would store the origin into NodeData during operator(). (Just talking to
> myself in print; it's very often what I want.)

Yes. The text now reads:

    //   typename NodeData;
    //
    //      The value type of |BreadthFirst<Handler>::visited|, the HashMap of
    //      ubi::Nodes that have been visited so far. Since the algorithm needs a
    //      hash table like this for its own use anyway, it is simple to let
    //      Handler store its own metadata about each node in the same table.
    //
    //      For example, if you want to find a shortest path to each node from any
    //      traversal starting point, your |NodeData| type could record the first
    //      edge to reach each node, and the node from which it originates. Then,
    //      when the traversal is complete, you can walk backwards from any node
    //      to some starting point, and the path recorded will be a shortest path.
    //
    //      This type must have a default constructor. If this type owns any other
    //      resources, move constructors and assignment operators are probably a
    //      good idea, too.


> @@ +57,5 @@
> > +//      The visitor function may consult |traversal.visited| for information
> > +//      about other nodes, but it should not add or remove entries.
> > +//
> > +//      The visitor function should return true on success, or false if an
> > +//      error occurs.
> 
> Why?

I've changed the comment:

    //      The visitor function should return true on success, or false if an
    //      error occurs. A false return value terminates the traversal
    //      immediately, and causes BreadthFirst<Handler>::traverse to return
    //      false.


> ::: js/src/moz.build
> @@ +219,5 @@
> >      'jsarray.cpp',
> >      'jsatom.cpp',
> >      'jsmath.cpp',
> >      'jsutil.cpp',
> > +    'vm/UbiNode.cpp',
> 
> Temporarily here, right?

Ah, right. Moved back. Sorry.


> ::: js/src/shell/jsheaptools.cpp
> @@ +734,5 @@
> > +        FindPathHandler handler(start, target, nodes, edges);
> > +        FindPathHandler::Traversal traversal(cx, handler, autoNoGC);
> > +        if (!traversal.init() ||
> > +            !traversal.addStart(start))
> > +            return false;
> 
> Sorry, but needs to either have curly brackets or be on one line.

Fixed; thanks.
I've also fixed the 'new SimpleEdgeRange' problem that Nick caught:
https://bugzilla.mozilla.org/show_bug.cgi?id=1003302#c6
I've moved findPath to TestingFunctions.cpp in this patch, and made it available even in non-DEBUG builds; there's no reason to make it conditionally available, and the tests need to recognize when it's not there, so it's an unhappy wrinkle.
Windows build is unhappy; I'll look into it tomorrow.
Seems like Windows doesn't like u'...' character literals or u"..." string literals. That's going to be a problem.
(In reply to Jim Blandy :jimb from comment #46)
> Seems like Windows doesn't like u'...' character literals or u"..." string
> literals. That's going to be a problem.

You want to use MOZ_UTF16 from mozilla/Char16.h; it deals with Windows wanting L"..." constants.
The try run shows failures on Win 8 running findPath. jandem was able to reproduce the crash, which pointed at 'free', so I tried valgrind and got this:

Entering FindPath (hi!)
FindPath: checked argc
FindPath: checked argv[0] type
FindPath: checked argv[1] type
FindPathHandler::operator(): 213172576 'type' 213125320
FindPathHandler::operator(): 213172576 'shape' 213222672
FindPathHandler::operator(): 213172576 'w' 213172624
==3967== Mismatched free() / delete / delete []
==3967==    at 0x4A078DE: operator delete(void*) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3967==    by 0xBC69EF: SimpleEdgeRange::~SimpleEdgeRange() (UbiNode.cpp:176)
==3967==    by 0x540BB2: mozilla::ScopedDeletePtrTraits<JS::ubi::EdgeRange>::release(JS::ubi::EdgeRange*) (in /home/jimb/moz/mem/js/src/debug~/js/src/shell/js)
==3967==    by 0x538BCC: mozilla::Scoped<mozilla::ScopedDeletePtrTraits<JS::ubi::EdgeRange> >::~Scoped() (Scoped.h:106)
==3967==    by 0x52BB8D: mozilla::ScopedDeletePtr<JS::ubi::EdgeRange>::~ScopedDeletePtr() (Scoped.h:246)
==3967==    by 0x52BF72: JS::ubi::BreadthFirst<heaptools::FindPathHandler>::traverse() (UbiNodeTraverse.h:115)
==3967==    by 0x4D5BCC: FindPath(JSContext*, unsigned int, JS::Value*) (TestingFunctions.cpp:1829)
==3967==    by 0xAE1BD3: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:241)
==3967==    by 0xAB7A30: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:455)
==3967==    by 0xAC5A7C: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2566)
==3967==    by 0xAB76D0: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:402)
==3967==    by 0xAB8748: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) (Interpreter.cpp:610)
==3967==  Address 0x4ff9ae0 is 0 bytes inside a block of size 328 alloc'd
==3967==    at 0x4A06409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3967==    by 0xB77944: js_malloc(unsigned long) (Utility.h:99)
==3967==    by 0xBB1A68: SimpleEdgeRange* js_new<SimpleEdgeRange, JSContext*&>(JSContext*&) (Utility.h:327)
==3967==    by 0xBC7D52: JS::ubi::TracerConcrete<JSObject>::edges(JSContext*) const (UbiNode.cpp:201)
==3967==    by 0x52353D: JS::ubi::Node::edges(JSContext*) const (UbiNode.h:332)
==3967==    by 0x52BC77: JS::ubi::BreadthFirst<heaptools::FindPathHandler>::traverse() (UbiNodeTraverse.h:110)
==3967==    by 0x4D5BCC: FindPath(JSContext*, unsigned int, JS::Value*) (TestingFunctions.cpp:1829)
==3967==    by 0xAE1BD3: js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) (jscntxtinlines.h:241)
==3967==    by 0xAB7A30: js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) (Interpreter.cpp:455)
==3967==    by 0xAC5A7C: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:2566)
==3967==    by 0xAB76D0: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:402)
==3967==    by 0xAB8748: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::ExecuteType, js::AbstractFramePtr, JS::Value*) (Interpreter.cpp:610)
==3967==
That seems to be a ScopedDeletePtr vs. ScopedJSDeletePtr issue. New try push:
https://tbpl.mozilla.org/?tree=Try&rev=815b3d2e4cc6
OMG, that was it. I'm so frustrated I could chew my way through an aardvark.

Fresh try with all the toppings:
https://tbpl.mozilla.org/?tree=Try&rev=b92bece48bba
Last-minute fixes are always great. Another try push:

https://tbpl.mozilla.org/?tree=Try&rev=a33ee769f7d9
Now there's a "Hazard: full browser" failure, but the log isn't very revealing.
Restarted that build, all green. Yay!!!

https://hg.mozilla.org/integration/mozilla-inbound/rev/3d405f960e94
Flags: in-testsuite+
Target Milestone: --- → mozilla33
https://hg.mozilla.org/mozilla-central/rev/3d405f960e94
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: