Open Bug 932156 Opened 11 years ago Updated 2 years ago

API to tag a JS object to a particular memory reporter

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

People

(Reporter: Felipe, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

I'd like to be able to add knowledge to the memory reporters in about:memory about the purpose of a specific JS object, instead of letting it being just counted towards the gc-heap (if I understand it correctly..)

For example: previously, the add-ons repository was stored in a .sqlite file, and the runtime memory associated with that storage was properly reported under the "storage" section in about:memory.

With bug 853389, however, all that in-memory data is now stored as .json on disk, and parsed to a JS object in memory. I'd like to still mark that object as "runtime memory for the add-ons repository storage" (or something to that effect)

(detailing for the curious: in this specific example, that object is AddonRepository.jsm's AddonDatabase.DB, which contains an array of various AddonSearchResult objects)
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Same as in bug 932152, an important question is whether we want to just mark the particular object, or the object and all the objects that it refers to.  The latter is harder.  Maybe I should just worry about the former;  it should still be enough to be quite usable.
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Same as in bug 932152, an important question is whether we want to just mark
> the particular object, or the object and all the objects that it refers to. 
> The latter is harder.  Maybe I should just worry about the former;  it
> should still be enough to be quite usable.

For the scenario in comment 0, that won't suffice. In general, my guess would be that for most cases this would only be useful if it also counts objects referred to. However, the question is then how to count objects that are referred to by multiple "notable" objects. And do you call getters during traversal? (Probably not.)
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Same as in bug 932152, an important question is whether we want to just mark
> the particular object, or the object and all the objects that it refers to. 
> The latter is harder.  Maybe I should just worry about the former;  it
> should still be enough to be quite usable.

I don't know how useful it would be for the use case I'm interested at. Would the measurements be able to at least count non-object properties (pure types like numbers, strings etc.)?

Most of the JSON storage I want to measure is structured like this:

obj = {
  schemaVersion: 5,
  data: [ { obj1 }, { obj2 }, { obj3 }, ... ]
}

And the data objs all have pure data, give or take (a few exceptions that don't matter much)


If the measuring API is not able to iterate in the array object (because it's an object), I can happily iterate and add all the relevant values manually. Even if I can't tag it to about:memory as this bug proposes, it would be very useful for debugging or for reporting it through telemetry.


I suppose I can already approximate this to a certain degree by adding the length of all strings in these objects (property names and values), knowing that they're stored as 16-bit etc. Same for numbers et al.. But it's a bit of an ugly process and wouldn't include the obj storage overhead, which would be nice to have.
When the memory reporter runs, we want all the objects that require special handling to have been tagged in some way.  Because the memory reporter just does a linear heap scan, and we don't want to do any kind of object tracing at that point.

However, it should be possible to do the necessary tracing and tagging before that.  It could be done manually, as Felipe suggested, but it also shouldn't be hard to request the API function to do it for you.

> However, the question is then how to count objects that
> are referred to by multiple "notable" objects. And do you call getters
> during traversal? (Probably not.)

Multiple references is an interesting case.  If we put them all in separate sub-trees in about:memory then repeated measurements of single objects wouldn't matter.  As for getters... if the auto-tracing code was done in JS, that should fall out naturally.

So I think there's a path forward here :)
(In reply to Nicholas Nethercote [:njn] from comment #4)
> As for getters... if the auto-tracing code was done in JS,
> that should fall out naturally.

My point was that I think getters shouldn't be invoked, to keep this side-effect-free (so we actually have to worry about .valueOf, too). This shouldn't be a problem, because a getter has to get a value from somewhere, too, and if it doesn't get it from the object but from somewhere completely different (say, a field on the global), then ISTM that the object isn't actually referring to the value.

Another issue: what about closed-over things? Those can't be traced-through in JS. I guess the answer to that is "don't do that, then".
(In reply to :Felipe Gomes from comment #0)
> (detailing for the curious: in this specific example, that object is
> AddonRepository.jsm's AddonDatabase.DB, which contains an array of various
> AddonSearchResult objects)

Since it's in a compartment, couldn't you just use some hint/annotation to say that "compartment([System Principal], resource://gre/modules/AddonManager.jsm)" should be tracked as belonging to the "Addon Manager" subsystem or something like that?

Ex: In my "real" Firefox 25.0.1 profile, I see 703,184B for that compartment.  In my current nightly Firefox with super-boring profile, I see 198,008B.

Another potential way to measure for 'simple' data structures would be to use nsIStructuredCloneContainer to perform a structured clone of the data-structure.  Do initFromVariant(yourObj), then read "serializedNBytes", and you've got a number for relatively little work that will correlate somewhat with your actual memory usage.
https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIStructuredCloneContainer

More thorough and more wasteful if you reallllly want to know how much memory your data is using would be to create a sandbox compartment and deserialize your data structure into that compartment (and then trigger a GC) and then ask the about:memory reporter about the compartment.
(In reply to Andrew Sutherland (:asuth) from comment #6)
> Since it's in a compartment, couldn't you just use some hint/annotation to
> say that "compartment([System Principal],
> resource://gre/modules/AddonManager.jsm)" should be tracked as belonging to
> the "Addon Manager" subsystem or something like that?

That's a great idea, I hadn't thought of it in that way. Of course, it includes the code size from the module API, which is unrelated to the data storage, but to an extent that is a better representation of the memory used by the feature.

Though whenever we move to reuseGlobal = true on desktop we'll lose that ability, right?
> Since it's in a compartment, couldn't you just use some hint/annotation to
> say that "compartment([System Principal],
> resource://gre/modules/AddonManager.jsm)" should be tracked as belonging to
> the "Addon Manager" subsystem or something like that?

If compartment- (i.e. global-) level tagging is all that's required, then we don't have to do anything;  we already have that.  I assume that there will be interesting use cases for which that's not sufficient, where a finer-grained tagging is required.
bhackett: this bug is about allowing users tag particular JS objects in some way, so that about:memory can present their measurements separately from vanilla JS objects.

I can think of two ways to implement the tagging.  

- Have a hashtable that maps object addresses to annotations.  (GGC will complicate that.)

- Somehow tag individual objects.  Is this possible, e.g. the TypeObject might be a possible place for such a tag?

Ideally the tag value would be an arbitrary string, though we could restrict the number of distinct tags allowed in order to restrict the tag value to a small integer (or even just a single bit, if necessary).

We'd probably want to also allow tag clearing, which might complicate things.
Flags: needinfo?(bhackett1024)
(In reply to :Felipe Gomes from comment #7)
> Though whenever we move to reuseGlobal = true on desktop we'll lose that
> ability, right?

That's my understanding.

(In reply to Nicholas Nethercote [:njn] from comment #8)
> If compartment- (i.e. global-) level tagging is all that's required, then we
> don't have to do anything;  we already have that.  I assume that there will
> be interesting use cases for which that's not sufficient, where a
> finer-grained tagging is required.

It would be amazing to have the finer-grained tagging.  I'm just wondering if this would be more of a (chrome) devtools thing than an about:memory thing, though.  (Though an equally easy-to-use mechanism as about:memory would be awesome.)  JS Compartment memory reporting is cheap, but graph analysis seems less cheap, and an in-compartment ownership analysis that doesn't cross into XPCOM and other compartments is going to be unrealistic.  For example memory-backed Blobs (nsDOMMemoryFile) are XPCOM even in our increasingly magic WebIDL world.  (The WebIDL bindings just hold onto the XPCOM objects on workers, right now.)

Specifically, it sounds like the end-case would be having nsCycleCollectorLogger (and other nsICycleCollectorListener impls) be able to log object sizes.  You get the traceAll log, you do the reachability analysis from the thing you're interested, lopping off the bits of the graph that are boring/owned by other things/other by a specific other thing, etc.  https://github.com/janodvarko/ccdump could probably used to prototype.

This could be automated via code shared with the devtools memory tools.  The rules for the AddonManager packaged in-tree so that a checkbox-triggered "about:memory" mode or "about:memory-subsystems" could display it on demand, or something very opt-in like the new Test Pilot thing could let people okay with Firefox locking up for a while report their stats to the telemetry server, etc.  It would be way too slow for the general telemetry opt-in or general about:memory (which can already be very slow), though.
(In reply to Nicholas Nethercote [:njn] from comment #9)
> bhackett: this bug is about allowing users tag particular JS objects in some
> way, so that about:memory can present their measurements separately from
> vanilla JS objects.
> 
> I can think of two ways to implement the tagging.  
> 
> - Have a hashtable that maps object addresses to annotations.  (GGC will
> complicate that.)
> 
> - Somehow tag individual objects.  Is this possible, e.g. the TypeObject
> might be a possible place for such a tag?
> 
> Ideally the tag value would be an arbitrary string, though we could restrict
> the number of distinct tags allowed in order to restrict the tag value to a
> small integer (or even just a single bit, if necessary).
> 
> We'd probably want to also allow tag clearing, which might complicate things.

Bug 850026 added the ability to attach metadata to JS objects in a lightweight manner, though the more different metadata values there are the more the shape hierarchy will fragment (objects with different metadata values have different shapes).  The purpose of that bug seems fairly similar to this one, also talk to jimb about how he's planning on using the metadata hook (I don't think it's actually in use in any context yet).

Right now, it should be fine for the metadata hook to be in place only if certain options are set, or possibly always on in all (or certain?) chrome compartments.  If the metadata hook is set for content compartments it will inhibit some important Ion optimizations (inlining object creation).
Flags: needinfo?(bhackett1024)
> Bug 850026 added the ability to attach metadata to JS objects in a
> lightweight manner

Perfect!  Thanks.
Yes, JSCompartment::objectMetadataCallback seems perfect for this. It seems to me now is probably the right time to start deciding what the properties of the metadata objects themselves mean.

Perhaps, if an object has a metadata object whose 'aboutMemoryCategory' property is a string, then that could be something about:memory would consult.

At the C++ level, we could have an RAII type, AutoAboutMemoryCategory, that establishes the about:memory category for all objects allocated while it is the youngest live instance of the type:

{
   ...
   AutoAboutMemoryCategory aamc("storage"); // or perhaps a JSAtom
   ...
}

Then, at the JS level we could have a function that establishes that metadata value while calling its first argument, so that:

withAboutMemoryCategory("storage", function () { return JSON.parse(data); })

would return a tree of objects all sharing the same metadata object, whose 'aboutMemoryCategory' property was the string "storage".

The object metadata annotation works best if you re-use metadata objects whenever possible. So the hook with which AutoAboutMemoryCategory interacts might want to actually have a hash table from atoms to metadata objects, that drops entries with unmarked values after a GC. (MORE GC MAGIC)
AFAICT, the metadata object must be attached when the object is created, which doesn't fit with our requirements here.

Something simpler would be just to add a property with a special name, and if that's present, categorize the object specially based on that.  Clunky but simple...
When I measure the size of a JSObject in StatsCellCallback() within vm/MemoryMetrics.cpp, I want to check if that object has a property with a special name (something like "aboutMemorySpecialProperty").  I thought of calling JS_GetPropertyById, but that requires a JSContext, and I only have a JSRuntime there...
Shu suggested that js::GetPropertyPure() might fit the bill. I'll try that.
See Also: → 949218
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.