Closed Bug 1123434 Opened 9 years ago Closed 2 years ago

Per-compartment memory usage

Categories

(Core :: JavaScript: GC, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ehoogeveen, Unassigned)

References

Details

Attachments

(1 file)

For the purposes of triggering GC, we currently track memory usage on a per Arena basis. An Arena is only 4kiB in size, so this is a reasonable compromise between granularity and overhead. However, Arenas serve Zones, which can span many JSCompartments. As a result, we can't really track per-compartment memory usage (at least, not without walking the JSObjects in each Arena).

In this bug I'd like to change that, assuming it can be done without too much overhead. There are a few complicating factors that I'd like to list here first:

1) Nursery allocations. It's not hard to bump a counter whenever we allocate in the nursery, but tracking when garbage objects are freed would require walking the nursery to find all unmarked objects (I think) - which is exactly the sort of thing we avoid in order to keep minor GCs fast, if I'm not mistaken. We could instead just ignore nursery allocations, and only count objects when they are tenured, though this could potentially 'hide' a lot of per-compartment memory usage.

2) JITed allocations. As each JitContext has a JSCompartment pointer, these are pretty simple to handle - just load the current count, add the thingSize, and store it. Still, this might add a measurable amount of overhead. Nursery allocations also store the slots inline (up to a limit), but it seems easier to count that separately if we count it at all.

3) Strings are shared across a zone, I believe. Do we want to attribute a string to the first compartment in a zone to allocate it, or track them in a separate per-zone counter? If the former, we should be careful not to double count (I'm not sure where this is determined).

4) I'm not sure how Atoms are handled. There's an Atom compartment, but is this per-zone, or per-runtime? It seems like just naively attributing all atom allocations to the atom compartment should work, but maybe there's more to it.

5) Other than moving objects to tenured (if we ignore nursery allocations), I don't think relocation should affect the counters (after all, the net amount doesn't change).

6) Sweeping seems like the obvious place to decrease the count, but because of background finalization the count probably needs to be atomic in some way. That could also complicate things for the JITs, unless we store a separate atomic 'sweptBytes' per compartment, and only update the main counter on the main thread.

There might be other subtleties that I'm missing - for instance, doesn't off main thread compilation initially use a different compartment? If so the generated JITcode would need to be updated with the right pointer, or use more indirection.
Oh, and

7) Arrays. We do special things for these that I'm not really familiar with yet.
What is the reason for this change?  Even if we triggered GC based on compartment granularity information we still can only GC at a zone- or runtime-level granularity.
That makes sense. This would be useful for compartment memory usage profiling, which could be used by devtools. Yoric was asking about this on IRC a few days ago, so I should probably CC him.
I'm working on a proof of concept so we can see if this is doable (in terms of overhead).

First major snag: Most GC thing types passed to Arena::finalize do not have a compartment pointer. The ones that do are JSObject, JSScript, Shape, AccessorShape and BaseShape.

I assume this means that the others are shared, though I would have expected jit::JitCode and maybe types::TypeObject to be linked to a specific compartment. I've just hacked around the problem for now so I can keep playing around with this.
This implements per-compartment and per-zone counters. Consider this mostly untested - it compiles and doesn't crash (at least on a run of Octane), but it's not complete yet and I can't test performance at the moment; posting this to get some feedback on the approach.

The idea is that sweptGCThingBytes would be subtracted from allocatedGCThingBytes at the end of sweeping - but it needs to happen on the main thread to avoid making allocatedGCThingBytes atomic, and I'm not sure how to schedule this to happen.

Compacting does what I want - punt if the helper threads are still working, or wait for them if the GC is non-incremental - but normal GCs don't compact, and adding another phase seems like massive overkill. What would be the best way to do this?

Finally, assuming the basic approach is good, would it be a good idea to subtract and zero sweptGCThingBytes between slices? I'm wondering if that would be good information for a dev tool that tracks compartment memory usage, or if GCs should be considered a single unit.
Attachment #8551751 - Flags: feedback?(terrence)
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #0)
> 1) Nursery allocations. It's not hard to bump a counter whenever we allocate
> in the nursery, but tracking when garbage objects are freed would require
> walking the nursery to find all unmarked objects (I think) - which is
> exactly the sort of thing we avoid in order to keep minor GCs fast, if I'm
> not mistaken. We could instead just ignore nursery allocations, and only
> count objects when they are tenured, though this could potentially 'hide' a
> lot of per-compartment memory usage.

Alternatively, we could keep a separate counter and simply zero it on every minor GC. But I don't know how much value this would add for profiling, since it would presumably be changing rapidly.
Additional thoughts: we could differentiate by AllocKind without affecting overhead. Of course, this would add something like 42 additional 4/8 byte fields to each JSCompartment and Zone. 168/336 kB per 1000 compartments doesn't seem too bad to me, but I don't really know how many we tend to have. In any case, I should measure the overhead of my WIP patch first.
Sorry for the slow answer, I have been sick.

So, I would definitely use per-compartment memory information. I would use it to:
- upload telemetry information of memory-expensive add-ons;
- provide data for self-support;
- provide data for a `top`-style utility tracking resource usage by add-ons/pages.


I am personally interested in the following information, ranked by most important to least important for my use case:
- peak memory usage by the compartment;
- current memory usage by the compartment;
- number of times gc/cc was called on this compartment;
- total duration of gc/cc for the compartment.

As we now have a interface nsICompartment to expose such information, I believe that we can put it there.

Is there any way I can help?
Flags: needinfo?(emanuel.hoogeveen)
(In reply to David Rajchenbach-Teller [:Yoric] (away until Febrary - use "needinfo") from comment #8)
> I am personally interested in the following information, ranked by most
> important to least important for my use case:
> - peak memory usage by the compartment;

This is just a matter of taking the amount of memory per compartment at the start of a GC and adjusting the maximum if needed. It would need another counter to keep track, of course.

> - current memory usage by the compartment;

This is what the WIP patch implements. Several types of allocations don't have access to the compartment - these are accounted on the zone. I don't know if all of these are necessarily *shared*, or if they just don't need to know about their compartment directly, but as a first pass we can just attribute them to compartments' containing zone (as I've done).

I imagine you could present a tree-like structure with the runtime at the top, branching off into zones, branching off into compartments. A lot of memory in a zone might indicate that one of the compartments is using a lot of memory, and we might not be sharing it with anything else in the zone, so I think it's still interesting information.

> - number of times gc/cc was called on this compartment;

I guess compartments might get added to a zone at any point, so we could track this separately. We do track the number of GCs per zone. I'm not sure about CC.

> - total duration of gc/cc for the compartment.

I don't think we can really answer this for GC, since they work per zone. Per zone information should be doable though (I'm not sure if GC stats are always on, but they should cover this). I'm not sure about CC.

> As we now have a interface nsICompartment to expose such information, I
> believe that we can put it there.

Thanks for the suggestion. I haven't really looked at this since I uploaded the WIP patch - still need to do performance tests to see if simply keeping track is expensive at all. In the WIP patch I simply added some counters to each JS::Zone and JSCompartment; I don't know anything about nsICompartment yet, but it shouldn't be hard to walk all the compartments or store the information there.

Performance measurements aside, I'm still not sure *when* to update the counter after GC has finished. The problem is that I'm pretty sure GC can 'finish' before background finalization is complete, unless compacting waits for it - but we want to update the counters on the main thread to avoid making them atomic (which would be hard for the JITs to deal with).

The only alternative I can think of is to make the counters 64-bit and never decrement them, then simply combine their values (allocated memory - deallocated memory) when doing measurements. But that might make incrementing the counters from the JITs more expensive as well, and would complicate tracking the maximum memory usage slightly.
Flags: needinfo?(emanuel.hoogeveen)
The cycle collector does not work on a per-compartment basis, so there's nothing really to measure about pauses spent for a compartment.

The GC only does zonal GCs occasionally now, though I'm hoping to change that in the future once some GC problems are fixed.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #9)
> I imagine you could present a tree-like structure with the runtime at the
> top, branching off into zones, branching off into compartments. A lot of
> memory in a zone might indicate that one of the compartments is using a lot
> of memory, and we might not be sharing it with anything else in the zone, so
> I think it's still interesting information.

I personally don't need this for the moment, but it sounds useful for memshrink efforts.

> > - number of times gc/cc was called on this compartment;
> 
> I guess compartments might get added to a zone at any point, so we could
> track this separately. We do track the number of GCs per zone. I'm not sure
> about CC.

I realize that I haven't mentioned that all this gc/cc stuff is in the "nice to have" category for my scenario. I will already be quite happy with just the current/max memory usage.

> > As we now have a interface nsICompartment to expose such information, I
> > believe that we can put it there.
> 
> Thanks for the suggestion. I haven't really looked at this since I uploaded
> the WIP patch - still need to do performance tests to see if simply keeping
> track is expensive at all. In the WIP patch I simply added some counters to
> each JS::Zone and JSCompartment; I don't know anything about nsICompartment
> yet, but it shouldn't be hard to walk all the compartments or store the
> information there.

I realize that my sentence was misleading. By all means, place the data in JS::Zone/JSCompartment. We should just update (as a followup bug) the implementation of nsICompartment so that it fetches the data from the JSCompartment.
I worked a bit more on this today and finally did some performance testing. The additions to the JIT, and merging sweptGCThingBytes into allocatedGCThingBytes (for each compartment and zone) at key points is pretty much performance neutral on Octane. However, actually updating sweptGCThingBytes during sweeping is costly (from ~33700 points to ~32400 points).

Most of this appears to be the result of making sweptGCThingBytes atomic - simply making all threads decrement the non-atomic allocatedGCThingBytes directly eliminates most of the difference (from ~33700 points to ~33500 points). I'm not quite sure yet how to get around this problem or how to eliminate the rest of the difference, though I feel like it should be doable.
Counters like this are very error-prone. In early about:memory days we used them more and I moved away as much as possible -- in favour of traversal-based measurements -- because of this.

I'll also point to the nsIMemoryReporter.sizeOfTab, which does something quite like what this bug is asking for, but for an entire tab. It's traversal-based, and reuses the memory reporter machinery. It was intended to be used in a devtools memory widget and Panos had a prototype but it never ended up landing.
Comment on attachment 8551751 [details] [diff] [review]
WIP: Track GCThing memory usage per compartment.

Review of attachment 8551751 [details] [diff] [review]:
-----------------------------------------------------------------

I agree with Nicholas that this is likely to be hard to get totally correct (and keep correct in the long term) because the compartment split is not really a fundamental unit of the system. As you noted, some allocations don't even conceptually have a compartment. On the other hand, the zone/runtime counters are required for system correctness, so they can't get out-of-whack without things falling over. I see the performance issues are a side-effect of this -- we are effectively creating a shadow structure here, duplicating the necessary work. 

This difficulty makes me think that this is the wrong approach. Specifically, I'd like to know why all of our addons have to be in the same zone; it seems that if we make this one change, Yoric could do everything he wants to do without needing any compartment numbers at all.
Attachment #8551751 - Flags: feedback?(terrence)
Bill, is there some specific reason why each addon has to go in the System zone and not into a separate System-like addon zone?
Flags: needinfo?(wmccloskey)
I think it makes sense to have a separate zone for each add-on. It'll be a bit of work to implement, but not too much.

I agree with Brian, Nick, and Terrence that it's not a good idea to try to try to keep per-compartment counters.
Flags: needinfo?(wmccloskey)
Zone per add-on would be perfect for me.
Is there anything I could do to help?
FWIW, I talked with Terrence about this prior to his comment and we agreed that the approach was too unwieldy. I still think having up-to-date per-zone counters could be useful, however: we wouldn't have to walk all the arenas in a zone just to get the heap usage, and could use it in the GC to (e.g.) decide whether to compact. We could also assert that the count is correct at the start of GC in debug builds.

Unfortunately, while ripping out the per-compartment stuff would certainly make the patch a lot simpler, it doesn't quite solve the problem of when to update the counters. One solution could be to count the live things on a zone during *marking*, then replace the total count with the live count at the end of the marking phase. We would have to make sure that this doesn't unduly slow down marking, though, and when I looked into this a bit I was scared off by the amount of marking paths. Perhaps Terrence's recent refactoring will make this more straightforward to attempt.
Updating stuff during marking is quite fine with me. I do not have realtime constraints for when the data is available, as long as it is eventually available. My main objective is to detect memory leaks, so detecting them a few seconds (or even a few minutes) late is quite acceptable.

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: emanuel.hoogeveen → nobody

We're not planning on implementing this.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: