Improve our malloc memory accounting framework

RESOLVED FIXED in Firefox 58

Status

()

enhancement
P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jonco, Assigned: jonco)

Tracking

55 Branch
mozilla58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox58 fixed)

Details

Attachments

(3 attachments)

Ideally we would track malloc memory precisely (bug 1395509) but that's going to be a lot of work.  There are some smaller changes we can make to improve the way we track this that will enable us to fix bug 1384049.
Currently, when we allocate malloc memory we sometimes add it to the zone counter only (if we use ZoneAllocPolicy) and sometimes to both the zone and runtime counters (if we call the updateMallocCounter methods). 

Instead, we should always add memory associated with the zone to the runtime counter.

The patch changes the code from the update method on the GC runtime calling the zone update method to the zone update method calling the update method on the runtime, so that all zone memory gets added to the runtime counter.

Also I replaced a few instances where we get the zone from the context before calling updateMallocCounter() with just calling this on the context which does the same thing but is shorter.
Assignee: nobody → jcoppeard
Attachment #8914710 - Flags: review?(sphink)
Blocks: 1341093
Make malloc counters count up instead of down.

I had to remove the ActiveThreadData protection from maxBytes_ since this is read off the main thread.  It's only written to with the GC lock held.

I guess there is a race condition when changing the parameter if any off thread compilation is running, but I don't think think this is going to happen in normal operation or have any bad consequences if it does.  Let me know if you think I should do something about this though.
Attachment #8914718 - Flags: review?(sphink)
This patch adds a dynamic malloc bytes threshold.

The idea here is that, just like the GC bytes threshold, we don't want to trigger a GC for zones every time the allocate a fixed amount of memory but have a threshold that grows with their memory use.

The threshold starts a the current max bytes setting and doubles every time it is exceeded (and a GC is triggered).  If we GC without reaching it, we reduce it by 10% down to a minimum of the original max bytes setting.  These numbers are off the top of my head obviously.  A fair comment might be that we should make them configurable.

I changed the way the counters are reset from the end of GC to the beginning.  This is because by the end of an incremental GC more memory will have been allocated.  So these now track malloc memory allocated since the last GC was triggered.

Zone counters are reset only when that zone was collected.  The runtime counter is reset to zero in a full GC, but for a zone GC we subtract the total byte count for the zones that were collected, so it continues to reflect allocated memory in zones that haven't been collected.
Attachment #8914722 - Flags: review?(sphink)
Comment on attachment 8914710 [details] [diff] [review]
bug1405274-update-both-zone-and-runtime

Review of attachment 8914710 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Runtime.cpp
@@ +803,4 @@
>  }
>  
>  void
>  JSRuntime::updateMallocCounter(JS::Zone* zone, size_t nbytes)

Does this function still need to exist? I see a single nontrivial caller, in JSContext::updateMallocCounter, and it seems like if it's really possible for its zone() to be nullptr, then it should do the check there.
Attachment #8914710 - Flags: review?(sphink) → review+
Comment on attachment 8914718 [details] [diff] [review]
bug1405274-malloc-count-up

Review of attachment 8914718 [details] [diff] [review]:
-----------------------------------------------------------------

Phew, thanks. I'm not sure why I didn't do this already.

As for threading, I hate to introduce more potential tsan hazards. Can you file a followup to cancel offthread work when changing this (or any) parameter? Low priority, but let's get it on file.

(Though I wonder if these settings should be asynchronous, where they get queued up and applied at the appropriate time. Obviously unnecessary for this trivial case, but something to consider for the eventual scheduler, where we might want to make smart choices between GCing vs letting some jitcode run a bit longer if we would discard it.)
Attachment #8914718 - Flags: review?(sphink) → review+
Comment on attachment 8914722 [details] [diff] [review]
bug1405274-malloc-thresholds

Review of attachment 8914722 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/GCRuntime.h
@@ +680,5 @@
> +        if (isTooMuchMalloc())
> +            maxBytes_ *= 2;
> +        else
> +            maxBytes_ = std::max(initialMaxBytes_.ref(), size_t(maxBytes_ * 0.9));
> +        reset();

The state dependency here makes me a little nervous -- isTooMuchMalloc() must be meaningful at the time you call this. (So don't call it twice, or you'll end up only 90% of where you want to be.) But it's fine.

::: js/src/jsgc.cpp
@@ +4339,5 @@
> +GCRuntime::updateMallocCountersOnGC()
> +{
> +    AutoLockGC lock(rt);
> +
> +    size_t totalZoneMallocBytes = 0;

nit: total*Zone*MallocBytes makes me think it's all the bytes for one zone. I would prefer totalRuntimeMallocBytes or just totalMallocBytes. Except that now I realize that this is just the total for the zones that are collecting. Maybe bytesInCollectedZones?

@@ +4353,5 @@
> +    // zones we did collect.
> +    if (isFull)
> +        mallocCounter.updateOnGC(lock);
> +    else
> +        mallocCounter.decrement(totalZoneMallocBytes);

So let's see... say you run for a while and have 3 zones of equal size. One is idle and you never collect it. One keeps growing to its max size and then throwing out all malloc bytes on every GC. The third is long-lived stuff where the GCs are doing nothing to the malloc bytes. Assume the second and third zones here are active (where the third one is doing lots of stuff with GC bytes but not much with malloc bytes.)

The first is treated as fixed-size overhead that pushes the runtime closer to its limit. That seems good. The second ramps up to its limit, possibly pushing the runtime over as it climbs, but after every GC it and the runtime drops back down. That seems ok.

The third is a nuisance. The first time you GC, its contribution is removed from the runtime and all of our limits behave as if it doesn't exist. Same for the zone's own limits; you'll need to malloc a ton more memory before you hit the malloc trigger.

I guess this raises the question -- should we decrement malloc counters for non-malloc GCs? But I guess if we don't, then we'll just mix in some number of unnecessary malloc-triggered GCs.

Bleh. I guess this is all the fallout from not knowing how much memory we've freed. I suppose we could count the number of frees and estimate the amount of memory freed by making the bogus assumption that the mean freed memory size is the same as the mean allocated size (so bytes_freed_since_last_GC = num_frees / num_allocations * bytes_allocated_since_last_GC). But that seems pretty sketchy.

It seems like what you have here is the best thing to try.
Attachment #8914722 - Flags: review?(sphink) → review+
Priority: -- → P2
(In reply to Jon Coppeard (:jonco) from comment #3)
> Created attachment 8914722 [details] [diff] [review]
> bug1405274-malloc-thresholds
> 
> The threshold starts a the current max bytes setting and doubles every time
> it is exceeded (and a GC is triggered).  If we GC without reaching it, we
> reduce it by 10% down to a minimum of the original max bytes setting.  These
> numbers are off the top of my head obviously.  A fair comment might be that
> we should make them configurable.

I'm not sure about this.  Even if the herustics are just guesses for now I'd prefer to do the sizing for the updated limit based on the used malloc size excluding garbage, so the used size after a collection.

> I changed the way the counters are reset from the end of GC to the
> beginning.  This is because by the end of an incremental GC more memory will
> have been allocated.  So these now track malloc memory allocated since the
> last GC was triggered.

But as you point out incremental GC can complicate this.  So I'd like to do something fancier, and it should be straightforward but not trivial.  Perhaps a follow-up change?
(In reply to Steve Fink [:sfink] [:s:] from comment #4)
> Does this function still need to exist?

Nope, nice catch.

> Can you file a followup to cancel offthread work when changing this (or any)
> parameter? 

Sure, filed bug 1405386.

> Bleh. I guess this is all the fallout from not knowing how much memory we've
> freed. 

Yes, the real fix is to do precise tracking of malloc memory.  But that's a ton of work.  I think this is a clear improvement over what we have now.

Other comments addressed.
(In reply to Paul Bone [:pbone] from comment #7)
> I'd prefer to do the sizing for the updated limit based on the used malloc size
> excluding garbage, so the used size after a collection.

Me too, but we don't have this information.  That's what bug 1395509 is about.
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a31f02ed47c
Always update runtime malloc counter too when malloc memory is associated with a zone r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/39f5d3ebf3ac
Make malloc counters count up instead of down r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/a7d845dd8d7a
Add a dynamic malloc bytes threshold r=sfink
Since these patches landed, we've seen regressions in proportion of non-incremental GCs (from 4% to 14%) and the mean max pause time (from 40ms to 64ms).  I'm going to re-open this bug and back out the third patch, which should hopefully fix this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1405980
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/55161cebaca0
Add a dynamic malloc bytes threshold: Reland to fix timeout in wpt /wasm/many-memories.window.html. r=sfink
wpt 
>TIMEOUT | /wasm/many-memories.window.html 
started failing with the backout, see https://treeherder.mozilla.org/logviewer.html#?job_id=135985845&repo=mozilla-inbound , so it got relanded.
Flags: needinfo?(jcoppeard)
https://hg.mozilla.org/mozilla-central/rev/55161cebaca0
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Depends on: 1408375
Flags: needinfo?(jcoppeard)
You need to log in before you can comment on or make changes to this bug.