Closed Bug 1405274 Opened 7 years ago Closed 7 years ago

Improve our malloc memory accounting framework

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox58

---

fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(3 files)

bug1405274-update-both-zone-and-runtime 7 years ago Jon Coppeard (:jonco) 7.37 KB, patch	sfink : review+	Details \| Diff \| Splinter Review
bug1405274-malloc-count-up 7 years ago Jon Coppeard (:jonco) 8.50 KB, patch	sfink : review+	Details \| Diff \| Splinter Review
bug1405274-malloc-thresholds 7 years ago Jon Coppeard (:jonco) 7.45 KB, patch	sfink : review+	Details \| Diff \| Splinter Review

Jon Coppeard (:jonco)

Assignee

Description

•

7 years ago

Ideally we would track malloc memory precisely (bug 1395509) but that's going to be a lot of work. There are some smaller changes we can make to improve the way we track this that will enable us to fix bug 1384049.

Jon Coppeard (:jonco)

Assignee

Comment 1

•

7 years ago

Attached patch bug1405274-update-both-zone-and-runtime — Details — Splinter Review

Currently, when we allocate malloc memory we sometimes add it to the zone counter only (if we use ZoneAllocPolicy) and sometimes to both the zone and runtime counters (if we call the updateMallocCounter methods). Instead, we should always add memory associated with the zone to the runtime counter. The patch changes the code from the update method on the GC runtime calling the zone update method to the zone update method calling the update method on the runtime, so that all zone memory gets added to the runtime counter. Also I replaced a few instances where we get the zone from the context before calling updateMallocCounter() with just calling this on the context which does the same thing but is shorter.

Assignee: nobody → jcoppeard

Attachment #8914710 - Flags: review?(sphink)

Jon Coppeard (:jonco)

Assignee

Updated

•

7 years ago

Blocks: 1341093

Jon Coppeard (:jonco)

Assignee

Comment 2

•

7 years ago

Attached patch bug1405274-malloc-count-up — Details — Splinter Review

Make malloc counters count up instead of down. I had to remove the ActiveThreadData protection from maxBytes_ since this is read off the main thread. It's only written to with the GC lock held. I guess there is a race condition when changing the parameter if any off thread compilation is running, but I don't think think this is going to happen in normal operation or have any bad consequences if it does. Let me know if you think I should do something about this though.

Attachment #8914718 - Flags: review?(sphink)

Jon Coppeard (:jonco)

Assignee

Comment 3

•

7 years ago

Attached patch bug1405274-malloc-thresholds — Details — Splinter Review

This patch adds a dynamic malloc bytes threshold. The idea here is that, just like the GC bytes threshold, we don't want to trigger a GC for zones every time the allocate a fixed amount of memory but have a threshold that grows with their memory use. The threshold starts a the current max bytes setting and doubles every time it is exceeded (and a GC is triggered). If we GC without reaching it, we reduce it by 10% down to a minimum of the original max bytes setting. These numbers are off the top of my head obviously. A fair comment might be that we should make them configurable. I changed the way the counters are reset from the end of GC to the beginning. This is because by the end of an incremental GC more memory will have been allocated. So these now track malloc memory allocated since the last GC was triggered. Zone counters are reset only when that zone was collected. The runtime counter is reset to zero in a full GC, but for a zone GC we subtract the total byte count for the zones that were collected, so it continues to reflect allocated memory in zones that haven't been collected.

Attachment #8914722 - Flags: review?(sphink)

Steve Fink [:sfink] [:s:]

Comment 4

•

7 years ago

Comment on attachment 8914710 [details] [diff] [review] bug1405274-update-both-zone-and-runtime Review of attachment 8914710 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Runtime.cpp @@ +803,4 @@ > } > > void > JSRuntime::updateMallocCounter(JS::Zone* zone, size_t nbytes) Does this function still need to exist? I see a single nontrivial caller, in JSContext::updateMallocCounter, and it seems like if it's really possible for its zone() to be nullptr, then it should do the check there.

Attachment #8914710 - Flags: review?(sphink) → review+

Steve Fink [:sfink] [:s:]

Comment 5

•

7 years ago

Comment on attachment 8914718 [details] [diff] [review] bug1405274-malloc-count-up Review of attachment 8914718 [details] [diff] [review]: ----------------------------------------------------------------- Phew, thanks. I'm not sure why I didn't do this already. As for threading, I hate to introduce more potential tsan hazards. Can you file a followup to cancel offthread work when changing this (or any) parameter? Low priority, but let's get it on file. (Though I wonder if these settings should be asynchronous, where they get queued up and applied at the appropriate time. Obviously unnecessary for this trivial case, but something to consider for the eventual scheduler, where we might want to make smart choices between GCing vs letting some jitcode run a bit longer if we would discard it.)

Attachment #8914718 - Flags: review?(sphink) → review+

Steve Fink [:sfink] [:s:]

Comment 6

•

7 years ago

Comment on attachment 8914722 [details] [diff] [review] bug1405274-malloc-thresholds Review of attachment 8914722 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/gc/GCRuntime.h @@ +680,5 @@ > + if (isTooMuchMalloc()) > + maxBytes_ *= 2; > + else > + maxBytes_ = std::max(initialMaxBytes_.ref(), size_t(maxBytes_ * 0.9)); > + reset(); The state dependency here makes me a little nervous -- isTooMuchMalloc() must be meaningful at the time you call this. (So don't call it twice, or you'll end up only 90% of where you want to be.) But it's fine. ::: js/src/jsgc.cpp @@ +4339,5 @@ > +GCRuntime::updateMallocCountersOnGC() > +{ > + AutoLockGC lock(rt); > + > + size_t totalZoneMallocBytes = 0; nit: total*Zone*MallocBytes makes me think it's all the bytes for one zone. I would prefer totalRuntimeMallocBytes or just totalMallocBytes. Except that now I realize that this is just the total for the zones that are collecting. Maybe bytesInCollectedZones? @@ +4353,5 @@ > + // zones we did collect. > + if (isFull) > + mallocCounter.updateOnGC(lock); > + else > + mallocCounter.decrement(totalZoneMallocBytes); So let's see... say you run for a while and have 3 zones of equal size. One is idle and you never collect it. One keeps growing to its max size and then throwing out all malloc bytes on every GC. The third is long-lived stuff where the GCs are doing nothing to the malloc bytes. Assume the second and third zones here are active (where the third one is doing lots of stuff with GC bytes but not much with malloc bytes.) The first is treated as fixed-size overhead that pushes the runtime closer to its limit. That seems good. The second ramps up to its limit, possibly pushing the runtime over as it climbs, but after every GC it and the runtime drops back down. That seems ok. The third is a nuisance. The first time you GC, its contribution is removed from the runtime and all of our limits behave as if it doesn't exist. Same for the zone's own limits; you'll need to malloc a ton more memory before you hit the malloc trigger. I guess this raises the question -- should we decrement malloc counters for non-malloc GCs? But I guess if we don't, then we'll just mix in some number of unnecessary malloc-triggered GCs. Bleh. I guess this is all the fallout from not knowing how much memory we've freed. I suppose we could count the number of frees and estimate the amount of memory freed by making the bogus assumption that the mean freed memory size is the same as the mean allocated size (so bytes_freed_since_last_GC = num_frees / num_allocations * bytes_allocated_since_last_GC). But that seems pretty sketchy. It seems like what you have here is the best thing to try.

Attachment #8914722 - Flags: review?(sphink) → review+

Paul Bone [:pbone]

Updated

•

7 years ago

Priority: -- → P2

Paul Bone [:pbone]

Comment 7

•

7 years ago

(In reply to Jon Coppeard (:jonco) from comment #3) > Created attachment 8914722 [details] [diff] [review] > bug1405274-malloc-thresholds > > The threshold starts a the current max bytes setting and doubles every time > it is exceeded (and a GC is triggered). If we GC without reaching it, we > reduce it by 10% down to a minimum of the original max bytes setting. These > numbers are off the top of my head obviously. A fair comment might be that > we should make them configurable. I'm not sure about this. Even if the herustics are just guesses for now I'd prefer to do the sizing for the updated limit based on the used malloc size excluding garbage, so the used size after a collection. > I changed the way the counters are reset from the end of GC to the > beginning. This is because by the end of an incremental GC more memory will > have been allocated. So these now track malloc memory allocated since the > last GC was triggered. But as you point out incremental GC can complicate this. So I'd like to do something fancier, and it should be straightforward but not trivial. Perhaps a follow-up change?

Jon Coppeard (:jonco)

Assignee

Comment 8

•

7 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #4) > Does this function still need to exist? Nope, nice catch. > Can you file a followup to cancel offthread work when changing this (or any) > parameter? Sure, filed bug 1405386. > Bleh. I guess this is all the fallout from not knowing how much memory we've > freed. Yes, the real fix is to do precise tracking of malloc memory. But that's a ton of work. I think this is a clear improvement over what we have now. Other comments addressed.

Jon Coppeard (:jonco)

Assignee

Comment 9

•

7 years ago

(In reply to Paul Bone [:pbone] from comment #7) > I'd prefer to do the sizing for the updated limit based on the used malloc size > excluding garbage, so the used size after a collection. Me too, but we don't have this information. That's what bug 1395509 is about.

Pulsebot

Comment 10

•

7 years ago

Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9a31f02ed47c Always update runtime malloc counter too when malloc memory is associated with a zone r=sfink https://hg.mozilla.org/integration/mozilla-inbound/rev/39f5d3ebf3ac Make malloc counters count up instead of down r=sfink https://hg.mozilla.org/integration/mozilla-inbound/rev/a7d845dd8d7a Add a dynamic malloc bytes threshold r=sfink

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 11

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9a31f02ed47c https://hg.mozilla.org/mozilla-central/rev/39f5d3ebf3ac https://hg.mozilla.org/mozilla-central/rev/a7d845dd8d7a

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Jon Coppeard (:jonco)

Assignee

Comment 12

•

7 years ago

Since these patches landed, we've seen regressions in proportion of non-incremental GCs (from 4% to 14%) and the mean max pause time (from 40ms to 64ms). I'm going to re-open this bug and back out the third patch, which should hopefully fix this.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Jon Coppeard (:jonco)

Assignee

Comment 13

•

7 years ago

Backed out third patch: https://hg.mozilla.org/integration/mozilla-inbound/rev/9a2d0691642f31a2e0e811629c5ea5474c303ad9

Jon Coppeard (:jonco)

Assignee

Updated

•

7 years ago

Depends on: 1405980

Pulsebot

Comment 14

•

7 years ago

Pushed by archaeopteryx@coole-files.de: https://hg.mozilla.org/integration/mozilla-inbound/rev/55161cebaca0 Add a dynamic malloc bytes threshold: Reland to fix timeout in wpt /wasm/many-memories.window.html. r=sfink

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 16

•

7 years ago

wpt >TIMEOUT | /wasm/many-memories.window.html started failing with the backout, see https://treeherder.mozilla.org/logviewer.html#?job_id=135985845&repo=mozilla-inbound , so it got relanded.

Flags: needinfo?(jcoppeard)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 17

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/55161cebaca0

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Resolution: --- → FIXED

Jon Coppeard (:jonco)

Assignee

Updated

•

7 years ago

Depends on: 1408375

Jon Coppeard (:jonco)

Assignee

Updated

•

7 years ago

Flags: needinfo?(jcoppeard)

You need to log in before you can comment on or make changes to this bug.