Have heuristics to better predict when it is a good moment to purge
Categories
(Core :: Memory Allocator, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox137 | --- | fixed |
People
(Reporter: jstutte, Assigned: jstutte)
References
(Blocks 3 open bugs, Regressed 1 open bug)
Details
(Keywords: perf-alert, Whiteboard: [sp3])
Attachments
(3 files, 2 obsolete files)
Tests for bug 1903758 showed that most of the benefit of deferred purging comes from letting an arena re-use recently freed memory without ever giving it back to the OS.
We could introduce a concept of "last significant re-use" on each arena, that tells us how much time passed since we successfully re-used a page since the last purge request was issued, assuming that a recent re-use makes it more likely for other re-uses to come. It might be as easy as the last time that mNumDirty
went down for re-use.
A similar concept could also be applied to chunk recycling.
Once we have that information, we can easily use this during moz_may_purge_one_now
to determine if there is any arena (or the recycled chunk list) that needs purging (cleanup) by just iterating over the list of purge requests and comparing that timestamp with now, and (if requested) do the work only for those.
Assignee | ||
Comment 1•5 months ago
|
||
A first sketch how this could look like.
The basic idea is:
- whenever we re-use more memory than the significance threshold we raise the bar of significance to that level of usage and update the timestamp
- whenever we free memory, we reset the bar of significance to the new, lower level of usage but do not update the timestamp
- when we may want to schedule an idle purge, we consider only purge requests that had no re-use for longer than a grace period
This should ensure that we do not count every up and down as a "significant reuse" but converge to a new average level that we want to purge for.
We do that per-arena based an mNumDirty as input for the threshold as well as for the recycled chunks list as a whole based on gRecycleSize.
Updated•5 months ago
|
Updated•5 months ago
|
Assignee | ||
Comment 2•3 months ago
|
||
Assignee | ||
Comment 3•3 months ago
|
||
Assignee | ||
Comment 4•3 months ago
•
|
||
So I looked at memory reports from the regressing test, and it seems from the diff, that:
-7.95 MB (100.0%) -- heap
├──-15.70 MB (197.56%) -- decommitted
│ ├───-9.71 MB (122.17%) -- unused-pages
│ │ ├──-5.95 MB (74.83%) ── madvised [8]
│ │ └──-3.76 MB (47.35%) ── fresh [8]
│ └───-5.99 MB (75.39%) ── unmapped [8]
└───7.75 MB (-97.56%) -- committed
├──7.73 MB (-97.35%) ── unused-pages/dirty [8] <<<<< !!!!!!
└──0.02 MB (-00.22%) -- (3 tiny)
├──0.02 MB (-00.20%) ── bookkeeping [8]
├──0.01 MB (-00.11%) ── allocated [8]
└──-0.01 MB (00.08%) ++ bin-unused
After lazy purge we see more committed unused-pages/dirty
than without. The interesting thing is, that instead bin-unused
is basically identical, so we do not see a higher fragmentation inside committed bin pages, AFAICT. I wonder, if purge was not able to coalesce enough dirty pages to make them become decommitted? Is there any magic constant we could tweak to be more aggressive about decommitting during purge ? I assume that arena_t::PurgeInfo::FindDirtyPages
is somehow involved here ?
Note that in turn apparently we keep less fresh pages around, which kind of makes sense, given we want to re-use pages more often. Please note also that these measures are taken after a MozJemalloc::jemalloc_free_dirty_pages
which is supposed to free ALL dirty pages, IIUC.
Comment 5•3 months ago
|
||
Intreaguing...
FindDirtyPages()
isn't changed by your patch and I can't imagine how it would be effected by your patch. It doesn't look for special pages or longer sets of pages. Its cleverness is limited to trying to do as little scanning as possible. For anyone else reading this who didn't see our conversation on slack. Dirty pages are the only kind of pages that it makes sense to purge, they're the only used page that is guaranteed to be resident. Fresh and madvised pages might be resident but we should assume they aren't since that's what OSs usually/historically do. The exception is madvised pages on MacOS, hence the double-purge logic.
I suspect, and based on the profiles I generated, that Purge just isn't being called as much. However that the test should call jemalloc_free_dirty_pages()
, I don't see why it wouldn't work. Something is very strange with that perfhearder report. it is for "Base Content Resident Unique Memory Base Resident Unique Memory After tabs open [+30s, forced GC] opt fission" AFAIK it can't be both "Base" and "After tabs open"? When I click the link to go back to the main report I don't see it, but I find "Base Content Resident Unique Memory opt fission" with the same number, I think this is correct because 7-9MB fits "Base" and not "after tabs open", so I don't think jemalloc_free_dirty_pages()
is getting called in that test.
Assignee | ||
Comment 6•3 months ago
|
||
Thanks for taking a look. Yesterday evening I did a detailed investigation thanks to this pernosco profile and filed bug 1933648. tl;dr: There is no real regression, just noise from our measurements.
I think we can go forward with lazy purge (the patches might need some cleanup).
Assignee | ||
Comment 7•3 months ago
|
||
WRT the effect on page faults, :denispal made some Android profiles with the patch stack from lazy purge: Each profile ran through speedometer3 30 times. We see around a 30-40% reduction in page faults samples with lazy purge.
Base:
https://share.firefox.dev/3ZlED0X
https://share.firefox.dev/4923Uk5
Patches:
https://share.firefox.dev/4hX6hJ7
https://share.firefox.dev/3V6CRyr
Updated•3 months ago
|
Updated•3 months ago
|
Updated•3 months ago
|
Assignee | ||
Updated•3 months ago
|
Updated•3 months ago
|
Updated•3 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•20 days ago
|
Updated•20 days ago
|
Comment 8•16 days ago
|
||
Assignee | ||
Comment 9•13 days ago
|
||
Updated•8 days ago
|
Updated•8 days ago
|
Comment 10•8 days ago
|
||
Comment 11•7 days ago
|
||
bugherder |
Comment 12•6 days ago
|
||
4% improvement on Ares6-Air-first-iteration
Comment 13•1 day ago
|
||
(In reply to Pulsebot from comment #10)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f9e3f8796616
Introduce a concept of last significant re-use to decide when to purge.
r=pbone,smaug
Perfherder has detected a browsertime performance change from push f9e3f8796616bf24fd9d1c7260bc126fe531d0e0.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
8% | speedometer jQuery-TodoMVC/Adding100Items/Sync | windows11-64-24h2-shippable | fission webrender | 35.72 -> 32.75 | |
8% | speedometer jQuery-TodoMVC/Adding100Items/Sync | windows11-64-24h2-nightlyasrelease | fission webrender | 34.57 -> 31.71 | |
8% | speedometer jQuery-TodoMVC/Adding100Items | windows11-64-24h2-shippable | fission webrender | 37.82 -> 34.81 | |
8% | speedometer jQuery-TodoMVC/Adding100Items | windows11-64-24h2-nightlyasrelease | fission webrender | 36.61 -> 33.73 | |
8% | speedometer jQuery-TodoMVC/Adding100Items/Sync | windows11-64-nightlyasrelease-qr | fission webrender | 33.98 -> 31.40 | |
... | ... | ... | ... | ... | ... |
2% | speedometer Inferno-TodoMVC | windows11-64-shippable-qr | fission webrender | 56.54 -> 55.18 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 44001
For more information on performance sheriffing please see our FAQ.
Description
•