Closed Bug 1903758 Opened 1 year ago Closed 1 year ago

Experiment with deferring and/or avoiding arena_t::Purge where possible

Categories

(Core :: Memory Allocator, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
136 Branch
Tracking Status
firefox136 --- fixed

People

(Reporter: jstutte, Assigned: jstutte)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

(Keywords: perf-alert, Whiteboard: [sp3])

Attachments

(5 files, 6 obsolete files)

48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details
48 bytes, text/x-phabricator-request
Details

arena_t::Purge checks if if it reached our maximum of non-used but allocated memory and in case is responsible to update our internal book-keeping and give back memory chunks to the OS. It does so synchronously whenever a free or realloc happens. Both our internal book-keeping and giving back memory to the OS can be quite time consuming. We already have ways to influence how much dirty memory we allow before checking, but this comes with the price of memory overhead if taken too far.

There are situations where we know at higher levels of abstraction, that during certain operations a purge is not needed at all or can be done later in order to reduce or avoid this overhead, like:

  • process shutdown. If a process knows that it is on the straight path to shutdown, any purge operation can be entirely avoided.
  • memory drop and raise. When we destroy a bigger tree of objects, just to re-create another, new tree of other objects, any purge that happens in between could be somehow deferred to happen after the end of that cycle.

This bug wants to explore the potential of making moz_jemalloc aware of such conditions or find other ways to avoid their impact, if possible. A big part of this is probably to set up good ways of measuring the impact of what we are trying to do here.

See Also: → 1902728, 1809115

:smaug said, we already looked at this from various angles in the past, so we know there might be potential but we do not really know, how much and if it is worth it. Having some PoC patches might help to find out.

When a process is about to shutdown, there is no point in spending effort to purge our memory, once it exits it will be returned to the system, anyways. Rather than having explicit different code paths everywhere, we can signal jemalloc via a global flag, that it can stop to bother. Once flipped, jemalloc will not execute any purge, and also realloc will only have effect if the new size is bigger than the old.

For content processes, we flip the flag when shutdown is first notified. This notification also interrupts content JS execution, such that there should be no more significant allocations until shutdown.

For the parent process, this patch chooses the highest risk by associating the flip with entering shutdown phase AppShutdownConfirmed, just to verify the potential impact. FWIW, given we only block purge, we do not expect to see unnecessary allocations of system memory.

There might be room for doing more on huge object free, which currently is unaltered.

Note that in leak checking builds we do not skip anything, of course.

This is a PoC patch and might want some adjustment to integrate better with jemalloc. There might be relevant telemetry already in place to monitor the effect.
Note that the profiler currently gets fooled about memory consumption as we bail out at lower levels and would probably need some adjustment, if we want to go this route.

See Also: → 1903764

memory drop and raise. When we destroy a bigger tree of objects, just to re-create another, new tree of other objects, any purge that happens in between could be somehow deferred to happen after the end of that cycle.

As far as this case is concerned, there could be some mechanism that allows to give moz_jemalloc a callback that should be called on purge, something like:

  • The consuming process sets a callback std::function<...> cbRequestPurge for moz_jemalloc purge
    ...
  • Some free happens
  • moz_jemalloc notices it should purge, does setPurgePending() and calls cbRequestPurge([arena = this]() {arena->doPendingPurge(); }
  • the cbRequestPurge does an idle dispatch of the passed callback to the current serial event target
  • when that event is executed, the real purge happens.

The basic idea is to hook into jemalloc on purge and defer it to when we are less busy.

This patch is meant to just show that this may move some needles. It also shows that it is a a bit hairy to inject something like this into jemalloc and might not be the best way how to do this at all.

Attachment #9411253 - Attachment description: WIP: Bug 1903758 - PoC: Do a lazy purge for arenas on nsThread threads. → WIP: Bug 1903758 - PoC: Do a lazy purge for all arenas via idle processing on the main thread.

The lazy purge seems to have some positive effect on (not only) sp3 numbers, thus smaug suggested to link this bug also there to keep it on the radar for further investigations.

Blocks: speedometer3
Whiteboard: [sp3]
Attachment #9411253 - Attachment description: WIP: Bug 1903758 - PoC: Do a lazy purge for all arenas via idle processing on the main thread. → WIP: Bug 1903758 - PoC: Do a lazy purge for arenas on nsThread threads.
Attachment #9411253 - Attachment description: WIP: Bug 1903758 - PoC: Do a lazy purge for arenas on nsThread threads. → WIP: Bug 1903758 - PoC: Do a lazy purge for all arenas via idle processing on the main thread.

Bug 1488780 will be an interesting, complementary improvement: Once we can purge without holding an arena's lock, the impact of purging on the main thread while other threads do some work that uses such an arena should be minimal.

See Also: → 1488780

there could be some mechanism that allows to give moz_jemalloc a callback that should be called on purge

We do not need a callback, the updated patch avoids that.

Assignee: nobody → jstutte
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug
Status: NEW → ASSIGNED
Attachment #9422232 - Attachment description: WIP: Bug 1903758 - Disable dom.memory.foreground_content_processes_have_larger_page_cache but introduce enabled dom.memory.background_content_processes_have_smaller_page_cache. → Bug 1903758 - Disable dom.memory.foreground_content_processes_have_larger_page_cache but introduce enabled dom.memory.background_content_processes_have_smaller_page_cache. r?smaug
Attachment #9422232 - Attachment is obsolete: true
Attachment #9424062 - Attachment description: WIP: Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. → Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug
Attachment #9421656 - Attachment description: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug

Does this bug depend on Bug 1913753?

(In reply to Paul Bone [:pbone] from comment #11)

Does this bug depend on Bug 1913753?

I assume only for the order of landing the patches to avoid conflicts, but it's probably good to make it explicit. It definitely depends on bug 1488780 and that is already blocking Bug 1913753.

Depends on: 1913753
Attachment #9411253 - Attachment is obsolete: true
Blocks: 1920450

Comment on attachment 9408621 [details]
WIP: Bug 1903758 - PoC: Avoid purge in our memory allocator after shutdown started.

Revision D214381 was moved to bug 1920450. Setting attachment 9408621 [details] to obsolete.

Attachment #9408621 - Attachment is obsolete: true
Summary: Experiment with avoiding and/or deferring arena_t::Purge where possible → Experiment with deferring and/or avoiding arena_t::Purge where possible
Depends on: 1920451
Depends on: 1924148

I mentioned this in DMs, but I think you want to eagerly purge in situations like heap minimization, as there's probably tons of random stuff that will break, not just bug 1924148, if we delay things more. It should really be treated as a signal like "please free as much as possible, immediately". It might even come into play with situations where a process is on the verge of OOMing.

(In reply to Andrew McCreight [:mccr8] from comment #14)

I mentioned this in DMs, but I think you want to eagerly purge in situations like heap minimization, as there's probably tons of random stuff that will break, not just bug 1924148, if we delay things more. It should really be treated as a signal like "please free as much as possible, immediately". It might even come into play with situations where a process is on the verge of OOMing.

Yes, the code path that comes through nsMemoryPressureWatcher::Observe should remain intact with lazy purge (and I think the patch keeps it intact but might want some tweak).

The awsy tests indicate that D220616 is mainly responsible for regressing the Base Content Resident Unique Memory measurement.

I tried to exclude that we just need to give the OS a bit more time to update its book-keeping before taking the measurements, but that seems to not change anything, making it likely that we see a real regression.

I think we might see a result of a maybe poor strategy of recycling the memory for (probably small) allocations which causes us fragmentation and keeps resident more pages than would be strictly needed if we were able to pack the memory in an optimal way (which might be hard to impossible if we want to continue to be fast).

I believe during allocation this is mostly driven by arena_t::GetNonFullBinRun, but I am not sure I parse this comment well regarding "lowest in memory" when I look at ArenaRunTreeTrait - are we just trying to say "with the lowest address" here ?

I'd be inclined to spin-off a new bug for improving this and for now to reason about for which platforms we could possibly accept a slight regression in this area that buys us a good performance improvement and for which not.

Blocks: 1924444

(sorry for the noise, wrong bug number though it is supposed to be the same stack)

(In reply to Jens Stutte [:jstutte] from comment #17)

I believe during allocation this is mostly driven by arena_t::GetNonFullBinRun, but I am not sure I parse this comment well regarding "lowest in memory" when I look at ArenaRunTreeTrait - are we just trying to say "with the lowest address" here ?

Correct, it's sorted by memory order.

I'd be inclined to spin-off a new bug for improving this and for now to reason about for which platforms we could possibly accept a slight regression in this area that buys us a good performance improvement and for which not.

I have a set of patches that turn the tree of runs into a priority queue. It seems to improve things but it's a complex patch that currently involves adding another kind of base allocator into jemalloc. I should tidy it up and test it again.

(In reply to Paul Bone [:pbone] from comment #21)

I have a set of patches that turn the tree of runs into a priority queue. It seems to improve things but it's a complex patch that currently involves adding another kind of base allocator into jemalloc. I should tidy it up and test it again.

I went ahead on bug 1924444 and experimented with substituting the tree with a simple O(1) MRU list which seems to pay off quite well (and is a quite simple patch).

No longer blocks: 1924444
See Also: → 1924444
Attachment #9421656 - Attachment description: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug
Attachment #9424062 - Attachment description: Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug → WIP: Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug
Depends on: 1933648
See Also: → 1933642
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug
Attachment #9424062 - Attachment description: WIP: Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug → Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug
Blocks: 1935590

Comment on attachment 9424062 [details]
Bug 1903758 - Let chunk_dealloc do a lazy pages_unmap. r?glandium,pbone,smaug

Revision D221761 was moved to bug 1935590. Setting attachment 9424062 [details] to obsolete.

Attachment #9424062 - Attachment is obsolete: true
Blocks: 1920451
No longer depends on: 1920451

Introduce a list of outstanding purge requests and API calls to enable its use and hooks to actually purge.

Without enabling, the behavior remains unchanged.

Attachment #9421656 - Attachment description: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?glandium,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9443493 - Attachment description: WIP: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug → Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug
Attachment #9443498 - Attachment description: WIP: Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug → Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug

This is meant to increase readability where we used to use EffectiveMaxDirty() directly.

Note that ExtraCommitPages() uses different thresholds than Purge(), for that use we do not change anything.

Attachment #9443688 - Attachment description: WIP: Bug 1903758 - Have helpers for purge thresholds. r?pbone → Bug 1903758 - Have helpers for purge thresholds. r?pbone
Attachment #9421656 - Attachment description: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9443498 - Attachment description: Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug → WIP: Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug
Attachment #9443688 - Attachment description: Bug 1903758 - Have helpers for purge thresholds. r?pbone → WIP: Bug 1903758 - Have helpers for purge thresholds. r?pbone
Attachment #9443498 - Attachment description: WIP: Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug → Bug 1903758 - Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r?pbone,smaug
Attachment #9443688 - Attachment description: WIP: Bug 1903758 - Have helpers for purge thresholds. r?pbone → Bug 1903758 - Have helpers for purge thresholds. r?pbone
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9443688 - Attachment description: Bug 1903758 - Have helpers for purge thresholds. r?pbone → WIP: Bug 1903758 - Have helpers for purge thresholds. r?pbone
Attachment #9443493 - Attachment description: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug → WIP: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug
Attachment #9421656 - Attachment description: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9443493 - Attachment description: WIP: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug → Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug
Attachment #9421656 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9443688 - Attachment description: WIP: Bug 1903758 - Have helpers for purge thresholds. r?pbone → Bug 1903758 - Have helpers for purge thresholds. r?pbone

EffectiveMaxDirty() is called on every free and doing some calculations and reads memory from gArenas. After an arena is created, the value can only change when moz_set_max_dirty_page_modifier is called.

This showed up in profiles while working on Bug 1903758. During a full speedometer 3 run on that machine we used to spend 44ms in EffectiveMaxDirty(), with this patch only 4ms in ShouldStartPurge().

Attachment #9445099 - Attachment description: Bug 1903758 - Precalculate mEffectiveMaxNumDirty. r?pbone → Bug 1903758 - Precalculate mMaxDirty. r?pbone
Attachment #9443493 - Attachment description: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug → WIP: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug
Attachment #9443493 - Attachment description: WIP: Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug → Bug 1903758 - Add an API to mozjemalloc to allow to control when to purge from outside. r?pbone,smaug
Attachment #9459197 - Attachment description: WIP: Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug → Bug 1903758 - Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r?pbone,smaug
Attachment #9459197 - Attachment is obsolete: true
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0e771414af93 Add an extra jemalloc_free_dirty_pages to the MinimizeMemoryUsageRunnable. r=pbone,smaug https://hg.mozilla.org/integration/autoland/rev/c20c7fb738dc Have helpers for purge thresholds. r=pbone https://hg.mozilla.org/integration/autoland/rev/43a16bf722e2 Precalculate mMaxDirty. r=pbone https://hg.mozilla.org/integration/autoland/rev/38ff70a42938 Add an API to mozjemalloc to allow to control when to purge from outside. r=pbone,smaug https://hg.mozilla.org/integration/autoland/rev/0927646a79dd Make mozjemalloc have a lazy purge for all arenas via idle processing on the main thread. r=smaug

Probably improved Jetstream2-Bomb-workers by 14%

Perfherder has detected a devtools performance change from push 0927646a79dd3149e2e7463d01adb43e6295cba1.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
22% damp server.protocoljs.DAMP windows11-64-shippable-qr e10s fission stylo webrender 670.12 -> 520.97
13% damp server.protocoljs.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 848.89 -> 736.22
10% damp browser-toolbox.debugger-ready.DAMP windows11-64-shippable-qr e10s fission stylo webrender 533.99 -> 480.37
7% damp custom.inspector.close.DAMP windows11-64-shippable-qr e10s fission stylo webrender 26.64 -> 24.73
6% damp simple.jsdebugger.close.DAMP windows11-64-shippable-qr e10s fission stylo webrender 9.41 -> 8.86
... ... ... ... ...
2% damp custom.jsdebugger.stepOut.DAMP windows11-64-shippable-qr e10s fission stylo webrender 451.00 -> 441.63

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 43391

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert
Regressions: 1944005
Regressions: 1944040

While bug 1944005 tracks a (probably not too important) regression, it also contains:

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
5% speedometer3 TodoMVC-jQuery/DeletingAllItems/total android-hw-a55-14-0-aarch64-shippable fission webrender 120.94 -> 115.24
4% speedometer3 TodoMVC-jQuery/total android-hw-a55-14-0-aarch64-shippable fission webrender 469.88 -> 448.78
4% speedometer3 TodoMVC-JavaScript-ES6-Webpack-Complex-DOM/Adding100Items/Sync android-hw-a55-14-0-aarch64-shippable fission webrender 56.77 -> 54.45
4% speedometer3 TodoMVC-jQuery/CompletingAllItems/Sync android-hw-a55-14-0-aarch64-shippable fission webrender 206.82 -> 199.18
4% speedometer3 TodoMVC-jQuery/DeletingAllItems/Sync android-hw-a55-14-0-aarch64-shippable fission webrender 116.13 -> 111.94
... ... ... ... ...
2% speedometer3 Editor-TipTap/total android-hw-a55-14-0-aarch64-shippable fission webrender 185.18 -> 181.17

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Regressions: 1949356
Regressions: 1950041
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: