Closed Bug 1203840 Opened 4 years ago Closed 4 years ago

Trigger dirty pages purge after CC

Categories

(Core :: Memory Allocator, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: glandium, Assigned: glandium)

Details

Attachments

(1 file, 2 obsolete files)

No description provided.
Jemalloc 4 purges dirty pages regularly during free() when the ratio of dirty
pages compared to active pages is higher than 1 << lg_dirty_mult.  We set
lg_dirty_mult in jemalloc_config to limit RSS usage, but it also has an impact
on performance.

So instead of enforcing a high ratio to force more pages being purged, we keep
jemalloc's default ratio of 8, and force a regular purge of all dirty pages,
after cycle collection.

Keeping jemalloc's default ratio avoids cycle-collection-triggered purge to
have to go through really all dirty pages when there are a lot, in which case
the normal jemalloc purge during free() will already have kicked in. It also
takes care of everything that doesn't run the cycle collector still having
a level of purge, like plugins in the plugin-container.

Andrew, what do you think of the location of the call to jemalloc_free_dirty_pages in nsCycleCollector::CleanupAfterCollection()? Should it maybe happen after the interval is calculated, so as not to influence the measured cycle collector time? Would you like me to add some timeLog too?
Attachment #8659733 - Flags: review?(n.nethercote)
Attachment #8659733 - Flags: review?(continuation)
Comment on attachment 8659733 [details] [diff] [review]
Trigger dirty pages purge after CC

Review of attachment 8659733 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsCycleCollector.cpp
@@ +3500,5 @@
>    timeLog.Checkpoint("CleanupAfterCollection::mGraph.Clear()");
>  
> +#if defined(MOZ_JEMALLOC4)
> +  jemalloc_free_dirty_pages();
> +#endif

A comment here -- perhaps part of the commit message -- would be useful.
Attachment #8659733 - Flags: review?(n.nethercote) → review+
How long is this purging going to take?
If it takes lots of time, probably better to have a separate slice for it.
(In reply to Andrew McCreight [:mccr8] from comment #3)
> How long is this purging going to take?
Flags: needinfo?(mh+mozilla)
It shouldn't take long, but that heavily depends on the OS. I know OSX tends to have slow virtual memory related syscalls... so it might be worth tracking. It's not clear to me whether timeLog is the right thing for that.
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(continuation)
Flags: needinfo?(bugs)
timeLog just does some measurement if you are running it locally. You'd probably want telemetry or something. Could we maybe just spin out a runnable for this and then not worry about impacting the CC pause time?

Is this purging supposed to just be done once per process? We run a cycle collector on the main thread, and one on each worker thread.
Flags: needinfo?(continuation)
So the interesting thing is that this purge has always happened to some degree randomly during CC. That is, it's a purge that normally happens during free() depending on the ratio of active pages vs. dirty pages, and not all dirty pages are necessarily purged when that happens. So it can happen at any time free() is called, which CC is likely to do. The difference is that it would now happen voluntarily, and would apply to all dirty pages. Another difference is the exact thing that is being done during this purge. On Linux and Windows, it's unchanged, but on OSX, it's different from what it used to be with mozjemalloc.

I don't think spawning a runnable is not going to make much a difference, because the purge operation holds a lock in jemalloc that will essentially block malloc/free on other threads anyways.

> Is this purging supposed to just be done once per process? We run a cycle collector on the main thread, and one on each worker thread.

I guess it would make sense to add a condition about the CC being on the main thread? Calling the purge n times because there are n worker threads seems a bit extreme.
So we should purge after SnowWhiteKiller has deleted stuff, not actually during CC.
Somewhere http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#134
but we don't want purge to happen all the time, since SnowWhiteKiller runs also other times than just right after CC.
I guess nsCycleCollector_dispatchDeferredDeletion should have a param for purge, and only 
nsCycleCollector::CollectWhite(), which among others calls that method, would request purge.
Flags: needinfo?(bugs)
Comment on attachment 8659733 [details] [diff] [review]
Trigger dirty pages purge after CC

As Olli said, cycle collected objects generally don't actually get destroyed during cycle collection, but during a FreeSnowWhite that happens right after. Does it matter how much exactly is freed? I suppose since we run the CC every 10 seconds it shouldn't matter too much exactly when it happens, and what Olli proposes sounds a little complicated for the possible gains. I'm not sure exactly.

We at least want to run this only on the main thread, so I'm r-ing for that.
Attachment #8659733 - Flags: review?(continuation) → review-
Attachment #8659733 - Attachment is obsolete: true
Attachment #8661694 - Flags: review?(n.nethercote)
Attachment #8661694 - Flags: review?(continuation)
Comment on attachment 8661694 [details] [diff] [review]
Trigger dirty pages purge after CC

Review of attachment 8661694 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +145,5 @@
> +#if defined(MOZ_JEMALLOC4)
> +      if (NS_IsMainThread()) {
> +          /* Jemalloc purges dirty pages regularly during free() when the ratio
> +           * of dirty pages compared to active pages is higher than
> +           * 1 << lg_dirty_mult. A high ration can have an impact on performance,

Nit: s/ration/ratio/
Attachment #8661694 - Flags: review?(n.nethercote) → review+
Comment on attachment 8661694 [details] [diff] [review]
Trigger dirty pages purge after CC

Review of attachment 8661694 [details] [diff] [review]:
-----------------------------------------------------------------

This looks much better, thanks. I'll defer the review to Olli who seemed to have some specific opinions about this. You don't need the main thread check here, because this particular code only runs on the main thread.
Attachment #8661694 - Flags: review?(continuation) → review?(bugs)
Comment on attachment 8661694 [details] [diff] [review]
Trigger dirty pages purge after CC

This is a main thread only runnable, so, no need for thread check.
Note, AsyncFreeSnowWhite is used whenever there are cycle collectable objects to delete, so also after some objects have their refcnt dropped to 0 (so, CC may not have run).

I would move jemalloc_free_dirty_pages(); inside the } else { and trigger it only
if (mContinuation) {
    jemalloc_free_dirty_pages();
}
That way we'd call it only if some objects were actually deleted.



But please measure how much time we spend in jemalloc_free_dirty_pages();, also
in some unusual cases like loading http://www.whatwg.org/specs/web-apps/current-work/ and then closing the tab.
conditional r+, want to see some numbers how much time we spend in that method.
Attachment #8661694 - Flags: review?(bugs) → review+
From the patch, it looks like the time for this is being collected with MEMORY_FREE_PURGED_PAGES_MS, though telemetry isn't returning any results for it, which is concerning.
I'd like to see some numbers _before_ we even land this. Since right now I have no idea whether we're talking about some microseconds or several milliseconds or what.
(In reply to Olli Pettay [:smaug] from comment #16)
> I'd like to see some numbers _before_ we even land this. Since right now I
> have no idea whether we're talking about some microseconds or several
> milliseconds or what.

Well, that's supposed to be collected already, but it looks broken on every branch except release, which is concerning. But anyways, on release it looks like it is mostly in the range from 5 to 20ms, with a median of 9.46. Maybe it will be faster if we run it more.

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2015-08-26&keys=__none__!__none__!__none__&max_channel_version=release%252F40&measure=MEMORY_FREE_PURGED_PAGES_MS&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2015-08-07&table=0&trim=1&use_submission_date=0
The current telemetry is for jemalloc_purge_freed_pages, and the new one for jemalloc_free_dirty_pages, right? Are those doing the same thing? Are they both performing the same way?
jemalloc_purge_freed_pages seems like it should be a no-op on Nightly [1]. There's also a comment saying it should be removed eventually (presumably after mozjemalloc is removed for good) [2]. jemalloc_free_dirty_pages is defined below [1], but I'm not sure how it works under the hood. I doubt it's covered by that telemetry measure though.

[1] https://dxr.mozilla.org/mozilla-central/rev/9ed17db42e3e46f1c712e4dffd62d54e915e0fac/memory/build/mozjemalloc_compat.c#176
[2] https://dxr.mozilla.org/mozilla-central/rev/9ed17db42e3e46f1c712e4dffd62d54e915e0fac/memory/build/jemalloc_config.cpp?offset=0#64
The metric in telemetry currently only collects on mac because mac is the only platform where we explicitly triggered a purge with jemalloc_purge_freed_pages. It is also only triggered in relation to about:memory, so dirty pages were allowed to pile up a lot.

With jemalloc4, however, that function does nothing, so keeping that telemetry collection is not useful, but since I'm adding something similar here, I figured I'd reuse the probe. I'd expect lower numbers than what MEMORY_FREE_PURGED_PAGES_MS currently shows on mac. FWIW, on linux, after running firefox for a few minutes, going to http://www.whatwg.org/specs/web-apps/current-work/, then about:memory and triggering all possible memory reductions, about:telemetry gives me this data:

MEMORY_FREE_PURGED_PAGES_MS
399 samples, average = 0, sum = 6

0 |#########################  393  6550%
1 |  6  100%
2 |  0  0%

So, below 1ms for the vast majority, and below 2ms for a few. This is however with the current patch, not with the proposed changes from comment 14.
The changes I suggested shouldn't affect much. But ok, if the new method doesn't take more than that, then fine. Thanks for testing.
I want to check on other OSes, though. I'm going to try to collect the telemetry data from unittests on try.
With this version, still, I haven't got above 2ms per purge, on Windows, OSX and Linux. I haven't tested Android, though.
Attachment #8661694 - Attachment is obsolete: true
Attachment #8662318 - Flags: review?(bugs)
Attachment #8662318 - Flags: review?(bugs) → review+
https://hg.mozilla.org/mozilla-central/rev/5559e86a2f3b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
You need to log in before you can comment on or make changes to this bug.