Closed
Bug 762448
Opened 13 years ago
Closed 10 years ago
Investigate jemalloc 3 RSS, dirty and waste regression
Categories
(Core :: Memory Allocator, defect)
Tracking
()
RESOLVED
FIXED
mozilla37
People
(Reporter: glandium, Assigned: ggp)
References
Details
Attachments
(5 files, 5 obsolete files)
811 bytes,
patch
|
glandium
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
847 bytes,
patch
|
glandium
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
200.99 KB,
image/png
|
Details | |
16.12 KB,
text/plain
|
Details | |
1.17 KB,
patch
|
ggp
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
Tests conducted in bug 580408 show that jemalloc 3 regresses our memory usage after tab close.
Reporter | ||
Updated•13 years ago
|
Blocks: jemalloc4-by-default
Comment 1•11 years ago
|
||
Is this still the case with the newest jemalloc version?
Reporter | ||
Comment 2•10 years ago
|
||
(In reply to Florian Bender from comment #1)
> Is this still the case with the newest jemalloc version?
Recent measuring shows it's still the case.
Assignee | ||
Comment 3•10 years ago
|
||
It looks like disabling the thread cache (tcache) helps: https://areweslimyet.com/?series=bbug762448&evenspacing . The builds in that run are, from left to right, jemalloc3 with tcache disabled [1], jemalloc3 with tcache enabled [2], and mozjemalloc [3].
Due to an oversight on my part, the jemalloc3 builds also include my heap partitioning patches. We know from previous experimentation that the overhead introduced by these patches over a plain jemalloc3 build is rather small (and certainly smaller than the jemalloc3 vs. mozjemalloc regression), but bear in mind that it is present in the graph too. I'll be happy to re-run the tests without my patches if necessary, though, to get a more accurate view of jemalloc3's isolated effects.
1- https://tbpl.mozilla.org/?tree=Try&rev=fd6445a170ae
2- https://tbpl.mozilla.org/?tree=Try&rev=b161078673c0
3- https://tbpl.mozilla.org/?tree=Try&rev=caf0405b7e99
Reporter | ||
Comment 4•10 years ago
|
||
Mmmm I thought I had disabled tcache, but memory/build/jemalloc_config.c tells I'm wrong.
Reporter | ||
Comment 5•10 years ago
|
||
That said, I *do* think I tested with tcache disabled back then, and while it was making a difference, it didn't fix the entire RSS regression. It looks like it does, now.
Reporter | ||
Comment 6•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #5)
> It looks like it does, now.
Actually, I misread it, there's a small remainder of a regression.
Assignee | ||
Comment 7•10 years ago
|
||
Here's the patch for disabling the tcache.
Try: https://tbpl.mozilla.org/?tree=Try&rev=b4fe96ca4634 . I'll look at the oranges there today.
I also added the Try build to the AWSY run in comment 3 as the fourth column in the graph,
so we can see what the RSS looks like without the partitioning patches. Looks like we regress
by very little when not partitioning, are we willing to tolerate that for the purposes of this
bug?
Attachment #8510171 -
Flags: review?(mh+mozilla)
Reporter | ||
Updated•10 years ago
|
Attachment #8510171 -
Flags: review?(mh+mozilla) → review+
Reporter | ||
Comment 8•10 years ago
|
||
(In reply to Guilherme Gonçalves [:ggp] from comment #7)
> I also added the Try build to the AWSY run in comment 3 as the fourth column
> in the graph,
> so we can see what the RSS looks like without the partitioning patches.
> Looks like we regress
> by very little when not partitioning, are we willing to tolerate that for
> the purposes of this
> bug?
I'm going to land bug 1083686 once try confirms it doesn't end up burning the tree. Would you mind getting a log from an AWSY run, and fool around with logalloc-replay with mozjemalloc and jemalloc3 for comparison? That would eliminate some of the noise that would be due to AWSY runs not doing the exact same allocations at the same exact time, and would give us a clearer picture of what the regression actually looks like.
Reporter | ||
Comment 9•10 years ago
|
||
(Note that the jemalloc_stats output /might/ be insufficient. Ping me if you need more)
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #8)
> Would you mind getting a log from an AWSY run, and fool around
> with logalloc-replay with mozjemalloc and jemalloc3 for comparison?
Sure. I tried it locally and it does seem to run well enough, I can get :johns to help run it on AWSY tomorrow. A couple of questions, though:
- While preprocessing the allocation log locally, I got several ignored calls to free ("Did not see an alloc for pointer"). Is that expected?
- Looks like there's no way to replay allocations made by different threads, right? So this will be a good way to compare mozjemalloc and jemalloc3 with tcache disabled, but not enough to assess the regression itself;
Reporter | ||
Comment 11•10 years ago
|
||
(In reply to Guilherme Gonçalves [:ggp] from comment #10)
> (In reply to Mike Hommey [:glandium] from comment #8)
> > Would you mind getting a log from an AWSY run, and fool around
> > with logalloc-replay with mozjemalloc and jemalloc3 for comparison?
>
> Sure. I tried it locally and it does seem to run well enough, I can get
> :johns to help run it on AWSY tomorrow. A couple of questions, though:
>
> - While preprocessing the allocation log locally, I got several ignored
> calls to free ("Did not see an alloc for pointer"). Is that expected?
On mac, /maybe/, but it sure could be worth investigating them.
> - Looks like there's no way to replay allocations made by different threads,
> right? So this will be a good way to compare mozjemalloc and jemalloc3 with
> tcache disabled, but not enough to assess the regression itself;
Well, the tcache regression is kind of obvious, since tcache adds per-thread data. But AWSY seems to show /some/ level of regression without tcache, and that's what we should be tracking here, and replaying the same allocations would allow to eliminate variations due to the change in order of allocations. Or due to some other things.
Assignee | ||
Comment 12•10 years ago
|
||
:johns has provided us with logs of a sample AWSY run. I'm hosting the log [1] and its munged version [2] for convenience.
While running this locally, I came across crashes in both mozjemalloc and jemalloc3. The crash in mozjemalloc was being caused by an assertion related to memory corruption [3]. It looked like Replay::Commit was overwriting jemalloc's metadata; the patch at bug 1093174 seems to solve that (though I'm not quite sure why).
I do manage to replay the run successfully with that patch applied, I'll have experimental results shortly.
1- https://people.mozilla.org/~ggoncalves/malloc-log-ggp.log.xz
2- https://people.mozilla.org/~ggoncalves/malloc-log-ggp.munged.xz
3- http://dxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c?from=memory/mozjemalloc/jemalloc.c#4416
Depends on: 1093174
Assignee | ||
Comment 13•10 years ago
|
||
Here are the results of replaying the AWSY log in comment 12 over mozjemalloc and jemalloc3 with tcache disabled. The run contains 64 calls to jemalloc_stats(), and while it's not clear to me how they correspond to the 8 data points AWSY collects for each run, they should be enough for comparison of overall memory usage.
The image shows 5 pairs of graphs, three in the top row, two in the bottom row, each containing the 64 measurements for a different metric (labeled in the y axis). In each pair, the top graph shows the numbers in MiB, and the bottom shows the difference in values between mozjemalloc and jemalloc3.
Out of the stats returned by jemalloc_stats, I've omitted 'bookkeeping' and 'bin_unused', as they're not implemented in jemalloc3. On the other hand, I measured RSS in each jemalloc_stats() call using the task_info OSX syscall [1,2], so we have figures for that too.
While the stats for 'allocated', 'mapped' and 'rss' show that the remaining regression is rather small, I don't know how to explain the big difference in 'dirty' and 'waste', nor why 'waste' is always zero in mozjemalloc.
1- http://nadeausoftware.com/articles/2012/07/c_c_tip_how_get_process_resident_set_size_physical_memory_use
2- https://stackoverflow.com/questions/5839626/how-is-top-able-to-see-memory-usage
Assignee | ||
Comment 14•10 years ago
|
||
And here is the raw output of logalloc-replay for both allocators, concatenated in a single file.
Assignee | ||
Comment 15•10 years ago
|
||
Mike, what do you think? Can we live with the current situation, or is the RSS difference peaking at 3MiB still too bad?
Assignee: nobody → ggoncalves
Flags: needinfo?(mh+mozilla)
Reporter | ||
Comment 16•10 years ago
|
||
3MiB is tolerable, I guess. Did you check with jemalloc trunk?
Flags: needinfo?(mh+mozilla)
Reporter | ||
Comment 17•10 years ago
|
||
(note we should investigate the weird values on e.g. waste, too)
Assignee | ||
Comment 18•10 years ago
|
||
Not yet, I'll re-test when we settle on a revision to update to, after upstreaming bug 899126. But let's land the tcache patch for now.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=f483e65fe480
Depends on: 1094275
Keywords: checkin-needed,
leave-open
Comment 19•10 years ago
|
||
Keywords: checkin-needed
Comment 20•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Summary: Investigate jemalloc 3 RSS regression → Investigate jemalloc 3 RSS, dirty and waste regression
Assignee | ||
Comment 21•10 years ago
|
||
Looks like the regression in waste can be explained by the fact that jemalloc3's "stats.active" includes
what we report as bin_unused. So in jemalloc3 we compute waste as:
stats->waste = mallctl("stats.active") - mallctl("stats.allocated")
whereas in mozjemalloc, it is:
for each arena:
stats->waste += arena_committed -
arena_allocated - arena_dirty - arena_unused - arena_headers;
(... compute total space occupied by chunk headers as chunk_header_size ...)
stats->waste -= chunk_header_size;
jemalloc3's "stats.active" is equivalent to the sum over all arenas of arena_committed + arena_unused,
and already discounts chunk_header_size, arena_headers and arena_dirty. Since the sum of arena_unused
over all arenas is bin_unused, subtracting bin_unused from stats->waste should fix the regression.
I've tested this by updating to latest jemalloc3, then applying the patch in bug 899126 to get bin_unused
working, then applying this patch. As bug 899126 comment 5 already shows, waste is equal to binunused on
a simple logalloc run, so waste is zero. I have confirmed that this is also the case with our logged AWSY
run, which is the same we get with mozjemalloc.
Attachment #8521655 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 22•10 years ago
|
||
And for reference, here's where I believe bin_unused gets added to "stats.active": https://github.com/jemalloc/jemalloc/blob/dev/src/arena.c#L247 . arena_run_split_remove gets called whenever a new run is allocated for an arena, and at that point need_pages is the total number of pages needed to accommodate the entire run, including unused regions.
Assignee | ||
Comment 23•10 years ago
|
||
Comment on attachment 8521655 [details] [diff] [review]
Deduct bin_unused from waste in jemalloc3's stats.
Actually, sorry, the patch obviously depends on bug 899126 landing first. There's no point in reviewing it now.
Attachment #8521655 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 24•10 years ago
|
||
And the regression in dirty can be resolved by tweaking the "lg_dirty_mult" jemalloc3 option, which specifies the base 2 log of the acceptable ratio of active to dirty pages per arena. The default value is 3, meaning the allocator will attempt to keep a ratio of 8:1 active to dirty pages.
Given how:
1- We only use one arena, for both jemalloc3 and mozjemalloc;
2- For the entire recorded AWSY run, the value of allocated, and also of jemalloc3's "stats.active", never exceeds 250MiB (and there's actually some slack there);
3- We're currently willing to tolerate at most 4MiB of dirty pages with mozjemalloc, as per the DIRTY_MAX_DEFAULT macro;
I think we can use a conservative value of ceil(log2(250/4)) = 6. I'll upload a patch to do that shortly.
I'm attaching the graphs for mozjemalloc vs. jemalloc3, after updating to latest jemalloc3 and applying the patches for waste and dirty. We're now within a 2MiB margin of mozjemalloc, and rss has dropped a little further now that we're being more aggressive about purging pages.
Attachment #8516891 -
Attachment is obsolete: true
Assignee | ||
Comment 25•10 years ago
|
||
The raw data used to generate the graphs.
Attachment #8516892 -
Attachment is obsolete: true
Assignee | ||
Comment 26•10 years ago
|
||
And the patch.
Assignee | ||
Updated•10 years ago
|
Attachment #8521655 -
Flags: review?(mh+mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8522445 -
Flags: review?(mh+mozilla)
Comment 27•10 years ago
|
||
(In reply to Guilherme Gonçalves [:ggp] from comment #24)
> 3- We're currently willing to tolerate at most 4MiB of dirty pages with
> mozjemalloc, as per the DIRTY_MAX_DEFAULT macro;
>
> I think we can use a conservative value of ceil(log2(250/4)) = 6.
Important detail: in bug 811740 we limit to *1* MiB on B2G, so this value would need to be different for that platform.
http://hg.mozilla.org/mozilla-central/annotate/e7f3e6fee15e/memory/mozjemalloc/jemalloc.c#l155
Assignee | ||
Comment 28•10 years ago
|
||
Good point, let's go with 8 on B2G.
Attachment #8532679 -
Flags: review?(mh+mozilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8522445 -
Attachment is obsolete: true
Attachment #8522445 -
Flags: review?(mh+mozilla)
Reporter | ||
Comment 29•10 years ago
|
||
Comment on attachment 8532679 [details] [diff] [review]
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages.
Review of attachment 8532679 [details] [diff] [review]:
-----------------------------------------------------------------
::: memory/build/jemalloc_config.c
@@ +17,5 @@
> +#endif
> +
> +#define MOZ_MALLOC_OPTIONS \
> + "narenas:1,lg_chunk:20,tcache:false" MOZ_MALLOC_PLATFORM_OPTIONS
> +MFBT_DATA const char * je_(malloc_conf) = MOZ_MALLOC_OPTIONS;
Make this je_(malloc_conf) = MOZ_MALLOC_OPTIONS MOZ_MALLOC_PLATFORM_OPTIONS; instead
Attachment #8532679 -
Flags: review?(mh+mozilla) → feedback+
Reporter | ||
Updated•10 years ago
|
Attachment #8521655 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 30•10 years ago
|
||
Sure, done.
Attachment #8532679 -
Attachment is obsolete: true
Attachment #8539391 -
Flags: review?(mh+mozilla)
Reporter | ||
Comment 31•10 years ago
|
||
Comment on attachment 8539391 [details] [diff] [review]
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages.
Review of attachment 8539391 [details] [diff] [review]:
-----------------------------------------------------------------
::: memory/build/jemalloc_config.c
@@ +10,5 @@
> #include "mozilla/Types.h"
>
> /* Override some jemalloc defaults */
> +#ifdef MOZ_B2G
> +#define MOZ_MALLOC_PLATFORM_OPTIONS ",lg_dirty_mult:8"
Maybe add a comment here why B2G has a different value. Sorry for the ping-pong.
Attachment #8539391 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 32•10 years ago
|
||
Done. Carrying over r+.
Attachment #8539391 -
Attachment is obsolete: true
Attachment #8540694 -
Flags: review+
Assignee | ||
Updated•10 years ago
|
Keywords: leave-open → checkin-needed
Updated•10 years ago
|
Attachment #8510171 -
Flags: checkin+
Comment 33•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d2a1e594533
https://hg.mozilla.org/integration/mozilla-inbound/rev/495ecf183622
Keywords: checkin-needed
Updated•10 years ago
|
Attachment #8521655 -
Flags: checkin+
Updated•10 years ago
|
Attachment #8540694 -
Flags: checkin+
Comment 34•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9d2a1e594533
https://hg.mozilla.org/mozilla-central/rev/495ecf183622
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
You need to log in
before you can comment on or make changes to this bug.
Description
•