Closed Bug 762448 Opened 13 years ago Closed 11 years ago

Investigate jemalloc 3 RSS, dirty and waste regression

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla37

People

(Reporter: glandium, Assigned: ggp)

References

Details

Attachments

(5 files, 5 obsolete files)

Disable tcache in jemalloc3 11 years ago Guilherme Gonçalves [:ggp] 811 bytes, patch	glandium : review+ RyanVM : checkin+	Details \| Diff \| Splinter Review
logalloc-results.png 11 years ago Guilherme Gonçalves [:ggp] 214.10 KB, image/png		Details
jemalloc_stats 11 years ago Guilherme Gonçalves [:ggp] 16.18 KB, text/plain		Details
Deduct bin_unused from waste in jemalloc3's stats. 11 years ago Guilherme Gonçalves [:ggp] 847 bytes, patch	glandium : review+ RyanVM : checkin+	Details \| Diff \| Splinter Review
logalloc-results.png 11 years ago Guilherme Gonçalves [:ggp] 200.99 KB, image/png		Details
jemalloc_stats 11 years ago Guilherme Gonçalves [:ggp] 16.12 KB, text/plain		Details
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. 11 years ago Guilherme Gonçalves [:ggp] 881 bytes, patch		Details \| Diff \| Splinter Review
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. 11 years ago Guilherme Gonçalves [:ggp] 1.06 KB, patch	glandium : feedback+	Details \| Diff \| Splinter Review
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. 11 years ago Guilherme Gonçalves [:ggp] 1.06 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. 11 years ago Guilherme Gonçalves [:ggp] 1.17 KB, patch	ggp : review+ RyanVM : checkin+	Details \| Diff \| Splinter Review

Mike Hommey [:glandium]

Reporter

Description

•

13 years ago

Tests conducted in bug 580408 show that jemalloc 3 regresses our memory usage after tab close.

Mike Hommey [:glandium]

Reporter

Updated

•

13 years ago

Blocks: jemalloc4-by-default

Florian Bender

Comment 1

•

11 years ago

Is this still the case with the newest jemalloc version?

Mike Hommey [:glandium]

Reporter

Comment 2

•

11 years ago

(In reply to Florian Bender from comment #1) > Is this still the case with the newest jemalloc version? Recent measuring shows it's still the case.

Mike Hommey [:glandium]

Reporter

Updated

•

11 years ago

Depends on: 1083686

Guilherme Gonçalves [:ggp]

Assignee

Comment 3

•

11 years ago

It looks like disabling the thread cache (tcache) helps: https://areweslimyet.com/?series=bbug762448&evenspacing . The builds in that run are, from left to right, jemalloc3 with tcache disabled [1], jemalloc3 with tcache enabled [2], and mozjemalloc [3]. Due to an oversight on my part, the jemalloc3 builds also include my heap partitioning patches. We know from previous experimentation that the overhead introduced by these patches over a plain jemalloc3 build is rather small (and certainly smaller than the jemalloc3 vs. mozjemalloc regression), but bear in mind that it is present in the graph too. I'll be happy to re-run the tests without my patches if necessary, though, to get a more accurate view of jemalloc3's isolated effects. 1- https://tbpl.mozilla.org/?tree=Try&rev=fd6445a170ae 2- https://tbpl.mozilla.org/?tree=Try&rev=b161078673c0 3- https://tbpl.mozilla.org/?tree=Try&rev=caf0405b7e99

Mike Hommey [:glandium]

Reporter

Comment 4

•

11 years ago

Mmmm I thought I had disabled tcache, but memory/build/jemalloc_config.c tells I'm wrong.

Mike Hommey [:glandium]

Reporter

Comment 5

•

11 years ago

That said, I *do* think I tested with tcache disabled back then, and while it was making a difference, it didn't fix the entire RSS regression. It looks like it does, now.

Mike Hommey [:glandium]

Reporter

Comment 6

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #5) > It looks like it does, now. Actually, I misread it, there's a small remainder of a regression.

Guilherme Gonçalves [:ggp]

Assignee

Comment 7

•

11 years ago

Attached patch Disable tcache in jemalloc3 — Details — Splinter Review

Here's the patch for disabling the tcache. Try: https://tbpl.mozilla.org/?tree=Try&rev=b4fe96ca4634 . I'll look at the oranges there today. I also added the Try build to the AWSY run in comment 3 as the fourth column in the graph, so we can see what the RSS looks like without the partitioning patches. Looks like we regress by very little when not partitioning, are we willing to tolerate that for the purposes of this bug?

Attachment #8510171 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Reporter

Updated

•

11 years ago

Attachment #8510171 - Flags: review?(mh+mozilla) → review+

Mike Hommey [:glandium]

Reporter

Comment 8

•

11 years ago

(In reply to Guilherme Gonçalves [:ggp] from comment #7) > I also added the Try build to the AWSY run in comment 3 as the fourth column > in the graph, > so we can see what the RSS looks like without the partitioning patches. > Looks like we regress > by very little when not partitioning, are we willing to tolerate that for > the purposes of this > bug? I'm going to land bug 1083686 once try confirms it doesn't end up burning the tree. Would you mind getting a log from an AWSY run, and fool around with logalloc-replay with mozjemalloc and jemalloc3 for comparison? That would eliminate some of the noise that would be due to AWSY runs not doing the exact same allocations at the same exact time, and would give us a clearer picture of what the regression actually looks like.

Mike Hommey [:glandium]

Reporter

Comment 9

•

11 years ago

(Note that the jemalloc_stats output /might/ be insufficient. Ping me if you need more)

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Depends on: 1088042

Guilherme Gonçalves [:ggp]

Assignee

Comment 10

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #8) > Would you mind getting a log from an AWSY run, and fool around > with logalloc-replay with mozjemalloc and jemalloc3 for comparison? Sure. I tried it locally and it does seem to run well enough, I can get :johns to help run it on AWSY tomorrow. A couple of questions, though: - While preprocessing the allocation log locally, I got several ignored calls to free ("Did not see an alloc for pointer"). Is that expected? - Looks like there's no way to replay allocations made by different threads, right? So this will be a good way to compare mozjemalloc and jemalloc3 with tcache disabled, but not enough to assess the regression itself;

Mike Hommey [:glandium]

Reporter

Comment 11

•

11 years ago

(In reply to Guilherme Gonçalves [:ggp] from comment #10) > (In reply to Mike Hommey [:glandium] from comment #8) > > Would you mind getting a log from an AWSY run, and fool around > > with logalloc-replay with mozjemalloc and jemalloc3 for comparison? > > Sure. I tried it locally and it does seem to run well enough, I can get > :johns to help run it on AWSY tomorrow. A couple of questions, though: > > - While preprocessing the allocation log locally, I got several ignored > calls to free ("Did not see an alloc for pointer"). Is that expected? On mac, /maybe/, but it sure could be worth investigating them. > - Looks like there's no way to replay allocations made by different threads, > right? So this will be a good way to compare mozjemalloc and jemalloc3 with > tcache disabled, but not enough to assess the regression itself; Well, the tcache regression is kind of obvious, since tcache adds per-thread data. But AWSY seems to show /some/ level of regression without tcache, and that's what we should be tracking here, and replaying the same allocations would allow to eliminate variations due to the change in order of allocations. Or due to some other things.

Guilherme Gonçalves [:ggp]

Assignee

Comment 12

•

11 years ago

:johns has provided us with logs of a sample AWSY run. I'm hosting the log [1] and its munged version [2] for convenience. While running this locally, I came across crashes in both mozjemalloc and jemalloc3. The crash in mozjemalloc was being caused by an assertion related to memory corruption [3]. It looked like Replay::Commit was overwriting jemalloc's metadata; the patch at bug 1093174 seems to solve that (though I'm not quite sure why). I do manage to replay the run successfully with that patch applied, I'll have experimental results shortly. 1- https://people.mozilla.org/~ggoncalves/malloc-log-ggp.log.xz 2- https://people.mozilla.org/~ggoncalves/malloc-log-ggp.munged.xz 3- http://dxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c?from=memory/mozjemalloc/jemalloc.c#4416

Depends on: 1093174

Guilherme Gonçalves [:ggp]

Assignee

Comment 13

•

11 years ago

Attached image logalloc-results.png (obsolete) — Details

Here are the results of replaying the AWSY log in comment 12 over mozjemalloc and jemalloc3 with tcache disabled. The run contains 64 calls to jemalloc_stats(), and while it's not clear to me how they correspond to the 8 data points AWSY collects for each run, they should be enough for comparison of overall memory usage. The image shows 5 pairs of graphs, three in the top row, two in the bottom row, each containing the 64 measurements for a different metric (labeled in the y axis). In each pair, the top graph shows the numbers in MiB, and the bottom shows the difference in values between mozjemalloc and jemalloc3. Out of the stats returned by jemalloc_stats, I've omitted 'bookkeeping' and 'bin_unused', as they're not implemented in jemalloc3. On the other hand, I measured RSS in each jemalloc_stats() call using the task_info OSX syscall [1,2], so we have figures for that too. While the stats for 'allocated', 'mapped' and 'rss' show that the remaining regression is rather small, I don't know how to explain the big difference in 'dirty' and 'waste', nor why 'waste' is always zero in mozjemalloc. 1- http://nadeausoftware.com/articles/2012/07/c_c_tip_how_get_process_resident_set_size_physical_memory_use 2- https://stackoverflow.com/questions/5839626/how-is-top-able-to-see-memory-usage

Guilherme Gonçalves [:ggp]

Assignee

Comment 14

•

11 years ago

Attached file jemalloc_stats (obsolete) — Details

And here is the raw output of logalloc-replay for both allocators, concatenated in a single file.

Guilherme Gonçalves [:ggp]

Assignee

Comment 15

•

11 years ago

Mike, what do you think? Can we live with the current situation, or is the RSS difference peaking at 3MiB still too bad?

Assignee: nobody → ggoncalves

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Reporter

Comment 16

•

11 years ago

3MiB is tolerable, I guess. Did you check with jemalloc trunk?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Reporter

Comment 17

•

11 years ago

(note we should investigate the weird values on e.g. waste, too)

Guilherme Gonçalves [:ggp]

Assignee

Comment 18

•

11 years ago

Not yet, I'll re-test when we settle on a revision to update to, after upstreaming bug 899126. But let's land the tcache patch for now. https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=f483e65fe480

Depends on: 1094275

Keywords: checkin-needed, leave-open

Ryan VanderMeulen [:RyanVM]

Comment 19

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/18aee2cd999e

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 20

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/18aee2cd999e

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Summary: Investigate jemalloc 3 RSS regression → Investigate jemalloc 3 RSS, dirty and waste regression

Guilherme Gonçalves [:ggp]

Assignee

Comment 21

•

11 years ago

Attached patch Deduct bin_unused from waste in jemalloc3's stats. — Details — Splinter Review

Looks like the regression in waste can be explained by the fact that jemalloc3's "stats.active" includes what we report as bin_unused. So in jemalloc3 we compute waste as: stats->waste = mallctl("stats.active") - mallctl("stats.allocated") whereas in mozjemalloc, it is: for each arena: stats->waste += arena_committed - arena_allocated - arena_dirty - arena_unused - arena_headers; (... compute total space occupied by chunk headers as chunk_header_size ...) stats->waste -= chunk_header_size; jemalloc3's "stats.active" is equivalent to the sum over all arenas of arena_committed + arena_unused, and already discounts chunk_header_size, arena_headers and arena_dirty. Since the sum of arena_unused over all arenas is bin_unused, subtracting bin_unused from stats->waste should fix the regression. I've tested this by updating to latest jemalloc3, then applying the patch in bug 899126 to get bin_unused working, then applying this patch. As bug 899126 comment 5 already shows, waste is equal to binunused on a simple logalloc run, so waste is zero. I have confirmed that this is also the case with our logged AWSY run, which is the same we get with mozjemalloc.

Attachment #8521655 - Flags: review?(mh+mozilla)

Guilherme Gonçalves [:ggp]

Assignee

Comment 22

•

11 years ago

And for reference, here's where I believe bin_unused gets added to "stats.active": https://github.com/jemalloc/jemalloc/blob/dev/src/arena.c#L247 . arena_run_split_remove gets called whenever a new run is allocated for an arena, and at that point need_pages is the total number of pages needed to accommodate the entire run, including unused regions.

Guilherme Gonçalves [:ggp]

Assignee

Comment 23

•

11 years ago

Comment on attachment 8521655 [details] [diff] [review] Deduct bin_unused from waste in jemalloc3's stats. Actually, sorry, the patch obviously depends on bug 899126 landing first. There's no point in reviewing it now.

Attachment #8521655 - Flags: review?(mh+mozilla)

Guilherme Gonçalves [:ggp]

Assignee

Comment 24

•

11 years ago

Attached image logalloc-results.png — Details

And the regression in dirty can be resolved by tweaking the "lg_dirty_mult" jemalloc3 option, which specifies the base 2 log of the acceptable ratio of active to dirty pages per arena. The default value is 3, meaning the allocator will attempt to keep a ratio of 8:1 active to dirty pages. Given how: 1- We only use one arena, for both jemalloc3 and mozjemalloc; 2- For the entire recorded AWSY run, the value of allocated, and also of jemalloc3's "stats.active", never exceeds 250MiB (and there's actually some slack there); 3- We're currently willing to tolerate at most 4MiB of dirty pages with mozjemalloc, as per the DIRTY_MAX_DEFAULT macro; I think we can use a conservative value of ceil(log2(250/4)) = 6. I'll upload a patch to do that shortly. I'm attaching the graphs for mozjemalloc vs. jemalloc3, after updating to latest jemalloc3 and applying the patches for waste and dirty. We're now within a 2MiB margin of mozjemalloc, and rss has dropped a little further now that we're being more aggressive about purging pages.

Attachment #8516891 - Attachment is obsolete: true

Guilherme Gonçalves [:ggp]

Assignee

Comment 25

•

11 years ago

Attached file jemalloc_stats — Details

The raw data used to generate the graphs.

Attachment #8516892 - Attachment is obsolete: true

Guilherme Gonçalves [:ggp]

Assignee

Comment 26

•

11 years ago

Attached patch Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. (obsolete) — Details — Splinter Review

And the patch.

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Attachment #8521655 - Flags: review?(mh+mozilla)

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Attachment #8522445 - Flags: review?(mh+mozilla)

Eric Rahm [:erahm]

Comment 27

•

11 years ago

(In reply to Guilherme Gonçalves [:ggp] from comment #24) > 3- We're currently willing to tolerate at most 4MiB of dirty pages with > mozjemalloc, as per the DIRTY_MAX_DEFAULT macro; > > I think we can use a conservative value of ceil(log2(250/4)) = 6. Important detail: in bug 811740 we limit to *1* MiB on B2G, so this value would need to be different for that platform. http://hg.mozilla.org/mozilla-central/annotate/e7f3e6fee15e/memory/mozjemalloc/jemalloc.c#l155

Guilherme Gonçalves [:ggp]

Assignee

Comment 28

•

11 years ago

Attached patch Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. (obsolete) — Details — Splinter Review

Good point, let's go with 8 on B2G.

Attachment #8532679 - Flags: review?(mh+mozilla)

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Attachment #8522445 - Attachment is obsolete: true

Attachment #8522445 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Reporter

Comment 29

•

11 years ago

Comment on attachment 8532679 [details] [diff] [review] Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. Review of attachment 8532679 [details] [diff] [review]: ----------------------------------------------------------------- ::: memory/build/jemalloc_config.c @@ +17,5 @@ > +#endif > + > +#define MOZ_MALLOC_OPTIONS \ > + "narenas:1,lg_chunk:20,tcache:false" MOZ_MALLOC_PLATFORM_OPTIONS > +MFBT_DATA const char * je_(malloc_conf) = MOZ_MALLOC_OPTIONS; Make this je_(malloc_conf) = MOZ_MALLOC_OPTIONS MOZ_MALLOC_PLATFORM_OPTIONS; instead

Attachment #8532679 - Flags: review?(mh+mozilla) → feedback+

Mike Hommey [:glandium]

Reporter

Updated

•

11 years ago

Attachment #8521655 - Flags: review?(mh+mozilla) → review+

Guilherme Gonçalves [:ggp]

Assignee

Comment 30

•

11 years ago

Attached patch Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. (obsolete) — Details — Splinter Review

Sure, done.

Attachment #8532679 - Attachment is obsolete: true

Attachment #8539391 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Reporter

Comment 31

•

11 years ago

Comment on attachment 8539391 [details] [diff] [review] Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. Review of attachment 8539391 [details] [diff] [review]: ----------------------------------------------------------------- ::: memory/build/jemalloc_config.c @@ +10,5 @@ > #include "mozilla/Types.h" > > /* Override some jemalloc defaults */ > +#ifdef MOZ_B2G > +#define MOZ_MALLOC_PLATFORM_OPTIONS ",lg_dirty_mult:8" Maybe add a comment here why B2G has a different value. Sorry for the ping-pong.

Attachment #8539391 - Flags: review?(mh+mozilla) → review+

Guilherme Gonçalves [:ggp]

Assignee

Comment 32

•

11 years ago

Attached patch Bump opt.lg_dirty_mult in jemalloc3 to reduce number of dirty pages. — Details — Splinter Review

Done. Carrying over r+.

Attachment #8539391 - Attachment is obsolete: true

Attachment #8540694 - Flags: review+

Guilherme Gonçalves [:ggp]

Assignee

Updated

•

11 years ago

Keywords: leave-open → checkin-needed

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Attachment #8510171 - Flags: checkin+

Ryan VanderMeulen [:RyanVM]

Comment 33

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9d2a1e594533 https://hg.mozilla.org/integration/mozilla-inbound/rev/495ecf183622

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Attachment #8521655 - Flags: checkin+

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

Attachment #8540694 - Flags: checkin+

Ryan VanderMeulen [:RyanVM]

Comment 34

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/9d2a1e594533 https://hg.mozilla.org/mozilla-central/rev/495ecf183622

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla37

You need to log in before you can comment on or make changes to this bug.