Closed Bug 693404 Opened 13 years ago Closed 13 years ago

MacOS overreports RSS of Firefox with jemalloc

Categories

(Core :: Memory Allocator, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Whiteboard: [MemShrink:P1][see comment 5])

Attachments

(5 files, 1 obsolete file)

jemalloc is currently disabled on MacOS 10.5 due to bug 670492.

I think this may not be a true regression -- it doesn't make much sense that jemalloc would actually regress RSS by 20% on 10.5 but not on 10.6 -- but that it may be due to how mac measures RSS.

In particular, jemalloc may be releasing memory using madvise(), but mac may not be immediately releasing that memory.
Depends on: 670492, 414946
Josh, do you have any ideas for how we might be able to move forward here or who we might be able to ping?  The problem is explained from first principals in the stackoverflow post above.
One thing we could do is measure how much memory has been MADV_FREE'd, and subtract that from the RSS the OS gives us when we report our RSS.  I'm pretty confident now that MADV_FREE'd memory will be reclaimed when the system is under memory pressure, on both 10.5 and 10.6.

This doesn't help anyone who measures RSS directly through the operating system, of course.
If we did what's suggested in comment 3, the "adjusted RSS" we compute should be equal to the RSS pbiggar saw in bug 414946 comment 170, where he enabled DECOMMIT.  It was a sizable win (on 10.5 or 10.6, I don't know), whereas we didn't see an improvement from jemalloc on 10.6 and saw a regression on 10.6.
Er, we saw a regression on 10.5.

To be clear, the way we're currently measuring, I think the part of RSS due to jemalloc is effectively stuck at the high water mark, on both 10.5 and 10.6.
We can't just subtract from the RSS the number of MADV_FREE'd pages without first making sure that the MADV_FREE'd pages are actually in memory.

There's a syscall, mincore, to do this.  But using it requires us to keep track of not how many, but precisely *which* pages we've MADV_FREE'd.  Which would be unfortunate...
Whiteboard: [MemShrink]
Assignee: nobody → justin.lebar+bug
Whiteboard: [MemShrink] → [MemShrink:P1]
Summary: Enable jemalloc on MacOS 10.5 → MacOS overreports RSS Of Firefox with jemalloc
Whiteboard: [MemShrink:P1] → [MemShrink:P1][see comment 5]
Summary: MacOS overreports RSS Of Firefox with jemalloc → MacOS overreports RSS of Firefox with jemalloc
So it looks like jemalloc keeps track of exactly which pages have been decommitted (of course it has to, so it can recommit them if necessary).  With bug 681183, we now get this accounting whether or not DECOMMIT is defined.

Walking this data structure may be kind of a pain, but I'll see how hard it is.

Of course, even if Firefox can figure out which pages do and don't count against RSS, |top| and |ps| will still give us the wrong RSS.

I guess we could explicitly nuke (decommit and recommit) these pages when you minimize memory usage in about:memory.  We could even nuke the pages before you read the "resident" memory reporter, although Telemetry reads that and assumes it's a cheap operation...

An OS command which flushes these pages would be much better...
jasone says on the jemalloc mailing list that he doesn't think there's a way to force Mac to free these MADV_FREE'd pages, or to read the number of MADV_FREE'd pages a process has.
We now have a function we can call which causes jemalloc to find all MADV_FREE'd pages and purge them (by explicitly decommitting / recommitting the pages).

In my debug + jemalloc build, I loaded a few tabs with Google Docs and then closed the tabs.  After clicking minimize memory usage, my RSS was 210mb.  After purging, my RSS was 170mb.  \o/

There's some overhead associated with this approach; in particular, when you call madvise(MADV_FREE) on a chunk for the first time, you need to add it to a red-black tree.  But the overhead of this operation should be much smaller than the syscall.

I'm not sure yet how expensive the purge operation is.  If you call it twice in quick succession, the second call should be cheap, since we only examine chunks which have been madvised since the last purge.

I think we'll want to call this function in the memory reporter which gets our RSS, unless the purge is too expensive to make during a telemetry ping.
> in particular, when you call madvise(MADV_FREE) on a chunk for the first time, you need 
> to add it to a red-black tree.

You know, this could just as well be a linked list...


In any case, the trick now is figuring out how to get this integrated into TP5-RSS.  We'd ideally purge right before TP5 checks our RSS.
> In any case, the trick now is figuring out how to get this integrated into
> TP5-RSS.  We'd ideally purge right before TP5 checks our RSS.

I guess we could send Firefox a signal before we check its RSS?

It kind of seems better if we just let Firefox report its own RSS to Talos.  Then we could even do something like report the RSS right after each page finishes loading, which is probably a lot more reasonable than the time-based polling we currently do.

Joel, what are your thoughts?
My thoughts: give me asprin!!

Actually in bug 685632, we are talking of collecting RSS from inside of pageloader (in browser process).  Would we want to collect this for Linux and Windows?  

There are probably a lot of logistics here to resolve and a rollout will change all the RSS numbers (think various platforms and versions).  That requires a lot more time than checking in a patch.

Am I thinking about this correctly?
> Would we want to collect this for Linux and Windows?

I think we should see whether RSS-from-inside-pageloader works well.  But on face, yes, I'd like to measure RSS in the same way on all platforms.

Our numbers on Mac are currently pretty meaningless, so given the choice between

  * using a different methodology (temporarily or permanently) on Mac compared to the other desktop platforms and
  * delaying until we can move to a new methodology everywhere at once,

the first is preferable, since our current "uniform" methodology gives bogus Mac numbers.

Also, once we get more reasonable measurement infrastructure in place, we should be able to enable jemalloc for Mac 10.5, which should be a large memory win.

Note that purging after every pageload will probably have an effect on TP5 itself, not just the RSS numbers.  In fact, just measuring RSS in-process may affect TP5 scores.

> Am I thinking about this correctly?

Yes, I think so.  :)
So I think the way forward here is to purge before we collect RSS on Mac, and to see how long the purge takes.  (Maybe we can report it in telemetry.)  Then we can hopefully check that in, and use it in Talos.
in talos (tp5 specific) we poll the counters every 20 seconds (resolution is set at 20):
http://hg.mozilla.org/build/talos/file/tip/ttest.py#l308

The problem is changing what we do here will affect all tests and change the numbers.  

1) how do we purge from a python script that just launched the process?  is there a system command?
2) these machines do not have internet access, so I am not sure we can use telemetry.  I am not very familiar with telemetry, so I could be wrong here.
3) if we cannot purge from a system command, we need to build this into pageloader. 
4) any changes that affect the runtime of talos will require is to do side by side staging.  For example mozafterpaint is on by default for tp5 (bug 661918), but this took weeks to adjust all the graph servers and wait for the appropriate rollout window (started Sept 9th and still rolling out).

We can do something as an experiment without putting it in production, just want you to know it would take a few months to get this rolled out.
Blocks: 694335
> The problem is changing what we do here will affect all tests and change the numbers.  

Correct.

> 1) how do we purge from a python script that just launched the process?  is there a system command?

If we wanted to continue measuring RSS from |ps|, we'd have to first send Firefox a signal or something.  And then we'd have to wait for FF to actually do the purge before reading memory usage...  So I think modifying the pageloader is probably better, if we can make that work.

> 2) these machines do not have internet access, so I am not sure we can use telemetry.  I am not 
> very familiar with telemetry, so I could be wrong here.

No, we wouldn't want to use telemetry for this.  My concern wrt telemetry is that we'd need to purge before every telemetry ping, since telemetry reads RSS, and that could be expensive.  Hopefully it's not.

> 3) if we cannot purge from a system command, we need to build this into pageloader.

I think this is better, since we don't need to figure out a signaling mechanism.  Also, measuring after each page has loaded seems much more reasonable than measuring every 20s.

> We can do something as an experiment without putting it in production, just want you to know it 
> would take a few months to get this rolled out.

Okay.  We're currently blocking what I suspect are major memory improvements on 10.5 on getting this rolled out.  But maybe we shouldn't block on this, if we can demonstrate that the improvements are real.  I filed bug 694335 on turning on jemalloc for 10.5.
Attachment #566648 - Attachment is obsolete: true
I tested this on a debug build, and it never took more than 12ms to purge pages, even when there were a bunch of pages to purge.  We'll want to keep an eye on this, but I think it should be OK.
Attachment #566650 - Attachment description: Part 2, v1: For demonstration purposes only, add a button to about:memory which purges jemalloc's MADV_FREE'd pages. → For demonstration purposes only, add a button to about:memory which purges jemalloc's MADV_FREE'd pages.
(In reply to Justin Lebar [:jlebar] from comment #18)

> Okay.  We're currently blocking what I suspect are major memory improvements
> on 10.5 on getting this rolled out.  But maybe we shouldn't block on this,
> if we can demonstrate that the improvements are real.  I filed bug 694335 on
> turning on jemalloc for 10.5.

Oh goodness, I don't think you should block anything on this bug.  Pageloader changes are incredibly expensive to deploy, especially if they change the numbers.  We'd probably want to lump this together with the android RSS bug as well so that we only have to eat one deployment cost in getting both these landed.  Even if we *landed* the code tomorrow, if it changes numbers, we've got to do double runs for tp tests on *every branch*, and to roll that through the system will take some time.  (We're working on trying to minimize this time, but still, you should be aware that this is the case...).

Is memory usage on 10.5 so urgent that we need to block on this? It seems to me that this urgency is mis-guided, since I'd imagine the number of 10.5 machines are diminishing.  But I've done no querying and have no metrics for that assertion.  I'm just wondering how you see this bug fitting in the hierarchy of talos issues: i.e. is this more urgent than say android rss or talos android stability improvements?
> Is memory usage on 10.5 so urgent that we need to block on this?

Ah, I think I might not have been clear.  All that's blocked here is turning on jemalloc for 10.5.  That is, I suspect jemalloc on 10.5 is a major memory improvement, but our test infrastructure shows it as a regression.

Note that this is not exclusively a 10.5 issue.  Until we fix this issue with Talos, our RSS numbers on Mac 10.6 report basically the high water mark.  Memory usage changes will fly under the radar until we fix this.

It seems there are two questions:

 1. Should we land jemalloc on 10.5 with no test coverage?  We already have less-than-useful coverage on 10.6; I'm not sure if this is an argument for or against turning on jemalloc on 10.5.

 2. How urgent is fixing this issue with Talos?  The urgency might change depending on your answer to (1).

With respect to priorities, I don't think fixing Talos on Mac is more important than getting Talos working on Android.  We already get partial coverage on Mac via the other Talos memory metrics and via Talos RSS measurements on Windows and Linux.

I'm inclined to say that if it's going to take a long time to get Talos working, we should, pending manual verification of the memory gains, enable jemalloc on 10.5.  (With the patches in this bug, you'll be able to read the real RSS via about:memory.)  The only reason I can think of to leave jemalloc off on 10.5 is to act as a canary, warning us about Mac RSS regressions we'd otherwise miss.  But it seems wrong to use our 10.5 users in this way, especially because, as they're on less-powerful machines than 10.6 users, they're the ones with the most to gain from jemalloc.
so if I were to add this into pageloader, would the best method would be to use about:memory style nsIMemoryReporterManager and get the RSS?

in the android bug, there is example code to look at the /proc files and harvest the information from there.
(In reply to Joel Maher (:jmaher) from comment #24)
> so if I were to add this into pageloader, would the best method would be to
> use about:memory style nsIMemoryReporterManager and get the RSS?

Correct.

> in the android bug, there is example code to look at the /proc files and
> harvest the information from there.

You can do the same thing on Android as on Mac; nsIMemoryReporterManager is just going to read /proc.  But you need to get RSS for both the content and chrome processes, and I'm not sure offhand how to do that.
Comment on attachment 566942 [details] [diff] [review]
Part 1, v2: Add function to jemalloc to purge madvised pages.

Review of attachment 566942 [details] [diff] [review]:
-----------------------------------------------------------------

I was going to complain about the tabs but it looks like that's the prevailing style here :-(

This actually isn't as scary as I expected.

::: memory/jemalloc/jemalloc.c
@@ +837,5 @@
>  	 *                  pages.
>  	 *     Small: Run address.
>  	 *     Large: Run size for first page, unset for trailing pages.
>  	 * - : Unused.
> +	 * m : MADV_FREE or MADV_DONTNEED'ed?

To be clear, this bit is whether either has been called (dependent on the platform), not which one has been called?

@@ +897,5 @@
> +	 * have pages which have been madvise(MADV_FREE)'d but not explicitly
> +	 * purged.
> +	 *
> +	 * We're currently lazy and don't remove a chunk from this list when
> +	 * all its madvised pages are recommitted. */

Is that a problem?  What happens if we purge something from this list that's been recommitted?

@@ +3115,5 @@
>  		 * page is encountered, commit all needed adjacent decommitted
>  		 * pages in one operation, in order to reduce system call
>  		 * overhead.
>  		 */
> +		if (chunk->map[run_ind + i].bits & (CHUNK_MAP_DECOMMITTED | CHUNK_MAP_MADVISED)) {

Can we #define something for CHUNK_MAP_DECOMMITTED | CHUNK_MAP_MADVISED?

@@ +3127,5 @@
>  			for (j = 0; i + j < need_pages && (chunk->map[run_ind +
> +			    i + j].bits & (CHUNK_MAP_DECOMMITTED | CHUNK_MAP_MADVISED)); j++) {
> +				/* DECOMMITTED and MADVISED are mutually-exclusive. */
> +				assert(!(chunk->map[run_ind + i + j].bits & CHUNK_MAP_DECOMMITTED &&
> +					 chunk->map[run_ind + i + j].bits & CHUNK_MAP_MADVISED));

Yay assertions.

::: memory/jemalloc/jemalloc.h
@@ +78,5 @@
>  size_t	malloc_usable_size(const void *ptr);
>  #endif /* MOZ_MEMORY_LINUX */
>  
>  void	jemalloc_stats(jemalloc_stats_t *stats);
> +void    jemalloc_purge_freed_pages(); // XXX jlebar comment

You should do something about that comment ...

::: memory/jemalloc/linkedlist.h
@@ +1,4 @@
> +/* -*- Mode: C; tab-width: 8; c-basic-offset: 8; indent-tabs-mode: t -*- */
> +/* vim:set softtabstop=8 shiftwidth=8 noet: */
> +/*-
> + * Copyright (C) Justin Lebar <justin.lebar@gmail.com>.

Pretty sure this should be Mozilla Foundation.
Attachment #566942 - Flags: review?(khuey) → review+
Comment on attachment 567107 [details] [diff] [review]
Part 2: Purge pages before reading RSS.

Review of attachment 567107 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsMemoryReporterManager.cpp
@@ +155,5 @@
> +    // Purging these pages shouldn't take more than 10ms or so, but we want to
> +    // keep an eye on it since GetResident() is called on each Telemetry ping.
> +    {
> +      Telemetry::AutoTimer<Telemetry::MEMORY_FREE_PURGED_PAGES_MS> timer;
> +      jemalloc_purge_freed_pages();

Does this link on Linux?
Attachment #567107 - Flags: review?(khuey) → review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #26)
> > +	 * m : MADV_FREE or MADV_DONTNEED'ed?
> 
> To be clear, this bit is whether either has been called (dependent on the
> platform), not which one has been called?

Correct.  Would it be more clear if I said

 * m: MADV_FREE/MADV_DONTNEED'ed?

?

> > +	 * We're currently lazy and don't remove a chunk from this list when
> > +	 * all its madvised pages are recommitted. */
> 
> Is that a problem?  What happens if we purge something from this list that's
> been recommitted?

When we purge, we iterate over all the chunk's pages and only purge the ones
which are currently madvise'd.  So if there are no madvised pages, it's a nop.

If we wanted to track this, we'd need to keep a counter of the number of
madvised pages in the chunk.  But since it'd be Bad if I got this wrong, I
didn't screw with it.

> ::: memory/jemalloc/linkedlist.h
> @@ +1,4 @@
> > +/* -*- Mode: C; tab-width: 8; c-basic-offset: 8; indent-tabs-mode: t -*- */
> > +/* vim:set softtabstop=8 shiftwidth=8 noet: */
> > +/*-
> > + * Copyright (C) Justin Lebar <justin.lebar@gmail.com>.
> 
> Pretty sure this should be Mozilla Foundation.

Really?  I thought Mozilla developers own the copyright to their code.  Gerv?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27)
> Comment on attachment 567107 [details] [diff] [review] [diff] [details] [review]
> Part 2: Purge pages before reading RSS.
> 
> Review of attachment 567107 [details] [diff] [review] [diff] [details] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/base/nsMemoryReporterManager.cpp
> @@ +155,5 @@
> > +    // Purging these pages shouldn't take more than 10ms or so, but we want to
> > +    // keep an eye on it since GetResident() is called on each Telemetry ping.
> > +    {
> > +      Telemetry::AutoTimer<Telemetry::MEMORY_FREE_PURGED_PAGES_MS> timer;
> > +      jemalloc_purge_freed_pages();
> 
> Does this link on Linux?

No, but it's mac-only code.
Work for hire done by Mozilla employees and contractors is (C) Mozilla Foundation (not Corporation). Work done by employees of other companies has a status defined by their employment agreement. Work done by volunteers is their own (unless they've signed the rights away or have a particularly evil employer). 

Gerv
(In reply to Justin Lebar [:jlebar] from comment #28)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #26)
> > > +	 * m : MADV_FREE or MADV_DONTNEED'ed?
> > 
> > To be clear, this bit is whether either has been called (dependent on the
> > platform), not which one has been called?
> 
> Correct.  Would it be more clear if I said
> 
>  * m: MADV_FREE/MADV_DONTNEED'ed?
> 
> ?

Yes, much.

> > > +	 * We're currently lazy and don't remove a chunk from this list when
> > > +	 * all its madvised pages are recommitted. */
> > 
> > Is that a problem?  What happens if we purge something from this list that's
> > been recommitted?
> 
> When we purge, we iterate over all the chunk's pages and only purge the ones
> which are currently madvise'd.  So if there are no madvised pages, it's a
> nop.

Ok.

> If we wanted to track this, we'd need to keep a counter of the number of
> madvised pages in the chunk.  But since it'd be Bad if I got this wrong, I
> didn't screw with it.

Yeah that seems fine.
(In reply to Justin Lebar [:jlebar] from comment #29)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #27)
> > Comment on attachment 567107 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > Part 2: Purge pages before reading RSS.
> > 
> > Review of attachment 567107 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: xpcom/base/nsMemoryReporterManager.cpp
> > @@ +155,5 @@
> > > +    // Purging these pages shouldn't take more than 10ms or so, but we want to
> > > +    // keep an eye on it since GetResident() is called on each Telemetry ping.
> > > +    {
> > > +      Telemetry::AutoTimer<Telemetry::MEMORY_FREE_PURGED_PAGES_MS> timer;
> > > +      jemalloc_purge_freed_pages();
> > 
> > Does this link on Linux?
> 
> No, but it's mac-only code.

Oh, this is inside a bigger block that's Mac only?  Carry on then.
https://hg.mozilla.org/mozilla-central/rev/2da8044c4ad1
https://hg.mozilla.org/mozilla-central/rev/ead5abe2b5b9
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink:P1][see comment 5][inbound] → [MemShrink:P1][see comment 5]
Target Milestone: --- → mozilla10
coming back to this bug, should we be collecting RSS from pageloader inside of firefox for all versions of OSX or just 10.5?
Either all versions or just 10.6 and 10.7.  10.5 is the only one which *doesn't* have jemalloc on by default.

I'd use the pageloader for everything if we could verify that the 10.5 numbers match with and without pageloader.
In order to collect this and not interfere with other tests (and to satisfy a requirement to run side by side for a week), I have created a tp5r test.  This is a test for the --rss flag internally.  from a graph server perspective, we can make this look like tp5.

How this will work is we would run tp5r for all osx platforms, and turn those platforms off for tp5 (after the side by side staging).  The numbers reported would be tp5_main_rss_paint instead of tp5_rss_paint.  Otherwise all the other numbers would be the same.  

If this patch looks good, we can add in graph server definitions for tp5r and work on buildbot changes to get this rolled out.  I see results of 159MB of RSS on tbpl for tp5, and in testing this in staging with the tp5r I see about 151MB for RSS.
Attachment #586858 - Flags: review?(jhammel)
Comment on attachment 586858 [details] [diff] [review]
tp5r definition for collecting rss from pageloader for desktop builds (1.0)

looks good
Attachment #586858 - Flags: review?(jhammel) → review+
landed on talos:
http://hg.mozilla.org/build/talos

still need to do graph server stuff and buildbot changes.
patch to update the graph server with definitions for this new test.
Attachment #587068 - Flags: review?(bear)
Attachment #587068 - Flags: review?(bear) → review+
landed on graphs:
http://hg.mozilla.org/graphs/rev/e455bf0cf929

still need to get this deployed to the graph servers, and the buildbot change.
Depends on: 716758
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: