Last Comment Bug 770612 - On x86/x64 Linux, transparent huge pages (memory pages, not Web pages) cause huge "resident" values
: On x86/x64 Linux, transparent huge pages (memory pages, not Web pages) cause ...
Status: NEW
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: 16 Branch
: x86_64 Linux
: -- normal with 1 vote (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: jemalloc3
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-03 11:29 PDT by Oliver Henshaw
Modified: 2016-02-13 10:48 PST (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
verbose about:memory showing large resident memory after membench cleanup (154.08 KB, text/plain)
2012-07-03 11:29 PDT, Oliver Henshaw
no flags Details
verbose about:memory from desktop machine (17.86 KB, text/plain)
2012-08-06 08:25 PDT, Oliver Henshaw
no flags Details
verbose about:memory from 16.0a2 on KVM (139.45 KB, text/plain)
2012-08-06 08:33 PDT, Oliver Henshaw
no flags Details
about:memory from Test 1, after stress allocations push firefox into swap (145.58 KB, text/plain)
2012-08-13 07:52 PDT, Oliver Henshaw
no flags Details
about:memory from Test 2, stress allocated memory after firefox reaches minimum memory after all membench tabs closed (138.77 KB, text/plain)
2012-08-13 07:55 PDT, Oliver Henshaw
no flags Details
about:memory from Test 2, firefox resident memory rose after stress allocations freed (138.26 KB, text/plain)
2012-08-13 07:57 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after boot, before starting firefox (478 bytes, text/plain)
2012-08-22 08:14 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after starting firefox (68.70 KB, text/plain)
2012-08-22 08:16 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after running membench (185.01 KB, text/plain)
2012-08-22 08:17 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after running membench, closing all opened tabs and waiting for resident memory to reach a minimum (193.58 KB, text/plain)
2012-08-22 08:18 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after running membench, closing all opened tabs and waiting for resident memory to rise to 900MB after reach a minimum (194.13 KB, text/plain)
2012-08-22 08:20 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after closing firefox (503 bytes, text/plain)
2012-08-22 08:21 PDT, Oliver Henshaw
no flags Details
state of transparent huge pages after closing firefox and waiting a while (503 bytes, text/plain)
2012-08-22 08:22 PDT, Oliver Henshaw
no flags Details

Description Oliver Henshaw 2012-07-03 11:29:22 PDT
Created attachment 638817 [details]
verbose about:memory showing large resident memory after membench cleanup

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Build ID: 20120616215734

Steps to reproduce:

I ran MemBench from http://gregor-wagner.com/tmp/mem and clicked "Close windows opened during the test". This closed all the tabs and RSS dropped to about 450MB (observed with top) and then started growing again.

After it topped 1 GB RSS I took an about:memory snapshot (see attached file). I can't see any entries larger than 238MB in "Explicit Allocations" but "Resident Set Size" and "Proportional Set Size" were both around 1.2GB.

This was on a kvm virtual machine installed from the Fedora 17 LiveCD and updated to firefox-13.0-1.fc17.x86_64
Comment 1 Justin Lebar (not reading bugmail) 2012-07-03 12:38:17 PDT
Yeouch.

The usual questions:

 * Did you have any extensions enabled?
 * Can you reproduce on a nightly build?

Also, do you have a non-KVM system to test on?  I don't know what kinds of changes the paravirtualization driver makes to the kernel, but it's quite likely that it's messing with VM routines, and that could conceivably be causing the problem.

In particular, I see

  333,373,440 B ── heap-committed
1,732,527,240 B ── heap-unallocated
1,224,572,928 B ── resident

If the heap was not being decommitted as we expect, that could be responsible for a lot of your RSS.
Comment 2 Nicholas Nethercote [:njn] 2012-07-03 18:11:12 PDT
I couldn't reproduce this on my 64-bit Ubuntu 12.04 box, with a trunk build and a profile that has PDF.js as the only add-on.  Resident peaked at 2330MB with 150 tabs open, and then dropped to and stayed at around 400MB.

I too suspect the virtual machine.
Comment 3 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-07-10 09:21:04 PDT
Oliver, can you please follow up with Justin's request in comment 1?
Comment 4 Nicholas Nethercote [:njn] 2012-07-10 17:05:49 PDT
I'm resolving this as incomplete.  Oliver, if you can get the requested information and it's still a problem, feel free to reopen this.  Thanks.
Comment 5 Oliver Henshaw 2012-08-06 08:25:04 PDT
Created attachment 649280 [details]
verbose about:memory from desktop machine

I managed to reproduce this on a desktop machine (with firefox 13) - memory use dropped to a minimum of 366MB and then climbed to around 620MB before seeming to stabilise. This has seemed intermittent at times, but I have obtained more reliable results when I waited for all tabs to finish loading, i.e. for the favicons to stop spinning not just for the Finished button to appear: this can take tens of minutes; then you may need to wait a few minutes after closing all tabs (and seeing memory stabilise just above its minimum value) for memory to begin to rise again.

Amount of free memory may be a factor: the KVM machines have 4GB, the host machine has 8GB (minus memory reserved for integrated grphics) but I obtained clearer results on the host test when no other users were logged in. I obtained mixed results on a 4GB (minus graphics memory) brazos machine but mainly confined my testing to the more powerful system.

There are no extensions or (or flash plugins) installed for these tests, apart from whatever fedora packages - and I think that's only langpacks.
Comment 6 Oliver Henshaw 2012-08-06 08:33:35 PDT
Created attachment 649283 [details]
verbose about:memory from 16.0a2 on KVM

I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of tI then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of them.
Comment 7 Justin Lebar (not reading bugmail) 2012-08-06 08:40:21 PDT
> I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced 
> this issue on all of tI then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 
> 16.0a2 and reproduced this issue on all of them.

What's the difference between these two sets of tests?  Was one outside KVM?
Comment 8 Justin Lebar (not reading bugmail) 2012-08-06 08:47:18 PDT
(In reply to Oliver Henshaw from comment #6)
> Created attachment 649283 [details]
> verbose about:memory from 16.0a2 on KVM

Thanks for expanding the PSS tree in this one.

It looks like all the memory usage is coming from the malloc or JS heaps (anonymous mappings).  But both of those heaps claim to be much smaller than the PSS shows.

I wonder if your kernel is doing something funky with madvise(MADV_DONTNEED).  What distro + kernel are the VMs and host running?

Could you try running a program which eats up all your RAM and see if that causes Firefox's memory usage to decrease?  Here's one you could use:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  while (1) {
    char* buf = malloc(1024 * 1024);
    if (!buf) {
      break;
    }
    for (int i = 0; i < 1024 * 1024; i += 1024) {
      buf[i] = i;
    }
  }
  fprintf(stderr, "Done allocating a bunch of memory.\n");
  return 0;
}
Comment 9 Oliver Henshaw 2012-08-06 09:00:02 PDT
(In reply to Justin Lebar [:jlebar] from comment #7)
> What's the difference between these two sets of tests?  Was one outside KVM?

Sorry, this was me pasting the text into the comment box twice and not realising. The correct text is simply:

"I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of them."
Comment 10 Nicholas Nethercote [:njn] 2012-08-07 16:13:08 PDT
Marking with MemShrink:P3 because this seems to be an obscure kernel issue and not widely experienced.  We can re-prioritize if we get more data.
Comment 11 Oliver Henshaw 2012-08-13 07:39:44 PDT
This is on a Fedora 17 live image + updates system, nothing particularly exotic. Looking through the yum history on the VM I first discovered this, I'm guessing it was running kernel-3.3.4-5.fc17.x86_64; the current test VM has kernel-3.4.3-1.fc17.x86_64.

I ran two test with 'stress' - http://weather.ou.edu/~apw/projects/stress/ - which I used to allocate (and hold on to) chunks of 256MB and see if it affected the results:

Test 1: I allocated 512MB before starting membench and started allocating further chunks of memory once firefox 'resident' had climbed to nearly 1GB. I allocated around 3.25GB and pushed firefox well into swap, but it didn't seem to drop any memory.

Test 2: I waited for firefox to hit its minimum of around 400MB after closing all membench tabs and then allocated 3GB with stress (this pushed the system a little into swap, perhaps I should have been more gentle). I left it for a while and resident memory and swap both seemed to be stable. But once I killed 'stress' and freed the 3GB then firefox resident memory started to rise again.
Comment 12 Oliver Henshaw 2012-08-13 07:52:55 PDT
Created attachment 651385 [details]
about:memory from Test 1, after stress allocations push firefox into swap
Comment 13 Oliver Henshaw 2012-08-13 07:55:17 PDT
Created attachment 651387 [details]
about:memory from Test 2, stress allocated memory after firefox reaches minimum memory after all membench tabs closed
Comment 14 Oliver Henshaw 2012-08-13 07:57:13 PDT
Created attachment 651388 [details]
about:memory from Test 2, firefox resident memory rose after stress allocations freed
Comment 15 Justin Lebar (not reading bugmail) 2012-08-13 07:58:36 PDT
Random guess, but I wonder if transparent superpages are killing us.

lwn [1] tells me you can disable it by setting /sys/kernel/mm/transparent_hugepage/enabled to "never".  I'm not sure how to see how many huge pages the Firefox process has (it's probably somewhere in /proc), but |hugeadm --pool-list| should tell you how superpages are allocated in your system...

[1] http://lwn.net/Articles/423584/
Comment 16 Justin Lebar (not reading bugmail) 2012-08-13 08:03:06 PDT
I'm beginning to strongly suspect something in the OS is causing this, based upon the results from your test 2.

Notice that when we move from attachment 651387 [details] to attachment 651388 [details], swap stays roughly the same, explicit (memory Firefox is aware of allocating) stays roughly the same, and RSS goes from 400mb to 800mb!

Anyway, we still need to figure it out...
Comment 17 Oliver Henshaw 2012-08-22 08:02:33 PDT
Transparent huge pages do seem to be the culprit. I couldn't reproduce the problem after disabling THP with "echo never >/sys/kernel/mm/transparent_hugepage/enabled".

I then ran membench again with THP enabled and printed out huge page statistics at significant points, in the hope that this will provide some insight (see following attachments).
Comment 18 Oliver Henshaw 2012-08-22 08:14:58 PDT
Created attachment 654211 [details]
state of transparent huge pages after boot, before starting firefox
Comment 19 Oliver Henshaw 2012-08-22 08:16:31 PDT
Created attachment 654214 [details]
state of transparent huge pages after starting firefox
Comment 20 Oliver Henshaw 2012-08-22 08:17:13 PDT
Created attachment 654215 [details]
state of transparent huge pages after running membench
Comment 21 Oliver Henshaw 2012-08-22 08:18:58 PDT
Created attachment 654218 [details]
state of transparent huge pages after running membench, closing all opened tabs and waiting for resident memory to reach a minimum
Comment 22 Oliver Henshaw 2012-08-22 08:20:24 PDT
Created attachment 654219 [details]
state of transparent huge pages after running membench, closing all opened tabs and waiting for resident memory to rise to 900MB after reach a minimum
Comment 23 Oliver Henshaw 2012-08-22 08:21:23 PDT
Created attachment 654220 [details]
state of transparent huge pages after closing firefox
Comment 24 Oliver Henshaw 2012-08-22 08:22:06 PDT
Created attachment 654221 [details]
state of transparent huge pages after closing firefox and waiting a while
Comment 25 Justin Lebar (not reading bugmail) 2012-08-22 08:47:43 PDT
Marking for re-triage based on comment 17: AIUI THP is scheduled to be a mainline Linux feature, I'd eventually expect to see this problem across most desktop Linux distros.
Comment 26 Justin Lebar (not reading bugmail) 2012-08-22 08:55:37 PDT
One way forward with this bug is to upgrade to jemalloc3 and re-measure with THP.

If THP is still killing us, then presumably we'll need to modify upstream jemalloc.
Comment 27 Nicholas Nethercote [:njn] 2012-09-04 16:11:01 PDT
We should find out if we can disable THP on a per-process basis.
Comment 28 Justin Lebar (not reading bugmail) 2012-09-04 18:41:52 PDT
Aha, we can madvise our way out of this!  There's an MADV_NOHUGEPAGE flag.
Comment 29 Karl Tomlinson (ni?:karlt) 2012-09-04 19:17:39 PDT
Is this actually a real problem?

If there is unused memory and it is migrated to huge pages to reduce tlb misses, isn't that a good thing?

It would be a real problem if splitting huge pages into small pages, but I imagine that is much easier/faster than gathering the huge pages.

I assume khugepaged has chosen to use huge pages because there are multiple small pages getting covered by the new huge page and so it believes there is benefit to the migration.
Comment 30 Karl Tomlinson (ni?:karlt) 2012-09-04 19:18:38 PDT
(In reply to Karl Tomlinson (:karlt) from comment #29)
> It would be a real problem if splitting huge pages into small pages

... were slow ...

> but I imagine that is much easier/faster than gathering the huge pages.
Comment 31 Justin Lebar (not reading bugmail) 2012-09-04 19:31:42 PDT
It's a problem if, upon memory pressure, the kernel doesn't split the huge pages to reclaim memory.  That's test 1 in comment 11.
Comment 32 Karl Tomlinson (ni?:karlt) 2012-09-04 19:56:50 PDT
(In reply to Justin Lebar [:jlebar] from comment #31)
> It's a problem if, upon memory pressure, the kernel doesn't split the huge
> pages to reclaim memory.  That's test 1 in comment 11.

Yes, that would be a problem, but http://lwn.net/Articles/423584/ says
"Rather than complicate the swapping code with an understanding of huge pages, Andrea simply splits a huge page back into its component small pages if that page needs to be reclaimed."

(In reply to Oliver Henshaw from comment #11)
> Test 1: I allocated 512MB before starting membench and started allocating
> further chunks of memory once firefox 'resident' had climbed to nearly 1GB.
> I allocated around 3.25GB and pushed firefox well into swap, but it didn't
> seem to drop any memory.

1GB + 3.25GB > 4GB, so I assume the rss went down.
But what kind of memory measurement didn't drop?
The per-process swap measurements that I've seen don't distinguish physical swap from virtual.
Was total system swap usage more than 250MB greater in test 1 than test 2?
Comment 33 Justin Lebar (not reading bugmail) 2012-09-04 20:08:14 PDT
> But what kind of memory measurement didn't drop?

You can measure RSS + space-on-swap.  That should drop upon memory pressure, as we break up huge pages.

But the point is, Firefox shouldn't have been pushed out of main memory in the first place until all its huge pages were broken up.  Firefox should only be taking up ~400mb without THP, and 3.25GB + .4GB + the rest of the system ~= 4gb.  So merely the fact that the kernel swapped Firefox onto disk (instead of splitting its pages) is indicative of a problem.

From your summary of the LWN article, it sounds like we might split a huge page into smaller pages before swapping it out to disk, but what if we keep a huge page in memory and swap to disk the normal-sized pages?
Comment 34 Karl Tomlinson (ni?:karlt) 2012-09-04 21:36:38 PDT
(In reply to Justin Lebar [:jlebar] from comment #33)
> > But what kind of memory measurement didn't drop?
> 
> You can measure RSS + space-on-swap.

I'd like to know how to do that.  Output from top includes virtual memory.
"size" from ps says "This number is very rough!" and it looks like it includes a lot of virtual memory.

> But the point is, Firefox shouldn't have been pushed out of main memory in
> the first place until all its huge pages were broken up.  Firefox should
> only be taking up ~400mb without THP, and 3.25GB + .4GB + the rest of the
> system ~= 4gb.  So merely the fact that the kernel swapped Firefox onto disk
> (instead of splitting its pages) is indicative of a problem.

The kernel was swapping in Test 2 also, apparently even before huge pages were allocated, and even with less stress, so it seems not all 4GB is available for Firefox and stress.

> From your summary of the LWN article, it sounds like we might split a huge
> page into smaller pages before swapping it out to disk, but what if we keep
> a huge page in memory and swap to disk the normal-sized pages?

It depends on the implementation and I don't know the details.  The kernel could know/notice that it can/has reclaim(ed) some space merely by splitting the huge page into used and unused small pages, or it may have a naive implementation like you suggest where huge pages are treated just like small pages (except for the means of storing on swap).

I don't think we should be disabling a system-configurable optimization unless we have clear evidence that it is causing a real problem.  Here I'm missing information about a non-hugepage test behaving significantly better.
Comment 35 Justin Lebar (not reading bugmail) 2012-09-05 06:36:16 PDT
> > You can measure RSS + space-on-swap.
> I'd like to know how to do that.

about:memory does so by parsing /proc/pid/smaps.  There's probably a more direct way if you're only interested in the total swap.

> I don't think we should be disabling a system-configurable optimization unless we have clear 
> evidence that it is causing a real problem.

Would the following test convince you?

 1) With THP disabled, load membench, close all tabs, open about:memory, click "minimize memory usage", let this value be A.
 2) With THP enabled, repeat step (1) and let the value be B.  We expect B ~= A.
 3) Leave the session open for a while.  We expect its RSS to grow as the kernel coalesces pages.  Let the resultant RSS be C.  We expect C >> B.
 4) Allocate a lot of memory on the system, until Firefox is pushed into swap.  At this point, let Dc and Ds be the amount of core memory and swap space we're using for Firefox, respectively.

We agree that Firefox should not start swapping until all its huge pages have been split, right?  In that case, if the kernel's optimization is benign, Dc + Ds ~= B.  On the other hand, if Dc + Ds >  B, then the kernel is not splitting some huge pages before swapping Firefox out to disk.

If you agree that this is a reasonable experiment, we just need to find someone to run it.  :)
Comment 36 Karl Tomlinson (ni?:karlt) 2012-09-05 16:33:47 PDT
(In reply to Justin Lebar [:jlebar] from comment #35)
> We agree that Firefox should not start swapping until all its huge pages
> have been split, right?

At least the huge pages that represent many unused small pages, yes.
There is some value in keeping huge pages that are mostly used.

> In that case, if the kernel's optimization is
> benign, Dc + Ds ~= B.  On the other hand, if Dc + Ds >  B, then the kernel
> is not splitting some huge pages before swapping Firefox out to disk.
> 
> If you agree that this is a reasonable experiment,

This sounds like as good an experiment as I can imagine.
Can swap cache be separated from Ds (or Dc) so that pages are not counted twice?

There may be some Firefox memory that is no longer being used and so deserves to be in swap if there is something more recent using more than 3/4 available memory.
However we touch most of our pages pretty often so if Dc + Ds ≫ B, then that would be a bad sign.
Comment 37 Nicholas Nethercote [:njn] 2012-11-16 19:18:15 PST
> I don't think we should be disabling a system-configurable optimization
> unless we have clear evidence that it is causing a real problem.

If THPs cause measured RSS to be much larger than normal and have no other effect, that alone is a real problem.  We rely on accurate memory measurements.

Bug 811228 and bug 812704 may be related to this.  However, they both feature high memory consumption combined with a high "heap-unclassified" value, whereas this bug doesn't have a high "heap-unclassified" value.
Comment 38 Justin Lebar (not reading bugmail) 2012-11-16 21:41:18 PST
> However, they both feature high memory consumption combined with a high "heap-unclassified" value, 
> whereas this bug doesn't have a high "heap-unclassified" value.

Yeah, I don't see how THP could affect the heap-unclassified value one way or another.  heap-unclassified is computed without any interaction with the kernel.
Comment 39 Nicholas Nethercote [:njn] 2012-11-17 19:33:42 PST
*** Bug 812704 has been marked as a duplicate of this bug. ***
Comment 40 Justin Lebar (not reading bugmail) 2013-02-06 14:56:12 PST
We're going to downgrade this to a P2 because it doesn't seem like this is affecting a lot of distros.  In fact, it's not clear if non-server distros will even be able to turn THP on, since other apps may have this same bug.

Note You need to log in before you can comment on or make changes to this bug.