Open Bug 770612 Opened 12 years ago Updated 2 years ago

On x86/x64 Linux, transparent huge pages (memory pages, not Web pages) cause huge "resident" values

Categories

(Core :: Memory Allocator, defect)

16 Branch
x86_64
Linux
defect

Tracking

()

People

(Reporter: oliver.henshaw, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(13 files)

154.08 KB, text/plain
Details
17.86 KB, text/plain
Details
139.45 KB, text/plain
Details
145.58 KB, text/plain
Details
138.77 KB, text/plain
Details
138.26 KB, text/plain
Details
478 bytes, text/plain
Details
68.70 KB, text/plain
Details
185.01 KB, text/plain
Details
193.58 KB, text/plain
Details
194.13 KB, text/plain
Details
503 bytes, text/plain
Details
503 bytes, text/plain
Details
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20100101 Firefox/13.0.1
Build ID: 20120616215734

Steps to reproduce:

I ran MemBench from http://gregor-wagner.com/tmp/mem and clicked "Close windows opened during the test". This closed all the tabs and RSS dropped to about 450MB (observed with top) and then started growing again.

After it topped 1 GB RSS I took an about:memory snapshot (see attached file). I can't see any entries larger than 238MB in "Explicit Allocations" but "Resident Set Size" and "Proportional Set Size" were both around 1.2GB.

This was on a kvm virtual machine installed from the Fedora 17 LiveCD and updated to firefox-13.0-1.fc17.x86_64
Whiteboard: [MemShrink]
Yeouch.

The usual questions:

 * Did you have any extensions enabled?
 * Can you reproduce on a nightly build?

Also, do you have a non-KVM system to test on?  I don't know what kinds of changes the paravirtualization driver makes to the kernel, but it's quite likely that it's messing with VM routines, and that could conceivably be causing the problem.

In particular, I see

  333,373,440 B ── heap-committed
1,732,527,240 B ── heap-unallocated
1,224,572,928 B ── resident

If the heap was not being decommitted as we expect, that could be responsible for a lot of your RSS.
I couldn't reproduce this on my 64-bit Ubuntu 12.04 box, with a trunk build and a profile that has PDF.js as the only add-on.  Resident peaked at 2330MB with 150 tabs open, and then dropped to and stayed at around 400MB.

I too suspect the virtual machine.
Oliver, can you please follow up with Justin's request in comment 1?
I'm resolving this as incomplete.  Oliver, if you can get the requested information and it's still a problem, feel free to reopen this.  Thanks.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
I managed to reproduce this on a desktop machine (with firefox 13) - memory use dropped to a minimum of 366MB and then climbed to around 620MB before seeming to stabilise. This has seemed intermittent at times, but I have obtained more reliable results when I waited for all tabs to finish loading, i.e. for the favicons to stop spinning not just for the Finished button to appear: this can take tens of minutes; then you may need to wait a few minutes after closing all tabs (and seeing memory stabilise just above its minimum value) for memory to begin to rise again.

Amount of free memory may be a factor: the KVM machines have 4GB, the host machine has 8GB (minus memory reserved for integrated grphics) but I obtained clearer results on the host test when no other users were logged in. I obtained mixed results on a 4GB (minus graphics memory) brazos machine but mainly confined my testing to the more powerful system.

There are no extensions or (or flash plugins) installed for these tests, apart from whatever fedora packages - and I think that's only langpacks.
I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of tI then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of them.
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
Version: 13 Branch → 16 Branch
> I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced 
> this issue on all of tI then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 
> 16.0a2 and reproduced this issue on all of them.

What's the difference between these two sets of tests?  Was one outside KVM?
(In reply to Oliver Henshaw from comment #6)
> Created attachment 649283 [details]
> verbose about:memory from 16.0a2 on KVM

Thanks for expanding the PSS tree in this one.

It looks like all the memory usage is coming from the malloc or JS heaps (anonymous mappings).  But both of those heaps claim to be much smaller than the PSS shows.

I wonder if your kernel is doing something funky with madvise(MADV_DONTNEED).  What distro + kernel are the VMs and host running?

Could you try running a program which eats up all your RAM and see if that causes Firefox's memory usage to decrease?  Here's one you could use:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  while (1) {
    char* buf = malloc(1024 * 1024);
    if (!buf) {
      break;
    }
    for (int i = 0; i < 1024 * 1024; i += 1024) {
      buf[i] = i;
    }
  }
  fprintf(stderr, "Done allocating a bunch of memory.\n");
  return 0;
}
(In reply to Justin Lebar [:jlebar] from comment #7)
> What's the difference between these two sets of tests?  Was one outside KVM?

Sorry, this was me pasting the text into the comment box twice and not realising. The correct text is simply:

"I then re-tested in KVM with the official tarballs of 13.0, 14.0.1, 15.0b3, 16.0a2 and reproduced this issue on all of them."
Marking with MemShrink:P3 because this seems to be an obscure kernel issue and not widely experienced.  We can re-prioritize if we get more data.
Whiteboard: [MemShrink] → [MemShrink:P3]
This is on a Fedora 17 live image + updates system, nothing particularly exotic. Looking through the yum history on the VM I first discovered this, I'm guessing it was running kernel-3.3.4-5.fc17.x86_64; the current test VM has kernel-3.4.3-1.fc17.x86_64.

I ran two test with 'stress' - http://weather.ou.edu/~apw/projects/stress/ - which I used to allocate (and hold on to) chunks of 256MB and see if it affected the results:

Test 1: I allocated 512MB before starting membench and started allocating further chunks of memory once firefox 'resident' had climbed to nearly 1GB. I allocated around 3.25GB and pushed firefox well into swap, but it didn't seem to drop any memory.

Test 2: I waited for firefox to hit its minimum of around 400MB after closing all membench tabs and then allocated 3GB with stress (this pushed the system a little into swap, perhaps I should have been more gentle). I left it for a while and resident memory and swap both seemed to be stable. But once I killed 'stress' and freed the 3GB then firefox resident memory started to rise again.
Random guess, but I wonder if transparent superpages are killing us.

lwn [1] tells me you can disable it by setting /sys/kernel/mm/transparent_hugepage/enabled to "never".  I'm not sure how to see how many huge pages the Firefox process has (it's probably somewhere in /proc), but |hugeadm --pool-list| should tell you how superpages are allocated in your system...

[1] http://lwn.net/Articles/423584/
I'm beginning to strongly suspect something in the OS is causing this, based upon the results from your test 2.

Notice that when we move from attachment 651387 [details] to attachment 651388 [details], swap stays roughly the same, explicit (memory Firefox is aware of allocating) stays roughly the same, and RSS goes from 400mb to 800mb!

Anyway, we still need to figure it out...
Transparent huge pages do seem to be the culprit. I couldn't reproduce the problem after disabling THP with "echo never >/sys/kernel/mm/transparent_hugepage/enabled".

I then ran membench again with THP enabled and printed out huge page statistics at significant points, in the hope that this will provide some insight (see following attachments).
Marking for re-triage based on comment 17: AIUI THP is scheduled to be a mainline Linux feature, I'd eventually expect to see this problem across most desktop Linux distros.
Summary: resident memory starts growing again after run membench and close all tabs opened during the test. → On x86/x64 Linux, resident memory starts growing again after running membench and closing all tabs opened during the test, due to transparent huge pages.
Whiteboard: [MemShrink:P3] → [MemShrink]
One way forward with this bug is to upgrade to jemalloc3 and re-measure with THP.

If THP is still killing us, then presumably we'll need to modify upstream jemalloc.
Depends on: jemalloc3
We should find out if we can disable THP on a per-process basis.
Whiteboard: [MemShrink] → [MemShrink:P1]
Aha, we can madvise our way out of this!  There's an MADV_NOHUGEPAGE flag.
Is this actually a real problem?

If there is unused memory and it is migrated to huge pages to reduce tlb misses, isn't that a good thing?

It would be a real problem if splitting huge pages into small pages, but I imagine that is much easier/faster than gathering the huge pages.

I assume khugepaged has chosen to use huge pages because there are multiple small pages getting covered by the new huge page and so it believes there is benefit to the migration.
(In reply to Karl Tomlinson (:karlt) from comment #29)
> It would be a real problem if splitting huge pages into small pages

... were slow ...

> but I imagine that is much easier/faster than gathering the huge pages.
It's a problem if, upon memory pressure, the kernel doesn't split the huge pages to reclaim memory.  That's test 1 in comment 11.
(In reply to Justin Lebar [:jlebar] from comment #31)
> It's a problem if, upon memory pressure, the kernel doesn't split the huge
> pages to reclaim memory.  That's test 1 in comment 11.

Yes, that would be a problem, but http://lwn.net/Articles/423584/ says
"Rather than complicate the swapping code with an understanding of huge pages, Andrea simply splits a huge page back into its component small pages if that page needs to be reclaimed."

(In reply to Oliver Henshaw from comment #11)
> Test 1: I allocated 512MB before starting membench and started allocating
> further chunks of memory once firefox 'resident' had climbed to nearly 1GB.
> I allocated around 3.25GB and pushed firefox well into swap, but it didn't
> seem to drop any memory.

1GB + 3.25GB > 4GB, so I assume the rss went down.
But what kind of memory measurement didn't drop?
The per-process swap measurements that I've seen don't distinguish physical swap from virtual.
Was total system swap usage more than 250MB greater in test 1 than test 2?
> But what kind of memory measurement didn't drop?

You can measure RSS + space-on-swap.  That should drop upon memory pressure, as we break up huge pages.

But the point is, Firefox shouldn't have been pushed out of main memory in the first place until all its huge pages were broken up.  Firefox should only be taking up ~400mb without THP, and 3.25GB + .4GB + the rest of the system ~= 4gb.  So merely the fact that the kernel swapped Firefox onto disk (instead of splitting its pages) is indicative of a problem.

From your summary of the LWN article, it sounds like we might split a huge page into smaller pages before swapping it out to disk, but what if we keep a huge page in memory and swap to disk the normal-sized pages?
(In reply to Justin Lebar [:jlebar] from comment #33)
> > But what kind of memory measurement didn't drop?
> 
> You can measure RSS + space-on-swap.

I'd like to know how to do that.  Output from top includes virtual memory.
"size" from ps says "This number is very rough!" and it looks like it includes a lot of virtual memory.

> But the point is, Firefox shouldn't have been pushed out of main memory in
> the first place until all its huge pages were broken up.  Firefox should
> only be taking up ~400mb without THP, and 3.25GB + .4GB + the rest of the
> system ~= 4gb.  So merely the fact that the kernel swapped Firefox onto disk
> (instead of splitting its pages) is indicative of a problem.

The kernel was swapping in Test 2 also, apparently even before huge pages were allocated, and even with less stress, so it seems not all 4GB is available for Firefox and stress.

> From your summary of the LWN article, it sounds like we might split a huge
> page into smaller pages before swapping it out to disk, but what if we keep
> a huge page in memory and swap to disk the normal-sized pages?

It depends on the implementation and I don't know the details.  The kernel could know/notice that it can/has reclaim(ed) some space merely by splitting the huge page into used and unused small pages, or it may have a naive implementation like you suggest where huge pages are treated just like small pages (except for the means of storing on swap).

I don't think we should be disabling a system-configurable optimization unless we have clear evidence that it is causing a real problem.  Here I'm missing information about a non-hugepage test behaving significantly better.
> > You can measure RSS + space-on-swap.
> I'd like to know how to do that.

about:memory does so by parsing /proc/pid/smaps.  There's probably a more direct way if you're only interested in the total swap.

> I don't think we should be disabling a system-configurable optimization unless we have clear 
> evidence that it is causing a real problem.

Would the following test convince you?

 1) With THP disabled, load membench, close all tabs, open about:memory, click "minimize memory usage", let this value be A.
 2) With THP enabled, repeat step (1) and let the value be B.  We expect B ~= A.
 3) Leave the session open for a while.  We expect its RSS to grow as the kernel coalesces pages.  Let the resultant RSS be C.  We expect C >> B.
 4) Allocate a lot of memory on the system, until Firefox is pushed into swap.  At this point, let Dc and Ds be the amount of core memory and swap space we're using for Firefox, respectively.

We agree that Firefox should not start swapping until all its huge pages have been split, right?  In that case, if the kernel's optimization is benign, Dc + Ds ~= B.  On the other hand, if Dc + Ds >  B, then the kernel is not splitting some huge pages before swapping Firefox out to disk.

If you agree that this is a reasonable experiment, we just need to find someone to run it.  :)
(In reply to Justin Lebar [:jlebar] from comment #35)
> We agree that Firefox should not start swapping until all its huge pages
> have been split, right?

At least the huge pages that represent many unused small pages, yes.
There is some value in keeping huge pages that are mostly used.

> In that case, if the kernel's optimization is
> benign, Dc + Ds ~= B.  On the other hand, if Dc + Ds >  B, then the kernel
> is not splitting some huge pages before swapping Firefox out to disk.
> 
> If you agree that this is a reasonable experiment,

This sounds like as good an experiment as I can imagine.
Can swap cache be separated from Ds (or Dc) so that pages are not counted twice?

There may be some Firefox memory that is no longer being used and so deserves to be in swap if there is something more recent using more than 3/4 available memory.
However we touch most of our pages pretty often so if Dc + Ds ≫ B, then that would be a bad sign.
> I don't think we should be disabling a system-configurable optimization
> unless we have clear evidence that it is causing a real problem.

If THPs cause measured RSS to be much larger than normal and have no other effect, that alone is a real problem.  We rely on accurate memory measurements.

Bug 811228 and bug 812704 may be related to this.  However, they both feature high memory consumption combined with a high "heap-unclassified" value, whereas this bug doesn't have a high "heap-unclassified" value.
> However, they both feature high memory consumption combined with a high "heap-unclassified" value, 
> whereas this bug doesn't have a high "heap-unclassified" value.

Yeah, I don't see how THP could affect the heap-unclassified value one way or another.  heap-unclassified is computed without any interaction with the kernel.
Summary: On x86/x64 Linux, resident memory starts growing again after running membench and closing all tabs opened during the test, due to transparent huge pages. → On x86/x64 Linux, transparent huge pages cause huge "resident" values
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
We're going to downgrade this to a P2 because it doesn't seem like this is affecting a lot of distros.  In fact, it's not clear if non-server distros will even be able to turn THP on, since other apps may have this same bug.
Summary: On x86/x64 Linux, transparent huge pages cause huge "resident" values → On x86/x64 Linux, transparent huge pages (memory pages, not Web pages) cause huge "resident" values
Status: UNCONFIRMED → NEW
Component: Untriaged → Memory Allocator
Ever confirmed: true
Product: Firefox → Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: