Our heap gets fragmented after you close a bunch of tabs.
See http://areweslimyet.com/ -- the red line is roughly twice the light blue/green lines; this is due to heap fragmentation. We don't really understand where this is coming from or what we can do about it.
Created attachment 615535 [details] [diff] [review]
Force jemalloc print stats on, and include "waste" in the bin stats.
Some instrumentation which lets you see which size classes are responsible for wasted space.
Here's the output from a short browsing session with this patch applied.
The table is (size class, wasted space [mb]).
Created attachment 615558 [details] [diff] [review]
Print detailed malloc stats (and include "waste") every time jemalloc_stats are run. (v2)
Created attachment 615559 [details] [diff] [review]
Print a stack trace on every allocation.
Comment 2 ordered by waste amount:
*** Bug 636220 has been marked as a duplicate of this bug. ***
*** Bug 637449 has been marked as a duplicate of this bug. ***
*** Bug 676007 has been marked as a duplicate of this bug. ***
How about investigating this with the new jemalloc?
(In reply to Mike Hommey [:glandium] from comment #9)
> How about investigating this with the new jemalloc?
Yeah, at the moment, I think we can get somewhere by looking just at the allocation sites. But if I really dig into the allocator's behavior, it would be worthwile to use the new one, for sure.
Created attachment 616044 [details]
1024-byte allocation sites
Created attachment 616045 [details]
2048-byte allocation sites
Mostly HTML5 parser
I wonder. If these 1024 and 2048 allocations contribute to fragmentation, when we close lots of tabs, it means they are long-lived. Making them bigger is likely to increase memory footprint.
(In reply to Mike Hommey [:glandium] from comment #13)
> I wonder. If these 1024 and 2048 allocations contribute to fragmentation,
> when we close lots of tabs, it means they are long-lived. Making them bigger
> is likely to increase memory footprint.
Indeed it would, unless we usually allocate N 2048-byte chunks, and we'd be switching to N/2 4096-byte chunks. For example, the NSS PL_Arenas are, it seems, usually larger than 2048 bytes, so increasing the chunk size there shouldn't have much of an impact on memory usage. I don't know about SQLite or the HTML5 parser.
Alternatively (but less likely), we could reduce fragmentation by changing the size of some short- or medium-lived chunks which get allocated in-between long-lived chunks, spreading the long-lived chunks out.
> For example, the NSS PL_Arenas are, it
> seems, usually larger than 2048 bytes, so increasing the chunk size there
> shouldn't have much of an impact on memory usage.
I did some instrumentation of them and saw that they often were not larger than 2048 bytes :/
So it just allocates a bunch of separate arenas?
Yes, it seemed to. Enough so that I stopped looking at it closely.
Created attachment 616446 [details]
1024-byte allocs (with lifetimes)
The units of the lifetime field are "number of X-byte malloc's" -- that is, if a malloc has lifetime 10, that means that the allocation survived 10 X-byte malloc's before being free'd.
Lifetime inf means the object was never free'd (I believe that when I ran the browser to collect this data, I killed it after GC/CC'ing, rather than shutting down nicely). We exclude |inf|'s when calculating the mean lifetime.
I'm not handling realloc, which may be throwing this data off. But I verified a few points by hand, so I'm reasonably confident that the numbers are meaningful, modulo that.
Created attachment 616447 [details]
2048-byte allocs (with lifetimes)
Created attachment 616468 [details]
I tried changing jemalloc so that any allocation request in the range 513..4095 bytes was rounded up to 4096. (Instead of the usual 1024, 2048 or 4096.) And I did an AWSY run.
The idea was that we'll use some more memory, but suffer less fragmentation. The results weren't very good -- memory consumption went up significantly, except for the final "measure after closing all tabs" measurement which was flat. (I've attached a screenshot.) So, the fragmentation improved from a relative point of view, but the cure is worse than the disease.
Created attachment 616807 [details] [diff] [review]
test patch 1: convert 2KB arenas to 4KB
This patch converts all the NSS arenas that use 2KB chunks to use 4KB chunks. It does likewise with nsPersistentProperties.
Created attachment 616808 [details] [diff] [review]
test patch 2: convert 1KB arenas to 4KB
This patch converts all the NSS arenas that use 1KB chunks to use 4KB chunks.
I was pretty wrong about how jemalloc handles 512b, 1kb, and 2kb allocations. Here's my updated understanding, with the proviso that I reserve the right to re-understand this again later.
We allocate 512-2kb allocs out of "runs" of size 8 pages (32kb). One run contains allocations from exactly one size class (i.e., 512b, 1kb, or 2kb).
There's some bookkeeping at the beginning of the run. Because the runs are 8 pages, not 8 pages plus epsilon, the bookkeeping takes up the space we might otherwise use to store one object. So our 32kb run can store only 15 2kb allocs, 31 1kb allocs, or 63 512b allocs.
Afaict, we never madvise/decommit part of a run.
I don't see a technical limitation against madvising/decommitting part of a run. It might be slow, both to madvise/decommit and to page-fault in / recommit.
Created attachment 616889 [details] [diff] [review]
Test patch 3: Consider 1k and 2k allocations as "large"
John, would you mind running this patch through AWSY?
Comment on attachment 616889 [details] [diff] [review]
Test patch 3: Consider 1k and 2k allocations as "large"
Actually, I have no idea how my browser even stood up with this patch. There's no way it'll work. The smallest allowable "large" allocation is 1 page.
I talked with jlebar; this bug is serving no useful purpose at this point.