Closed
Bug 688979
Opened 13 years ago
Closed 7 years ago
Add trace-malloc-like functionality for jemalloc
Categories
(Core :: Memory Allocator, defect)
Core
Memory Allocator
Tracking
()
RESOLVED
DUPLICATE
of bug 1094552
People
(Reporter: justin.lebar+bug, Unassigned)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [MemShrink:P2])
Attachments
(2 files, 1 obsolete file)
16.89 KB,
patch
|
Details | Diff | Splinter Review | |
18.83 KB,
patch
|
Details | Diff | Splinter Review |
I've been thinking about how we can get more information about how and why the heap is fragmented.
I think what would be helpful is a log which contains:
- for each malloc, the requested malloc size, the block's malloc_usable_size, the block's address, and a stack trace, and
- for each free, the free'd address.
We could parse this log to profile the heap and find dark matter, which is nice. But we could also use it to understand sources of heap fragmentation. Since we know the allocations' addresses, we can look at a page with few live allocations and ask "who allocated the objects which used to live on this page?".
trace-malloc is almost what we want, but doesn't quite get us there because:
- its output format is impenetrable,
- it doesn't contain malloc_usable_size (and adding that would break all consumers, although I guess we could put it behind a flag),
- it calls into libc's allocator, not jemalloc, and
- it collects a lot of additional information, thus perturbing jemalloc.
The only real trick here, afaict, is figuring out how to call NS_StackWalk from either within jemalloc or from a wrapper.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [MemShrink]
Assuming you had this information, what would you do with it that would help lessen fragmentation?
Reporter | ||
Comment 2•13 years ago
|
||
Presumably, callsites which are causing fragmentation allocate lots of small, short-lived chunks interspersed with some longer-lived chunks.
If we could identify those sites, we could either allocate the small chunks from an arena, as part of larger allocations, or perhaps on the stack.
This is really a generalization of the nsTArray --> nsAutoTArray work in bug 688532, except that we'd be able to focus on the callsites which are actually causing fragmentation, instead of (or, in addition to) trying to reduce the number of overall calls to malloc.
Assignee: nobody → justin.lebar+bug
Comment 3•13 years ago
|
||
I think I read somewhere that allocating stack traces are good
predictors of a block's lifetime (which is what you're after, right?)
Reporter | ||
Comment 4•13 years ago
|
||
(In reply to Julian Seward from comment #3)
> I think I read somewhere that allocating stack traces are good
> predictors of a block's lifetime (which is what you're after, right?)
I guess I'm interested in more than just "how long do the allocations from a callsite live?"
A bunch of small, long-lived allocations made all in a row isn't so bad if they are all free'd around the same time. So long-lived allocations aren't necessarily the problem, unless the distribution of the chunks' lifetimes has a thick tail.
But also, a callsite which makes exclusively short-lived allocations could cause fragmentation by spreading out onto more pages the intervening long-lived allocations.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P2]
Reporter | ||
Comment 5•13 years ago
|
||
For my reference, changes to jemalloc.c don't get propagated correctly unless you apply attachment 529650 [details] [diff] [review].
Target Milestone: --- → mozilla9
Version: unspecified → Trunk
Reporter | ||
Comment 6•13 years ago
|
||
This prints out backtraces which I think may be right.
The backtraces are just a list of PCs. To translate a PC into a file and line number, you need to use the data from /proc/maps (included in the dumps generated by this patch) to figure out which solib the PC belongs to, calculate the offset into the solib, and then run addr2line.
Reporter | ||
Updated•13 years ago
|
Target Milestone: mozilla9 → ---
Comment 7•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #6)
> Created attachment 563797 [details] [diff] [review] [diff] [details] [review]
> WIP v1
>
> This prints out backtraces which I think may be right.
>
> The backtraces are just a list of PCs. To translate a PC into a file and
> line number, you need to use the data from /proc/maps (included in the dumps
> generated by this patch) to figure out which solib the PC belongs to,
> calculate the offset into the solib, and then run addr2line.
Note that this (using data from /proc/maps) won't on Android.
Reporter | ||
Comment 8•13 years ago
|
||
That's a shame. Why is that, and how do I get around it?
Comment 9•13 years ago
|
||
Because we don't map files for our libs.
What can work instead, is to get struct r_debug during malloc_init. Once you get that, you can find the right library by going through struct link_maps. See the simple_linker_init part of https://bug687446.bugzilla.mozilla.org/attachment.cgi?id=560887 , this will get you struct r_debug. I can assist if necessary, I've been implementing that in the linker and breakpad.
Comment 10•13 years ago
|
||
Though, now that i think of it, if you want line numbers, you need actual files, since the debug info is not mapped, libunwind won't find the necessary info anyways...
Reporter | ||
Comment 11•13 years ago
|
||
> Because we don't map files for our libs.
Ah.
Let me see if this is even useful on desktop Linux, and then we can figure out how to get this to work on Android.
Reporter | ||
Comment 12•13 years ago
|
||
Now with a python script which, miraculously, seems to translate the offsets properly.
Reporter | ||
Updated•13 years ago
|
Attachment #563797 -
Attachment is obsolete: true
Comment 13•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #12)
> Created attachment 563849 [details] [diff] [review] [diff] [details] [review]
> WIP v2
>
> Now with a python script which, miraculously, seems to translate the offsets
> properly.
Speaking of a script that translates offsets, I seem to remember we have one in the tree already. Or maybe it was in the automation scripts.
Reporter | ||
Comment 14•13 years ago
|
||
There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only translates "lib+addr".
Comment 15•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #14)
> There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only
> translates "lib+addr".
Well, you have libs, you have their base address, you have pc... you could output lib+addr :)
Reporter | ||
Comment 16•13 years ago
|
||
Well, yeah. But piping to fix-linux-stack.pl is about as hard as piping to addr2line. :)
Reporter | ||
Comment 17•13 years ago
|
||
I've been thinking about figuring out how to assign "blame" for fragmentation.
The intuitive thing to do would be to look at the heap, find pages with just a few live objects, and blame those objects for fragmentation.
But I think this is wrong. Those objects have to live *somewhere*, and it's not their fault that they live on a mostly-empty page.
So we need to look at dead objects, not live objects. Probably the simplest heuristic is to blame the most-recently dead object at each address on each page which has at least one live allocation, but I'm not sure that's right, because it ignores the allocator's bucketing of allocations by size and whatnot...
Reporter | ||
Comment 18•13 years ago
|
||
I guess the correct definition of "how bad is this allocation site?" is "how many fewer pages would be live if we hadn't made any allocations at that site?". Our goal is to approximate this tractably.
Reporter | ||
Comment 19•13 years ago
|
||
I have no idea what these changes to rules.mk are for. But anyway, this works well enough.
Linux only.
Reporter | ||
Comment 20•12 years ago
|
||
This is now simple to do with replace-malloc. It's what we rely on for new DMD.
In any case I'm not looking at this anymore.
Assignee: justin.lebar+bug → nobody
Comment 21•7 years ago
|
||
DMD's cumulative heap profiling covers this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•