Closed Bug 688979 Opened 13 years ago Closed 7 years ago

Add trace-malloc-like functionality for jemalloc

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1094552

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 1 obsolete file)

WIP v1 13 years ago Justin Lebar (not reading bugmail) 11.08 KB, patch		Details \| Diff \| Splinter Review
WIP v2 13 years ago Justin Lebar (not reading bugmail) 16.89 KB, patch		Details \| Diff \| Splinter Review
WIP v3 13 years ago Justin Lebar (not reading bugmail) 18.83 KB, patch		Details \| Diff \| Splinter Review

Justin Lebar (not reading bugmail)

Reporter

Description

•

13 years ago

I've been thinking about how we can get more information about how and why the heap is fragmented. I think what would be helpful is a log which contains: - for each malloc, the requested malloc size, the block's malloc_usable_size, the block's address, and a stack trace, and - for each free, the free'd address. We could parse this log to profile the heap and find dark matter, which is nice. But we could also use it to understand sources of heap fragmentation. Since we know the allocations' addresses, we can look at a page with few live allocations and ask "who allocated the objects which used to live on this page?". trace-malloc is almost what we want, but doesn't quite get us there because: - its output format is impenetrable, - it doesn't contain malloc_usable_size (and adding that would break all consumers, although I guess we could put it behind a flag), - it calls into libc's allocator, not jemalloc, and - it collects a lot of additional information, thus perturbing jemalloc. The only real trick here, afaict, is figuring out how to call NS_StackWalk from either within jemalloc or from a wrapper.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Whiteboard: [MemShrink]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 1

•

13 years ago

Assuming you had this information, what would you do with it that would help lessen fragmentation?

Justin Lebar (not reading bugmail)

Reporter

Comment 2

•

13 years ago

Presumably, callsites which are causing fragmentation allocate lots of small, short-lived chunks interspersed with some longer-lived chunks. If we could identify those sites, we could either allocate the small chunks from an arena, as part of larger allocations, or perhaps on the stack. This is really a generalization of the nsTArray --> nsAutoTArray work in bug 688532, except that we'd be able to focus on the callsites which are actually causing fragmentation, instead of (or, in addition to) trying to reduce the number of overall calls to malloc.

Assignee: nobody → justin.lebar+bug

Julian Seward [:jseward]

Comment 3

•

13 years ago

I think I read somewhere that allocating stack traces are good predictors of a block's lifetime (which is what you're after, right?)

Justin Lebar (not reading bugmail)

Reporter

Comment 4

•

13 years ago

(In reply to Julian Seward from comment #3) > I think I read somewhere that allocating stack traces are good > predictors of a block's lifetime (which is what you're after, right?) I guess I'm interested in more than just "how long do the allocations from a callsite live?" A bunch of small, long-lived allocations made all in a row isn't so bad if they are all free'd around the same time. So long-lived allocations aren't necessarily the problem, unless the distribution of the chunks' lifetimes has a thick tail. But also, a callsite which makes exclusively short-lived allocations could cause fragmentation by spreading out onto more pages the intervening long-lived allocations.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Depends on: 688999

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Justin Lebar (not reading bugmail)

Reporter

Comment 5

•

13 years ago

For my reference, changes to jemalloc.c don't get propagated correctly unless you apply attachment 529650 [details] [diff] [review].

Target Milestone: --- → mozilla9

Version: unspecified → Trunk

Justin Lebar (not reading bugmail)

Reporter

Comment 6

•

13 years ago

Attached patch WIP v1 (obsolete) — Details — Splinter Review

This prints out backtraces which I think may be right. The backtraces are just a list of PCs. To translate a PC into a file and line number, you need to use the data from /proc/maps (included in the dumps generated by this patch) to figure out which solib the PC belongs to, calculate the offset into the solib, and then run addr2line.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Target Milestone: mozilla9 → ---

Mike Hommey [:glandium]

Comment 7

•

13 years ago

(In reply to Justin Lebar [:jlebar] from comment #6) > Created attachment 563797 [details] [diff] [review] [diff] [details] [review] > WIP v1 > > This prints out backtraces which I think may be right. > > The backtraces are just a list of PCs. To translate a PC into a file and > line number, you need to use the data from /proc/maps (included in the dumps > generated by this patch) to figure out which solib the PC belongs to, > calculate the offset into the solib, and then run addr2line. Note that this (using data from /proc/maps) won't on Android.

Justin Lebar (not reading bugmail)

Reporter

Comment 8

•

13 years ago

That's a shame. Why is that, and how do I get around it?

Mike Hommey [:glandium]

Comment 9

•

13 years ago

Because we don't map files for our libs. What can work instead, is to get struct r_debug during malloc_init. Once you get that, you can find the right library by going through struct link_maps. See the simple_linker_init part of https://bug687446.bugzilla.mozilla.org/attachment.cgi?id=560887 , this will get you struct r_debug. I can assist if necessary, I've been implementing that in the linker and breakpad.

Mike Hommey [:glandium]

Comment 10

•

13 years ago

Though, now that i think of it, if you want line numbers, you need actual files, since the debug info is not mapped, libunwind won't find the necessary info anyways...

Justin Lebar (not reading bugmail)

Reporter

Comment 11

•

13 years ago

> Because we don't map files for our libs. Ah. Let me see if this is even useful on desktop Linux, and then we can figure out how to get this to work on Android.

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

13 years ago

Attached patch WIP v2 — Details — Splinter Review

Now with a python script which, miraculously, seems to translate the offsets properly.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Attachment #563797 - Attachment is obsolete: true

Mike Hommey [:glandium]

Comment 13

•

13 years ago

(In reply to Justin Lebar [:jlebar] from comment #12) > Created attachment 563849 [details] [diff] [review] [diff] [details] [review] > WIP v2 > > Now with a python script which, miraculously, seems to translate the offsets > properly. Speaking of a script that translates offsets, I seem to remember we have one in the tree already. Or maybe it was in the automation scripts.

Justin Lebar (not reading bugmail)

Reporter

Comment 14

•

13 years ago

There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only translates "lib+addr".

Mike Hommey [:glandium]

Comment 15

•

13 years ago

(In reply to Justin Lebar [:jlebar] from comment #14) > There's fix-linux-stack.pl, but that doesn't translate raw PCs; it only > translates "lib+addr". Well, you have libs, you have their base address, you have pc... you could output lib+addr :)

Justin Lebar (not reading bugmail)

Reporter

Comment 16

•

13 years ago

Well, yeah. But piping to fix-linux-stack.pl is about as hard as piping to addr2line. :)

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 691174

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 691176

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 691189

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 691192

Justin Lebar (not reading bugmail)

Reporter

Comment 17

•

13 years ago

I've been thinking about figuring out how to assign "blame" for fragmentation. The intuitive thing to do would be to look at the heap, find pages with just a few live objects, and blame those objects for fragmentation. But I think this is wrong. Those objects have to live *somewhere*, and it's not their fault that they live on a mostly-empty page. So we need to look at dead objects, not live objects. Probably the simplest heuristic is to blame the most-recently dead object at each address on each page which has at least one live allocation, but I'm not sure that's right, because it ignores the allocator's bucketing of allocations by size and whatnot...

Justin Lebar (not reading bugmail)

Reporter

Comment 18

•

13 years ago

I guess the correct definition of "how bad is this allocation site?" is "how many fewer pages would be live if we hadn't made any allocations at that site?". Our goal is to approximate this tractably.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

13 years ago

Blocks: 746009

Justin Lebar (not reading bugmail)

Reporter

Comment 19

•

13 years ago

Attached patch WIP v3 — Details — Splinter Review

I have no idea what these changes to rules.mk are for. But anyway, this works well enough. Linux only.

Justin Lebar (not reading bugmail)

Reporter

Comment 20

•

12 years ago

This is now simple to do with replace-malloc. It's what we rely on for new DMD. In any case I'm not looking at this anymore.

Assignee: justin.lebar+bug → nobody

Eric Rahm [:erahm]

Comment 21

•

7 years ago

DMD's cumulative heap profiling covers this.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.