Closed Bug 1014346 Opened 10 years ago Closed 9 years ago

DMD: add ability to find SCCs in the heap graph

Categories

(Core :: DMD, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: mccr8)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 obsolete files)

dbaron described on dev-platform one way he still uses trace-malloc: > ... being able to find the root strongly connected components > of the memory graph, which is useful for finding leaks in other > systems (e.g., leaks of trees of GTK widget objects) that aren't > hooked up to cycle collection. It's occasionally even a faster way > of debugging non-CC but nsTraceRefcnt-logged reference counted > objects. > > [This features works] Roughly the same way a conservative collector would -- > assuming any word-aligned memory in one object in the heap that contains > something that's the address of something else in the heap (including in the > interior of the allocation) is a pointer to that object in the heap. > > (It's actually done in the leaksoup tool outside of trace-malloc.)
I've started implementing this.
Assignee: nobody → continuation
The advantage of a DMD-style tool here is that it will work on all objects, subject to limitations of sampling of allocations. It could also be used to create an LSAN-style tool that gives you the allocation stacks for blocks that leak without any references.
(In reply to Andrew McCreight [:mccr8] from comment #2) > The advantage of a DMD-style tool here is that it will work on all objects, "all objects"... as opposed to what? > subject to limitations of sampling of allocations. The sampling can be avoided by using DMD='--sample-below=1'.
(In reply to Nicholas Nethercote [:njn] from comment #3) > (In reply to Andrew McCreight [:mccr8] from comment #2) > > The advantage of a DMD-style tool here is that it will work on all objects, > > "all objects"... as opposed to what? nsTraceRefCnt (and the leak logging stuff in general) only works on objects that opt-in. Now, that includes all refcounted things (so it is pretty thorough), but it doesn't include things like random char buffers. That's why LSAN has been able to find leaks missed by the other shutdown leak detection. > The sampling can be avoided by using DMD='--sample-below=1'. Yeah, erahm mentioned that. I'll just have to see how slow it is in practice. For most of the scenarios I'm interested in, there's a test case that doesn't take more than a few seconds to run, so I can tolerate a fairly huge slowdown. And I suppose there's probably additional speedups that can be had in the no-sampling approach, should that be a problem.
But the comparison here is with trace-malloc, which hooks malloc (etc.) and looks at all allocations.
Oh, okay, I misread comment 0. I wish I'd known all this stuff trace-malloc can do earlier. ;)
This implements a new function dmd::DumpHeapGraph. It iterates over all of the live blocks, computing the largest and smallest block address. Then it adds all of the blocks to a splay tree, where each node in the tree is a range of addresses covering the entire block. Then we look at each individual block, and scan every pointer aligned thing stored in them. For each of them, we first check that it is inside the global address range we found in the first pass. Then we look it up in the splay tree. If it is within a block, we clamp the value to the actual block start, and add it to the set of edges for that block. Then we print out the block and its edges. I added assertions to the splay tree insertion to check that we have no overlapping ranges. When sampling is on, we hit that assertion. Disabling sampling fixes it. I don't understand enough about how sampling actually works to understand if that's bad or not. Anyways, the next thing to do is to print out the allocation traces for every block. This is slightly different than what TraceRecord does right now, because that only records the number of blocks, not the actual addresses of the blocks, but that shouldn't be hard to do.
You can capture the graph by doing something like this: MOZCONFIG=~/dmdconfig DYLD_INSERT_LIBRARIES=obj-dmd-dbg.noindex/dist/lib/libdmd.dylib LD_LIBRARY_PATH=obj-dmd-dbg.noindex/ DMD="--sample-below=1" ./mach run -P debug | grep OBJECT > graph.txt graph.txt has entries that look like OBJECT 0x111f08c40 EDGES 0x111f09300 0x10fedd870 0x111f747c0 0x1005e52d0 0x11ee31600
> I added assertions to the splay tree insertion to check that we have no > overlapping ranges. When sampling is on, we hit that assertion. Disabling > sampling fixes it. I don't understand enough about how sampling actually > works to understand if that's bad or not. That makes sense. When sampling, not all small heap blocks get recorded. For the small blocks that are recorded, the size we record for that block represents both itself and all the small heap blocks we recently skipped. So dmd::DumpHeapGraph() should check the sample-below value and immediately fail if it's not 1.
Oh, I see. I could also use the actual size, somehow. But yeah, it would still produce odd results, so better to just fail.
> I could also use the actual size, somehow. Computing SCCs of an incomplete graph won't give useful results :)
This version locks the actual addresses of the blocks along with the stack traces, moves the logging into ShutdownXPCOM, and cleans up various other things. Note that the file paths for saving logs are hard coded. It creates two logs, one for stack traces (mostly like the current DMD logs) and one for the heap graph it infers using conservative scanning. You run it with something like this: MOZCONFIG=~/dmdconfig DYLD_INSERT_LIBRARIES=obj-dmd-dbg.noindex/dist/lib/libdmd.dylib LD_LIBRARY_PATH=obj-dmd-dbg.noindex/ DMD=1 ./mach run -P debug I implemented a few Python scripts to parse the log files and do a little bit of crude analysis: https://github.com/amccreight/heapgraph/tree/master/dmd I tried using this to investigate bug 884212. Poking around, I was able to figure out which object corresponded to the leaking MediaRule, from a combination of the stack traces and the size of the object. Unfortunately, there are no references to it in the graph, so either the analysis is messed up somehow, or there's some mangled refcounting of the object. The former is more likely. So this wasn't really that useful. The output of the tool looks like this: Logical-Framework:2 amccreight$ python ~/heapgraph/dmd/analyzer.py graph.txt outlive.txt 0x1297f6ba0 1 addr= 0x1297f6ba0 size= 72 --> 2, 5 (anonymous namespace)::CSSParserImpl::ParseMediaRule(void (*)(mozilla::css::Rule*, void*), void*) (mozalloc.h:201, in XUL (anonymous namespace)::CSSParserImpl::ParseAtRule(void (*)(mozilla::css::Rule*, void*), void*, bool) (nsCSSParser.cpp:2591, in XUL (anonymous namespace)::CSSParserImpl::ParseGroupRule(mozilla::css::GroupRule*, void (*)(mozilla::css::Rule*, void*), void*) (nsCSSParser.cpp:2353, in (anonymous namespace)::CSSParserImpl::ParseMediaRule(void (*)(mozilla::css::Rule*, void*), void*) (nsCSSParser.cpp:3061, in XUL 2 addr= 0x12922bec0 size= 64 --> 3, 5 (anonymous namespace)::CSSParserImpl::ParseMediaRule(void (*)(mozilla::css::Rule*, void*), void*) (mozalloc.h:201, in XUL (anonymous namespace)::CSSParserImpl::ParseAtRule(void (*)(mozilla::css::Rule*, void*), void*, bool) (nsCSSParser.cpp:2591, in XUL (anonymous namespace)::CSSParserImpl::ParseGroupRule(mozilla::css::GroupRule*, void (*)(mozilla::css::Rule*, void*), void*) (nsCSSParser.cpp:2353, in (anonymous namespace)::CSSParserImpl::ParseMediaRule(void (*)(mozilla::css::Rule*, void*), void*) (nsCSSParser.cpp:3061, in XUL 3 addr= 0x129790e40 size= 16 --> 4 nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity(unsigned long, unsigned long) (nsTArray.h:204, in XUL nsAutoPtr<nsMediaQuery>* nsTArray_Impl<nsAutoPtr<nsMediaQuery>, nsTArrayInfallibleAllocator>::AppendElements<nsMediaQuery*>(nsMediaQuery* const*, unsi (anonymous namespace)::CSSParserImpl::GatherMedia(nsMediaList*, bool) (nsCSSParser.cpp:2801, in XUL (anonymous namespace)::CSSParserImpl::ParseMediaRule(void (*)(mozilla::css::Rule*, void*), void*) (nsCSSParser.cpp:3056, in XUL ... Object 1 points to objects 2 and 5, is at address 0x1297f6ba0, and has size 72. Then I show the top 4 stack frames of its allocation, after filtering out useless frames like malloc.
Attachment #8427362 - Attachment is obsolete: true
I used this tool in bug 1015662 to analyze a ownership cycle that passed through non-CCed stuff, if somebody wants an example of how it might be used.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
\o/ Refgraph can finally be retired in favor of DMD. More infrastructure sharing more better.
See Also: → 704240
The current version sticks all out edges into a set, which reduces the size of the graph but makes it impossible to figure out which field the pointer actually corresponds to. I hacked up a version that adds an alternate mode where you output something for every field (and updated the scripts to more or less deal with it). If the conservative scan fails to find a block at some offset, it just outputs 0 for that field. It seems to produce a more or less sensible output. This is kind of an intermediate point between the patch I posted above, and what trace-malloc does, which is output raw memory into the log.
I'll upload my patches and you can review them! :)
Depends on: 1058178
Comment on attachment 8428160 [details] [diff] [review] Implement conservative heap scanning for DMD-captured blocks. Updated patches in bug 1058178.
Attachment #8428160 - Attachment is obsolete: true
mccr8: I think this can be closed now?
Flags: needinfo?(continuation)
I didn't fix this as filed, but I did add a conservative heap scanning option to DMD, and an analysis script that uses the heap log to show what objects point to a particular object. I've used these tools to investigate and fix a number of leaks. The difference is that there is no particular SCC analysis in there. I feel like in the post cycle collector world it isn't as useful, because most interesting objects are going to be hooked up to cycle collection. My leak investigations take a CC log and a DMD log, then use the former to find what is entraining things, which seems sufficient.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(continuation)
Resolution: --- → WONTFIX
Ah, I see the description in comment 0 talks about non-CCed stuff. I guess I'll think about whether having this might be useful. It would just be an analysis on top of my existing DMD heap scan analysis logs. Really, I think the next thing to improve here would be the mapping of heap block information back to source, both by figuring out offsets of fields and the class of the object, somehow.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: