Closed Bug 1058178 Opened 10 years ago Closed 9 years ago

DMD: Add heap scanning mode

Categories

(Core :: DMD, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

(Keywords: dev-doc-complete, Whiteboard: [MemShrink:P2])

Attachments

(3 files, 28 obsolete files)

2.47 KB, patch
Details | Diff | Splinter Review
19.83 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
14.04 KB, patch
n.nethercote
: review+
Details | Diff | Splinter Review
Using information DMD stores about the set of live heap blocks, we can conservatively scan the set of live blocks to see how they may point to other live blocks, and log this to a file.  This, in combination with the allocation stacks DMD already records for live blocks, may be an easier way to investigate difficult leaks than refcount logging.

Spinning this off from bug 1014346, which isn't quite the same thing.
Whiteboard: [MemShrink]
Depends on: 1058169
I had a little fast path set up for the conservative scanner, which computes the largest and smallest address of any allocated block, so we can avoid a splay tree lookup.  It does fairly well, with about a 92% hit rate on a simple open-and-close browser session, but there was no measurable improvement to the speed, so I think I will omit it.  Something to keep in mind for the future.
I implemented a new script:
  https://github.com/amccreight/heapgraph/blob/master/dmd/clamp_to_block.py

This takes two files: a DMD log of live blocks and a random text file.  It looks in the text file for anything that looks like an address, then checks to see if the address is part of any of the live blocks.  If it is, it replaces the address with the address of the block.

The idea here is that a cycle collector log contains a bunch of pointers, but they are pointers into the middle of an object, due to vtables etc., so it is hard to relate DMD addresses to CC addresses.  The idea is that once you run the clamp script on the CC log, this can be done easily.  I haven't tried it in practice yet.
I've used the tools here in bug 1060526 to narrow down the problem.

bz points out in IRC that debugging information like pahole should be able to map offsets to field names, which would make the output more useful.
Depends on: 855878
My current thought is to just dump memory out a word at a time, and then do the conservative analysis in a post-processing script (I already have such a script written).  That's what trace malloc does.  That will make the logs bigger, but it will simplify the C++ DMD code, as most of the current complexity is building the splay tree of address ranges.
Attachment #8480137 - Attachment is obsolete: true
Attachment #8480138 - Attachment is obsolete: true
Attachment #8480139 - Attachment is obsolete: true
Whiteboard: [MemShrink] → [MemShrink:P2]
For some things, like nsGlobalWindow, the conservative heap scan and the CC log are matching up nicely, but I'm having trouble with XPCWrappedJS.  I'll have to investigate this, but I suspect I'm not doing the clamping-to-blocks correctly somehow.
Depends on: 1082128
This is very simple now.  When the scanning is set, it adds a new property "addr" to each object that is the address of the object.  It also adds a property "contents" that is a list of the word-aligned values in each object.  That's not the most compact representation in the world, but it is easy to deal with, and in my local experiments it is small enough for developer use.

The actual smarts about clamping the random values to particular addresses is done in a script:
  https://github.com/amccreight/heapgraph/blob/master/dmd/clamp_to_block.py

There's another script that tells you what other blocks point at a particular block:
  https://github.com/amccreight/heapgraph/blob/master/dmd/analyzer.py

I need to add a DMD option to enable scanning.
Attachment #8483034 - Attachment is obsolete: true
Attachment #8483035 - Attachment is obsolete: true
Attachment #8483036 - Attachment is obsolete: true
This is how I've been triggering it for shutdown leaks, though it isn't anywhere near landable yet.
> This is very simple now.

Wow, it really is. Nice.
I should probably write a test or something.  Or maybe just run it with scanning one time, until I get a script landed.
Attachment #8508365 - Attachment is obsolete: true
Attachment #8508366 - Attachment is obsolete: true
Thanks to erahm for pointing out that the shouldScan flag is more of an option for the log invocation, not the DMD invocation.
Yes please on the test front. memory/replace/dmd/test/test_dmd.js is where you should look.

But note that waiting until bug 1076446 re-lands will make things easier and avoid conflicts. It should happen on Monday. (Sorry for the delay; to summarize, the current incredible slowness of the Mac test machines is being an *enormous* pain w.r.t. that bug.)
There's no particular hurry.  Now that the patch is small, it is easy to rebase.
Slightly less hacky now. It uses environment variables to control where to log the file, and to control which kind of processes get logged.

Also, it now runs after bloat logging is shutdown, which removes a lot of noisiness from BloatView entries.
Attachment #8511198 - Attachment is obsolete: true
I changed the selection of scanning over to being a DMD mode. Maybe that doesn't entirely make sense, but it feels a little more consistent with what we're doing. Maybe that's wrong. It does at least avoid passing around a boolean everywhere. This patch also updates dmd.py to at least make mode=contents behave like mode=live, but does not change the output otherwise. I did update my two DMD heap scan scripts, the one that analyzes who points to a block, and the one that clamps the addresses in a log, in my github repo.
Attachment #8511197 - Attachment is obsolete: true
Nick, do you have a preference about having a new DMD mode or having a new flag that you pass in when generating the report?  The mode avoids adding an extra thing to tweak somehow, but doing a heap scan only affects the report, not the actual gathering of data before that, so it isn't entirely consistent.
Flags: needinfo?(n.nethercote)
I prefer a new mode. It's less flexible but keeps things simpler -- you choose the mode once at start-up, and then you don't have to make any further choices when dumping the data to file or when post-processing the data.

(We discussed this back when I first suggested adding modes, though that was a while ago.)
Flags: needinfo?(n.nethercote)
Depends on: 1171647
This is a new version of the patch, on top of the changes in bug 1171647. As you can see, there is now very little boilerplate to add a new mode.
Attachment #8511408 - Attachment is obsolete: true
Attachment #8609657 - Attachment is obsolete: true
I still need to add some testing.
Attachment #8615564 - Attachment is obsolete: true
New version for the simpler variant of bug 1171647.
I added some tests. It checks that the actual values in memory that are read out are correct. I also renamed it to scan mode, as that is less weird.
Attachment #8615570 - Attachment is obsolete: true
Attachment #8616416 - Attachment is obsolete: true
I changed my mind and added the address clamping to the default dmd.py script. I think it makes sense to have it there because it is akin to the stack fixing script, where you run it once per log. It is hideously slow on my machine (taking over a minute) mostly due to the JSON dumping call, so I left it off by default. I added a test that the clamping works correctly.
Summary: DMD: Conservative heap logs → DMD: Add heap scanning mode
This implements a new "scan" mode for DMD that records the address
and contents of every live unsampled block in the DMD log. This
enables the low-level analysis of references from one block to
another, which can help leak investigations.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3c66f4735055

try with an older version: https://treeherder.mozilla.org/#/jobs?repo=try&revision=90371bf36d2d

I still need to make sure my latest version works on Windows but I think it should be okay.

This does not do the basic DMD tests on the scan log. The problem is that the test writes out a number of pointer sized values, so the actual size of the block varies from 32-bit to 64-bit. I could write out different values depending on the platform, but then the contents would vary. scan mode only adds a couple of fields, so I think it is okay to not test it.
Attachment #8621292 - Flags: review?(n.nethercote)
This adds a new option --clamp-contents to dmd.py. This replaces every value
contained in the memory contents in the log with a pointer to the start of a live
block, if the value is a pointer into the middle of that block. All other values
are replaced with 0. This conservative analysis makes it easier to determine
which blocks point to other blocks.
Attachment #8615571 - Attachment is obsolete: true
Attachment #8620687 - Attachment is obsolete: true
Attachment #8620688 - Attachment is obsolete: true
Attachment #8621294 - Flags: review?(n.nethercote)
Comment on attachment 8621292 [details] [diff] [review]
part 1 - Implement DMD heap scanning mode.

Review of attachment 8621292 [details] [diff] [review]:
-----------------------------------------------------------------

This all looks fine except for the testing stuff which is awful :)  Is there a problem with having non-deterministic addresses in the output? Or is the problem the fact that you are using a different script for the post-processing? If it's the latter, then that's a strong argument for merging that script with this one.

::: memory/replace/dmd/DMD.cpp
@@ +1780,5 @@
> +{
> +public:
> +  const char* ToPtrString(const void* aPtr)
> +  {
> +    snprintf(kPtrBuf, sizeof kPtrBuf - 1, "%" PRIxPTR, (uintptr_t)aPtr);

Please put parentheses around |kPtrBuf| to make the precedence clearer.

@@ +1786,5 @@
> +  }
> +
> +private:
> +  static const size_t kPtrBufLen = 32;
> +  char kPtrBuf[kPtrBufLen];

kPtrBufLen has a single use. Using 32 directly is fine.

@@ +1792,5 @@
> +
> +static void
> +WriteBlockContents(JSONWriter& aWriter, const LiveBlock& aBlock)
> +{
> +  MOZ_ASSERT(!aBlock.IsSampled(), "Sampled blocks do not have accurate sizes");

You should also have a check in Init() for if --mode=scan is used without --sample-below=1, so the user gets an immediate and more helpful error message.

::: memory/replace/dmd/DMD.h
@@ +198,5 @@
> +//       // recorded. Additionally, any such block that is at least one word
> +//       // long will contain an array "contents" of the values contained in
> +//       // the block, read out one word at a time, including only full words.
> +//       "addr": "4e4e4e4e",
> +//       "contents": ["0", "6", "7f7f7f7f", "0"]

The other field comments follow a particular form. Can you do something more like this, which is consistent with that form:

For "addr": "The address of the block. This is mandatory in "scan" mode, but omitted otherwise."

For "contents": "The contents of the block, read one word at a time. This is mandatory in "scan" mode if the block's length is at least one word, but omitted otherwise."
Attachment #8621292 - Flags: review?(n.nethercote) → feedback+
Comment on attachment 8621294 [details] [diff] [review]
part 2 - Implement address clamping analysis for DMD scan logs.

Review of attachment 8621294 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/replace/dmd/dmd.py
@@ +717,5 @@
> +# Code for clamping addresses using conservative pointer analysis.
> +##################################################################
> +
> +# Start is the address of the first byte of the block, while end is
> +# the first byte past the end of the block.

"end" is unclear... in SpiderMonkey "limit" is often used for "one past the end".

@@ +854,5 @@
> +    tmpFile = tempfile.NamedTemporaryFile(delete=False)
> +    tmpFilename = tmpFile.name
> +    if isZipped:
> +        tmpFile = gzip.GzipFile(filename='', fileobj=tmpFile)
> +    json.dump(j, tmpFile, sort_keys=True)

As you've noticed elsewhere, using Python's JSON dumping will completely change the whitespace layout of the file, and this will potentially screw up stack-fixing, because stack-fixing relies on the stack traces having a particular form. (The stack-fixing scripts don't care that the input is JSON, and just look directly for stack trace info that they rewrite in-place.)

Bleh, this is all so hacky but I'm having trouble thinking of a better way to do it :(
(In reply to Nicholas Nethercote [:njn] from comment #32)
> This all looks fine except for the testing stuff which is awful :)  Is there
> a problem with having non-deterministic addresses in the output? Or is the
> problem the fact that you are using a different script for the
> post-processing? If it's the latter, then that's a strong argument for
> merging that script with this one.

The only script I have is one that takes a block address as an argument and prints out information about each block that points to that block, if any, including the size, the offset of the pointers within that block, and allocation stack. That is not ideal from a testing perspective, because it only gives very weak information about what is actually in the log, particularly since the DMD testing regime is to elide the stack. The size and offsets will also be different depending on the platform. So you'd need some kind of testing mode that doesn't require you pass in an address (or dig around in the raw JSON for the address), and then doesn't print out the address, and somehow massages the output so that the output is in words instead of bytes or something.

> You should also have a check in Init() for if --mode=scan is used without
> --sample-below=1, so the user gets an immediate and more helpful error
> message.
The call to WriteBlockContents() is guarded on b.IsSampled(), so users should never hit this assert. The assert is there because at various points in the development I forgot about this restriction, and it produces really weird output, because you are reading far past the end of the actual memory block.

That said, the actual processing script throws immediately when given a scan log where sampling is disabled, so I should handle this better. The easiest thing to do, now that I think about it, is just to manually set mSampleBelowSize to 1 at the end of the Options ctor if scan mode is set. You always want to disable sampling when doing scanning. How does that sound?

> The other field comments follow a particular form. Can you do something more
> like this, which is consistent with that form:
Oh, sorry. I forgot about that, and I find the style really weird. ;) I've updated it mostly to follow your suggestion, thanks.

I've address the rest of the comments.
Attachment #8621292 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #33)
> "end" is unclear... in SpiderMonkey "limit" is often used for "one past the
> end".
I'll update the comment to be more precise about what I mean.

> As you've noticed elsewhere, using Python's JSON dumping will completely
> change the whitespace layout of the file, and this will potentially screw up
> stack-fixing, because stack-fixing relies on the stack traces having a
> particular form. (The stack-fixing scripts don't care that the input is
> JSON, and just look directly for stack trace info that they rewrite
> in-place.)

Yeah, it isn't a big deal, because we can just do stack fixing first, and you are only going to do stack fixing and clamping once. The main drawback is that it renders it so that you can't really open the resulting file in Emacs because it puts everything on a single line. The python JSON thing does have some pretty printing options, but I played around with it a tiny bit and the results were not great.

One alternative would be to write a custom parser (maybe one that uses a JSON parser to read each individual block line), that would let us preserve the white space better, but of course that would make it slightly more fragile. I'll take a crack at that I guess.

> Bleh, this is all so hacky but I'm having trouble thinking of a better way
> to do it :(

Yeah...
Removing trailing zeros from objects seems to be pretty effective. It reduces the uncompressed size of this log file I have from 48MB to 33MB, which also reduces the processing time from about a minute to 44 seconds.
By "trailing zeros", I mean all entries at the end of the list that make up "contents" that are "0".
Attachment #8621294 - Flags: review?(n.nethercote)
Minor change to the test to make more of a difference when dropping trailing nulls.
Attachment #8621792 - Attachment is obsolete: true
In this version, I implemented a pretty printer for DMD JSON output. This means that the output will look more or less like the original output, so you can open the file without Emacs freezing, and Linux stack fixing should hopefully still work on the output.

I also trimmed trailing nulls, which shrinks the file and speeds up the clamping process. We only care about pointers from one block to another in a clamped file, so this should be okay. I suspect a lot of the size savings from this are giant blocks of memory used for graphics stuff.

It also fixes a bug in the previous version that meant I wasn't actually testing clamped mode. Oops!
Attachment #8621294 - Attachment is obsolete: true
In theory DMD's output format is JSON, which is great because it's so ubiquitous and easy to manipulate. But in practice that's a lie, and it's actually this weird JSON dialect that requires particular whitespace layout so that stack fixing can work. Which is pretty sucky. With effort it would probably be possible to tweak the fix-stacks script so that you can call into them to transform individual stack traces rather than running them wholesale across the input. Then the whitespace wouldn't matter.

> I implemented a pretty printer for DMD JSON output.

Oof. Sorry you had to do that. But Python putting lots of JSON on a single line is pretty daft.

> The size and offsets will also be different depending on the platform.

Would running that test only on 64-bit be an option? I don't know if that's a good idea or not. But having strict 64-bit only tests may be better than weak all-platform tests.
It sounds like I should check that my patch actually is compatible with stack fixing, if it is that fragile.

(In reply to Nicholas Nethercote [:njn] from comment #41)
> Oof. Sorry you had to do that. But Python putting lots of JSON on a single
> line is pretty daft.
Yeah, like I said there's some kind of pretty printing options for Json.dump, but I didn't like the output. I think the problem is that it puts the leaves each on their own line, whereas I'd prefer they are all on the same line, because there aren't that many.

The pretty printer was quite simple to do, and it should be reusable for any other JSON-level transformations we might come up with in the future.

> Would running that test only on 64-bit be an option? I don't know if that's
> a good idea or not. But having strict 64-bit only tests may be better than
> weak all-platform tests.

That sounds like a reasonable idea. I'll see if I can figure out how to do that.
This version detects whether it is 32- or 64-bit, which then lets it specify which test file to use. Hopefully that will work.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcefc8cbf35b
Attachment #8622762 - Attachment is obsolete: true
Attachment #8622767 - Attachment is obsolete: true
Forgot to address a few minor comments.
Attachment #8624488 - Attachment is obsolete: true
Attachment #8624489 - Attachment is obsolete: true
The reason you need basicScanTesting in dmd.py is that dmd.py does not produce output with any scan-specific data for scan mode. And if it did, it would contain pointer data that varies between runs, which makes it hard to match against expected output.

What about just getting rid of basicScanTesting and --basic-scan-testing? It would leave a hole in the test coverage, but the coupling between TestScan in SmokeDMD.cpp and basicScanTesting is so tight that it's hardly meaningful at the moment anyway.
Alternatively, --filter-stack-for-testing could be modified (and renamed) to filter addresses as well?
Update to the latest version. I think the main problem was a bug in basic-scan-32-expected.txt.
Attachment #8624518 - Attachment is obsolete: true
Attachment #8624519 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #48)
> The reason you need basicScanTesting in dmd.py is that dmd.py does not
> produce output with any scan-specific data for scan mode. And if it did, it
> would contain pointer data that varies between runs, which makes it hard to
> match against expected output.

Yes, that is part of the problem. The larger reason for it is to make sure that the scan is producing the correct results. I don't have any particular idea of how that would be actually useful to a developer, but I think it is still important to check that we're producing the correct results.

> What about just getting rid of basicScanTesting and --basic-scan-testing? It
> would leave a hole in the test coverage, but the coupling between TestScan
> in SmokeDMD.cpp and basicScanTesting is so tight that it's hardly meaningful
> at the moment anyway.

The goal of basicScanTesting is twofold:
1. Does the C++ code in DMD.cpp produce the right output?
2. Does the address clamping code work (and not totally mangle the JSON)?

Aside from the address clamping, it doesn't really have anything to do with the usual job of dmd.py. I could split it out into a separate script if you prefer. I mostly just shoved the JSON checking in there out of convenience.
A separate script sounds good. Thank you.
I split the testing code into a new file. I'm waiting for try results before I put it up for review.
Attachment #8625776 - Attachment is obsolete: true
Attachment #8625777 - Attachment is obsolete: true
Comment on attachment 8627388 [details] [diff] [review]
part 1 - Implement DMD heap scanning mode.

L32 and L64 are green at least: https://treeherder.mozilla.org/#/jobs?repo=try&revision=efe447f9fe8b
Attachment #8627388 - Flags: review?(n.nethercote)
Attachment #8627389 - Flags: review?(n.nethercote)
Comment on attachment 8627388 [details] [diff] [review]
part 1 - Implement DMD heap scanning mode.

Review of attachment 8627388 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/replace/dmd/test/scan-test.py
@@ +24,5 @@
> +'''
> +    p = argparse.ArgumentParser(description=description)
> +
> +    p.add_argument('input_file',
> +                   help='a file produced by DMD')

Nit: odd line break?
Attachment #8627388 - Flags: review?(n.nethercote) → review+
Attachment #8627389 - Flags: review?(n.nethercote) → review+
Thank you for your patience :)
(In reply to Nicholas Nethercote [:njn] from comment #57)
> Nit: odd line break?

I just copied this from dmd.py. I think it looks weird when there aren't a ton of other options. I'll just leave it as is...

(In reply to Nicholas Nethercote [:njn] from comment #58)
> Thank you for your patience :)

Thanks for all of your feedback!
I should document this on the DMD page.
Flags: needinfo?(continuation)
Keywords: dev-doc-needed
Blocks: 1179042
I added some basic documentation of heap scan mode and --clamp-contents to https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD
Flags: needinfo?(continuation)
Attachment #8615571 - Attachment is obsolete: false
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: