[The following text is from bug 563700 comment 23]
I was thinking some more about how to get *serious* about hunting down all the heap-unclassified bytes.
You'd add instrumentation code to maintain a data structure that records 1 bit of information for every malloc'd byte. A 0 means "not reported", a 1 means "reported". You'd also add instrumentation code to record, for every heap block, the stack trace at its allocation point.
Then, you'd modify every heap memory reporter so that when it is queried, it sets the "reported" bits of all the heap bytes it counts.
Once about:memory was loaded, you'd iterate over all the heap blocks. Any heap block fully reported would be ignored. Any heap block partially or not-at-all reported would be recorded. You'd aggregate repeated stack traces for recorded heap blocks and present the stack traces in order so that the ones responsible for the most unreported bytes are shown first.
Also, while the memory reporters are doing their thing, you'd complain about any byte on the heap whose "reported" bit was set more than once -- such bytes are double-counted. You could print out its stack trace and the name of the reporter that was second to count it.
Tracking one-bit-per-byte metadata and per-heap-block stack traces is something that Valgrind excels at, BTW. A Valgrind tool that implemented this analysis would have to use its own heap allocator rather than jemalloc (that's just how Valgrind works), so it would have to be careful about the rounding up of request sizes, but it could definitely be made to work. Memory reporters would use client requests to tell the tool which bytes have been accounted for. A client request would also be used to tell the tool when all the memory reporters have been consulted.
In addition to the Valgrind tool, each memory reporter will have to be modified to include annotations that tell the tool which heap bytes have been accounted for. These annotations wouldn't be landed.
(In reply to Nicholas Nethercote [:njn] from comment #1)
> In addition to the Valgrind tool, each memory reporter will have to be
> modified to include annotations that tell the tool which heap bytes have
> been accounted for. These annotations wouldn't be landed.
Why not? If they're not landed, then:
* I have to patch FF to run the tool, and
* that patch probably will bitrot.
I presume this is something we're going to want to run every once in a while; it's not fix once and forget.
Well, let's see how intrusive they are.
Created attachment 553103 [details] [diff] [review]
This is an in-progress version of DMD. Features:
- Reports unreported heap blocks.
- Reports double-reported heap blocks.
- Reports how many bytes each annotated reporter reported. This is useful for cross-checking with about:memory.
- Tracks both requested and slop bytes; emulates jemalloc's round-up behaviour.
- Has a couple of regression tests.
This patch applies to an SVN trunk version of Valgrind (I have r11976). If you want to use it, you'll need to follow the usual Valgrind tricks for Firefox, see https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind. Trunk builds of Valgrind are easy, follow the instructions in Valgrind's top-level README file.
You'll need the --tool=exp-dmd flag, and you probably want to set --num-callers to something low like 6, otherwise lots of records that could be sensibly merged won't be. (Even with 6 you'll still get some like that, but if you ask for fewer than 6 sometimes the stack traces will be too shallow to be useful.)
Created attachment 553104 [details] [diff] [review]
Firefox annotations, v1
Some DMD annotations for Firefox. Many more need to be done.
Created attachment 554346 [details] [diff] [review]
Created attachment 554347 [details] [diff] [review]
Firefox annotations, v2
Did you mean to change this from [MemShrink:P1] to [MemShrink]? If so, why?
I did, I want to re-triage it. The bug obviously shouldn't be closed, but I don't think it needs to be a P1 any more now that it's in a state where it's spitting out useful numbers.
(In reply to Nicholas Nethercote [:njn] from comment #9)
> I did, I want to re-triage it. The bug obviously shouldn't be closed, but I
> don't think it needs to be a P1 any more now that it's in a state where it's
> spitting out useful numbers.
Might we call it fixed?
(In reply to Kyle Huey [:khuey] (firstname.lastname@example.org) from comment #10)
> Might we call it fixed?
I'll still be posting new versions of it, as well as new versions of the patch annotating Firefox. Marking it closed when I'll still be working on it doesn't feel right.
Created attachment 569249 [details] [diff] [review]
This version also tracks allocations done with mmap, which led me to bug 696690.
Created attachment 570932 [details] [diff] [review]
Firefox annotations, v3
Created attachment 571888 [details] [diff] [review]
Created attachment 571889 [details] [diff] [review]
Firefox annotations, v4
Comment on attachment 571888 [details] [diff] [review]
Bug 704400 folded DMD into the tree, so this patch is no longer needed.
Comment on attachment 571889 [details] [diff] [review]
Firefox annotations, v4
Ugh, wrong patch.
Created attachment 580310 [details] [diff] [review]
DMDV (i.e. the Valgrind version of DMD) is now in the tree. There's no need to keep this bug open.