Closed Bug 822148 Opened 13 years ago Closed 12 years ago

DMD: More refactoring

Categories

(Core :: DMD, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox19 --- fixed
firefox20 --- fixed
b2g18 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Attachments

(9 files)

24.07 KB, patch
justin.lebar+bug
: review+
justin.lebar+bug
: approval-mozilla-aurora+
justin.lebar+bug
: approval-mozilla-b2g18+
Details | Diff | Splinter Review
50.23 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
12.90 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
5.09 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
44.45 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
1.36 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
8.14 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
1.14 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
2.07 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
I have more refactoring planned. This is something of a follow-up to bug 820652.
No longer depends on: 820652
This patch removes the reporter names from output. This saves space (and this will become more important in subsequent patches), and it's redundant info because the MallocSizeOf function's name always matches it, viz: Reported by 'js' at mozilla::dmd::Report(... JsMallocSizeOf(... Reported by 'startup-cache/data' at mozilla::dmd::Report(... mozilla::scache::StartupCacheDataMallocSizeOf(... Reported by 'images' at mozilla::dmd::Report(... ImagesMallocSizeOf(... Reported by 'layout/style-sheet-cache' at mozilla::dmd::Report(... LayoutStyleSheetCacheMallocSizeOf(... Note also that this gets rid of the "reporter names are always static" assumption that you never liked :)
Attachment #693200 - Flags: review?(justin.lebar+bug)
This patch does a big overhaul. It should be the last big one (I hope, except for some forthcoming renamings of things). - Currently double-reports are handled separately from single reports. We put double-reports into a DoubleReportBlockGroup as soon as they are detected, bypassing LiveBlock. This meant that a double-reported block would be listed in the double-report section and also the normal "reported" section. This patch changes this so that LiveBlock can record up to two reports. (This is more like what I used to have, with the union, but I think it's better now. Sorry about the back-and-forth.) As a result, with this patch double-reports aren't recorded immediately. Instead we just split LiveBlocks into three categories at Dump() time: unreported, once-reported, and twice-reported. The big benefit is that LiveBlockKey and DoubleReportBlockKey are merged into a single class (BlockGroupKey), and LiveBlockGroup and DoubleReportBlockGroup are merged into BlockGroup. This avoids a bunch of duplication in Print() methods, in particular. Overall, things are much more uniform and less repetitive. Also, gDoubleReportBlockGroupTable is no longer needed; it's been replaced with a local variable in Dump(). Note that I've changed terminology; I'm now using "unreported", "once-reported", "twice-reported". The only disadvantage from a functionality POV is that it's possible that a block would be twice-reported and then freed before Dump() is called, in which case we will now miss it. That seems very unlikely in the standard use cases, though, because DMD runs right after the memory reporters. - To avoid increasing the size of LiveBlock, I added a TaggedPtr class (which maybe should go in MFBT) and did some field packing. One benefit is that mReqSize is now word-size again, so there's no overflow risk. Note that I haven't backed myself into a corner, space-wise, by putting the 2nd report field in LiveBlock. There's a potential follow-up optimization which would make LiveBlock variable-length -- we'd only allocate space for that 2nd report if needed. This would require changing LiveBlockTable from a HashSet<LiveBlock> to HashSet<LiveBlock*>, but that would help again with space (at the cost of more calls to malloc) because we'd only be wasting one word per unused hash table slot, instead of the current five.
Attachment #693201 - Flags: review?(justin.lebar+bug)
Comment on attachment 693200 [details] [diff] [review] DMD: remove reporter names from output because they're redundant. > 4 files changed, 81 insertions(+), 147 deletions(-) Hard to argue with this! >diff --git a/xpcom/base/nsIMemoryReporter.idl b/xpcom/base/nsIMemoryReporter.idl >--- a/xpcom/base/nsIMemoryReporter.idl >+++ b/xpcom/base/nsIMemoryReporter.idl >-#define MOZ_REPORT(ptr, usable, name) mozilla::dmd::Report(ptr, name) >-#define MOZ_REPORT_ON_ALLOC(ptr, usable, name) mozilla::dmd::ReportOnAlloc(ptr, name) >+#define MOZ_REPORT(ptr, usable, name) mozilla::dmd::Report(ptr) >+#define MOZ_REPORT_ON_ALLOC(ptr, usable, name) mozilla::dmd::ReportOnAlloc(ptr) Is the plan to eventually get rid of the different names for different reporters, and then to get rid of the notion of declared reporters altogether? >diff --git a/memory/replace/dmd/DMD.cpp b/memory/replace/dmd/DMD.cpp >--- a/memory/replace/dmd/DMD.cpp >+++ b/memory/replace/dmd/DMD.cpp >+ const StackTrace* ReportStackTrace() const { return mReportStackTrace; } Maybe we should give up on this no-"Get" business -- ReportStackTrace() does not sound like a getter to me, particularly given the existence of a Report() method below. >+ void Report(Thread* aT, bool aReportedOnAlloc) const; > DoubleReportBlockKey(const StackTrace* aAllocStackTrace, > const StackTrace* aReportStackTrace1, >- const StackTrace* aReportStackTrace2, >- const char* aReporterName1, >- const char* aReporterName2) >+ const StackTrace* aReportStackTrace2) > : mAllocStackTrace(aAllocStackTrace), > mReportStackTrace1(aReportStackTrace1), >- mReportStackTrace2(aReportStackTrace2), >- mReporterName1(aReporterName1), >- mReporterName2(aReporterName2) >+ mReportStackTrace2(aReportStackTrace2) > { >- MOZ_ASSERT(IsSane()); >+ MOZ_ASSERT(mAllocStackTrace && mReportStackTrace1 && mReportStackTrace2); Nit: It's a bit easier to debug these assertions if they're split up into separate assertions around the &&s, but feel free to leave it if you prefer it this way. r+a=me
Attachment #693200 - Flags: review?(justin.lebar+bug)
Attachment #693200 - Flags: review+
Attachment #693200 - Flags: approval-mozilla-b2g18+
Attachment #693200 - Flags: approval-mozilla-aurora+
> Note that I haven't backed myself into a corner, space-wise, by putting the > 2nd report field in LiveBlock. There's a potential follow-up optimization > which would make LiveBlock variable-length -- we'd only allocate space for > that 2nd report if needed. This would require changing LiveBlockTable from > a HashSet<LiveBlock> to HashSet<LiveBlock*>, but that would help again with > space (at the cost of more calls to malloc) because we'd only be wasting one > word per unused hash table slot, instead of the current five. This still isn't optimal, though. Consider just the memory usage for single-reported live blocks, since that dominates (right?). With the approach above, we use One word for each hashtable block One word for each LiveBlock (assuming optimal packing) Whereas if we had a hashtable exclusively devoted to live blocks (i.e. HashTable<LiveBlock> == HashTable<StackTrace*>), we'd use One word for each hashtable block Assuming our hashtable is 2/3 full, the second approach uses 1/(1 + 2/3)) = 3/5 as much memory as the first one.
> Is the plan to eventually get rid of the different names for different > reporters, I'm planning to get rid of the |name| argument to NS_MEMORY_REPORTER_MALLOC_SIZEOF_FUN. That's waiting on the removal of DMDV (and you just reminded me to add a note about that to bug 819819). > and then to get rid of the notion of declared reporters altogether? I'm not sure which sense of "reporters" you're using here... but having a different MallocSizeOf function for each memory reporter (i.e. nsIMemoryReporter or nsIMemoryMultiReporter) is very useful because it lets you identify which memory reporter is responsible for reporting which blocks. This is especially useful when writing a new reporter.
> Assuming our hashtable is 2/3 full, the second approach uses 1/(1 + 2/3)) = > 3/5 as much memory as the first one. To summarize our IRC discussion: - At a minimum, for each LiveBlock we need to store 3 words: mPtr, mReqSize, mAllocStackTrace (plus a couple of other bits that can be squeezed in). - Many blocks are reported once, which requires storing an additional word. - JS hash tables are typically half full, and a half-full HashSet<Block*> with 4 word Blocks (which makes storing a single report straightforward) takes the same space as a half-full HashSet<Block> with 3 words Blocks (which require extra storage for reports). Specifically: HashSet<Block*>: N + 4*N/2 = 3*N HashSet<Block>: 3*N - The extra heap blocks will require some storage within jemalloc, presumably. Hopefully this will be less than the extra space that the 3-word approach would need for reports. - So the HashSet<Block*>-with-4-word-Block seems to be the best, space-wise. It requires extra calls to malloc/free, but hopefully they won't hurt performance noticeably.
Comment on attachment 693201 [details] [diff] [review] (part 2) - DMD: Treat twice-reported blocks more like other blocks. >+ TaggedPtr(T aPtr, bool aTag) >+ : mPtr(aPtr) >+ { >+ MOZ_ASSERT(IsTwoByteAligned(aPtr)); >+ MOZ_ASSERT(aTag <= kTagMask); >+ mUint |= (aTag & kTagMask); Maybe cast aTag to a uintptr_t before doing the assertion and bitwise op? Otherwise this test is relying on the fact that aTag gets cast to the same type when performing <= and & with kTagMask, and I don't know if that's true. >+ void Set(T aPtr, bool aTag) >+ { >+ MOZ_ASSERT(IsTwoByteAligned(aPtr)); >+ mPtr = aPtr; >+ MOZ_ASSERT(aTag <= kTagMask); >+ mUint |= (aTag & kTagMask); >+ } >+ >+ T Ptr() const { return reinterpret_cast<T>(mUint & kPtrMask); } >+ >+ bool Tag() const { return bool(mUint & kTagMask); } Have you considered defining operator* and operator-> on this class? Then you can treat it just like a normal pointer. But maybe the way you have it is good; explicit is better than implicit. > // A live heap block. > class LiveBlock > { > const void* mPtr; >+ const size_t mReqSize; // size requested >+ >+ // Ptr: |mAllocStackTrace| - stack trace where this block was allocated. >+ // Tag bit 0: |mSampled| - was this block sampled? (if so, slop == 0). >+ TaggedPtr<const StackTrace* const> >+ mAllocStackTrace_mSampled0; I guess the 0 at the end of the name is supposed to tell me that big 0 is for mSampled? It would be less distracting to me if that weren't there -- anyway the point of the tagged pointer abstraction is that you don't care where that bit lives. >+ // Ptr: |mReportStackTrace| - stack trace where this block was reported. >+ // Tag bit 0: |mReportedOnAlloc| - was the block reported immediately on >+ // allocation? If so, DMD must not clear the report at the end of Dump(). >+ // |mPtr| is used as the key in LiveBlockTable, so it's ok for this member > // to be |mutable|. >+ mutable TaggedPtr<const StackTrace*> mReportStackTrace_mReportedOnAlloc0[2]; It's clear from the comment why we have two of these in an array. Also, please update the comment ("mPtr"). Maybe mReportStackTraces_mReportedOnAlloc would be a better name. > //--------------------------------------------------------------------------- >-// Live and double-report block groups >+// Block groups > //--------------------------------------------------------------------------- > >-class LiveBlockKey >+class BlockGroupKey > { > public: > const StackTrace* const mAllocStackTrace; // never null > protected: >- const StackTrace* const mReportStackTrace; // nullptr if unreported >+ const StackTrace* const mReportStackTrace1; // nullptr if unreported >+ const StackTrace* const mReportStackTrace2; // nullptr if not 2x-reported It's kind of weird that you use an array[2] earlier and use two separate variables here. This /is/ a lot simpler.
Attachment #693201 - Flags: review?(justin.lebar+bug) → review+
> I guess the 0 at the end of the name is supposed to tell me that big 0 is for > mSampled? s/big/bit
> Have you considered defining operator* and operator-> on this class? Then > you can treat it just like a normal pointer. > > But maybe the way you have it is good; explicit is better than implicit. I left it explicit. > Also, please update the comment ("mPtr"). That comment is still valid.
Having removed the distinction between LiveBlockGroup and DoubleReportBlockGroup, the "Live" in "LiveBlock" is no longer needed. This patch removes it.
Attachment #694122 - Flags: review?(justin.lebar+bug)
> That comment is still valid. Ah, I see; I got confused by the diff. Sorry!
Trivial patch.
Attachment #694125 - Flags: review?(justin.lebar+bug)
Attachment #694125 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 694122 [details] [diff] [review] (part 3) - DMD: Rename |LiveBlock| as |Block|. > // A live heap block. >-class LiveBlock >+class Block Do you want to change the comment? (Now we might think "live as opposed to what?")
Attachment #694122 - Flags: review?(justin.lebar+bug) → review+
jlebar, in bug 717853 comment 128 you said: > The nomenclature and comments are confusing here. > > We say a FrameGroup is "a group of one or more stack frames with a common > PC." But it seems to me that it's actually an aggregation of one or more > /blocks/ whose stacks all share a common PC. > > Of course, if we describe it that way, it's not clear why this isn't called > a "block group", since it's a group of blocks. So that suggests BlockGroup > may not be the right name, either. > > Maybe something like BlocksAggregatedByStack and BlocksAggregatedByFrame > would be better? That would make explicit the parallels between these two > classes. You're right. BlockGroup would be more accurately named BlocksWithMatchingAllocAndReportTraces, and FrameGroup would be more accurately named BlocksWithMatchingAllocFrame. Those names are too long, esp. considering we need concise colloquial names for these things in output lines like this: Unreported: 1,773 blocks and 1,150 block groups in frame group 1 of 5,660 I think a better way forward is to think of these structures not as a collection of blocks, but as a unit of output, which I'll call a "record". Colloquially (i.e. in output messages) we would have "trace records" (which contain one or more traces) and "frame records" (which contain one frame). For example: Unreported: 1,773 blocks from 1,150 trace records in frame record 1 of 5,660 The necessary code changes are as follows: - BlockGroup --> TraceRecord - BlockGroupKey --> TraceRecordKey - BlockGroupTable --> TraceRecordTable - FrameGroup --> FrameRecord - FrameGroupTable --> FrameRecordTable - GroupSize --> RecordSize This patch just makes those changes, plus some related ones (e.g. variable names), but no functional changes.
Attachment #694132 - Flags: review?(justin.lebar+bug)
> > // A live heap block. > >-class LiveBlock > >+class Block > > Do you want to change the comment? (Now we might think "live as opposed to > what?") As opposed to a freed block. It's not a distinction worth making in the name, but is useful in the comment, IMO.
Occasionally we have a PC for which ns_DescribeCodeAddress() gives us nothing useful. In that case we currently print a frame like this: ???[ +0x0] This leads to fix-linux-stack.pl complaining about "" being a non-existent file. This patch just changes it so useless entries like this are instead printed as: ??? which fix-linux-stack.pl ignores.
Attachment #694135 - Flags: review?(justin.lebar+bug)
DMD's summary indicates how many bytes were {un,once-,twice-}reported. This patch augments that with the corresponding numbers of heap blocks, just for interest's sake. Here's an example after triggering memory reports just after start-up: Total: ~53,271,760 bytes (100.00%) in ~6,339 blocks (100.00%) Unreported: ~12,049,017 bytes ( 22.62%) in ~2,215 blocks ( 34.94%) Once-reported: ~41,222,743 bytes ( 77.38%) in ~4,124 blocks ( 65.06%) Twice-reported: ~0 bytes ( 0.00%) in ~0 blocks ( 0.00%) The 6,339 matches the block table count, which is encouraging: Block table: 786,512 bytes (16,384 entries, 6,339 used)
Attachment #694144 - Flags: review?(justin.lebar+bug)
Use Percent() where possible, because it avoids div-by-zero problems. (This patch was prompted by the "NaN%" on the last line of http://people.mozilla.com/~kgupta/bug/769761-dmd.txt.)
Attachment #694148 - Flags: review?(justin.lebar+bug)
jlebar wanted this. (And that's the last patch from me for this bug :)
Attachment #694151 - Flags: review?(justin.lebar+bug)
Comment on attachment 694151 [details] [diff] [review] (part 9) - DMD: remove a friend declaration. Yay, thanks.
Attachment #694151 - Flags: review?(justin.lebar+bug) → review+
Attachment #694148 - Flags: review?(justin.lebar+bug) → review+
Attachment #694144 - Flags: review?(justin.lebar+bug) → review+
Attachment #694135 - Flags: review?(justin.lebar+bug) → review+
> It's clear from the comment why we have two of these in an array. Sorry, I meant it's /not/ clear why we two of these in the array.
WRT part 5: Working backwards from the output seems like a reasonable approach to me. > Unreported: 1,773 blocks from 1,150 trace records in frame record 1 of 5,660 > 144 bytes (144 requested / 0 slop) > 10.11% of the heap (64.04% cumulative); 10.11% of unreported (64.04% cumulative) > Allocated at > malloc() at jemalloc.c:123 It's not at all clear to me what a "trace record" or a "frame record" is, particularly because nobody talks about "traces" and "frames" but instead we say "stack traces" and "stack frames". Keep in mind that that it's not obvious to some people what we mean by "block". Consider instead the following output: > 144 unreported bytes allocated under stack frame > malloc() at jemalloc.c:123. > > 144 bytes requested, 0 bytes slop > 1,773 allocations, 1,150 unique call stacks > 10.11% of the heap (64.04% cumulative); 10.11% of unreported (64.04% cumulative) > Record 1 of 5,660 This privileges the most important information; you can stop reading after the empty line if you want. Putting the backtrace in the second line might not make sense for the other report types. The point is more: 1) There's room to improve our output, and 2) Doing so doesn't necessarily require us to introduce more jargon. This doesn't answer your question about renaming the types in the code except to suggest that we don't necessarily want to reflect the code changes in the output. I'm still thinking about the class names...
> Consider instead the following output: > > > 144 unreported bytes allocated under stack frame > > malloc() at jemalloc.c:123. > > > > 144 bytes requested, 0 bytes slop > > 1,773 allocations, 1,150 unique call stacks > > 10.11% of the heap (64.04% cumulative); 10.11% of unreported (64.04% cumulative) > > Record 1 of 5,660 > > This privileges the most important information; you can stop reading after > the empty line if you want. You've moved things around but there's no fundamental change. It's still a unit of output and it still needs a name, so we can refer to it in the output and documentation. You used "Record" on the last line. We have two kinds of records. Any suggestions for what they should be called? > This doesn't answer your question about renaming the types in the code except > to suggest that we don't necessarily want to reflect the code changes in the > output. I'm still thinking about the class names... Those classes represent output records. It's sensible for the class names to reflect the names used in the output.
Would you be amenable to StackTraceRecord and StackFrameRecord? Note that I still want to try merging StackFrameRecords at some point, but we can cross that bridge. > Those classes represent output records. It's sensible for the class names to reflect the > names used in the output. Sure, that's sensible as a general rule; I'm just not sure we should necessarily be constrained by it here.
> Would you be amenable to StackTraceRecord and StackFrameRecord? I can live with it if it'll get me r+. Are you happy with "stack trace record" and "stack frame record" in the output? That'll cause the first line in each frame record to overflow 80 chars, but the stack traces already do that so I guess it's ok.
> Are you happy with "stack trace record" and "stack frame record" in the output? Sure, so long as I can come back and tweak the output later. > I can live with it if it'll get me r+. Heh, okay then. I'm happy to consider alternatives if you have some.
Comment on attachment 694132 [details] [diff] [review] (part 5) - DMD: rename "groups" as "records". a=me to land all this stuff.
Attachment #694132 - Flags: review?(justin.lebar+bug) → review+
Component: General → DMD
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: