Closed Bug 822148 Opened 12 years ago Closed 12 years ago

DMD: More refactoring

Categories

(Core :: DMD, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox19 --- fixed
firefox20 --- fixed
b2g18 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Attachments

(9 files)

24.07 KB, patch
justin.lebar+bug
: review+
justin.lebar+bug
: approval-mozilla-aurora+
justin.lebar+bug
: approval-mozilla-b2g18+
Details | Diff | Splinter Review
50.23 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
12.90 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
5.09 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
44.45 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
1.36 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
8.14 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
1.14 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
2.07 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
I have more refactoring planned.  This is something of a follow-up to bug 820652.
No longer depends on: 820652
This patch removes the reporter names from output.  This saves space (and this
will become more important in subsequent patches), and it's redundant info
because the MallocSizeOf function's name always matches it, viz:

 Reported by 'js' at
   mozilla::dmd::Report(...
   JsMallocSizeOf(...

 Reported by 'startup-cache/data' at
    mozilla::dmd::Report(...
    mozilla::scache::StartupCacheDataMallocSizeOf(...

 Reported by 'images' at
    mozilla::dmd::Report(...
    ImagesMallocSizeOf(...

 Reported by 'layout/style-sheet-cache' at
    mozilla::dmd::Report(...
    LayoutStyleSheetCacheMallocSizeOf(...

Note also that this gets rid of the "reporter names are always static"
assumption that you never liked :)
Attachment #693200 - Flags: review?(justin.lebar+bug)
This patch does a big overhaul.  It should be the last big one (I hope, except
for some forthcoming renamings of things).

- Currently double-reports are handled separately from single reports.  We put
  double-reports into a DoubleReportBlockGroup as soon as they are detected,
  bypassing LiveBlock.  This meant that a double-reported block would be
  listed in the double-report section and also the normal "reported" section.
  
  This patch changes this so that LiveBlock can record up to two
  reports.  (This is more like what I used to have, with the union, but I
  think it's better now.  Sorry about the back-and-forth.)
  
  As a result, with this patch double-reports aren't recorded immediately.
  Instead we just split LiveBlocks into three categories at Dump() time:
  unreported, once-reported, and twice-reported.
  
  The big benefit is that LiveBlockKey and DoubleReportBlockKey are merged
  into a single class (BlockGroupKey), and LiveBlockGroup and
  DoubleReportBlockGroup are merged into BlockGroup.  This avoids a bunch of
  duplication in Print() methods, in particular.  Overall, things are much
  more uniform and less repetitive.

  Also, gDoubleReportBlockGroupTable is no longer needed;  it's been replaced
  with a local variable in Dump().

  Note that I've changed terminology;  I'm now using "unreported",
  "once-reported", "twice-reported".

  The only disadvantage from a functionality POV is that it's possible that a
  block would be twice-reported and then freed before Dump() is called, in
  which case we will now miss it.  That seems very unlikely in the standard
  use cases, though, because DMD runs right after the memory reporters.

- To avoid increasing the size of LiveBlock, I added a TaggedPtr class (which
  maybe should go in MFBT) and did some field packing.  One benefit is that
  mReqSize is now word-size again, so there's no overflow risk.

  Note that I haven't backed myself into a corner, space-wise, by putting the
  2nd report field in LiveBlock.  There's a potential follow-up optimization
  which would make LiveBlock variable-length -- we'd only allocate space for
  that 2nd report if needed.  This would require changing LiveBlockTable from
  a HashSet<LiveBlock> to HashSet<LiveBlock*>, but that would help again with
  space (at the cost of more calls to malloc) because we'd only be wasting one
  word per unused hash table slot, instead of the current five.
Attachment #693201 - Flags: review?(justin.lebar+bug)
Comment on attachment 693200 [details] [diff] [review]
DMD: remove reporter names from output because they're redundant.

> 4 files changed, 81 insertions(+), 147 deletions(-)

Hard to argue with this!

>diff --git a/xpcom/base/nsIMemoryReporter.idl b/xpcom/base/nsIMemoryReporter.idl
>--- a/xpcom/base/nsIMemoryReporter.idl
>+++ b/xpcom/base/nsIMemoryReporter.idl
>-#define MOZ_REPORT(ptr, usable, name)          mozilla::dmd::Report(ptr, name)
>-#define MOZ_REPORT_ON_ALLOC(ptr, usable, name) mozilla::dmd::ReportOnAlloc(ptr, name)
>+#define MOZ_REPORT(ptr, usable, name)          mozilla::dmd::Report(ptr)
>+#define MOZ_REPORT_ON_ALLOC(ptr, usable, name) mozilla::dmd::ReportOnAlloc(ptr)

Is the plan to eventually get rid of the different names for different
reporters, and then to get rid of the notion of declared reporters altogether?

>diff --git a/memory/replace/dmd/DMD.cpp b/memory/replace/dmd/DMD.cpp
>--- a/memory/replace/dmd/DMD.cpp
>+++ b/memory/replace/dmd/DMD.cpp
>+  const StackTrace* ReportStackTrace() const { return mReportStackTrace; }

Maybe we should give up on this no-"Get" business -- ReportStackTrace() does
not sound like a getter to me, particularly given the existence of a Report()
method below.

>+  void Report(Thread* aT, bool aReportedOnAlloc) const;

>   DoubleReportBlockKey(const StackTrace* aAllocStackTrace,
>                        const StackTrace* aReportStackTrace1,
>-                       const StackTrace* aReportStackTrace2,
>-                       const char* aReporterName1,
>-                       const char* aReporterName2)
>+                       const StackTrace* aReportStackTrace2)
>     : mAllocStackTrace(aAllocStackTrace),
>       mReportStackTrace1(aReportStackTrace1),
>-      mReportStackTrace2(aReportStackTrace2),
>-      mReporterName1(aReporterName1),
>-      mReporterName2(aReporterName2)
>+      mReportStackTrace2(aReportStackTrace2)
>   {
>-    MOZ_ASSERT(IsSane());
>+    MOZ_ASSERT(mAllocStackTrace && mReportStackTrace1 && mReportStackTrace2);

Nit: It's a bit easier to debug these assertions if they're split up into
separate assertions around the &&s, but feel free to leave it if you prefer it
this way.

r+a=me
Attachment #693200 - Flags: review?(justin.lebar+bug)
Attachment #693200 - Flags: review+
Attachment #693200 - Flags: approval-mozilla-b2g18+
Attachment #693200 - Flags: approval-mozilla-aurora+
> Note that I haven't backed myself into a corner, space-wise, by putting the
> 2nd report field in LiveBlock.  There's a potential follow-up optimization
> which would make LiveBlock variable-length -- we'd only allocate space for
> that 2nd report if needed.  This would require changing LiveBlockTable from
> a HashSet<LiveBlock> to HashSet<LiveBlock*>, but that would help again with
> space (at the cost of more calls to malloc) because we'd only be wasting one
> word per unused hash table slot, instead of the current five.

This still isn't optimal, though.

Consider just the memory usage for single-reported live blocks, since that dominates (right?).  With the approach above, we use

 One word for each hashtable block
 One word for each LiveBlock (assuming optimal packing)

Whereas if we had a hashtable exclusively devoted to live blocks (i.e. HashTable<LiveBlock> == HashTable<StackTrace*>), we'd use

 One word for each hashtable block

Assuming our hashtable is 2/3 full, the second approach uses 1/(1 + 2/3)) = 3/5 as much memory as the first one.
> Is the plan to eventually get rid of the different names for different
> reporters,

I'm planning to get rid of the |name| argument to NS_MEMORY_REPORTER_MALLOC_SIZEOF_FUN.  That's waiting on the removal of DMDV (and you just reminded me to add a note about that to bug 819819).


> and then to get rid of the notion of declared reporters altogether?

I'm not sure which sense of "reporters" you're using here... but having a different MallocSizeOf function for each memory reporter (i.e. nsIMemoryReporter or nsIMemoryMultiReporter) is very useful because it lets you identify which memory reporter is responsible for reporting which blocks.  This is especially useful when writing a new reporter.
> Assuming our hashtable is 2/3 full, the second approach uses 1/(1 + 2/3)) =
> 3/5 as much memory as the first one.

To summarize our IRC discussion:

- At a minimum, for each LiveBlock we need to store 3 words:  mPtr, mReqSize, mAllocStackTrace (plus a couple of other bits that can be squeezed in).

- Many blocks are reported once, which requires storing an additional word.

- JS hash tables are typically half full, and a half-full HashSet<Block*> with 4 word Blocks (which makes storing a single report straightforward) takes the same space as a half-full HashSet<Block> with 3 words Blocks (which require extra storage for reports).  Specifically:

  HashSet<Block*>:  N + 4*N/2 = 3*N
  HashSet<Block>:   3*N

- The extra heap blocks will require some storage within jemalloc, presumably.  Hopefully this will be less than the extra space that the 3-word approach would need for reports.

- So the HashSet<Block*>-with-4-word-Block seems to be the best, space-wise.  It requires extra calls to malloc/free, but hopefully they won't hurt performance noticeably.
Comment on attachment 693201 [details] [diff] [review]
(part 2) - DMD: Treat twice-reported blocks more like other blocks.

>+  TaggedPtr(T aPtr, bool aTag)
>+    : mPtr(aPtr)
>+  {
>+    MOZ_ASSERT(IsTwoByteAligned(aPtr));
>+    MOZ_ASSERT(aTag <= kTagMask);
>+    mUint |= (aTag & kTagMask);

Maybe cast aTag to a uintptr_t before doing the assertion and bitwise op?
Otherwise this test is relying on the fact that aTag gets cast to the same type
when performing <= and & with kTagMask, and I don't know if that's true.

>+  void Set(T aPtr, bool aTag)
>+  {
>+    MOZ_ASSERT(IsTwoByteAligned(aPtr));
>+    mPtr = aPtr;
>+    MOZ_ASSERT(aTag <= kTagMask);
>+    mUint |= (aTag & kTagMask);
>+  }
>+
>+  T Ptr() const { return reinterpret_cast<T>(mUint & kPtrMask); }
>+
>+  bool Tag() const { return bool(mUint & kTagMask); }

Have you considered defining operator* and operator-> on this class?  Then you
can treat it just like a normal pointer.

But maybe the way you have it is good; explicit is better than implicit.

> // A live heap block.
> class LiveBlock
> {
>   const void*  mPtr;
>+  const size_t mReqSize;    // size requested
>+
>+  // Ptr: |mAllocStackTrace| - stack trace where this block was allocated.
>+  // Tag bit 0: |mSampled| - was this block sampled? (if so, slop == 0).
>+  TaggedPtr<const StackTrace* const>
>+    mAllocStackTrace_mSampled0;

I guess the 0 at the end of the name is supposed to tell me that big 0 is for
mSampled?  It would be less distracting to me if that weren't there -- anyway
the point of the tagged pointer abstraction is that you don't care where that
bit lives.

>+  // Ptr: |mReportStackTrace| - stack trace where this block was reported.
>+  // Tag bit 0: |mReportedOnAlloc| - was the block reported immediately on
>+  //   allocation?  If so, DMD must not clear the report at the end of Dump().
>+  // |mPtr| is used as the key in LiveBlockTable, so it's ok for this member
>   // to be |mutable|.
>+  mutable TaggedPtr<const StackTrace*> mReportStackTrace_mReportedOnAlloc0[2];

It's clear from the comment why we have two of these in an array.  Also, please
update the comment ("mPtr").

Maybe mReportStackTraces_mReportedOnAlloc would be a better name.

> //---------------------------------------------------------------------------
>-// Live and double-report block groups
>+// Block groups
> //---------------------------------------------------------------------------
> 
>-class LiveBlockKey
>+class BlockGroupKey
> {
> public:
>   const StackTrace* const mAllocStackTrace;   // never null
> protected:
>-  const StackTrace* const mReportStackTrace;  // nullptr if unreported
>+  const StackTrace* const mReportStackTrace1; // nullptr if unreported
>+  const StackTrace* const mReportStackTrace2; // nullptr if not 2x-reported

It's kind of weird that you use an array[2] earlier and use two separate
variables here.

This /is/ a lot simpler.
Attachment #693201 - Flags: review?(justin.lebar+bug) → review+
> I guess the 0 at the end of the name is supposed to tell me that big 0 is for
> mSampled?

s/big/bit
> Have you considered defining operator* and operator-> on this class?  Then
> you can treat it just like a normal pointer.
> 
> But maybe the way you have it is good; explicit is better than implicit.

I left it explicit.


> Also, please update the comment ("mPtr").

That comment is still valid.
Having removed the distinction between LiveBlockGroup and
DoubleReportBlockGroup, the "Live" in "LiveBlock" is no longer needed.  This
patch removes it.
Attachment #694122 - Flags: review?(justin.lebar+bug)
> That comment is still valid.

Ah, I see; I got confused by the diff.  Sorry!
Trivial patch.
Attachment #694125 - Flags: review?(justin.lebar+bug)
Attachment #694125 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 694122 [details] [diff] [review]
(part 3) - DMD: Rename |LiveBlock| as |Block|.

> // A live heap block.
>-class LiveBlock
>+class Block

Do you want to change the comment?  (Now we might think "live as opposed to
what?")
Attachment #694122 - Flags: review?(justin.lebar+bug) → review+
jlebar, in bug 717853 comment 128 you said:

> The nomenclature and comments are confusing here.
> 
> We say a FrameGroup is "a group of one or more stack frames with a common
> PC." But it seems to me that it's actually an aggregation of one or more
> /blocks/ whose stacks all share a common PC.
> 
> Of course, if we describe it that way, it's not clear why this isn't called
> a "block group", since it's a group of blocks.  So that suggests BlockGroup
> may not be the right name, either.
> 
> Maybe something like BlocksAggregatedByStack and BlocksAggregatedByFrame
> would be better?  That would make explicit the parallels between these two
> classes.

You're right.  BlockGroup would be more accurately named
BlocksWithMatchingAllocAndReportTraces, and FrameGroup would be more
accurately named BlocksWithMatchingAllocFrame.

Those names are too long, esp. considering we need concise colloquial names
for these things in output lines like this:

  Unreported: 1,773 blocks and 1,150 block groups in frame group 1 of 5,660

I think a better way forward is to think of these structures not as a
collection of blocks, but as a unit of output, which I'll call a "record".
Colloquially (i.e. in output messages) we would have "trace records" (which
contain one or more traces) and "frame records" (which contain one frame).
For example:

  Unreported: 1,773 blocks from 1,150 trace records in frame record 1 of 5,660

The necessary code changes are as follows:

- BlockGroup        --> TraceRecord
- BlockGroupKey     --> TraceRecordKey
- BlockGroupTable   --> TraceRecordTable

- FrameGroup        --> FrameRecord
- FrameGroupTable   --> FrameRecordTable

- GroupSize         --> RecordSize

This patch just makes those changes, plus some related ones (e.g. variable
names), but no functional changes.
Attachment #694132 - Flags: review?(justin.lebar+bug)
> > // A live heap block.
> >-class LiveBlock
> >+class Block
> 
> Do you want to change the comment?  (Now we might think "live as opposed to
> what?")

As opposed to a freed block.  It's not a distinction worth making in the name, but is useful in the comment, IMO.
Occasionally we have a PC for which ns_DescribeCodeAddress() gives us nothing
useful.  In that case we currently print a frame like this:

  ???[ +0x0]

This leads to fix-linux-stack.pl complaining about "" being a non-existent
file.

This patch just changes it so useless entries like this are instead printed
as:

  ???

which fix-linux-stack.pl ignores.
Attachment #694135 - Flags: review?(justin.lebar+bug)
DMD's summary indicates how many bytes were {un,once-,twice-}reported.  This
patch augments that with the corresponding numbers of heap blocks, just for
interest's sake.

Here's an example after triggering memory reports just after start-up:

  Total:           ~53,271,760 bytes (100.00%) in  ~6,339 blocks (100.00%)
  Unreported:      ~12,049,017 bytes ( 22.62%) in  ~2,215 blocks ( 34.94%)
  Once-reported:   ~41,222,743 bytes ( 77.38%) in  ~4,124 blocks ( 65.06%)
  Twice-reported:           ~0 bytes (  0.00%) in      ~0 blocks (  0.00%)

The 6,339 matches the block table count, which is encouraging:

  Block table:             786,512 bytes (16,384 entries, 6,339 used)
Attachment #694144 - Flags: review?(justin.lebar+bug)
Use Percent() where possible, because it avoids div-by-zero problems.  (This
patch was prompted by the "NaN%" on the last line of
http://people.mozilla.com/~kgupta/bug/769761-dmd.txt.)
Attachment #694148 - Flags: review?(justin.lebar+bug)
jlebar wanted this.

(And that's the last patch from me for this bug :)
Attachment #694151 - Flags: review?(justin.lebar+bug)
Comment on attachment 694151 [details] [diff] [review]
(part 9) - DMD: remove a friend declaration.

Yay, thanks.
Attachment #694151 - Flags: review?(justin.lebar+bug) → review+
Attachment #694148 - Flags: review?(justin.lebar+bug) → review+
Attachment #694144 - Flags: review?(justin.lebar+bug) → review+
Attachment #694135 - Flags: review?(justin.lebar+bug) → review+
> It's clear from the comment why we have two of these in an array.

Sorry, I meant it's /not/ clear why we two of these in the array.
WRT part 5:

Working backwards from the output seems like a reasonable approach to me.

> Unreported: 1,773 blocks from 1,150 trace records in frame record 1 of 5,660
>  144 bytes (144 requested / 0 slop)
>  10.11% of the heap (64.04% cumulative);  10.11% of unreported (64.04% cumulative)
>  Allocated at
>    malloc() at jemalloc.c:123

It's not at all clear to me what a "trace record" or a "frame record" is,
particularly because nobody talks about "traces" and "frames" but instead we
say "stack traces" and "stack frames".  Keep in mind that that it's not obvious
to some people what we mean by "block".

Consider instead the following output:

> 144 unreported bytes allocated under stack frame
>   malloc() at jemalloc.c:123.
> 
>   144 bytes requested, 0 bytes slop
>   1,773 allocations, 1,150 unique call stacks
>   10.11% of the heap (64.04% cumulative);  10.11% of unreported (64.04% cumulative)
>   Record 1 of 5,660

This privileges the most important information; you can stop reading after the
empty line if you want.

Putting the backtrace in the second line might not make sense for the other
report types.  The point is more:

 1) There's room to improve our output, and
 2) Doing so doesn't necessarily require us to introduce more jargon.

This doesn't answer your question about renaming the types in the code except
to suggest that we don't necessarily want to reflect the code changes in the
output.  I'm still thinking about the class names...
> Consider instead the following output:
> 
> > 144 unreported bytes allocated under stack frame
> >   malloc() at jemalloc.c:123.
> > 
> >   144 bytes requested, 0 bytes slop
> >   1,773 allocations, 1,150 unique call stacks
> >   10.11% of the heap (64.04% cumulative);  10.11% of unreported (64.04% cumulative)
> >   Record 1 of 5,660
> 
> This privileges the most important information; you can stop reading after
> the empty line if you want.

You've moved things around but there's no fundamental change.  It's still a unit of output and it still needs a name, so we can refer to it in the output and documentation.  You used "Record" on the last line.  We have two kinds of records.  Any suggestions for what they should be called?


> This doesn't answer your question about renaming the types in the code except
> to suggest that we don't necessarily want to reflect the code changes in the
> output.  I'm still thinking about the class names...

Those classes represent output records.  It's sensible for the class names to reflect the names used in the output.
Would you be amenable to StackTraceRecord and StackFrameRecord?

Note that I still want to try merging StackFrameRecords at some point, but we can cross that bridge.

> Those classes represent output records.  It's sensible for the class names to reflect the 
> names used in the output.

Sure, that's sensible as a general rule; I'm just not sure we should necessarily be constrained by it here.
> Would you be amenable to StackTraceRecord and StackFrameRecord?

I can live with it if it'll get me r+.  Are you happy with "stack trace record" and "stack frame record" in the output?  That'll cause the first line in each frame record to overflow 80 chars, but the stack traces already do that so I guess it's ok.
> Are you happy with "stack trace record" and "stack frame record" in the output?

Sure, so long as I can come back and tweak the output later.

> I can live with it if it'll get me r+.

Heh, okay then.  I'm happy to consider alternatives if you have some.
Comment on attachment 694132 [details] [diff] [review]
(part 5) - DMD: rename "groups" as "records".

a=me to land all this stuff.
Attachment #694132 - Flags: review?(justin.lebar+bug) → review+
Component: General → DMD
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: