Closed Bug 671299 Opened 13 years ago Closed 12 years ago

Add style sheet memory reporters

Categories

(Core :: CSS Parsing and Computation, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: bzbarsky, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(5 files, 2 obsolete files)

Right now about:memory counts the presshell arena.  This includes, I believe, nsStyleStruct_*, nsStyleContext, nsRuleNode.

What is not included, I believe, is sheets, rules (including selectors and data blocks), and rule cascade data.  David, anything else?  Any preferences for how this should be organized in about:memory?

This is probably going to become a tracking bug for those pieces....  I would expect separate patches for them.
(In reply to comment #0)
> Any preferences
> for how this should be organized in about:memory?

A good rule of thumb is to start with coarse buckets, and make them finer if they get big enough that you wonder what's going on inside them.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Summary: Add missing style system memory reporters → [meta] Add missing style system memory reporters
Attached file DMD output
Here's some output from DMD for a browser run where I had five gmail tabs open.  12 of the top 14 unreported allocation points relate to CSS, and I've attached the list.  Note that they're listed in order of importance.

I haven't taught DMD about any of the existing style reporters so some of these might be covered already, but I doubt all of them are.  Together they probably account for ~15% of the entire heap.

I can re-run with bigger/smaller stack traces (giving less/more agglomeration of records) if necessary.
All of the allocations in that list are things that increase in proportion to the size of style sheets:  they're all either the parsed style sheet data structures or the cache of the result of cascading all the rules in the style sheets.

It would be interesting to know how much CSS gmail actually pulls in.  (Also Firefox, which could be a very significant chunk even with 5 gmail tabs open.)

We might want to look at sharing parsed style sheets across tabs, although that may get substantially less useful with e10s depending on what the process separation boundaries are.
dbaron, it might be worth thinking about whether the slop bytes can be reduced for some of those cases;  for many of them it's more than 10%, for the first one it's more than 20%.
We're accounting for several of the lower ones, but none of the first few.  I'll try to wire up some of these this week.
Not sure what your definition of "slop bytes" is, but we could definitely develop more compact storage for selectors.
(In reply to David Baron [:dbaron] from comment #6)
> Not sure what your definition of "slop bytes" is, but we could definitely
> develop more compact storage for selectors.

He means this: http://blog.mozilla.com/nnethercote/2011/08/05/clownshoes-available-in-sizes-2101-and-up/

basically, where you request (2^n)+5 bytes, and jemalloc rounds up and allocates (for certain values of n) 2^(n+1) bytes.
Looking at this list in terms of slop:

* The slop bytes for nsCSSCompressedDataBlock come from allocating a chunk of
  aBaseSize + aDataSize - 4.  aBaseSize is presumably
  sizeof(nsCSSCompressedDataBlock) which depends on the alignment requirements
  of void*, I think.  I'm not quite sure why block_chars is unconditionally 4
  even on 8-bit systems, actually...  The alignment comment is not making much
  sense to me.  On my 64-bit build, aBaseSize is coming out to 24, because we
  end up using 8 bytes each for mStyleBits and mBlock_; perhaps we should move
  mStyleBits to after mBlockEnd?

  In any case, aDataSize is variable, but always a multiple of
  sizeof(CDBValueStorage), which is also 24 in my case (8 bytes each for the
  nsCSSProperty enum, the nsCSSUnit enum, and the mValue union of the
  nsCSSValue.  The former two end up with 8 bytes each due to alignment
  issues, I bet, and there's not much we can do about that.  It's wasting 8
  bytes per CDBValueStorage, but short of nasty casting shenanigans we can't
  really change that.  Worth filing a bug on in any case?

  On a 32-bit system the corresponding numbers are aBaseSize == 12 and aDataSize
  is a multiple of 12.

  So on a 32-bit system the allocation size will be 8+(12*N) and on a 64-bit
  system it will be 20+(24*N).  As long as we're staying under 512 bytes (so
  20 or fewer longhand properties in the block on a 64-bit system) the slop
  will be worst when the allocation size is 4 mod 16, and in particular when N
  == 2 for 64-bit and N == 1 for 32-bit.  Those correspond to slop of 18% and
  60% respectively.  Since you're seeing numbers over 20%, I assume some of
  these allocations are ending up > 512 bytes, and then our slop can be pretty
  bad and depends solely on the number of CSS properties.  It's worth getting
  some data on how many properties compressed blocks typically have, I guess,
  to see what's going on here and whether we can improve the slop situation.

  In any case, I do think we should reorder mStyleBits and mBlockEnd; that will
  make the 64-bit case 12+(24*N), I would think, saving us some space in some
  cases.  Except won't that make the alignment of the union in the nsCSSValue
  inside CDBValueStorage weird, because the block's data will start on a 4-byte
  boundary?

* The slop bytes for nsCSSSelector (from nsCSSSelectorList::AddSelector) would
  presumably have to do with the size of nsCSSSelector.  This size is 8*(sizeof
  void*) + 8 bytes.  So on a 32-bit system we're talking 40 bytes and on a
  64-bit system we're talking 72 bytes.  Both will incur 8 bytes of slop with
  jemalloc, since it allocates multiples of 16 bytes.  I assume your log is for
  a 64-bit system, since the slop is exactly 1/9 of the allocation total, which
  is 8/72.  The only way to reduce the slop here is to either add or remove 8
  bytes to nsCSSSelector; the former may be difficult, while the latter is good
  to keep in mind for in case we need to add more members.  ;)

* For the ParseRuleSet callsite, I'm assuming that the allocation is that of
  StyleRule (not sure, because the line number reported is from the inlined
  mozalloc code).  If so, sizeof(StyleRule) == 80 on my 64-bit system; I have
  no idea why there would be slop here at all.  Is the data from a 32-bit build
  after all, or am I looking at the wrong allocation?  Or is something else
  going on with DMD?

* For ParseSelectorGroup, we're allocating an nsCSSSelectorList.  This is a
  24-byte allocation on 64-bit, so we get 33% slop.  A selector list is two
  pointers and an integer, so hard to shrink below 24 bytes....  though we have
  considered using arrays more and linked lists less here so as to fuse
  allocations better.  We have existing bugs on that.

* For the nsTArray realloc, I think we already have bugs on looking into the
  slop there.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> 
> basically, where you request (2^n)+5 bytes, and jemalloc rounds up and
> allocates (for certain values of n) 2^(n+1) bytes.

That's a particular bad case of slop bytes, but I'm referring to all cases where the heap allocator rounds up.  Eg. if you ask jemalloc for anywhere between 2049 and 4096 bytes you'll get back 4096, resulting in up to 2047 slop bytes.  Internal fragmentation, in other words.
(In reply to Boris Zbarsky (:bz) from comment #8)
> Looking at this list in terms of slop:
> 
> * The slop bytes for nsCSSCompressedDataBlock come from allocating a chunk of
>   aBaseSize + aDataSize - 4.  aBaseSize is presumably
>   sizeof(nsCSSCompressedDataBlock) which depends on the alignment
> requirements
>   of void*, I think. 

I started looking at this one yesterday.  My DMD results are all from 64-bit builds.


> I'm not quite sure why block_chars is unconditionally 4
>   even on 8-bit systems, actually...  The alignment comment is not making
> much
>   sense to me.

I think it could prevent 4 bytes of padding at the end of the struct if the field arrangement were different.  But it doesn't seem necessary with the current field arrangement.

>  On my 64-bit build, aBaseSize is coming out to 24, because we
>   end up using 8 bytes each for mStyleBits and mBlock_; perhaps we should
> move
>   mStyleBits to after mBlockEnd?

Yes, that was my first idea.  We could also consider replacing mBlockEnd with a |PRUint32 mDataSize|, assuming that's safe and workable.
 
>   It's worth getting
>   some data on how many properties compressed blocks typically have, I guess,
>   to see what's going on here and whether we can improve the slop situation.

Here are the top 30 cases when starting the browser (64-bit) and loading 5 gmail tabs. (Nb: I just logged the size when each nsCSSCompressedDataBlock was created, so these aren't necessarily all live at the same time).  The format of the log lines is:

  CSS(aBaseSize, aDataSize): requested -> actual (slop)

and recall that aBaseSize is swollen by 4 bytes due to the block_chars.

( 1)  16990 (29.5%, 29.5%): CSS(24, 24): 44 -> 48 (4)
( 2)   6839 (11.9%, 41.4%): CSS(24, 192): 212 -> 224 (12)
( 3)   5044 ( 8.8%, 50.2%): CSS(24, 48): 68 -> 80 (12)
( 4)   4665 ( 8.1%, 58.3%): CSS(24, 72): 92 -> 96 (4)
( 5)   3109 ( 5.4%, 63.7%): CSS(24, 216): 236 -> 240 (4)
( 6)   2514 ( 4.4%, 68.1%): CSS(24, 240): 260 -> 272 (12)
( 7)   2319 ( 4.0%, 72.1%): CSS(24, 96): 116 -> 128 (12)
( 8)   1832 ( 3.2%, 75.3%): CSS(24, 0): 20 -> 32 (12)
( 9)   1707 ( 3.0%, 78.2%): CSS(24, 120): 140 -> 144 (4)
(10)   1307 ( 2.3%, 80.5%): CSS(24, 144): 164 -> 176 (12)
(11)   1162 ( 2.0%, 82.5%): CSS(24, 264): 284 -> 288 (4)
(12)    757 ( 1.3%, 83.8%): CSS(24, 168): 188 -> 192 (4)
(13)    705 ( 1.2%, 85.1%): CSS(24, 288): 308 -> 320 (12)
(14)    601 ( 1.0%, 86.1%): CSS(24, 384): 404 -> 416 (12)
(15)    561 ( 1.0%, 87.1%): CSS(24, 312): 332 -> 336 (4)
(16)    470 ( 0.8%, 87.9%): CSS(24, 408): 428 -> 432 (4)
(17)    425 ( 0.7%, 88.6%): CSS(24, 336): 356 -> 368 (12)
(18)    416 ( 0.7%, 89.4%): CSS(24, 696): 716 -> 1024 (308)
(19)    384 ( 0.7%, 90.0%): CSS(24, 432): 452 -> 464 (12)
(20)    371 ( 0.6%, 90.7%): CSS(24, 360): 380 -> 384 (4)
(21)    330 ( 0.6%, 91.3%): CSS(24, 456): 476 -> 480 (4)
(22)    314 ( 0.5%, 91.8%): CSS(24, 888): 908 -> 1024 (116)
(23)    314 ( 0.5%, 92.3%): CSS(24, 720): 740 -> 1024 (284)
(24)    304 ( 0.5%, 92.9%): CSS(24, 744): 764 -> 1024 (260)
(25)    281 ( 0.5%, 93.4%): CSS(24, 576): 596 -> 1024 (428)
(26)    266 ( 0.5%, 93.8%): CSS(24, 912): 932 -> 1024 (92)
(27)    228 ( 0.4%, 94.2%): CSS(24, 480): 500 -> 512 (12)
(28)    190 ( 0.3%, 94.6%): CSS(24, 528): 548 -> 1024 (476)
(29)    172 ( 0.3%, 94.9%): CSS(24, 1128): 1148 -> 2048 (900)
(30)    171 ( 0.3%, 95.1%): CSS(24, 768): 788 -> 1024 (236)

Hmm, if aBaseSize is 24 bytes, maybe block_chars is doing the exact opposite of what is intended -- it provides 4 bytes of used space but takes up 8?

Same results with mStyleBits and mBlockEnd swapped:

( 1)  16838 (29.4%, 29.4%): CSS(16, 24): 36 -> 48 (12)
( 2)   6839 (11.9%, 41.4%): CSS(16, 192): 204 -> 208 (4)
( 3)   5006 ( 8.7%, 50.1%): CSS(16, 48): 60 -> 64 (4)
( 4)   4652 ( 8.1%, 58.2%): CSS(16, 72): 84 -> 96 (12)
( 5)   3109 ( 5.4%, 63.7%): CSS(16, 216): 228 -> 240 (12)
( 6)   2514 ( 4.4%, 68.1%): CSS(16, 240): 252 -> 256 (4)
( 7)   2316 ( 4.0%, 72.1%): CSS(16, 96): 108 -> 112 (4)
( 8)   1756 ( 3.1%, 75.2%): CSS(16, 0): 12 -> 16 (4)
( 9)   1707 ( 3.0%, 78.2%): CSS(16, 120): 132 -> 144 (12)
(10)   1297 ( 2.3%, 80.4%): CSS(16, 144): 156 -> 160 (4)
(11)   1162 ( 2.0%, 82.4%): CSS(16, 264): 276 -> 288 (12)
(12)    757 ( 1.3%, 83.8%): CSS(16, 168): 180 -> 192 (12)
(13)    705 ( 1.2%, 85.0%): CSS(16, 288): 300 -> 304 (4)
(14)    601 ( 1.0%, 86.1%): CSS(16, 384): 396 -> 400 (4)
(15)    561 ( 1.0%, 87.0%): CSS(16, 312): 324 -> 336 (12)
(16)    470 ( 0.8%, 87.9%): CSS(16, 408): 420 -> 432 (12)
(17)    425 ( 0.7%, 88.6%): CSS(16, 336): 348 -> 352 (4)
(18)    416 ( 0.7%, 89.3%): CSS(16, 696): 708 -> 1024 (316)
(19)    384 ( 0.7%, 90.0%): CSS(16, 432): 444 -> 448 (4)
(20)    371 ( 0.6%, 90.6%): CSS(16, 360): 372 -> 384 (12)
(21)    330 ( 0.6%, 91.2%): CSS(16, 456): 468 -> 480 (12)
(22)    314 ( 0.5%, 91.8%): CSS(16, 888): 900 -> 1024 (124)
(23)    314 ( 0.5%, 92.3%): CSS(16, 720): 732 -> 1024 (292)
(24)    304 ( 0.5%, 92.8%): CSS(16, 744): 756 -> 1024 (268)
(25)    281 ( 0.5%, 93.3%): CSS(16, 576): 588 -> 1024 (436)
(26)    266 ( 0.5%, 93.8%): CSS(16, 912): 924 -> 1024 (100)
(27)    227 ( 0.4%, 94.2%): CSS(16, 480): 492 -> 496 (4)
(28)    190 ( 0.3%, 94.5%): CSS(16, 528): 540 -> 1024 (484)
(29)    172 ( 0.3%, 94.8%): CSS(16, 1128): 1140 -> 2048 (908)
(30)    171 ( 0.3%, 95.1%): CSS(16, 768): 780 -> 1024 (244)

So we save 8 bytes just from the rearrangement.  But then slop negates some savings (e.g. case (1)) and magnifies some others (e.g. case (2)).

The zero and one element cases ((8) and (1) respectively) are quite common, I wonder if they can be special-cased in any useful way.

>   In any case, I do think we should reorder mStyleBits and mBlockEnd; that
> will
>   make the 64-bit case 12+(24*N), I would think, saving us some space in some
>   cases.  Except won't that make the alignment of the union in the nsCSSValue
>   inside CDBValueStorage weird, because the block's data will start on a
> 4-byte
>   boundary?

But isn't that ok?  I thought block_chars was trying to ensure 4-byte boundary-ness in cases where that would help.

Anyway, I've opened bug 681161 for shrinking nsCompressedCSSBlock.
Assignee: khuey → n.nethercote
Summary: [meta] Add missing style system memory reporters → Add style sheet memory reporters
Attached patch patch, v0 (obsolete) — Splinter Review
The attached patch is not quite finished, but it measures all the style sheets hanging off nsDocument objects, and also all those in the nsLayoutStylesheetCache.  (I haven't yet done those in nsXULPrototypeCache nor those in nsDocument's binding managers, but they're much smaller and I'll do them in another bug if I get to them at all.)

My plan is to report this info per-window, and then give some summary stats in about:memory's "Other Measurements".  But the current patch just has two reports "explicit/layout/style-sheets" and "explicit/layout/style-sheet-cache".  It covers pretty much everything in the attached DMD profile, and more.

With Gmail and TechCrunch loaded, in my debug Linux64 build these new reports together account for about 7.3% of "explicit".  Combined with the patch in bug 712835, for the same workload "heap-unclassified" drops to about 16%.
Blocks: 687724
Blocks: 713799
No longer blocks: 687724
Preliminary patch 1:  This patch adds SizeOfExcludingThis() to nsVoidArray
and nsCOMArray, which is needed for the main patch.  It's based on similar 
code for PLDHash.

With this patch applied there are two copies of 
nsVoidArray::EnumerateForwards, identical except for the |const|s in the 
signature.  I can use a macro to avoid the duplication in the source code if
you like.
Attachment #584010 - Attachment is obsolete: true
Attachment #584520 - Flags: review?(bzbarsky)
Preliminary patch 2: This patch adds 
nsStringBuffer::SizeOfIncludingThisMustBeUnshared, which is needed for the 
main patch.
Attachment #584521 - Flags: review?(bzbarsky)
Patch 3: The main event.  This renames the "dom" sub-tree in about:memory 
as "dom+layout" and adds "dom" and "layout" children to each inner window's
entry.  (Previously the inner-window itself represented the DOM
measurements.)  It also adds "dom-total" and "layout-window-style-sheets" 
entries to the "other measurements" section of about:memory, which makes it
easy to see the total space used for each of them.  And it adds 
"explicit/layout/style-sheet-cache", which is usually pretty small.

Examples of these:

├───12,930,108 B (09.14%) -- dom+layout
│   └──12,930,108 B (09.14%) -- window-objects
│      └──12,930,108 B (09.14%) -- active
│         ├───8,341,581 B (05.90%) -- top=7 (inner=21)
│         │   ├──3,798,948 B (02.69%) -- inner-window(id=29, uri=https://mail.google.com/mail/u/0/?ui=2&view=bsp&ver=ohhl4rw8mbn4)
│         │   │  ├──2,918,968 B (02.06%) -- layout
│         │   │  └────879,980 B (00.62%) -- dom
│         │   ├──3,240,380 B (02.29%) -- inner-window(id=21, uri=https://mail.google.com/mail/u/0/?shva=1#inbox)
│         │   │  ├──2,875,416 B (02.03%) -- layout
│         │   │  └────364,964 B (00.26%) -- dom
...
│    └────147,064 B (00.10%) -- style-sheet-cache
...
  3,691,164 B -- dom-total-window
...
  9,238,944 B -- layout-total-window-style-sheets

In a Linux64 debug build, with Gmail and TechCrunch loaded the new reporters account for about 7% of "explicit" memory, which gets "heap-unclassified" down to about 16.6%.

The patch is quite big but not terribly interesting.  It follows the 
naming/structure conventions in https://wiki.mozilla.org/Memory_Reporting.
DMD tells me that it covers probably 95% of the memory usage relating to
style sheets and that nothing is being double-counted, at least for the
websites I've been testing with (mostly Gmail and TechCrunch).

The existing "layout" sub-tree (with "shell(http://www.foo.com)" entries)
will be merged into this "dom+layout" sub-tree in bug 713799.
Attachment #584522 - Flags: review?(dbaron)
Comment on attachment 584520 [details] [diff] [review]
patch 1: add ns{Void,COM}Array::SizeOfExcludingThis()

>+++ b/xpcom/glue/nsCOMArray.h

>+    size_t SizeOfExcludingThis(nsVoidArraySizeOfElementExcludingThisFunc
>+                               aSizeOfElement,
>+                               nsMallocSizeOfFun aMallocSizeOf, void* aData = NULL) const {

Might be better to do it like this:

      size_t SizeOfExcludingThis(
        nsVoidArraySizeOfElementExcludingThisFunc aSizeOfElement,
        nsMallocSizeOfFun aMallocSizeOf, void* aData = NULL) const {

because the way it is now made me think this took 4 arguments, at first glance.

>+++ b/xpcom/glue/nsVoidArray.cpp
>-      newSize = SIZEOF_IMPL(newCapacity);

Why this change?  This seems to just maintain the invariant that newSize and newCapacity correspond.  I know newSize is unused below at the moment, but why break the invariant?

r=me with those two nits.
Attachment #584520 - Flags: review?(bzbarsky) → review+
Comment on attachment 584521 [details] [diff] [review]
patch 2: add nsStringBuffer::SizeOfIncludingThisMustBeUnshared

Should the assert here perhaps be fatal?

r=me either way
Attachment #584521 - Flags: review?(bzbarsky) → review+
> >+++ b/xpcom/glue/nsVoidArray.cpp
> >-      newSize = SIZEOF_IMPL(newCapacity);
> 
> Why this change?  This seems to just maintain the invariant that newSize and
> newCapacity correspond.  I know newSize is unused below at the moment, but
> why break the invariant?

It's dead code, removing it quells a GCC warning.  You still want me to put it back in?

Alternatively, I could put "(void)newSize;" after the if-then-else.  That would maintain the invariant while quelling the warning, at the price of looking odd.
> It's dead code, removing it quells a GCC warning.

My mistake, no GCC warning there.  I'll add it back in.
> Should the assert here perhaps be fatal?
> 
> r=me either way

Having just read http://robert.ocallahan.org/2011/12/case-for-non-fatal-assertions.html, I'll leave it non-fatal -- a small inaccuracy in about:memory doesn't warrant bringing down the whole browser.
Comment on attachment 584522 [details] [diff] [review]
patch 3: add the style reporters

I don't see why your nsIStyleSheet::SizeOfExcludingThis is an "excluding
this" function.  Nothing has style sheets as sub-objects;
nsCOMArray<nsIStyleSheet> is an array of strongly-owned pointers.  I
think it should include the style sheet object and be renamed
appropriately.

(and renaming SizeOfStyleSheetsElementExcludingThis in nsDocument.cpp)

nsDocument.h:
  I don't see why SizeOfStyleSheets is NS_MUST_OVERRIDE.

nsDOMMemoryReporter.cpp:

  [ incorporate bug 713799 comment 1 by reference ]

  I'm ok with this for now, though.

nsPresContext.h:

  >+    // Measurement of the following members may be added later if DMD finds it is
  >+    // worthwhile:
  >+    // - mNotifyDidPaintTimer
  >+    // - mRegisteredPlugins
  >+    // - mWillPaintObservers
  >+    // - mWillPaintFallbackEvent
  >+    // - mUpdatePluginGeometryForFrame

  Drop mUpdatePluginGeometryForFrame from this list; it's non-owning.

AnimationCommon.cpp:
  > CommonAnimationManager::SizeOfExcludingThis(nsMallocSizeOfFun aMallocSizeOf) const
  > {
  >-  // XXX: could measure mProperytValuePairs here.  Bug 671299 may do this.
  >+  // Measurement of the following members may be added later if DMD finds it is
  >+  // worthwhile:
  >+  // - mPropertyValuePairs
  >+
  >   return 0;
  > }

  This doesn't make a whole lot of sense since mPropertyValuePairs are
  members of AnimValuesStyleRule.

  Really, I think all of the reporting for this class should be in the
  subclasses anyway.

GroupRule.h:

  I don't see why this (or its subclasses) need a *virtual* "excluding
  this" sizeof function.  It seems like this should only be needed for
  the subclasses to call, and the subclasses (MediaRule, DocumentRule)
  should have only an "including this" version (rather than both).

Rule.h:

  >+  // This is used to measure nsCOMArray<Rule>s.
  >+  static size_t SizeOfCOMArrayElementExcludingThis(css::Rule* aElement,
  >+                                                   nsMallocSizeOfFun aMallocSizeOf,
  >+                                                   void* aData);

  Calling this an "excluding this" function seems confusing.  (It
  does, in fact, include this.)

StyleRule.cpp:

  >+size_t
  >+nsAtomList::SizeOfIncludingThis(nsMallocSizeOfFun aMallocSizeOf) const
  >+{
  >+  size_t n = aMallocSizeOf(this, sizeof(nsAtomList));
  >+  n += mNext ? mNext->SizeOfIncludingThis(aMallocSizeOf) : 0;

  Could you write this iteratively so that there's no risk of stack
  overflow?

  >+
  >+  // Measurement of the following members may be added later if DMD finds it is
  >+  // worthwhile:
  >+  // - mAtom

  This doesn't make sense, since atoms are shared.

  >+size_t
  >+nsCSSSelector::SizeOfIncludingThis(nsMallocSizeOfFun aMallocSizeOf) const
  >+{
  >+  size_t n = aMallocSizeOf(this, sizeof(nsCSSSelector));
  >+  n += mClassList ? mClassList->SizeOfIncludingThis(aMallocSizeOf) : 0;
  >+  n += mNext ? mNext->SizeOfIncludingThis(aMallocSizeOf) : 0;

  The looping over mNext should be iterative, possibly at the caller.

  >+  // Measurement of the following members may be added later if DMD finds it is
  >+  // worthwhile:
  >+  // - mLowercaseTag;
  >+  // - mCasedTag;

  No, these are atoms.

  >+  // - mIDList;

  You should just do this; it's just like mClassList.

  >+  // - mPseudoClassList;
  >+  // - mAttrList;

  These are ok to list here.

  >+  // - mNegations;

  You should do mNegations; it's just like mNext.  Otherwise you're
  producing results that count selectors only when :not() isn't involved.

  (I'm insisting on adding this because, if authors start looking at
  about:memory to learn what reduces memory usage, this will mislead
  them.)

  >+size_t
  >+nsCSSSelectorList::SizeOfIncludingThis(nsMallocSizeOfFun aMallocSizeOf) const
  >+{
  >+  size_t n = aMallocSizeOf(this, sizeof(nsCSSSelectorList));
  >+  n += mSelectors ? mSelectors->SizeOfIncludingThis(aMallocSizeOf) : 0;
  >+  n += mNext ? mNext->SizeOfIncludingThis(aMallocSizeOf) : 0;

  Again, handle mNext iteratively (maybe at the caller).

StyleRule.h:

  Seems silly to use NS_MUST_OVERRIDE on the one class marked MOZ_FINAL
  but not on any of the ones that aren't.

  I'd either swap or eliminate entirely.

nsAnimationManager.cpp:

  Not sure why this has an "excluding this" function rather than
  an "including this" function.  I also don't see where it's called.

nsCSSRules.cpp:

  ImportRule shouldn't list mChildSheet or mMedia as something to
  traverse -- those should be handled by nsCSSStyleSheet::mMedia /
  nsCSSStyleSheetInner::mSheets.

nsCSSRules.h:

  Silly to use NS_MUST_OVERRIDE on nsCSSKeyframesRule, which is
  MOZ_FINAL.  (You don't use it on all the rest; they're also
  MOZ_FINAL.)

nsCSSStyleSheet.cpp:

  For the same reason as mNegations -- because it could seriously
  mislead authors about the memory implications of things -- I'm going
  to insist that nsCSSStyleSheetInner::SizeOfIncludingThis traverse the
  children (mFirstChild and their mNext).  That probably represents the
  bulk of the memory that you thought your patch should get to but that
  it's not.

  nsCSSStyleSheetInner's comment on what to do in the future should not
  mention mSheets or should say "(array storage only)" next to it.

  Likewise nsCSSStyleSheet's should not mention mNext or mOwnerRule.

  I don't see why nsCSSStyleSheet needs an "excluding this" function.

nsCSSValue:

  I don't see why nsCSSValuePairList_heap and nsCSSValueList_heap need
  to override SizeOfExcludingThis.

  I also don't think nsCSSValueList and nsCSSValuePairList (non-_heap)
  should need an "including this" function.
  nsCSSValue::SizeOfExcludingThis should instead not measure the
  ListDep/PairListDep types, since those are non-owning.

  I'm a little concerned about nsCSSValue not measuring the array types.
  It should be very easy, and catch a good bit of memory.

  nsCSSValueList::SizeOfExcludingThis (and same for nsCSSValuePairList)
  should traverse the list iteratively rather than recursively.

nsIStyleSheet.h:

  This seems like it should be an 'including' function rather than
  'excluding' (as mentioned above).

nsLayoutStylesheetCache:

  Should mSheetsReporter get deleted at the end?  (If so, use
  nsAutoPtr.)


r=dbaron with those things fixed
Attachment #584522 - Flags: review?(dbaron) → review+
Attached patch patch 3b (obsolete) — Splinter Review
Thanks for the fast and very detailed review!  There were enough changes
that I'd like another review, if you don't mind.  I've attached just the
diff against the first patch.


> I don't see why your nsIStyleSheet::SizeOfExcludingThis is an "excluding
> this" function.  Nothing has style sheets as sub-objects;
> nsCOMArray<nsIStyleSheet> is an array of strongly-owned pointers.  I
> think it should include the style sheet object and be renamed
> appropriately.
>
> (and renaming SizeOfStyleSheetsElementExcludingThis in nsDocument.cpp)

You're right that it should include the style sheet object.

About the naming, if you have nsTHashtable<T>, each element is a T, and the
callback function doesn't measure the space used for the T element itself,
just the things it points to.  So it uses names like
|sizeOfEntryExcludingThis|.

I carried that over to nsCOMArray<T> where each element is a T*.  So the
"This" in the "ElementExcludingThis" was meant to refer to the pointer
element.  But I can see that it's confusing (I confused myself!) so I've
changed nsCOMArray accordingly in my local copy of patch 1.

(I didn't need this for nsTArray<T> because they're so easy to iterate over
that a per-element callback function isn't needed.)


> nsDOMMemoryReporter.cpp:
>
>   [ incorporate bug 713799 comment 1 by reference ]
>
>   I'm ok with this for now, though.

I'll do that in bug 713799.


> AnimationCommon.cpp:
>   > CommonAnimationManager::SizeOfExcludingThis(nsMallocSizeOfFun aMallocSizeO
f) const
>   > {
>   >-  // XXX: could measure mProperytValuePairs here.  Bug 671299 may do this.
>   >+  // Measurement of the following members may be added later if DMD finds
it is
>   >+  // worthwhile:
>   >+  // - mPropertyValuePairs
>   >+
>   >   return 0;
>   > }
>
>   This doesn't make a whole lot of sense since mPropertyValuePairs are
>   members of AnimValuesStyleRule.
Right, I was looking at the wrong class.  It should be |mElementData| and
|mPresContext|.  I'm pretty sure the latter is non-owning, so I said that.

>   Really, I think all of the reporting for this class should be in the
>   subclasses anyway.

Why?  The subclasses measure the subclass-specific things;  this measures
the superclass-specific things.  That's all standard.


> GroupRule.h:
>
>   I don't see why this (or its subclasses) need a *virtual* "excluding
>   this" sizeof function.  It seems like this should only be needed for
>   the subclasses to call, and the subclasses (MediaRule, DocumentRule)
>   should have only an "including this" version (rather than both).

True.  And I missed that nsCSSKeyframesRule is a subclass of GroupRule, so I
modified nsKeyframesRule::SizeOfIncludingThis to call
GroupRule::SizeOfIncludingThis.


>   >+  // - mNegations;
>
>   You should do mNegations; it's just like mNext.  Otherwise you're
>   producing results that count selectors only when :not() isn't involved.

I've added it, but Gmail doesn't appear to use :not() at all because it
didn't measure anything extra.


> nsAnimationManager.cpp:
>
>   Not sure why this has an "excluding this" function rather than
>   an "including this" function.  I also don't see where it's called.

nsAnimationManager inherits from nsIStyleRuleProcessor.  We call
SizeOfIncludingThis() on each element in nsStyleSet::mRuleProcessors within
nsStyleSet::SizeOfIncludingThis.

And SizeOfExcludingThis() is needed to get the measurements right when
inheritance is involved, because each class must call its superclass's
SizeOfExcludingThis().


>   For the same reason as mNegations -- because it could seriously
>   mislead authors about the memory implications of things -- I'm going
>   to insist that nsCSSStyleSheetInner::SizeOfIncludingThis traverse the
>   children (mFirstChild and their mNext).  That probably represents the
>   bulk of the memory that you thought your patch should get to but that
>   it's not.

IIUC I need to measure nsCSStyleSheet::mNext and
nsCSSStyleSheetInner::mFirstChild.  I've added those.  DMD tells me that
accounted for maybe 40% of the missing style sheet data for Gmail.


> nsCSSValue:
>
>   I also don't think nsCSSValueList and nsCSSValuePairList (non-_heap)
>   should need an "including this" function.
>   nsCSSValue::SizeOfExcludingThis should instead not measure the
>   ListDep/PairListDep types, since those are non-owning.
>
>   I'm a little concerned about nsCSSValue not measuring the array types.
>   It should be very easy, and catch a good bit of memory.

I added measurements for Array, Rect, Pair, Triplet, Gradient and String.
Neither Gmail nor TechCrunch appear to use Array or Triplet at all.  I'm not
sure if a String value's buffer is shared or not, I played it safe and assumed
it's not.

I also reworked all this heap/Dep stuff for List and PairList.  I'm not
totally confident in it, so if you could check it closely I'd appreciate it.
I'm pretty sure that both nsCSSValueList and nsCSSValueList_heap need a
SizeOfIncluding() function, since the nsCSSValue points to the latter, and
the latter points to the former.


Finally, I also added measurement of nsCSSValue::mPseudoClassList.  With all
the extra measurements, Gmail+TechCrunch's heap-unclassified drops from ~16.5%
to ~15% on a Linux64 debug build.


> nsLayoutStylesheetCache:
>
>   Should mSheetsReporter get deleted at the end?  (If so, use
>   nsAutoPtr.)

I don't think so.  When a reporter is registered, a strong reference to it
is held.  That strong reference is released when it's unregistered.  Because
mSheetsReporter is weak, it'll be destroyed at that time.  This pattern is
used for several other memory reporters.
Attachment #585638 - Flags: review?(dbaron)
Attached patch patch 3b, v2Splinter Review
Patch 3b had a bug in nsCSSStyleSheet::SizeOfIncludingThis that was causing double-counting, which DMD detected.  This new patch fixes it.
Attachment #585638 - Attachment is obsolete: true
Attachment #586279 - Flags: review?(dbaron)
Attachment #585638 - Flags: review?(dbaron)
19 day review ping!
25 day review ping.  dbaron, do you have an ETA for this review?
I just saw that dbaron is away until Feb 12.  Given that I have r+ on the first patch, I think I'll land the second patch soon (but after the FF12 deadline) and anything dbaron wants changed we can do as a follow-up.
Comment on attachment 586279 [details] [diff] [review]
patch 3b, v2

The modified patch landed long ago, no point keeping this optional r? request open.
Attachment #586279 - Flags: review?(dbaron)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: