Last Comment Bug 681755 - Investigate which of the rulehash arrays should be nsAutoTArray
: Investigate which of the rulehash arrays should be nsAutoTArray
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: unspecified
: x86 Mac OS X
: P2 normal with 1 vote (vote)
: mozilla10
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on: 677571 687722 689433
Blocks: needs-more-autoarray 689443
  Show dependency treegraph
 
Reported: 2011-08-24 13:55 PDT by Boris Zbarsky [:bz]
Modified: 2011-10-06 08:42 PDT (History)
10 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Measurement patch (3.19 KB, patch)
2011-08-24 13:55 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
RuleHashTableEntry data (658 bytes, text/plain)
2011-08-24 13:57 PDT, Boris Zbarsky [:bz]
no flags Details
Universal rule data (75 bytes, text/plain)
2011-08-24 13:57 PDT, Boris Zbarsky [:bz]
no flags Details
AtomSelectorEntry data (663 bytes, text/plain)
2011-08-24 13:58 PDT, Boris Zbarsky [:bz]
no flags Details
State selector data (227 bytes, text/plain)
2011-08-24 13:59 PDT, Boris Zbarsky [:bz]
no flags Details
Possibly-negated class data (67 bytes, text/plain)
2011-08-24 14:00 PDT, Boris Zbarsky [:bz]
no flags Details
Possibly-negated id data (34 bytes, text/plain)
2011-08-24 14:00 PDT, Boris Zbarsky [:bz]
no flags Details
PerWeightData data (1.17 KB, text/plain)
2011-08-24 14:02 PDT, Boris Zbarsky [:bz]
no flags Details
Attribute AtomSelectorEntry data (223 bytes, text/plain)
2011-09-15 14:16 PDT, Boris Zbarsky [:bz]
no flags Details
ID AtomSelectorEntry data (336 bytes, text/plain)
2011-09-15 14:17 PDT, Boris Zbarsky [:bz]
no flags Details
Class AtomSelectorEntry data (483 bytes, text/plain)
2011-09-15 14:18 PDT, Boris Zbarsky [:bz]
no flags Details
ID RuleHashTableEntry data (95 bytes, text/plain)
2011-09-15 14:20 PDT, Boris Zbarsky [:bz]
no flags Details
Class RuleHashTableEntry data (280 bytes, text/plain)
2011-09-15 14:20 PDT, Boris Zbarsky [:bz]
no flags Details
Namespace RuleHashTableEntry data (35 bytes, text/plain)
2011-09-15 14:22 PDT, Boris Zbarsky [:bz]
no flags Details
Tag RuleHashTableEntry data (552 bytes, text/plain)
2011-09-15 14:22 PDT, Boris Zbarsky [:bz]
no flags Details
Updated measurement patch (11.10 KB, patch)
2011-09-15 14:23 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
Test patch to make all these arrays auto arrays of size 1 (7.12 KB, patch)
2011-09-15 21:02 PDT, Boris Zbarsky [:bz]
dbaron: review+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] 2011-08-24 13:55:52 PDT
Created attachment 555529 [details] [diff] [review]
Measurement patch

There are a bunch of arrays that rulehash uses.  Some of these are actually guaranteed to never be empty.  We should look into using nsTArray for these, perhaps.

The attached patch logs at destruction time the sizes of the following arrays:

1) Array of rulevalues in RuleHashTableEntry
2) Array of universal rules in RuleHash
3) Array of nsCSSSelector* in AtomSelectorEntry
4) Array of state selectors in RuleCascadeData
5) Array of possibly-negated class selectors in RuleCascadeData
6) Array of possible-negated id selectors in RuleCascadeData
7) Array of rules in PerWeightData

I then ran the log through some grep/sed/sort/uniq stuff to produce data files for each of those 7 with the following structure:  First line is the total number of arrays of the given type, the following lines are counts and lengths at destruction with the count listed first and the length second.  I'll attach all that stuff, then analysis of the results after loading google, gmail, cnn, slashdot, twitter, mozilla.org, mozilla.com, web.mit.edu.
Comment 1 Boris Zbarsky [:bz] 2011-08-24 13:57:11 PDT
Created attachment 555530 [details]
RuleHashTableEntry data
Comment 2 Boris Zbarsky [:bz] 2011-08-24 13:57:53 PDT
Created attachment 555531 [details]
Universal rule data
Comment 3 Boris Zbarsky [:bz] 2011-08-24 13:58:38 PDT
Created attachment 555532 [details]
AtomSelectorEntry data
Comment 4 Boris Zbarsky [:bz] 2011-08-24 13:59:27 PDT
Created attachment 555533 [details]
State selector data
Comment 5 Boris Zbarsky [:bz] 2011-08-24 14:00:03 PDT
Created attachment 555534 [details]
Possibly-negated class data
Comment 6 Boris Zbarsky [:bz] 2011-08-24 14:00:27 PDT
Created attachment 555535 [details]
Possibly-negated id data
Comment 7 Boris Zbarsky [:bz] 2011-08-24 14:02:02 PDT
Created attachment 555536 [details]
PerWeightData data
Comment 8 Boris Zbarsky [:bz] 2011-08-24 14:11:08 PDT
Analysis:

For the RuleHashTableEntry, the beginning of the file looks like this:

     59080
  36195  1
  11178  2
  3680  3
  2206  4
  1516  5

No 0s as expected: we don't create an entry if there is nothing to put in it.  So using at least an nsAutoTArray<1> or somethign that can hold one element is a no-brainer.  Not sure about auto-space for 2 elements; past that it's probably not that worthwhile.

For AtomSelectorEntry, we have:

     41761
  22027  1
  8164  2
  3172  3
  2037  4
  1333  5

Again, auto-space for one entry is a no-brainer, space for 2 I'm not sure about, things past that are probably not worth it.

For universal rules, we see that there are 867 total rulehashes of which 492 had no universal rules at all (yay!).  Not sure it's worth making this an auto array at all.

For possibly-negated classes, we have 248 RuleCascadeData of which 239 had no negated classes.  For ids, that number is 244.  So there's no point making these auto arrays either.

For state selectors, things are more interesting; out of 248 RuleCascadeData 84 had 4 state selectors, and 89 had 87 state selectors, with a smaller peak at 0 (15 instances) and 1 (11 instances).  I'm not sure it's worth doing an auto array here either.

For the PerWeightData, we have:

      7548
  1792  1
   450  2
   852  3
   280  4
   146  5

(again, as expected: no rules with a weight means no PerWeightData for that weight).  So an auto array holding one entry is totally worth it.  Past that point, I'm not sure; the distribution doesn't die off nearly as fast as the hashtable entry arrays (e.g. there are 302 instanced of 7, 224 instances of 15, 222 instances of 17, 128 instances of 35, 124 instances of 335).
Comment 9 Justin Lebar (not reading bugmail) 2011-08-24 14:20:05 PDT
AtomSelectorEntry holds pointers, but the other two lists don't.  That suggests we might want something like AutoTArray<1>, which allocates extra space for one element, and also something like SmallVoidArray, which stores the pointer in mHdr.

I'm hoping to make an AutoTArray<1> lookalike which has a smaller footprint and is also memmovable.  If we want AutoTArray<N> which is memmovable, that should also be doable...
Comment 10 Boris Zbarsky [:bz] 2011-08-26 22:34:42 PDT
Oh, on the other hand for the PerWeightData case the arrays are all temporary (they live in a hashtable owned by a stack-only CascadeEnumData).  So the only drawback of bigger tarrays is the cache locality loss; there's no long-term memory usage effect there.

Given that, an autoarray of size 4 (which would cover almost half the arrays in that data) might make sense.
Comment 11 Justin Lebar (not reading bugmail) 2011-09-06 11:45:20 PDT
Erm, this should *depend on* bug 682735, which makes nsAutoTArray memmove'able.
Comment 12 Boris Zbarsky [:bz] 2011-09-15 14:14:53 PDT
dbaron pointed out that the atomselector and rulehash entries are actually used in multiple tables with possibly different growth behavior.  I'll attach some data files showing breakdowns of that stuff.
Comment 13 Boris Zbarsky [:bz] 2011-09-15 14:16:04 PDT
Created attachment 560449 [details]
Attribute AtomSelectorEntry data
Comment 14 Boris Zbarsky [:bz] 2011-09-15 14:17:06 PDT
Created attachment 560451 [details]
ID AtomSelectorEntry data
Comment 15 Boris Zbarsky [:bz] 2011-09-15 14:18:16 PDT
Created attachment 560452 [details]
Class AtomSelectorEntry data
Comment 16 Boris Zbarsky [:bz] 2011-09-15 14:20:05 PDT
Created attachment 560453 [details]
ID RuleHashTableEntry data
Comment 17 Boris Zbarsky [:bz] 2011-09-15 14:20:53 PDT
Created attachment 560455 [details]
Class RuleHashTableEntry data
Comment 18 Boris Zbarsky [:bz] 2011-09-15 14:22:04 PDT
Created attachment 560456 [details]
Namespace RuleHashTableEntry data
Comment 19 Boris Zbarsky [:bz] 2011-09-15 14:22:51 PDT
Created attachment 560457 [details]
Tag RuleHashTableEntry data
Comment 20 Boris Zbarsky [:bz] 2011-09-15 14:23:42 PDT
Created attachment 560458 [details] [diff] [review]
Updated measurement patch
Comment 21 Boris Zbarsky [:bz] 2011-09-15 14:26:18 PDT
It looks to me like for the most part the breakdown is the same for all the table types.  Exceptions are the tiny Namespace RuleHashTableEntry table and the ID AtomSelectorEntry table (for which length 1 would only cover about 35% of the entries; lengths 1 and 2 would cover closer to 55%).

David, thoughts on the array sizing?  Should we just use auto-2 arrays?
Comment 22 Boris Zbarsky [:bz] 2011-09-15 21:02:31 PDT
Created attachment 560506 [details] [diff] [review]
Test patch to make all these arrays auto arrays of size 1

Justin, want to give this a cachegrind spin?
Comment 23 Justin Lebar (not reading bugmail) 2011-09-15 21:20:47 PDT
Comment on attachment 560506 [details] [diff] [review]
Test patch to make all these arrays auto arrays of size 1

> +  newEntry->mSelectors.SwapElements(oldEntry->mSelectors);

Don't you need to fix up nsTArray's SwapElements implementation first?
Comment 24 Boris Zbarsky [:bz] 2011-09-15 21:27:38 PDT
Er, yes.  Gah.  I forgot about that.
Comment 25 Boris Zbarsky [:bz] 2011-09-15 21:29:32 PDT
Not happening tonight.  :(
Comment 26 Justin Lebar (not reading bugmail) 2011-09-22 10:59:32 PDT
With the patch from bug 687722 applied, I measured the number of calls to malloc() when starting the browser to about:home, loading my gmail inbox, and closing the browser.

Before: 1 350 000
After:  1 280 000

That's a 1.05x change.
Comment 27 Justin Lebar (not reading bugmail) 2011-09-22 11:03:14 PDT
Pushed to Try to get some Talos results:

https://tbpl.mozilla.org/?tree=Try&rev=ae42d72ec2e3
Comment 28 Boris Zbarsky [:bz] 2011-09-22 11:07:17 PDT
So the before/after numbers there are both with bug 687722 but, but before is without the attached patch and after is with the attached patch?

How do things look if you use auto-2 arrays in the attached patch?
Comment 29 Justin Lebar (not reading bugmail) 2011-09-22 11:28:45 PDT
(In reply to Boris Zbarsky (:bz) from comment #28)
> So the before/after numbers there are both with bug 687722 but, but before
> is without the attached patch and after is with the attached patch?

Correct.

> How do things look if you use auto-2 arrays in the attached patch?

Argh; I can't reproduce those numbers anymore.  (I measured multiple times originally!)  I think I need a more static testcase.
Comment 30 Justin Lebar (not reading bugmail) 2011-09-22 13:26:10 PDT
New numbers.  These are from running an instrumented build on the first 10 pages in the TP5 manifest.  I ran the test 10 times for each build.  Here are the mean number of times we call malloc:

Unpatched        1138000
nsAutoTArray<1>  1113000
nsAutoTArray<2>  1106000

So nsAutoTArray<1> is a 2% decrease, but nsAutoArray<2> is only a 0.5% decrease over nsAutoTArray<1>.
Comment 31 Justin Lebar (not reading bugmail) 2011-09-22 20:41:46 PDT
If Talos is to be believed, this is good for a moderate memory improvement on Linux and Mac.  (It looks like the Windows Talos job was never started, for some reason.)

Results below are (patch) / (smallest recent value on m-i), in mb:

Linux-32:
  private: 395 / 400
  rss:     139 / 143

Linux-64:
  private: 453 / 461
  rss:     192 / 198

Mac 10.5:
  private: 909 / 902 (very noisy)
  rss:     173 / 179

Mac 10.6:
  private: 3500 / 3700 (this difference is too huge)
  rss:     279 / 280

I'll take a 2-3% drop in RSS!

I'll push to try to see what happens with size-2 nsAutoTArrays.
Comment 32 Justin Lebar (not reading bugmail) 2011-09-22 20:49:25 PDT
Try push with nsAutoTArray<2>: https://hg.mozilla.org/try/rev/718f3d42eef3
Comment 33 Justin Lebar (not reading bugmail) 2011-09-22 20:49:51 PDT
erm, I mean https://tbpl.mozilla.org/?tree=Try&rev=718f3d42eef3
Comment 34 Justin Lebar (not reading bugmail) 2011-09-23 06:19:16 PDT
nsAutoTArray<2> gives maybe a 1mb decrease in RSS on Linux-32 and maybe 4mb on Mac 10.6.  These could be noise, though; the mac numbers in particular are very noisy.

I don't know if the distribution of rulehash sizes is different on TP5 than on what Boris tested on; in some other tests, I've seen that TP5 is a lot simpler than many of the sites I visit.

I'm inclined to suggest we should go with nsAutoTArray<2> because some of Boris's data suggests that it would be useful, and since we're already paying an extra 16 bytes for the empty nsAutoTArray plus an extra 4 bytes for the one RuleValue or an extra word for the one nsCSSSelector*.  So the choice is between RuleValue entries which have size 20 versus 24 bytes, and AtomSelectorEntrys with size 20/24 versus 24/30 bytes (32/64-bit).

We're already paying so much more for these auto arrays, maybe should eat the extra 4 or 8 bytes.

But I think we'd do well with either nsAutoTArray<1> or nsAutoTArray<2>.
Comment 35 Justin Lebar (not reading bugmail) 2011-09-23 07:17:28 PDT
Comment on attachment 560506 [details] [diff] [review]
Test patch to make all these arrays auto arrays of size 1

dbaron, do you have a preference wrt size-1 or size-2?  It would be great to get this in before we branch.
Comment 36 Justin Lebar (not reading bugmail) 2011-09-23 07:36:28 PDT
Hm, I did a pretty bad job with those size computations.  Let me try again.  On 32-bit:

sizeof(RuleHashTableEntry)
  originally        8 bytes (PLDHashEntryHdr=4, nsTArray=4)
  nsAutoTArray<1>  32 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, RuleValue=12)
  nsAutoTArray<2>  44 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, 2 x RuleValue=24)

sizeof(AtomSelectorEntry)
  originally       12 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsTArray=4)
  nsAutoTArray<1>  28 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, nsCSSSelector*=4)
  nsAutoTArray<2>  32 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, 2 x nsCSSSelector*=8)
Comment 37 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-09-26 13:54:27 PDT
Comment on attachment 560506 [details] [diff] [review]
Test patch to make all these arrays auto arrays of size 1

>+static void
>+RuleHash_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from,
>+                   PLDHashEntryHdr *to)
>+{

Please assert that from != to.

>+  RuleHashTableEntry *oldEntry =
>+    const_cast<RuleHashTableEntry*>(
>+      static_cast<const RuleHashTableEntry*>(from));
>+  RuleHashTableEntry *newEntry = new (to) RuleHashTableEntry();
>+  newEntry->mRules.SwapElements(oldEntry->mRules);

It seems like you need to call oldEntry->~RuleHashTableEntry(), or else you'll get leak stats misreporting.  Or am I missing something?

>+static void
>+RuleHash_TagTable_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from,
>+                            PLDHashEntryHdr *to)

Same here (both comments).


>+static void
>+AtomSelector_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from,
>+                       PLDHashEntryHdr *to)

and same 2 comments here


r=dbaron with that
Comment 38 Boris Zbarsky [:bz] 2011-09-26 14:11:08 PDT
Good catch on both of those.  Will fix.
Comment 39 Boris Zbarsky [:bz] 2011-09-26 14:53:15 PDT
Per discussion with jlebar, going to go with auto-1 on the rulehash entries and auto-2 on the atomselector ones.  The atomselector case has a larger fraction of length-2 lists, and the cost of an extra entry is less there as well.
Comment 40 Boris Zbarsky [:bz] 2011-09-26 15:07:50 PDT
Pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/6e359c7e8080

I'll file a followup on the PerWeightData arrays.
Comment 41 Phil Ringnalda (:philor) 2011-09-26 19:45:56 PDT
Backed that out in https://hg.mozilla.org/integration/mozilla-inbound/rev/de97f86d572e - debug Windows mochitest-4 logs got bloated up to the point of being truncated (at 52428800 bytes), and this was my best guess out of that push.
Comment 42 Boris Zbarsky [:bz] 2011-09-26 20:04:52 PDT
Yeah, that log has about 65k lines with pldhash containing entries of size 48 and hence spewing warnings.

Double-checking our entry size numbers now.
Comment 43 Justin Lebar (not reading bugmail) 2011-09-26 20:42:46 PDT
So this is failing because our nsAutoTArrays are bigger on Windows than on Mac (and also maybe on Linux?), because Windows apparently aligns PRUint64 to 8 bytes but Mac aligns to 4 bytes.

The solution is to make nsAutoTArray<E> align E to its natural alignment.  I'll file a bug on this now.
Comment 44 Boris Zbarsky [:bz] 2011-09-26 20:46:03 PDT
So the key difference is apparently that the alignment requirement of PRUint64 is different on Mac (at least) and Windows. It's 4 words for the former and 8 for the latter.

So on Mac, RuleHashTagTableEntry is 4 bytes of PRUint32, then 4 bytes of padding so the array can align on a boundary, then 16 bytes or array header (of which 4 bytes is actually padding!), then 12 bytes of first array entry, then 4 bytes of padding so the array size can be a multiple of 8.  Then another nsCOMPtr<nsIAtom>.

If you're counting along at home, that's 4+4+16+12+4+4 = 44.  And presumably MSVC wants to align the whole struct on an 8-byte boundary so those paddings make sense in the _next_ struct, so we get 48.

pldhash warns when the entrysize is > 10*sizeof(void*), and 48 > 40, so we lose.

On gcc32, the array is 4-byte aligned.  So we drop the 4*4 == 16 bytes of padding and end up with a 32-byte struct, which is small enough that pldhash won't warn.  If we added another RuleValue, we'd be up at 44 bytes there and warning as well, so it's a good thing we didn't.  ;)

On 64-bit we have 80 bytes to play with, so things will be fine there as long as they're fine on 32-bit, since alignment requirements will not be able to create word-sized holes like they did here.

The plan is for jlebar to convert nsAutoTArray to natural alignment, then we reland this patch.
Comment 45 Boris Zbarsky [:bz] 2011-09-26 22:29:08 PDT
Filed bug 689443 on sorting out what to do with PerWeightData.
Comment 46 Justin Lebar (not reading bugmail) 2011-10-05 16:51:41 PDT
I pushed this; I hope you don't mind, bz!

https://hg.mozilla.org/integration/mozilla-inbound/rev/81ca82f27831
Comment 47 Justin Lebar (not reading bugmail) 2011-10-05 16:56:47 PDT
(In reply to comment 31)
> I'll take a 2-3% drop in RSS!

I fell for the oldest mistake in the book here.  TBPL shows RSS in megabytes (1024^2 bytes), but I computed megabytes from the graphs by looking at the top three digits.  So naturally, the TBPL results are 2-3% smaller than the results from the graphs...

I really should not have made that mistake.

We'll see whether there's an actual RSS decrease, now that I've pushed this, but I'm not too optimistic.

I'll go make some noise now in the bugs I filed long ago on graphserver's units (or lack thereof).
Comment 48 Justin Lebar (not reading bugmail) 2011-10-05 19:08:32 PDT
I originally landed the wrong version of this, but I backed out and relanded.

https://hg.mozilla.org/integration/mozilla-inbound/rev/2c4d7ffc1521
https://hg.mozilla.org/integration/mozilla-inbound/rev/531c05649fc9
Comment 49 Ed Morley [:emorley] 2011-10-06 08:42:31 PDT
Correct version: https://hg.mozilla.org/mozilla-central/rev/531c05649fc9

Note You need to log in before you can comment on or make changes to this bug.