Closed
Bug 681755
Opened 13 years ago
Closed 13 years ago
Investigate which of the rulehash arrays should be nsAutoTArray
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: bzbarsky, Assigned: bzbarsky)
References
(Blocks 1 open bug)
Details
(Whiteboard: [MemShrink:P2])
Attachments
(17 files)
3.19 KB,
patch
|
Details | Diff | Splinter Review | |
658 bytes,
text/plain
|
Details | |
75 bytes,
text/plain
|
Details | |
663 bytes,
text/plain
|
Details | |
227 bytes,
text/plain
|
Details | |
67 bytes,
text/plain
|
Details | |
34 bytes,
text/plain
|
Details | |
1.17 KB,
text/plain
|
Details | |
223 bytes,
text/plain
|
Details | |
336 bytes,
text/plain
|
Details | |
483 bytes,
text/plain
|
Details | |
95 bytes,
text/plain
|
Details | |
280 bytes,
text/plain
|
Details | |
35 bytes,
text/plain
|
Details | |
552 bytes,
text/plain
|
Details | |
11.10 KB,
patch
|
Details | Diff | Splinter Review | |
7.12 KB,
patch
|
dbaron
:
review+
|
Details | Diff | Splinter Review |
There are a bunch of arrays that rulehash uses. Some of these are actually guaranteed to never be empty. We should look into using nsTArray for these, perhaps. The attached patch logs at destruction time the sizes of the following arrays: 1) Array of rulevalues in RuleHashTableEntry 2) Array of universal rules in RuleHash 3) Array of nsCSSSelector* in AtomSelectorEntry 4) Array of state selectors in RuleCascadeData 5) Array of possibly-negated class selectors in RuleCascadeData 6) Array of possible-negated id selectors in RuleCascadeData 7) Array of rules in PerWeightData I then ran the log through some grep/sed/sort/uniq stuff to produce data files for each of those 7 with the following structure: First line is the total number of arrays of the given type, the following lines are counts and lengths at destruction with the count listed first and the length second. I'll attach all that stuff, then analysis of the results after loading google, gmail, cnn, slashdot, twitter, mozilla.org, mozilla.com, web.mit.edu.
Assignee | ||
Comment 1•13 years ago
|
||
Assignee | ||
Comment 2•13 years ago
|
||
Assignee | ||
Comment 3•13 years ago
|
||
Assignee | ||
Comment 4•13 years ago
|
||
Assignee | ||
Comment 5•13 years ago
|
||
Assignee | ||
Comment 6•13 years ago
|
||
Assignee | ||
Comment 7•13 years ago
|
||
Assignee | ||
Comment 8•13 years ago
|
||
Analysis: For the RuleHashTableEntry, the beginning of the file looks like this: 59080 36195 1 11178 2 3680 3 2206 4 1516 5 No 0s as expected: we don't create an entry if there is nothing to put in it. So using at least an nsAutoTArray<1> or somethign that can hold one element is a no-brainer. Not sure about auto-space for 2 elements; past that it's probably not that worthwhile. For AtomSelectorEntry, we have: 41761 22027 1 8164 2 3172 3 2037 4 1333 5 Again, auto-space for one entry is a no-brainer, space for 2 I'm not sure about, things past that are probably not worth it. For universal rules, we see that there are 867 total rulehashes of which 492 had no universal rules at all (yay!). Not sure it's worth making this an auto array at all. For possibly-negated classes, we have 248 RuleCascadeData of which 239 had no negated classes. For ids, that number is 244. So there's no point making these auto arrays either. For state selectors, things are more interesting; out of 248 RuleCascadeData 84 had 4 state selectors, and 89 had 87 state selectors, with a smaller peak at 0 (15 instances) and 1 (11 instances). I'm not sure it's worth doing an auto array here either. For the PerWeightData, we have: 7548 1792 1 450 2 852 3 280 4 146 5 (again, as expected: no rules with a weight means no PerWeightData for that weight). So an auto array holding one entry is totally worth it. Past that point, I'm not sure; the distribution doesn't die off nearly as fast as the hashtable entry arrays (e.g. there are 302 instanced of 7, 224 instances of 15, 222 instances of 17, 128 instances of 35, 124 instances of 335).
Comment 9•13 years ago
|
||
AtomSelectorEntry holds pointers, but the other two lists don't. That suggests we might want something like AutoTArray<1>, which allocates extra space for one element, and also something like SmallVoidArray, which stores the pointer in mHdr. I'm hoping to make an AutoTArray<1> lookalike which has a smaller footprint and is also memmovable. If we want AutoTArray<N> which is memmovable, that should also be doable...
Assignee | ||
Comment 10•13 years ago
|
||
Oh, on the other hand for the PerWeightData case the arrays are all temporary (they live in a hashtable owned by a stack-only CascadeEnumData). So the only drawback of bigger tarrays is the cache locality loss; there's no long-term memory usage effect there. Given that, an autoarray of size 4 (which would cover almost half the arrays in that data) might make sense.
Updated•13 years ago
|
Comment 11•13 years ago
|
||
Erm, this should *depend on* bug 682735, which makes nsAutoTArray memmove'able.
Assignee | ||
Updated•13 years ago
|
Priority: -- → P2
Assignee | ||
Comment 12•13 years ago
|
||
dbaron pointed out that the atomselector and rulehash entries are actually used in multiple tables with possibly different growth behavior. I'll attach some data files showing breakdowns of that stuff.
Assignee | ||
Comment 13•13 years ago
|
||
Assignee | ||
Comment 14•13 years ago
|
||
Assignee | ||
Updated•13 years ago
|
Attachment #560449 -
Attachment description: Attribute AtomeSelectorEntry data → Attribute AtomSelectorEntry data
Assignee | ||
Comment 15•13 years ago
|
||
Assignee | ||
Comment 16•13 years ago
|
||
Assignee | ||
Comment 17•13 years ago
|
||
Assignee | ||
Comment 18•13 years ago
|
||
Assignee | ||
Comment 19•13 years ago
|
||
Assignee | ||
Comment 20•13 years ago
|
||
Assignee | ||
Comment 21•13 years ago
|
||
It looks to me like for the most part the breakdown is the same for all the table types. Exceptions are the tiny Namespace RuleHashTableEntry table and the ID AtomSelectorEntry table (for which length 1 would only cover about 35% of the entries; lengths 1 and 2 would cover closer to 55%). David, thoughts on the array sizing? Should we just use auto-2 arrays?
Assignee | ||
Comment 22•13 years ago
|
||
Justin, want to give this a cachegrind spin?
Comment 23•13 years ago
|
||
Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 > + newEntry->mSelectors.SwapElements(oldEntry->mSelectors); Don't you need to fix up nsTArray's SwapElements implementation first?
Assignee | ||
Comment 24•13 years ago
|
||
Er, yes. Gah. I forgot about that.
Assignee | ||
Comment 25•13 years ago
|
||
Not happening tonight. :(
Comment 26•13 years ago
|
||
With the patch from bug 687722 applied, I measured the number of calls to malloc() when starting the browser to about:home, loading my gmail inbox, and closing the browser. Before: 1 350 000 After: 1 280 000 That's a 1.05x change.
Comment 27•13 years ago
|
||
Pushed to Try to get some Talos results: https://tbpl.mozilla.org/?tree=Try&rev=ae42d72ec2e3
Assignee | ||
Comment 28•13 years ago
|
||
So the before/after numbers there are both with bug 687722 but, but before is without the attached patch and after is with the attached patch? How do things look if you use auto-2 arrays in the attached patch?
Updated•13 years ago
|
Blocks: needs-more-autoarray
Comment 29•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #28) > So the before/after numbers there are both with bug 687722 but, but before > is without the attached patch and after is with the attached patch? Correct. > How do things look if you use auto-2 arrays in the attached patch? Argh; I can't reproduce those numbers anymore. (I measured multiple times originally!) I think I need a more static testcase.
Comment 30•13 years ago
|
||
New numbers. These are from running an instrumented build on the first 10 pages in the TP5 manifest. I ran the test 10 times for each build. Here are the mean number of times we call malloc: Unpatched 1138000 nsAutoTArray<1> 1113000 nsAutoTArray<2> 1106000 So nsAutoTArray<1> is a 2% decrease, but nsAutoArray<2> is only a 0.5% decrease over nsAutoTArray<1>.
Comment 31•13 years ago
|
||
If Talos is to be believed, this is good for a moderate memory improvement on Linux and Mac. (It looks like the Windows Talos job was never started, for some reason.) Results below are (patch) / (smallest recent value on m-i), in mb: Linux-32: private: 395 / 400 rss: 139 / 143 Linux-64: private: 453 / 461 rss: 192 / 198 Mac 10.5: private: 909 / 902 (very noisy) rss: 173 / 179 Mac 10.6: private: 3500 / 3700 (this difference is too huge) rss: 279 / 280 I'll take a 2-3% drop in RSS! I'll push to try to see what happens with size-2 nsAutoTArrays.
Comment 32•13 years ago
|
||
Try push with nsAutoTArray<2>: https://hg.mozilla.org/try/rev/718f3d42eef3
Comment 33•13 years ago
|
||
erm, I mean https://tbpl.mozilla.org/?tree=Try&rev=718f3d42eef3
Comment 34•13 years ago
|
||
nsAutoTArray<2> gives maybe a 1mb decrease in RSS on Linux-32 and maybe 4mb on Mac 10.6. These could be noise, though; the mac numbers in particular are very noisy. I don't know if the distribution of rulehash sizes is different on TP5 than on what Boris tested on; in some other tests, I've seen that TP5 is a lot simpler than many of the sites I visit. I'm inclined to suggest we should go with nsAutoTArray<2> because some of Boris's data suggests that it would be useful, and since we're already paying an extra 16 bytes for the empty nsAutoTArray plus an extra 4 bytes for the one RuleValue or an extra word for the one nsCSSSelector*. So the choice is between RuleValue entries which have size 20 versus 24 bytes, and AtomSelectorEntrys with size 20/24 versus 24/30 bytes (32/64-bit). We're already paying so much more for these auto arrays, maybe should eat the extra 4 or 8 bytes. But I think we'd do well with either nsAutoTArray<1> or nsAutoTArray<2>.
Comment 35•13 years ago
|
||
Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 dbaron, do you have a preference wrt size-1 or size-2? It would be great to get this in before we branch.
Attachment #560506 -
Flags: review?(dbaron)
Comment 36•13 years ago
|
||
Hm, I did a pretty bad job with those size computations. Let me try again. On 32-bit: sizeof(RuleHashTableEntry) originally 8 bytes (PLDHashEntryHdr=4, nsTArray=4) nsAutoTArray<1> 32 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, RuleValue=12) nsAutoTArray<2> 44 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, 2 x RuleValue=24) sizeof(AtomSelectorEntry) originally 12 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsTArray=4) nsAutoTArray<1> 28 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, nsCSSSelector*=4) nsAutoTArray<2> 32 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, 2 x nsCSSSelector*=8)
Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 >+static void >+RuleHash_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) >+{ Please assert that from != to. >+ RuleHashTableEntry *oldEntry = >+ const_cast<RuleHashTableEntry*>( >+ static_cast<const RuleHashTableEntry*>(from)); >+ RuleHashTableEntry *newEntry = new (to) RuleHashTableEntry(); >+ newEntry->mRules.SwapElements(oldEntry->mRules); It seems like you need to call oldEntry->~RuleHashTableEntry(), or else you'll get leak stats misreporting. Or am I missing something? >+static void >+RuleHash_TagTable_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) Same here (both comments). >+static void >+AtomSelector_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) and same 2 comments here r=dbaron with that
Attachment #560506 -
Flags: review?(dbaron) → review+
Assignee | ||
Comment 38•13 years ago
|
||
Good catch on both of those. Will fix.
Assignee | ||
Comment 39•13 years ago
|
||
Per discussion with jlebar, going to go with auto-1 on the rulehash entries and auto-2 on the atomselector ones. The atomselector case has a larger fraction of length-2 lists, and the cost of an extra entry is less there as well.
Assignee | ||
Comment 40•13 years ago
|
||
Pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/6e359c7e8080 I'll file a followup on the PerWeightData arrays.
Flags: in-testsuite-
Target Milestone: --- → mozilla9
Comment 41•13 years ago
|
||
Backed that out in https://hg.mozilla.org/integration/mozilla-inbound/rev/de97f86d572e - debug Windows mochitest-4 logs got bloated up to the point of being truncated (at 52428800 bytes), and this was my best guess out of that push.
Updated•13 years ago
|
Whiteboard: [MemShrink]
Assignee | ||
Comment 42•13 years ago
|
||
Yeah, that log has about 65k lines with pldhash containing entries of size 48 and hence spewing warnings. Double-checking our entry size numbers now.
Comment 43•13 years ago
|
||
So this is failing because our nsAutoTArrays are bigger on Windows than on Mac (and also maybe on Linux?), because Windows apparently aligns PRUint64 to 8 bytes but Mac aligns to 4 bytes. The solution is to make nsAutoTArray<E> align E to its natural alignment. I'll file a bug on this now.
Assignee | ||
Comment 44•13 years ago
|
||
So the key difference is apparently that the alignment requirement of PRUint64 is different on Mac (at least) and Windows. It's 4 words for the former and 8 for the latter. So on Mac, RuleHashTagTableEntry is 4 bytes of PRUint32, then 4 bytes of padding so the array can align on a boundary, then 16 bytes or array header (of which 4 bytes is actually padding!), then 12 bytes of first array entry, then 4 bytes of padding so the array size can be a multiple of 8. Then another nsCOMPtr<nsIAtom>. If you're counting along at home, that's 4+4+16+12+4+4 = 44. And presumably MSVC wants to align the whole struct on an 8-byte boundary so those paddings make sense in the _next_ struct, so we get 48. pldhash warns when the entrysize is > 10*sizeof(void*), and 48 > 40, so we lose. On gcc32, the array is 4-byte aligned. So we drop the 4*4 == 16 bytes of padding and end up with a 32-byte struct, which is small enough that pldhash won't warn. If we added another RuleValue, we'd be up at 44 bytes there and warning as well, so it's a good thing we didn't. ;) On 64-bit we have 80 bytes to play with, so things will be fine there as long as they're fine on 32-bit, since alignment requirements will not be able to create word-sized holes like they did here. The plan is for jlebar to convert nsAutoTArray to natural alignment, then we reland this patch.
Assignee | ||
Updated•13 years ago
|
Whiteboard: [MemShrink] → [need landing][MemShrink]
Target Milestone: mozilla9 → ---
Assignee | ||
Comment 45•13 years ago
|
||
Filed bug 689443 on sorting out what to do with PerWeightData.
Blocks: 689443
Updated•13 years ago
|
Whiteboard: [need landing][MemShrink] → [need landing][MemShrink:P2]
Comment 46•13 years ago
|
||
I pushed this; I hope you don't mind, bz! https://hg.mozilla.org/integration/mozilla-inbound/rev/81ca82f27831
Whiteboard: [need landing][MemShrink:P2] → [MemShrink:P2][inbound]
Comment 47•13 years ago
|
||
(In reply to comment 31) > I'll take a 2-3% drop in RSS! I fell for the oldest mistake in the book here. TBPL shows RSS in megabytes (1024^2 bytes), but I computed megabytes from the graphs by looking at the top three digits. So naturally, the TBPL results are 2-3% smaller than the results from the graphs... I really should not have made that mistake. We'll see whether there's an actual RSS decrease, now that I've pushed this, but I'm not too optimistic. I'll go make some noise now in the bugs I filed long ago on graphserver's units (or lack thereof).
Assignee | ||
Updated•13 years ago
|
Target Milestone: --- → mozilla10
Comment 48•13 years ago
|
||
I originally landed the wrong version of this, but I backed out and relanded. https://hg.mozilla.org/integration/mozilla-inbound/rev/2c4d7ffc1521 https://hg.mozilla.org/integration/mozilla-inbound/rev/531c05649fc9
Target Milestone: mozilla10 → ---
Comment 49•13 years ago
|
||
Correct version: https://hg.mozilla.org/mozilla-central/rev/531c05649fc9
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink:P2][inbound] → [MemShrink:P2]
Target Milestone: --- → mozilla10
You need to log in
before you can comment on or make changes to this bug.
Description
•