Closed Bug 681755 Opened 14 years ago Closed 14 years ago

Investigate which of the rulehash arrays should be nsAutoTArray

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla10

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(17 files)

Measurement patch 14 years ago Boris Zbarsky [:bzbarsky] 3.19 KB, patch		Details \| Diff \| Splinter Review
RuleHashTableEntry data 14 years ago Boris Zbarsky [:bzbarsky] 658 bytes, text/plain		Details
Universal rule data 14 years ago Boris Zbarsky [:bzbarsky] 75 bytes, text/plain		Details
AtomSelectorEntry data 14 years ago Boris Zbarsky [:bzbarsky] 663 bytes, text/plain		Details
State selector data 14 years ago Boris Zbarsky [:bzbarsky] 227 bytes, text/plain		Details
Possibly-negated class data 14 years ago Boris Zbarsky [:bzbarsky] 67 bytes, text/plain		Details
Possibly-negated id data 14 years ago Boris Zbarsky [:bzbarsky] 34 bytes, text/plain		Details
PerWeightData data 14 years ago Boris Zbarsky [:bzbarsky] 1.17 KB, text/plain		Details
Attribute AtomSelectorEntry data 14 years ago Boris Zbarsky [:bzbarsky] 223 bytes, text/plain		Details
ID AtomSelectorEntry data 14 years ago Boris Zbarsky [:bzbarsky] 336 bytes, text/plain		Details
Class AtomSelectorEntry data 14 years ago Boris Zbarsky [:bzbarsky] 483 bytes, text/plain		Details
ID RuleHashTableEntry data 14 years ago Boris Zbarsky [:bzbarsky] 95 bytes, text/plain		Details
Class RuleHashTableEntry data 14 years ago Boris Zbarsky [:bzbarsky] 280 bytes, text/plain		Details
Namespace RuleHashTableEntry data 14 years ago Boris Zbarsky [:bzbarsky] 35 bytes, text/plain		Details
Tag RuleHashTableEntry data 14 years ago Boris Zbarsky [:bzbarsky] 552 bytes, text/plain		Details
Updated measurement patch 14 years ago Boris Zbarsky [:bzbarsky] 11.10 KB, patch		Details \| Diff \| Splinter Review
Test patch to make all these arrays auto arrays of size 1 14 years ago Boris Zbarsky [:bzbarsky] 7.12 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Description

•

14 years ago

Attached patch Measurement patch — Details — Splinter Review

There are a bunch of arrays that rulehash uses. Some of these are actually guaranteed to never be empty. We should look into using nsTArray for these, perhaps. The attached patch logs at destruction time the sizes of the following arrays: 1) Array of rulevalues in RuleHashTableEntry 2) Array of universal rules in RuleHash 3) Array of nsCSSSelector* in AtomSelectorEntry 4) Array of state selectors in RuleCascadeData 5) Array of possibly-negated class selectors in RuleCascadeData 6) Array of possible-negated id selectors in RuleCascadeData 7) Array of rules in PerWeightData I then ran the log through some grep/sed/sort/uniq stuff to produce data files for each of those 7 with the following structure: First line is the total number of arrays of the given type, the following lines are counts and lengths at destruction with the count listed first and the length second. I'll attach all that stuff, then analysis of the results after loading google, gmail, cnn, slashdot, twitter, mozilla.org, mozilla.com, web.mit.edu.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Blocks: 677571

Boris Zbarsky [:bzbarsky]

Assignee

Comment 1

•

14 years ago

Attached file RuleHashTableEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

14 years ago

Attached file Universal rule data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

14 years ago

Attached file AtomSelectorEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

14 years ago

Attached file State selector data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

14 years ago

Attached file Possibly-negated class data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

14 years ago

Attached file Possibly-negated id data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

14 years ago

Attached file PerWeightData data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

14 years ago

Analysis: For the RuleHashTableEntry, the beginning of the file looks like this: 59080 36195 1 11178 2 3680 3 2206 4 1516 5 No 0s as expected: we don't create an entry if there is nothing to put in it. So using at least an nsAutoTArray<1> or somethign that can hold one element is a no-brainer. Not sure about auto-space for 2 elements; past that it's probably not that worthwhile. For AtomSelectorEntry, we have: 41761 22027 1 8164 2 3172 3 2037 4 1333 5 Again, auto-space for one entry is a no-brainer, space for 2 I'm not sure about, things past that are probably not worth it. For universal rules, we see that there are 867 total rulehashes of which 492 had no universal rules at all (yay!). Not sure it's worth making this an auto array at all. For possibly-negated classes, we have 248 RuleCascadeData of which 239 had no negated classes. For ids, that number is 244. So there's no point making these auto arrays either. For state selectors, things are more interesting; out of 248 RuleCascadeData 84 had 4 state selectors, and 89 had 87 state selectors, with a smaller peak at 0 (15 instances) and 1 (11 instances). I'm not sure it's worth doing an auto array here either. For the PerWeightData, we have: 7548 1792 1 450 2 852 3 280 4 146 5 (again, as expected: no rules with a weight means no PerWeightData for that weight). So an auto array holding one entry is totally worth it. Past that point, I'm not sure; the distribution doesn't die off nearly as fast as the hashtable entry arrays (e.g. there are 302 instanced of 7, 224 instances of 15, 222 instances of 17, 128 instances of 35, 124 instances of 335).

Justin Lebar (not reading bugmail)

Comment 9

•

14 years ago

AtomSelectorEntry holds pointers, but the other two lists don't. That suggests we might want something like AutoTArray<1>, which allocates extra space for one element, and also something like SmallVoidArray, which stores the pointer in mHdr. I'm hoping to make an AutoTArray<1> lookalike which has a smaller footprint and is also memmovable. If we want AutoTArray<N> which is memmovable, that should also be doable...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 10

•

14 years ago

Oh, on the other hand for the PerWeightData case the arrays are all temporary (they live in a hashtable owned by a stack-only CascadeEnumData). So the only drawback of bigger tarrays is the cache locality loss; there's no long-term memory usage effect there. Given that, an autoarray of size 4 (which would cover almost half the arrays in that data) might make sense.

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

Blocks: 682735

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

No longer blocks: 677571

Depends on: 677571

Justin Lebar (not reading bugmail)

Comment 11

•

14 years ago

Erm, this should *depend on* bug 682735, which makes nsAutoTArray memmove'able.

No longer blocks: 682735

Depends on: 682735

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Priority: -- → P2

Boris Zbarsky [:bzbarsky]

Assignee

Comment 12

•

14 years ago

dbaron pointed out that the atomselector and rulehash entries are actually used in multiple tables with possibly different growth behavior. I'll attach some data files showing breakdowns of that stuff.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

14 years ago

Attached file Attribute AtomSelectorEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 14

•

14 years ago

Attached file ID AtomSelectorEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Attachment #560449 - Attachment description: Attribute AtomeSelectorEntry data → Attribute AtomSelectorEntry data

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

14 years ago

Attached file Class AtomSelectorEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 16

•

14 years ago

Attached file ID RuleHashTableEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

14 years ago

Attached file Class RuleHashTableEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 18

•

14 years ago

Attached file Namespace RuleHashTableEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 19

•

14 years ago

Attached file Tag RuleHashTableEntry data — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 20

•

14 years ago

Attached patch Updated measurement patch — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Comment 21

•

14 years ago

It looks to me like for the most part the breakdown is the same for all the table types. Exceptions are the tiny Namespace RuleHashTableEntry table and the ID AtomSelectorEntry table (for which length 1 would only cover about 35% of the entries; lengths 1 and 2 would cover closer to 55%). David, thoughts on the array sizing? Should we just use auto-2 arrays?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 22

•

14 years ago

Attached patch Test patch to make all these arrays auto arrays of size 1 — Details — Splinter Review

Justin, want to give this a cachegrind spin?

Justin Lebar (not reading bugmail)

Comment 23

•

14 years ago

Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 > + newEntry->mSelectors.SwapElements(oldEntry->mSelectors); Don't you need to fix up nsTArray's SwapElements implementation first?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

14 years ago

Er, yes. Gah. I forgot about that.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 25

•

14 years ago

Not happening tonight. :(

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

Depends on: 687722

Justin Lebar (not reading bugmail)

Comment 26

•

14 years ago

With the patch from bug 687722 applied, I measured the number of calls to malloc() when starting the browser to about:home, loading my gmail inbox, and closing the browser. Before: 1 350 000 After: 1 280 000 That's a 1.05x change.

Justin Lebar (not reading bugmail)

Comment 27

•

14 years ago

Pushed to Try to get some Talos results: https://tbpl.mozilla.org/?tree=Try&rev=ae42d72ec2e3

Boris Zbarsky [:bzbarsky]

Assignee

Comment 28

•

14 years ago

So the before/after numbers there are both with bug 687722 but, but before is without the attached patch and after is with the attached patch? How do things look if you use auto-2 arrays in the attached patch?

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

Blocks: needs-more-autoarray

Justin Lebar (not reading bugmail)

Comment 29

•

14 years ago

(In reply to Boris Zbarsky (:bz) from comment #28) > So the before/after numbers there are both with bug 687722 but, but before > is without the attached patch and after is with the attached patch? Correct. > How do things look if you use auto-2 arrays in the attached patch? Argh; I can't reproduce those numbers anymore. (I measured multiple times originally!) I think I need a more static testcase.

Justin Lebar (not reading bugmail)

Comment 30

•

14 years ago

New numbers. These are from running an instrumented build on the first 10 pages in the TP5 manifest. I ran the test 10 times for each build. Here are the mean number of times we call malloc: Unpatched 1138000 nsAutoTArray<1> 1113000 nsAutoTArray<2> 1106000 So nsAutoTArray<1> is a 2% decrease, but nsAutoArray<2> is only a 0.5% decrease over nsAutoTArray<1>.

Justin Lebar (not reading bugmail)

Comment 31

•

14 years ago

If Talos is to be believed, this is good for a moderate memory improvement on Linux and Mac. (It looks like the Windows Talos job was never started, for some reason.) Results below are (patch) / (smallest recent value on m-i), in mb: Linux-32: private: 395 / 400 rss: 139 / 143 Linux-64: private: 453 / 461 rss: 192 / 198 Mac 10.5: private: 909 / 902 (very noisy) rss: 173 / 179 Mac 10.6: private: 3500 / 3700 (this difference is too huge) rss: 279 / 280 I'll take a 2-3% drop in RSS! I'll push to try to see what happens with size-2 nsAutoTArrays.

Justin Lebar (not reading bugmail)

Comment 32

•

14 years ago

Try push with nsAutoTArray<2>: https://hg.mozilla.org/try/rev/718f3d42eef3

Justin Lebar (not reading bugmail)

Comment 33

•

14 years ago

erm, I mean https://tbpl.mozilla.org/?tree=Try&rev=718f3d42eef3

Justin Lebar (not reading bugmail)

Comment 34

•

14 years ago

nsAutoTArray<2> gives maybe a 1mb decrease in RSS on Linux-32 and maybe 4mb on Mac 10.6. These could be noise, though; the mac numbers in particular are very noisy. I don't know if the distribution of rulehash sizes is different on TP5 than on what Boris tested on; in some other tests, I've seen that TP5 is a lot simpler than many of the sites I visit. I'm inclined to suggest we should go with nsAutoTArray<2> because some of Boris's data suggests that it would be useful, and since we're already paying an extra 16 bytes for the empty nsAutoTArray plus an extra 4 bytes for the one RuleValue or an extra word for the one nsCSSSelector*. So the choice is between RuleValue entries which have size 20 versus 24 bytes, and AtomSelectorEntrys with size 20/24 versus 24/30 bytes (32/64-bit). We're already paying so much more for these auto arrays, maybe should eat the extra 4 or 8 bytes. But I think we'd do well with either nsAutoTArray<1> or nsAutoTArray<2>.

Justin Lebar (not reading bugmail)

Comment 35

•

14 years ago

Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 dbaron, do you have a preference wrt size-1 or size-2? It would be great to get this in before we branch.

Attachment #560506 - Flags: review?(dbaron)

Justin Lebar (not reading bugmail)

Comment 36

•

14 years ago

Hm, I did a pretty bad job with those size computations. Let me try again. On 32-bit: sizeof(RuleHashTableEntry) originally 8 bytes (PLDHashEntryHdr=4, nsTArray=4) nsAutoTArray<1> 32 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, RuleValue=12) nsAutoTArray<2> 44 bytes (PLDHashEntryHdr=4, nsAutoTArray<0>=16, 2 x RuleValue=24) sizeof(AtomSelectorEntry) originally 12 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsTArray=4) nsAutoTArray<1> 28 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, nsCSSSelector*=4) nsAutoTArray<2> 32 bytes (PLDHashEntryHdr=4, nsIAtom*=4, nsAutoTArray<0>=16, 2 x nsCSSSelector*=8)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 37

•

14 years ago

Comment on attachment 560506 [details] [diff] [review] Test patch to make all these arrays auto arrays of size 1 >+static void >+RuleHash_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) >+{ Please assert that from != to. >+ RuleHashTableEntry *oldEntry = >+ const_cast<RuleHashTableEntry*>( >+ static_cast<const RuleHashTableEntry*>(from)); >+ RuleHashTableEntry *newEntry = new (to) RuleHashTableEntry(); >+ newEntry->mRules.SwapElements(oldEntry->mRules); It seems like you need to call oldEntry->~RuleHashTableEntry(), or else you'll get leak stats misreporting. Or am I missing something? >+static void >+RuleHash_TagTable_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) Same here (both comments). >+static void >+AtomSelector_MoveEntry(PLDHashTable *table, const PLDHashEntryHdr *from, >+ PLDHashEntryHdr *to) and same 2 comments here r=dbaron with that

Attachment #560506 - Flags: review?(dbaron) → review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 38

•

14 years ago

Good catch on both of those. Will fix.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 39

•

14 years ago

Per discussion with jlebar, going to go with auto-1 on the rulehash entries and auto-2 on the atomselector ones. The atomselector case has a larger fraction of length-2 lists, and the cost of an extra entry is less there as well.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 40

•

14 years ago

Pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/6e359c7e8080 I'll file a followup on the PerWeightData arrays.

Flags: in-testsuite-

Target Milestone: --- → mozilla9

Phil Ringnalda (:philor)

Comment 41

•

14 years ago

Backed that out in https://hg.mozilla.org/integration/mozilla-inbound/rev/de97f86d572e - debug Windows mochitest-4 logs got bloated up to the point of being truncated (at 52428800 bytes), and this was my best guess out of that push.

Ed Morley [:emorley]

Updated

•

14 years ago

Whiteboard: [MemShrink]

Boris Zbarsky [:bzbarsky]

Assignee

Comment 42

•

14 years ago

Yeah, that log has about 65k lines with pldhash containing entries of size 48 and hence spewing warnings. Double-checking our entry size numbers now.

Justin Lebar (not reading bugmail)

Comment 43

•

14 years ago

So this is failing because our nsAutoTArrays are bigger on Windows than on Mac (and also maybe on Linux?), because Windows apparently aligns PRUint64 to 8 bytes but Mac aligns to 4 bytes. The solution is to make nsAutoTArray<E> align E to its natural alignment. I'll file a bug on this now.

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

Depends on: 689433

Boris Zbarsky [:bzbarsky]

Assignee

Comment 44

•

14 years ago

So the key difference is apparently that the alignment requirement of PRUint64 is different on Mac (at least) and Windows. It's 4 words for the former and 8 for the latter. So on Mac, RuleHashTagTableEntry is 4 bytes of PRUint32, then 4 bytes of padding so the array can align on a boundary, then 16 bytes or array header (of which 4 bytes is actually padding!), then 12 bytes of first array entry, then 4 bytes of padding so the array size can be a multiple of 8. Then another nsCOMPtr<nsIAtom>. If you're counting along at home, that's 4+4+16+12+4+4 = 44. And presumably MSVC wants to align the whole struct on an 8-byte boundary so those paddings make sense in the _next_ struct, so we get 48. pldhash warns when the entrysize is > 10*sizeof(void*), and 48 > 40, so we lose. On gcc32, the array is 4-byte aligned. So we drop the 4*4 == 16 bytes of padding and end up with a 32-byte struct, which is small enough that pldhash won't warn. If we added another RuleValue, we'd be up at 44 bytes there and warning as well, so it's a good thing we didn't. ;) On 64-bit we have 80 bytes to play with, so things will be fine there as long as they're fine on 32-bit, since alignment requirements will not be able to create word-sized holes like they did here. The plan is for jlebar to convert nsAutoTArray to natural alignment, then we reland this patch.

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Whiteboard: [MemShrink] → [need landing][MemShrink]

Target Milestone: mozilla9 → ---

Boris Zbarsky [:bzbarsky]

Assignee

Comment 45

•

14 years ago

Filed bug 689443 on sorting out what to do with PerWeightData.

Blocks: 689443

Johnny Stenback (:jst)

Updated

•

14 years ago

Whiteboard: [need landing][MemShrink] → [need landing][MemShrink:P2]

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

No longer depends on: 682735

Justin Lebar (not reading bugmail)

Comment 46

•

14 years ago

I pushed this; I hope you don't mind, bz! https://hg.mozilla.org/integration/mozilla-inbound/rev/81ca82f27831

Whiteboard: [need landing][MemShrink:P2] → [MemShrink:P2][inbound]

Justin Lebar (not reading bugmail)

Comment 47

•

14 years ago

(In reply to comment 31) > I'll take a 2-3% drop in RSS! I fell for the oldest mistake in the book here. TBPL shows RSS in megabytes (1024^2 bytes), but I computed megabytes from the graphs by looking at the top three digits. So naturally, the TBPL results are 2-3% smaller than the results from the graphs... I really should not have made that mistake. We'll see whether there's an actual RSS decrease, now that I've pushed this, but I'm not too optimistic. I'll go make some noise now in the bugs I filed long ago on graphserver's units (or lack thereof).

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

14 years ago

Target Milestone: --- → mozilla10

Justin Lebar (not reading bugmail)

Comment 48

•

14 years ago

I originally landed the wrong version of this, but I backed out and relanded. https://hg.mozilla.org/integration/mozilla-inbound/rev/2c4d7ffc1521 https://hg.mozilla.org/integration/mozilla-inbound/rev/531c05649fc9

Target Milestone: mozilla10 → ---

Ed Morley [:emorley]

Comment 49

•

14 years ago

Correct version: https://hg.mozilla.org/mozilla-central/rev/531c05649fc9

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Whiteboard: [MemShrink:P2][inbound] → [MemShrink:P2]

Target Milestone: --- → mozilla10

You need to log in before you can comment on or make changes to this bug.