Last Comment Bug 689443 - Use a linked list for PerWeightData
: Use a linked list for PerWeightData
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: unspecified
: All All
: P2 normal (vote)
: mozilla11
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on: 681755
Blocks: 700914
  Show dependency treegraph
 
Reported: 2011-09-26 22:27 PDT by Boris Zbarsky [:bz]
Modified: 2011-12-08 08:29 PST (History)
5 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Switch PerWeightData back to using linked lists instead of arrays for the actual per-weight rule data, so as to reduce fragmentation. incidentally fixes a bug where we were copying the per-weight data very inefficiently. (11.87 KB, patch)
2011-10-28 12:36 PDT, Boris Zbarsky [:bz]
dbaron: review+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] 2011-09-26 22:27:59 PDT
These arrays are never empty.  Unfortunately, in my measurements in bug 681755 there is a pretty long tail.  Converting the table in https://bug681755.bugzilla.mozilla.org/attachment.cgi?id=555536 to a table of length vs percentage of arrays that had that length or less, we have:

1  23.7%
2  29.7%
3  41.0%
4  44.7%
5  46.6%
6  47.7%
7  51.7%
8  52.9%
9  54.6%
10 57.8%
11 59.8%
12 61.1%
13 61.6%
14 62.3%
15 65.2%
16 66.0%

and it goes on like that.

Also important: these are temporary data structures, effectively owned by a hashtable allocated on the stack.  So there is probably not a huge amount of fragmentation win here... except for the fact that we deallocate that hashtable _after_ we've constructed the rule cascade.

One other note: we currently copy the PerWeightData out of the hashtable into an array, looks like.  But do we end up copying the mRules in the process?  I'd sure think so!  Just using Swap() here instead might be better than any sort of messing with auto arrays.

Thoughts?
Comment 1 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-10-06 11:53:05 PDT
Perhaps we could arena-allocate them and make a linked list of them instead of an array?  (Same for RuleValue?)  (They were once a linked list, no?  But maybe not arena-allocated?)
Comment 2 Boris Zbarsky [:bz] 2011-10-06 14:35:08 PDT
> But do we end up copying the mRules in the process? 

Yes, we do.

Using an arena would help with the fragmentation issues, if we have any.  Justin, thoughts?
Comment 3 Justin Lebar (not reading bugmail) 2011-10-06 14:47:28 PDT
> Also important: these are temporary data structures, effectively owned by a hashtable allocated on
> the stack.  So there is probably not a huge amount of fragmentation win here... except for the fact 
> that we deallocate that hashtable _after_ we've constructed the rule cascade.

Sounds like I need to work on a tool to help us pinpoint how much fragmentation an allocation site is responsible for (bug 688979).  If every time we do this computation we allocate so much temporary data that the rule cascade ends up on a fresh page, then this would be bad.  But my suspicion is that this wouldn't happen.

OTOH there are about 10x more RuleHashTableEntry allocations than PerWeightData allocations, so I suspect whatever we do here won't have a large impact...
Comment 4 Boris Zbarsky [:bz] 2011-10-06 15:00:00 PDT
OK.  So maybe we should just do whatever is fastest, then....
Comment 5 Boris Zbarsky [:bz] 2011-10-28 10:30:58 PDT
> (Same for RuleValue?)

RuleValue used to be an arena-allocated linked list.  We changed that in bug 576136 because we had to walk the rulevalue lists pretty often and the pointer-chasing led to cache misses.  In fact, the PerWeightData also used to have a linked list before then.

I think going back to an arena-allocated linked list for PerWeightData makes a lot of sense.  That should guarantee zero fragmentation, if we do it right, because I can clear the arena after every cascade construction.  Note that we used to not ever release things back to this arena, btw!
Comment 6 Boris Zbarsky [:bz] 2011-10-28 12:36:44 PDT
Created attachment 570330 [details] [diff] [review]
Switch PerWeightData back to using linked lists instead of arrays for the actual per-weight rule data, so as to reduce fragmentation.   incidentally fixes a bug where we were copying the per-weight data very inefficiently.

Another option is to put the entire first PerWeightDataListItem in the PerWeightdata, since we will never have an empty list.  But that would complicate the code in CascadeRuleEnumFunc somewhat.
Comment 7 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-12-07 15:10:27 PST
Comment on attachment 570330 [details] [diff] [review]
Switch PerWeightData back to using linked lists instead of arrays for the actual per-weight rule data, so as to reduce fragmentation.   incidentally fixes a bug where we were copying the per-weight data very inefficiently.

>+// We want page-sized arenas so there's no fragmentation involved.
>+#define NS_CASCADEENUMDATA_ARENA_BLOCK_SIZE (4096)

If you're not confident of this (I'm not), check with njn that this is the right number.

r=dbaron
Comment 8 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-12-07 15:12:51 PST
Oh, and you probably want to update RuleCascadeData::SizeOf.
Comment 9 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-12-07 15:13:56 PST
Er, never mind that... you're changing CascadeEnumData, which is temporary.
Comment 10 Nicholas Nethercote [:njn] 2011-12-07 15:35:32 PST
> >+// We want page-sized arenas so there's no fragmentation involved.
> >+#define NS_CASCADEENUMDATA_ARENA_BLOCK_SIZE (4096)
> 
> If you're not confident of this (I'm not), check with njn that this is the
> right number.

Answering that requires knowing more about jemalloc's internals than I do.  If you have lots of allocations in a row of the same size, you probably won't get any fragmentation, but who knows what other threads will be doing.

My philosophy in these cases is to assume jemalloc knows what it's doing.  So I'd choose the size based on what things are being allocated out of the pool and their size.  E.g. with 256 byte arenas you've got 256-4*sizeof(void*) bytes available, so if you're allocating things that are 90 bytes in size you'll waste a lot of space at the end of each arena.  But if you're allocating things that are 8 bytes it won't matter much.  And if you're allocating a lot of things, larger arenas are better because you have fewer calls to malloc/free.


(In reply to David Baron [:dbaron] from comment #8)
> Oh, and you probably want to update RuleCascadeData::SizeOf.

Hmm, this could clash with the patch in bug 705987...
Comment 11 Justin Lebar (not reading bugmail) 2011-12-07 15:53:04 PST
There was some discussion of the 4096 on IRC.  If there are no intervening allocations in the 256-byte size class, then 256-byte allocations should have the same effect on fragmentation as 4096-byte ones.  And 256-byte may be faster, because it's more likely that there's some 256-byte hole sitting around we can fill than that there's a page-sized hole sitting around.

But 4096 shouldn't fragment under any circumstances, because jemalloc can always decommit that page.
Comment 12 Boris Zbarsky [:bz] 2011-12-07 16:57:40 PST
> If there are no intervening allocations in the 256-byte size class,

Can't guarantee that, obviously.

The objects being allocated out of the arena are 3 words, so 12 or 24 byte allocations.  And the number of objects allocated is typically "one for every selector", so hundreds to tens of thousands, depending on the page.

I'm going to stick with 4096, I guess.

> Hmm, this could clash with the patch in bug 705987...

It won't; see comment 9.
Comment 14 Ed Morley [:emorley] 2011-12-08 08:29:29 PST
https://hg.mozilla.org/mozilla-central/rev/79a0b1e9f6fd

Note You need to log in before you can comment on or make changes to this bug.