Use a linked list for PerWeightData

RESOLVED FIXED in mozilla11

Status

()

Core
CSS Parsing and Computation
P2
normal
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: bz, Assigned: bz)

Tracking

unspecified
mozilla11
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
These arrays are never empty.  Unfortunately, in my measurements in bug 681755 there is a pretty long tail.  Converting the table in https://bug681755.bugzilla.mozilla.org/attachment.cgi?id=555536 to a table of length vs percentage of arrays that had that length or less, we have:

1  23.7%
2  29.7%
3  41.0%
4  44.7%
5  46.6%
6  47.7%
7  51.7%
8  52.9%
9  54.6%
10 57.8%
11 59.8%
12 61.1%
13 61.6%
14 62.3%
15 65.2%
16 66.0%

and it goes on like that.

Also important: these are temporary data structures, effectively owned by a hashtable allocated on the stack.  So there is probably not a huge amount of fragmentation win here... except for the fact that we deallocate that hashtable _after_ we've constructed the rule cascade.

One other note: we currently copy the PerWeightData out of the hashtable into an array, looks like.  But do we end up copying the mRules in the process?  I'd sure think so!  Just using Swap() here instead might be better than any sort of messing with auto arrays.

Thoughts?
(Assignee)

Updated

6 years ago
Depends on: 681755
Perhaps we could arena-allocate them and make a linked list of them instead of an array?  (Same for RuleValue?)  (They were once a linked list, no?  But maybe not arena-allocated?)
(Assignee)

Comment 2

6 years ago
> But do we end up copying the mRules in the process? 

Yes, we do.

Using an arena would help with the fragmentation issues, if we have any.  Justin, thoughts?
> Also important: these are temporary data structures, effectively owned by a hashtable allocated on
> the stack.  So there is probably not a huge amount of fragmentation win here... except for the fact 
> that we deallocate that hashtable _after_ we've constructed the rule cascade.

Sounds like I need to work on a tool to help us pinpoint how much fragmentation an allocation site is responsible for (bug 688979).  If every time we do this computation we allocate so much temporary data that the rule cascade ends up on a fresh page, then this would be bad.  But my suspicion is that this wouldn't happen.

OTOH there are about 10x more RuleHashTableEntry allocations than PerWeightData allocations, so I suspect whatever we do here won't have a large impact...
(Assignee)

Comment 4

6 years ago
OK.  So maybe we should just do whatever is fastest, then....
(Assignee)

Comment 5

6 years ago
> (Same for RuleValue?)

RuleValue used to be an arena-allocated linked list.  We changed that in bug 576136 because we had to walk the rulevalue lists pretty often and the pointer-chasing led to cache misses.  In fact, the PerWeightData also used to have a linked list before then.

I think going back to an arena-allocated linked list for PerWeightData makes a lot of sense.  That should guarantee zero fragmentation, if we do it right, because I can clear the arena after every cascade construction.  Note that we used to not ever release things back to this arena, btw!
(Assignee)

Comment 6

6 years ago
Created attachment 570330 [details] [diff] [review]
Switch PerWeightData back to using linked lists instead of arrays for the actual per-weight rule data, so as to reduce fragmentation.   incidentally fixes a bug where we were copying the per-weight data very inefficiently.

Another option is to put the entire first PerWeightDataListItem in the PerWeightdata, since we will never have an empty list.  But that would complicate the code in CascadeRuleEnumFunc somewhat.
Attachment #570330 - Flags: review?(dbaron)
(Assignee)

Updated

6 years ago
Whiteboard: [need review]
(Assignee)

Updated

6 years ago
Blocks: 700914
(Assignee)

Updated

6 years ago
Summary: Maybe use an auto array for the array of PerWeightData → Use a linked list for PerWeightData
Comment on attachment 570330 [details] [diff] [review]
Switch PerWeightData back to using linked lists instead of arrays for the actual per-weight rule data, so as to reduce fragmentation.   incidentally fixes a bug where we were copying the per-weight data very inefficiently.

>+// We want page-sized arenas so there's no fragmentation involved.
>+#define NS_CASCADEENUMDATA_ARENA_BLOCK_SIZE (4096)

If you're not confident of this (I'm not), check with njn that this is the right number.

r=dbaron
Attachment #570330 - Flags: review?(dbaron) → review+
OS: Mac OS X → All
Hardware: x86 → All
Oh, and you probably want to update RuleCascadeData::SizeOf.
Er, never mind that... you're changing CascadeEnumData, which is temporary.
> >+// We want page-sized arenas so there's no fragmentation involved.
> >+#define NS_CASCADEENUMDATA_ARENA_BLOCK_SIZE (4096)
> 
> If you're not confident of this (I'm not), check with njn that this is the
> right number.

Answering that requires knowing more about jemalloc's internals than I do.  If you have lots of allocations in a row of the same size, you probably won't get any fragmentation, but who knows what other threads will be doing.

My philosophy in these cases is to assume jemalloc knows what it's doing.  So I'd choose the size based on what things are being allocated out of the pool and their size.  E.g. with 256 byte arenas you've got 256-4*sizeof(void*) bytes available, so if you're allocating things that are 90 bytes in size you'll waste a lot of space at the end of each arena.  But if you're allocating things that are 8 bytes it won't matter much.  And if you're allocating a lot of things, larger arenas are better because you have fewer calls to malloc/free.


(In reply to David Baron [:dbaron] from comment #8)
> Oh, and you probably want to update RuleCascadeData::SizeOf.

Hmm, this could clash with the patch in bug 705987...
There was some discussion of the 4096 on IRC.  If there are no intervening allocations in the 256-byte size class, then 256-byte allocations should have the same effect on fragmentation as 4096-byte ones.  And 256-byte may be faster, because it's more likely that there's some 256-byte hole sitting around we can fill than that there's a page-sized hole sitting around.

But 4096 shouldn't fragment under any circumstances, because jemalloc can always decommit that page.
(Assignee)

Comment 12

6 years ago
> If there are no intervening allocations in the 256-byte size class,

Can't guarantee that, obviously.

The objects being allocated out of the arena are 3 words, so 12 or 24 byte allocations.  And the number of objects allocated is typically "one for every selector", so hundreds to tens of thousands, depending on the page.

I'm going to stick with 4096, I guess.

> Hmm, this could clash with the patch in bug 705987...

It won't; see comment 9.
(Assignee)

Updated

6 years ago
Whiteboard: [need review] → [need landing]
(Assignee)

Comment 13

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/79a0b1e9f6fd
Flags: in-testsuite-
Whiteboard: [need landing]
Target Milestone: --- → mozilla11
https://hg.mozilla.org/mozilla-central/rev/79a0b1e9f6fd
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.