stylo: stylist update spends a bunch of time in malloc

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P1
normal
3 months ago
7 days ago

People

(Reporter: bz, Assigned: bholley)

Tracking

(Blocks: 2 bugs, {perf})

53 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 months ago
Looking at a profile of stylist::update for the myspace-orig testcase from bug 1358806, I'm seeing about 20% of the time under find_push and about half of that is malloc/realloc bits.

find_push does this:

    map.entry(key).or_insert_with(Vec::new).push(value)

where "map" is FnvHashMap<Str, Vec<V>, and this is going to heap-allocate storage for the vec the first time we end up inserting.

For comparison, the equivalent Gecko data structure is:

  struct RuleHashTableEntry : public PLDHashEntryHdr {
    // Auto length 1, because we always have at least one entry in mRules.
    AutoTArray<RuleValue, 1> mRules;
  };

which avoids the extra allocation when there is only one rule targeting the id/class/tag involved.

Might be worth checking what the actual lengths of these vecs end up being in practice; if "1" is a common length it might be worth doing something with an inline buffer.

When I initially did the measurements in Gecko, back in 2011 or so, I got numbers along the lines of https://bugzilla.mozilla.org/show_bug.cgi?id=681755#c8 (the RuleHashTableEntry numbers are the ones relevant here).  At the time, after loading a bunch of websites, about 60% of the arrays had length 1.
(Reporter)

Comment 1

3 months ago
Oh, and per IRC discussion the allocs coming from the hashtable storage themselves are more or less unavoidable, and Gecko has those too.  The hashtable starts off with no storage, then allocates a reasonable chunk all at once, then does doubling, so not much to improve there.
Nice idea! I will measure.
Assignee: nobody → bobbyholley
Priority: -- → P1
(Reporter)

Comment 3

3 months ago
https://perf-html.io/public/87f8162a9d4d24e519628b481ff30bbbbf3345df/calltree/?callTreeFilters=prefix-01234567xEa9Ey1bcDYqD.ofKrxAexx3jDYo.qmL5xMrxMszMnzBazBbzBczD0zD1zLozL5zUtzUuzUvzV0zV1zV2zV3&thread=3 is the profile I'm looking at.
Have a patch, but the results on my usual testcases are inconclusive. 100x myspace doesn't benefit at all from this optimization, because every selector is duplicated.

Going to try constructing a best-case and worst-case testcase.
(Reporter)

Updated

7 days ago
Blocks: 1382927
You need to log in before you can comment on or make changes to this bug.