stylo: 50% of wikipedia styling spent in match_revalidation_selectors

NEW
Unassigned

Status

()

Core
CSS Parsing and Computation
P2
normal
a month ago
8 days ago

People

(Reporter: bholley, Unassigned)

Tracking

(Depends on: 4 bugs, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments)

Taken with STYLO_THREADS=1 on a local obama wikipedia copy: https://perfht.ml/2sxsgRu

The first call to Servo_TraverseSubtree takes 52 ms. Of that time, 26 ms is spent in match_revalidation_selectors. Half of _that_ time is spent in Gecko_MatchLang, which is what bug 1373357 is about. But there's still the other half.

In general, this is a good problem to have, because it means that we're sharing almost all the styles now (thanks to bz's great work in bug 1367862). Here are the results of DUMP_STYLE_STATISTICS:

[PERF],elements_traversed,11109
[PERF],elements_styled,11107
[PERF],elements_matched,1432
[PERF],styles_shared,9675
[PERF],selectors,2705
[PERF],revalidation_selectors,663
[PERF],dependency_selectors,3422
[PERF],declarations,10481

So this basically means that we spent half of the time computing style for 15% of the elements, and then the other half proving that we can reuse those styles for the other 85% of the elements. If we can fix bug 1373357, the numbers will look even better.

Nevertheless, it's still worth digging in at some point and seeing if there's anything we can optimize. Some things to look at:
* How many elements do we match revalidation selectors for?
* What is the work breakdown across those elements? Are some disproportionately expensive? If so, is that due to wide variance in the entries found in the SelectorMap, or due to some of those revalidation selectors being particularly expensive?
* In some cases, we don't use the bloom filter for revalidation selectors [1]. Is that hurting us?
* We don't have an nth-index cache yet (bug 1334730). Any such selector would be a revalidation selector. Is that hurting us?

I think we should first focus on bug 1373357, which seems like the easiest win, and then look at this.

[1] http://searchfox.org/mozilla-central/rev/c49a70b53f67dd5550eec8a08793805f2aca8d42/servo/components/style/sharing/mod.rs#181

Comment 1

a month ago
So I tried doing some profiles locally at higher resolution (0.1ms instead of the 1ms resolution I think the linked profile is done at) and I see Gecko_MatchLang being 19.9 out of a total of 32.4ms matching revalidation selectors in one profile [1] and 17.4ms out of 28.5ms in another [2] and 16.2ms out of 26.1ms in a third [3].  This is all still with STYLO_THREADS=1.

So I'm seeing about 61-62% of the revalidation selector time under Gecko_MatchLang on this page.  

One thing I did measure separately in those profiles is the "no bloom filter" case under revalidation_match_results().  You can see it if you filter for "nobloom": I made that case call an out-of-line function called revalidation_match_results_nobloom.  The time under there is pretty small for the moment; for example, under ConstructDocElementFrame bit linked at [1] it's 0.9ms out of the total 32.4ms matching revalidation selectors.  So it's not that we're spending a bunch of time dealing with the missing bloom filter and walking around the tree.

I should have some answers to the questions from comment 0 in a bit.

[1] https://perf-html.io/public/6d62bdc4621b3e7887a7b7e64d24ffaa67ee8303/calltree/?implementation=cpp&search=ConstructDocElementFrame&thread=2 which has 15.6ms and 4.4ms under MatchLang
[2] https://perf-html.io/public/6d62bdc4621b3e7887a7b7e64d24ffaa67ee8303/calltree/?implementation=cpp&search=ProcessAttachedQueueInternal&thread=2 which has 14ms and 13.4ms under MatchLang
[3] https://perf-html.io/public/8fe3affa41cdefbd1555f42b40ed940b64af5fe4/calltree/?search=ConstructDocElementFrame&thread=5 which has 12.2ms and 4ms under MatchLang.

Comment 2

a month ago
Created attachment 8878576 [details]
Timings of revalidation times on a per-element basis during initial frame construction

> * How many elements do we match revalidation selectors for?

If I focus on the initial frame construction, 10160 or so revalidations; the number varies across runs by plus or minus 10.  This is out of a total of 15293 on the page.  All are for unique elements.  I should note that these numbers are from a debug-servo-in-opt-gecko build, unlike the profiles in comment 1, which are from a fully-opt build.

Over the full pageload, I see 61815 revalidations for 10687 unique elements.  The number of revalidations per element ranges from 1 to 7, with the vast majority being 6:

  Revalidations  Elements
       1            50
       2           477
       3             0
       4            10
       5           139
       6         10001
       7            10

All this is with STYLO_THREADS=1.

> * In some cases, we don't use the bloom filter for revalidation selectors [1]. Is that hurting us?

Not horribly; see above.  It does account for about 3% of the current revalidation time, and more if we fix the lang stuff.  But that's the total time taken revalidating without bloom filter; some of that time would presumably remain even if we managed to use a bloom filter there.

> * What is the work breakdown across those elements? Are some disproportionately expensive?

I took some measurements using std::time::Instant, again under initial frame construction only, but in a fully-opt build; this time there were 10168 revalidations.  The time to revalidate a given element ranges from 547ns to almost 60000ns.  But the 60000 is fairly outlier-y.  The general distribution of revalidation times looks like this:

   <1us: 5822 elements
 1-10us: 3696 elements
10-20us:  446 elements
20-30us:  129 elements
30-40us:   68 elements
40-50us:    4 elements
50-60us:    3 elements

The log this data is from is attached in case people want to poke at it.

As another way of slicing the data, let's try by tag name:

      a: 3060 elements, 1.5-20us to revalidate, average 2.8us, stddev 1.5us, total 8.6ms
   span: 3319 elements, 0.5-14us to revalidate, average 0.7us, stddev 0.5us, total 2.5ms
     li:  927 elements, 0.8-60us to revalidate, average  15us, stddev  11us, total  14ms
   cite:  577 elements, 0.6-3us  to revalidate, average 0.7us, stddev 0.1us, total 0.4ms
    sup:  550 elements, 0.6-9us  to revalidate, average 0.7us, stddev 0.3us, total 0.4ms
      b:  525 elements, 0.5-3us  to revalidate, average 0.7us, stddev 0.2us, total 0.4ms
      i:  487 elements, 0.5-2us  to revalidate, average 0.7us, stddev 0.2us, total 0.3ms
    div:  230 elements, 0.7-10us to revalidate, average 1.0us, stddev 0.7us, total 0.2ms
      p:  150 elements, 0.6-2us  to revalidate, average 0.6us, stddev 0.1us, total 0.1ms
 others:  343 elements, 0.6-39us to revalidate, average 2.0us, stddev 2.3us, total 0.7ms

The <a> and <li> particularly stand out here, both for average numbers and totals (this last partly because there are so many of them).

Another comment coming up with details on what's going on with <a> and <li>; this one is getting a bit long.  ;)

> We don't have an nth-index cache yet (bug 1334730). Any such selector would be a revalidation selector. Is that hurting us?

I don't think so; the revalidation selectors on this page only involve :first-child and :last-child, which we presumably do quickly even without an nth-index cache.

Comment 3

a month ago
OK, so digging into what's going on with <li> elements.  There are 47 revalidation selectors that match the <li> elements on this page once we go through the selector map.  Those are:

    .catlinks li:first-child
    ol:lang(azb) li
    ol:lang(bcc) li
    ol:lang(bgn) li
    ol:lang(bqi) li
    ol:lang(fa) li
    ol:lang(glk) li
    ol:lang(kk-arab) li
    ol:lang(lrc) li
    ol:lang(luz) li
    ol:lang(mzn) li
    ol:lang(ckb) li
    ol:lang(sdh) li
    ol:lang(hi) li
    ol:lang(mr) li
    ol:lang(ne) li
    ol:lang(as) li
    ol:lang(bn) li
    ol:lang(or) li
    #p-personal li
    div#mw-panel div.portal div.body ul li
    div#footer ul li
    div#footer #footer-icons li
    div#footer #footer-info li
    div#footer #footer-places li
    .hlist li:last-child::after
    .hlist dd li:first-child::before
    .hlist dt li:first-child::before
    .hlist li li:first-child::before
    .hlist dd li:last-child::after
    .hlist dt li:last-child::after
    .hlist li li:last-child::after
    .hlist dd ol > li:first-child::before
    .hlist dt ol > li:first-child::before
    .hlist li ol > li:first-child::before
    .treeview li.emptyline > ul > li:first-child
    *|*:fullscreen:not(:root):not(:-moz-browser-frame)
    [dir]
    [dir = "rtl"]
    [dir = "ltr"]
    [dir = "auto"]:dir(ltr)
    [dir = "auto"]:dir(rtl)
    details > summary:first-of-type > *|*
    :root:-moz-locale-dir(rtl)
    [hidden = "true"]
    [collapsed = "true"]
    [moz-collapsed = "true"]

There are several interesting things going on there:

1) A bunch of the <li> on this page are under a <ul>, not <ol>, so all those "ol:lang" bits get filtered out by the bloom filter for those.  But for the ones that _are_ inside an <ol>, we end up with things like this (logging each selector we are revalidating and the simple selectors that get matched as a result):

  REVALIDATING SELECTOR: Selector(ol:lang(as) li, specificity = 0x402)
  SIMPLE SELECTOR: li
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: :lang(as)

The "li" matched, the parent was "ol", but its lang was not "as", so we walk all the way up to the root doing the slow :lang compares, because in the selector ":lang(as)" comes after "ol" and we match right-to-left on that list.  In this case, it would be _much_ faster to match "ol" against ancestors first to determine that we don't match; the log would then most likely look like this:

  REVALIDATING SELECTOR: Selector(ol:lang(as) li, specificity = 0x402)
  SIMPLE SELECTOR: li
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: :lang(as)
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol
  SIMPLE SELECTOR: ol

We can actually measure the impact of this, because all the things under the "ol" have nice ids like "cite_note-NN".  If I run my timing statistics above for <li>:

  li with cite-note id: 473 elements,  18-60us to revalidate, average 20us, stddev  5us, total 9ms
  li w/o cite-note id:  454 elements, 0.8-60us to revalidate, average 10us, stddev 13us, total 4.7us

I should note that of those 454 elements in the second bucket 153 are in fact inside an <ol> (but not directly; they are inside the footnotes that contain an unordered list of references.  Given that, I would be willing to bet that almost all of the long revalidation time for <li> is in fact coming from the :lang bits.

2) There is one single <li> (the first child of the <ul> that's inside the class="catlinks" thing) for which we have the following log:

  REVALIDATING SELECTOR: Selector(.catlinks li:first-child, specificity = 0x801)
  SIMPLE SELECTOR: :first-child
  SIMPLE SELECTOR: li
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks
  SIMPLE SELECTOR: .catlinks

Since it's a single element, it's not a big contributor to overall time, but it does highlight an interesting issue: in this case there is actually no need at all to keep matching once we've matched :first-child.  That's because the "li" and ".catlinks" parts are not revalidation selectors and we know they will in fact match based on our restrictions on parent and element.  The same sort of thing happens for that first-child li for the various ".hlist ... :first-child" selectors.  In a situation in which the revalidation selector matched more elements, this could be significant.  It _might_ be worth storing a range of simple selectors to match for each revalidation selector or something, maybe.

3) The "*|*:fullscreen:not(:root):not(:-moz-browser-frame)" selector is also matched right-to-left.  Which means that we end up matching all three pseudo-clases, since ":not(:root)" and ":not(:-moz-browser-frame)" typically match.  ;)  I don't know what the peformance impact of this is, offhand, but I think it's worth checking what other UAs do with ordering like this.  If they all match left-to-right, pages might be optimizing their CSS to put less-likely-to-match things first...  In any case, this rule is one of our UA rules, so we could reorder it if needed.

4) For the "[dir = "auto"]:dir(ltr)" we end up matching both simple selectors, again because we match right to left and most things do in fact match :dir(ltr), though most don't have a "dir" attribute.  This is, again, a UA style we could reorder if needed.

Comment 4

a month ago
Created attachment 8878592 [details] [diff] [review]
Patch for logging that the <li> data comes from

The "MATCH ATTEMPTS" bit is sadly bogus, because the bloom filter bits happen below that.  Ah, well.  It's pretty simple to see which revalidations get bloom filtered out: they're the ones without any "SIMPLE SELECTOR" following them.

Comment 5

a month ago
Created attachment 8878594 [details]
Log produced by that logging patch (gzipped text)

Comment 6

a month ago
OK, now <a> elements.  For <a> elements there are either 29 or 41 revalidation selectors, based on whether the <a> has class="external text" or not.  For the case when it does not, there are 29, like so:

    #mw-credits a
    a:not([href])
    a:lang(ar)
    a:lang(kk-arab)
    a:lang(mzn)
    a:lang(ps)
    a:lang(ur)
    #ca-unwatch.icon a
    #ca-watch.icon a
    #ca-unwatch.icon a
    #ca-watch.icon a
    #ca-unwatch.icon a:hover
    #ca-unwatch.icon a:focus
    #ca-watch.icon a:hover
    #ca-watch.icon a:focus
    #p-logo a
    div#mw-panel div.portal div.body ul li a
    div#mw-panel div.portal div.body ul li a:visited
    *|*:fullscreen:not(:root):not(:-moz-browser-frame)
    [dir]
    [dir = "rtl"]
    [dir = "ltr"]
    [dir = "auto"]:dir(ltr)
    [dir = "auto"]:dir(rtl)
    details > summary:first-of-type > *|*
    :root:-moz-locale-dir(rtl)
    [hidden = "true"]
    [collapsed = "true"]
    [moz-collapsed = "true"]

for the case when we have class="external text" it's those plus:

    div#content a[href $= ".pdf"].external
    div#content a[href *= ".pdf?"].external
    div#content a[href *= ".pdf#"].external
    div#content a[href $= ".PDF"].external
    div#content a[href *= ".PDF?"].external
    div#content a[href *= ".PDF#"].external
    div#mw_content a[href $= ".pdf"].external
    div#mw_content a[href *= ".pdf?"].external
    div#mw_content a[href *= ".pdf#"].external
    div#mw_content a[href $= ".PDF"].external
    div#mw_content a[href *= ".PDF?"].external
    div#mw_content a[href *= ".PDF#"].external

so it's the class="external" that matters.  Looking at the timings again:

  <a> with external: 833 elements,   3-20us to revalidate, average 4.5us, stddev 1.3us, total 3.7ms
  <a> w/o external: 2227 elements, 1.5-18us to revalidate, average 2.1us, stddev 0.9us, total 4.8ms

Looking at the log, typically the external bits fail to match on the [href] selectors there, so don't end up doing tree-walking to look for the #content/#mw_content bits.  Not surprising.

Looking at the selectors that are shared across all the <a>, we end up doing matching on all those :lang bits; I don't see a way to avoid that.  Similarly for the [href].  

I do see some tree-walking for the "div#mw-panel div.portal div.body ul li a" bit, though it looks like it never matches the .body, so I'm not sure why the bloom filter did not handle it.  Anyway, fixing bug 1369611 would fix that, because then "#mw-panel" would not be a revalidation selector.

Similar for "#mw-credits a".
Depends on: 1369611

Comment 7

a month ago
Created attachment 8878601 [details] [diff] [review]
Patch for logging how long each element takes to revalidate

This is the diff that produced the "Timings of revalidation times" log in attachment 8878576 [details].

Comment 8

a month ago
Created attachment 8878616 [details] [diff] [review]
Patch for measuring how long each selector takes to revalidate

Comment 9

a month ago
Created attachment 8878617 [details]
Gzipped per-selector revalidation timing log (in ns) for just initial frame construction

Comment 10

a month ago
Attachment 8878617 [details] has a total of 225k selectors being revalidated.  There were 10168 element revalidations involved (as you can tell because selectors like [dir] were revalidated 10168 times; selectors that could be filtered out based on selector maps were revalidated fewer times).  This counts selectors filtered out by bloom filters as being "revalidated", for what it's worth.

Some statistics of possible interest; all times in ns unless otherwise specified (e.g. totals are in ms)

                                          Overall: 225k selectors, min: 15, max: 31772, mean: 200, stddev: 400, total: 45ms
                                 All with :lang():  33k selectors, min: 15, max: 18999, mean: 700, stddev: 676, total: 23ms
                                  All w/o :lang(): 192k selectors, min: 15, max: 31772, mean: 113, stddev: 225, total: 22ms
           Selectors that use a nonexistent id[1]:  31k selectors, min: 15, max:  9124, mean:  46, stddev:  91, total:  1.4ms
             Selectors affected by bug 1369611[2]:  10k selectors, min: 18, max:  9037, mean: 102, stddev: 194, total:  1ms
            Selectors using :first/:last-child[3]:  11k selectors, min: 16, max:  2188, mean:  49, stddev:  69, total:  0.5ms
            details > summary:first-of-type > *|*:  10k selectors, min: 22, max: 31772, mean:  76, stddev: 574, total:  0.7ms
                       :root:-moz-locale-dir(rtl):  10k selectors, min: 75, max:  8374, mean: 135, stddev: 150, total:  1.3ms
*|*:fullscreen:not(:root):not(:-moz-browser-frame): 10k selectors, min: 86, max: 10926, mean: 201, stddev: 227, total:  2ms
                        span[_moz_quote = "true"]:   3k selectors, min: 91, max:  2803, mean: 158, stddev: 131, total:  0.5ms
                         Selectors for "[dir" [4]:  50k selectors, min: 50, max: 16000, mean: 134, stddev: 195, total:  6.8ms

For the two things involving :root, changing the order might help, maybe.

Of the 6.8ms doing the "[dir" thing, the "[dir='auto']:dir(ltr)" selector is 2ms.  The "[dir='ltr']" selector is a bit under 1ms.  So reordering might help a bit there too.

[1] "#ca-unwatch.icon a:focus", "#ca-unwatch.icon a:hover", "#ca-watch.icon a:focus", "#ca-watch.icon a:hover", "#ca-unwatch.icon a span", "#ca-watch.icon a span", "#ca-unwatch.icon a", "#ca-watch.icon a".  There are no elements that have those two ids on this page, afaict, so all these should be bloom filtered out.

[2] "div#mw-panel div.portal div.body","div#mw-panel div.portal","#mw-navigation h2","* html div#mw-head div.vectorMenu h3","div#mw-head div.vectorMenu h3","div#mw-head h3","div#mw-panel div.portal h3","div#mw-panel div.portal div.body ul","div#mw-panel div.portal div.body ul li","#mw-credits a","div#mw-panel div.portal div.body ul li a","div#mw-panel div.portal div.body ul li a:visited" which are all revalidation selectors only because they involve an id, but all those ids have rules for them, I'm pretty sure.

[3] ".client-js .collapsible.collapsed > tbody > tr:not(:first-child)",".catlinks li:first-child",".hlist dd li:first-child::before",".hlist dd li:last-child::after",".hlist dd ol > li:first-child::before",".hlist dt li:first-child::before",".hlist dt li:last-child::after",".hlist dt ol > li:first-child::before",".hlist li li:first-child::before",".hlist li li:last-child::after",".hlist li ol > li:first-child::before",".hlist li:last-child::after",".treeview li.emptyline > ul > li:first-child" and all of those, I'm pretty sure, don't need to do walks up past the right-hand-most bit but might right now.

[4]: "[dir = 'auto']:dir(ltr)", "[dir = 'auto']:dir(rtl)", "[dir = 'ltr']", "[dir = 'rtl']", [dir]

Comment 11

a month ago
I filed bug 1373798 on the [dir] thing.
Depends on: 1373798
Depends on: 1373800

Updated

a month ago
Depends on: 1374017

Updated

a month ago
Depends on: 1374019

Updated

a month ago
Depends on: 1374021
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.