Open Bug 1382927 Opened 7 years ago Updated 19 days ago

stylo: very slow handling of modification of stylesheets

Categories

(Core :: CSS Parsing and Computation, enhancement, P4)

53 Branch
enhancement

Tracking

()

Tracking Status
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- ?

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file Testcase
On the attached testcase, I see times like so:

Gecko: ~600ms
Chrome: ~700ms
Safari: ~750ms
Stylo: ~6000ms (not a typo; 10x slower).

And yes, I did make sure to not use a debug-servo build for these measurements.  Basically, stylo needs 0.6ms for each sheet update there; that's a _long_ time.

Stylo profile at https://perfht.ml/2tM2PHH done at 0.1ms sampling interval (so roughly speaking all the times are doubled from the numbers above).

Corresponding Gecko profile at https://perfht.ml/2tM3edb

Comparing the two profiles, I see the following differences:

1)  Removing the old sheet takes negligible time in Gecko, as far as I can tell.  I can't even find it in the profile by eyeballing and searching for RemoveStyleSheet shows 28ms spent under it.  set_textContent takes about 600ms, almost all of which is _parsing_ sheets twice (once when the child textnode is removed, once when a new one is added).  And even there, a bunch of that time is spent doing things like generating uuids for the sheet's nullprincipal (about 260ms of those 600ms are under NullPrincipal::Init).

In contrast, Stylo spends nearly 2800ms under set_textContent.  Of that time, 1900ms are spent under RemoveStyleSheet, and almost all that time (1866ms) is spent under Stylist::clear.  And almost all of _that_ is spent in core::ptr::drop_in_place (arena_dalloc, locking, lots of self time, etc.

2)  Gecko spends 28ms under GatherRuleProcessors and another 111ms under RefreshRuleCascade.  The total time under get_BorderLeftWidth is 954ms (includes the RefreshRuleCascade but not the GatherRuleProcessors).  For Stylo, the time under get_BorderLeftWidth is 6000ms and a huge part of this (5500ms) is Stylist::rebuild, which corresponds to the two Gecko functions I mention at the beginning of this item.  Almost half of this is self time; the rest is various hashtable bits.

I expect the problem here is that Stylo is rebuilding the UA sheet part of the stylist each time.  Maybe fixing bug 1362538 and bug 1363789 would help some, and we should measure, but I suspect we want to preserve the UA sheet parts of the stylist, or have a separate stylist for UA bits (like we do for XBL) or something.  Filed bug 1382925 on this.

3) Gecko spends 718ms processing pending restyles when the computed style get happens.  Stylo spends 1779ms.  A bunch of this time is spent updating viewport anon boxes (426ms) even though none of the sheet changes could possibly have changed the viewport or its anon box styles.  Another 150ms are spent in other parts of post-traversal.  1165ms are spent under StyleDocument.

I'll try this with STYLO_THREADS=1 to get an idea of where time might be spent under there.
With STYLO_THREADS=1, profile is at https://perfht.ml/2tM2DZf and time under StyleDocument is only 664ms.  So the parallel traversal is really killing us on this small a DOM.

One interesting exercise is to compare the bottom-up profiles, restricted to stuff under get_BorderLeftWidth.  For Gecko: https://perfht.ml/2tLGVnZ which shows ruletree stuff, selector matching, expected things.

For stylo with STYLO_THREADS=1: https://perfht.ml/2tLGR7M which has tons of Stylist::rebuild self time, lots of hashtable bits, bunch of allocation, memmove, locks (from jemalloc, looks like), etc.
See Also: → 1362538
Blocks: 1386045
Priority: -- → P3
FWIW, with bug 1386443 stylo is ~4 seconds, which is a pretty nice improvement. Haven't tried with both that and https://github.com/servo/servo/pull/17939 yet.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> FWIW, with bug 1386443 stylo is ~4 seconds, which is a pretty nice
> improvement. Haven't tried with both that and
> https://github.com/servo/servo/pull/17939 yet.

This is on Linux, where on current nightly I get ~22s on the test-case instead of ~60s, perhaps due to faster malloc stuff?
Err, I mean(In reply to Emilio Cobos Álvarez [:emilio] from comment #3)
> (In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > FWIW, with bug 1386443 stylo is ~4 seconds, which is a pretty nice
> > improvement. Haven't tried with both that and
> > https://github.com/servo/servo/pull/17939 yet.
> 
> This is on Linux, where on current nightly I get ~22s on the test-case
> instead of ~60s, perhaps due to faster malloc stuff?

Hmm, the original measurement shows ~6s, though I get ~22 on linux... Anyway, the hasher change is about 5x improvement on that testcase for me.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> Err, I mean(In reply to Emilio Cobos Álvarez [:emilio] from comment #3)
> > (In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > > FWIW, with bug 1386443 stylo is ~4 seconds, which is a pretty nice
> > > improvement. Haven't tried with both that and
> > > https://github.com/servo/servo/pull/17939 yet.
> > 
> > This is on Linux, where on current nightly I get ~22s on the test-case
> > instead of ~60s, perhaps due to faster malloc stuff?
> 
> Hmm, the original measurement shows ~6s, though I get ~22 on linux...
> Anyway, the hasher change is about 5x improvement on that testcase for me.

That's awesome! Really glad to see improvement here (though per the GH comment I'd like to understand it a tiny bit more).
So I reprofiled here w/ the patches from bug 1382925, and now the news.

Good news is that they help a lot! I see about 1.3 seconds instead of 4 without those patches.

Not so good news, of course, is that that's still 1.3 seconds. With STYLO_THREADS=1 I get about 0.8 seconds though, which makes sense since this is a pretty small document.

Here's a profile:

  https://perfht.ml/2uyFCOz

And with STYLO_THREADS=1

  https://perfht.ml/2uyklod

So, some notes from the profile:

 * We seem to spend a fair amount of time updating the style of the viewport and related anon boxes. That can be optimized, though not sure if worth it because the root style doesn't usually change. In any case, it could be optimized in various ways:

   * First of all, don't keep the vector of property declaration, but a set of rule nodes. This would shave a bit of time.
   * Don't touch them at all, since the root style hasn't changed and the rules neither. Moving this bit of state from the traversal to the post-traversal isn't too hard (though it's tricky if you want to allow changing the style of anon boxes).
   * Make them non-inheriting anon boxes. This is related to bug 1388849. I don't think there's a case where that propagation matters, but there could be.

 * We also spend a good time during principal stuff during stylesheet creation.

 * Surprisingly enough compute_undisplayed_style_difference appears on the profile, but it only really cares about -moz-binding, so it can be optimized too.

There are a few other things like that, but except the first two points, it looks like a bunch of stuff adding up that we can micro-optimize, but haven't.

Not sure if we should re-purpose this bug for that or not.
It makes sense to file bugs blocking this one on those ideas.

It might also make sense to measure what happens with a 100-node DOM and a 1000-node DOM, just to see how that affects the numbers in both stylo and the other engines.
I now see about 575 vs 875 gecko-vs-stylo. That's probably acceptable as far as microbenchmarks are concerned, though it sounds like there are some additional improvements we could make.
Priority: P3 → P4
Severity: normal → S3

Both Nightly and Chrome perform the same on the testcase. On my machine, both take 150ms-170ms

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: