Closed Bug 1397976 Opened 7 years ago Closed 7 years ago

stylo: Do a second pass on the style sharing cache after computing the rule node

Categories

(Core :: CSS Parsing and Computation, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 6 obsolete files)

2.12 KB, patch
emilio
: review+
Details | Diff | Splinter Review
9.74 KB, patch
emilio
: review+
Details | Diff | Splinter Review
5.29 KB, patch
emilio
: review+
Details | Diff | Splinter Review
21.32 KB, patch
emilio
: review+
Details | Diff | Splinter Review
19.06 KB, patch
emilio
: review+
Details | Diff | Splinter Review
4.47 KB, patch
emilio
: review+
Details | Diff | Splinter Review
3.23 KB, patch
bholley
: review+
Details | Diff | Splinter Review
I had this idea yesterday.

The style sharing cache operates before selector matching, so that we can skip selector matching if we get a hit. But the cache's mechanism for determining which rules would match is inherently less precise than the rule node, which describes what rules _actually_ matched, and is what controls sharing in gecko (via the style context tree).

So the idea here would be to have a second look at the entries in the cache after computing the rule node. If any of the rule nodes match, then we reuse the ComputedValues. To avoid regressing performance with lots of superfluous lookups, we can use a bloom filter for rule node pointers.

I just hacked this up, and added instrumentation to measure the number of style contexts. The results on the HTML spec look really good:

> Gecko: 27600
> stylo 1 thread: 46000
> stylo 1 thread + 2ndPass: 27400
> stylo 3 threads: 83800
> stylo 3 threads + 2ndPass: 60000

Note that all the stylo runs have a few hundred gecko contexts floating around, which explains why we do a tiny bit better than gecko on stylo 1 thread + 2ndPass.

My patch doesn't currently support pseudos or visited styles (I haven't thought through exactly how those would work, or measured if it's worthwhile), and doesn't implement the bloom filter yet. But I'm posting it for feedback here in the interim and pushing to try.
Attached patch Style context count logging. v1 (obsolete) — Splinter Review
MozReview-Commit-ID: 7q9DGzE0mFQ
MozReview-Commit-ID: EcVQDLoxwFP
Attached patch Prototype the 2nd pass. v1 (obsolete) — Splinter Review
MozReview-Commit-ID: H67j3Sbt3gr
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e09344ec1b39e1d12796db74048fd5e4ba5690e

cam/bz - would be interesting to see numbers on how this moves the needle on its own, and how it interacts with the reset struct sharing.
Hm so adding a check for !self.element.is_native_anonymous() (which I added between the initial measurements and the posted patch) brings stylo 1 thread + 2ndPass from 27400 to 31503. So it seems like we should try to find a way to share those correctly.
Also, I'm missing the can_share_across_parents check. Will add that, but need to head out right now.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #5)
> Hm so adding a check for !self.element.is_native_anonymous() (which I added
> between the initial measurements and the posted patch) brings stylo 1 thread
> + 2ndPass from 27400 to 31503. So it seems like we should try to find a way
> to share those correctly.

I was wrong - there aren't any native anonymous elements here that match, and the numbers are just noisy.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> > Gecko: 27600
> > stylo 1 thread: 46000
> > stylo 1 thread + 2ndPass: 27400
> > stylo 3 threads: 83800
> > stylo 3 threads + 2ndPass: 60000

So, with the proper parent checks in place, I get:

1 thread: ~40k
3 threads: 71.7

So that's still an improvement, but a fair bit worse than gecko on sequential mode. The two hypotheses I have are that we're not handling pseudos yet, or that the cache is too small. I'll experiment with those tomorrow.
Attached patch Prototype the 2nd pass. v2 (obsolete) — Splinter Review
Now with correct parent checks.

MozReview-Commit-ID: H67j3Sbt3gr
Attachment #8905738 - Attachment is obsolete: true
Priority: -- → P3
I took some measurements a little while ago for different cache sizes:

127 entries:
* 1 thread: 31.4k
* 3 threads: panics for some reason

64 entries:
* 1 thread: 35.1k
* 3 threads: 65.3k

39 entries:
* 1 thread: 39.1k
* 3 threads: 69.1k

47 entries:
* 1 thread: 35.4k / 36.1k / 38.5k
* 3 threads: 70k / 68.6k / 67.3k

So it seems that cache size does make a difference. Something to consider tweaking, but that can probably be done separately. I'll file a bug so I don't forget.
Blocks: 1398328
Depends on: 1399011
So I've got this all done, including sharing on recascade. I _think_ it should be green now, but doing another try push to be sure. The only thing remaining is the measure the lookup performance on the cache and figure out whether we want a bloom filter or not.

Patches: https://github.com/bholley/gecko/commits/2ndpass_only
Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c921f26432066b910e27e20d2eb163821747ddc0
Using traversal_parent here is wrong.

MozReview-Commit-ID: GHCIjkgx4VE
Attachment #8907418 - Flags: review?(emilio)
This will make the linear probing faster. If we end up implementing the two-tier
cache setup, this code will be unnecessary and can go away.

MozReview-Commit-ID: BRfV5ump34n
Attachment #8907419 - Flags: review?(emilio)
This gives us more flexibility, and doesn't cost us anything.

MozReview-Commit-ID: CuvOEcLA3My
Attachment #8907420 - Flags: review?(emilio)
MozReview-Commit-ID: H67j3Sbt3gr
Attachment #8907421 - Flags: review?(emilio)
Attachment #8905733 - Attachment is obsolete: true
Attachment #8905737 - Attachment is obsolete: true
Attachment #8905791 - Attachment is obsolete: true
MozReview-Commit-ID: AFTwtzi4P93
Attachment #8907422 - Flags: review?(emilio)
Comment on attachment 8907418 [details] [diff] [review]
Part 1 - Use inheritance_parent to control style_sharing. v1

Review of attachment 8907418 [details] [diff] [review]:
-----------------------------------------------------------------

> Using traversal_parent here is wrong.

Well, it's not to big of a deal though, right?

I mean, it only differs from traversal_parent when we're NAC, and we don't share in that case anyway.
Attachment #8907418 - Flags: review?(emilio) → review+
Attachment #8907420 - Flags: review?(emilio) → review+
Comment on attachment 8907421 [details] [diff] [review]
Part 4 - Do a second pass on the sharing cache to reuse style by rule node identity. v2

Review of attachment 8907421 [details] [diff] [review]:
-----------------------------------------------------------------

All the PRIMARY_STYLE_REUSED_VIA_RULE_NODE stuff is super out-of-band in style_resolver.rs

You should make StyleResolver return something like:

struct ResolvedElementStyles {
    primary: PrimaryStyle,
    pseudos: EagerPseudoStyles,
}

And make PrimaryStyle, contain whether it was reused via rule node.

After that you can just query it directly, and implement From<ResolvedElementStyles> for ElementStyles.

You can make the cache return a ResolvedElementStyles instead of an element, too, by querying the element data.

::: servo/components/style/sharing/mod.rs
@@ +115,5 @@
>      fn from(cv: &ComputedValues) -> Self {
>          let p = NonZeroPtrMut::new(cv as *const ComputedValues as *const () as *mut ());
>          OpaqueComputedValues(p)
>      }
> +    fn eq(&self, cv: &ComputedValues) -> bool { Self::from(cv) == *self }

nit: I think this should move to the next line.

::: servo/components/style/traversal.rs
@@ +733,5 @@
>  
> +    if primary_style_reused_via_rule_node {
> +        data.flags.insert(data::PRIMARY_STYLE_REUSED_VIA_RULE_NODE);
> +    } else {
> +        data.flags.remove(data::PRIMARY_STYLE_REUSED_VIA_RULE_NODE);

This feels super out-of-band... This really belongs to ElementStyles.
Attachment #8907421 - Flags: review?(emilio) → review-
Comment on attachment 8907419 [details] [diff] [review]
Part 2 - Cache the parent CV identity on ValidationData. v1

Review of attachment 8907419 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure I buy the "this is faster" claim. In general the fast path when comparing siblings doesn't require to go and grab the element data, and in this case it does?

Is this required for something else? I think that unless I'm missing something obvious, or you have measurements, I'd like to drop this patch, or move it to another bug.

::: servo/components/style/sharing/checks.rs
@@ +13,4 @@
>  use sharing::{StyleSharingCandidate, StyleSharingTarget};
>  
> +/// Determines whether a target and a candidate have compatible parents for sharing.
> +pub fn parents_allow_sharing<E>(target: &mut StyleSharingTarget<E>,

nit: Indentation.

@@ +24,3 @@
>  
> +    // Siblings can always share.
> +    let parent = match target.inheritance_parent() {

parent_style_identity calls unwrap() here, so you can just do the same. It's a bit unfortunate that this check is after the style_identity
Attachment #8907419 - Flags: review?(emilio)
Comment on attachment 8907422 [details] [diff] [review]
Part 5 - Share styles during recascades. v1

Review of attachment 8907422 [details] [diff] [review]:
-----------------------------------------------------------------

This patch doesn't insert into the cache when we're recascading, so this code will do close-to-nothing, I believe.

I'd need to reason about when would it be correct to share the whole ElementStyles and not, etc.

I've opened https://github.com/servo/servo/pull/18482 which should make trivial to share the code for the first part in a `cascade_primary_style` function, hopefully making what I proposed there easier.

Meanwhile, I propose to move this to another bug, since the insertion seems non-trivial to me.
Attachment #8907422 - Flags: review?(emilio) → review-
(In reply to Emilio Cobos Álvarez [:emilio] from comment #25)
> Comment on attachment 8907422 [details] [diff] [review]
> Part 5 - Share styles during recascades. v1
> 
> Review of attachment 8907422 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch doesn't insert into the cache when we're recascading, so this
> code will do close-to-nothing, I believe.

This is wrong, and I missed the traversal.rs changes.
Comment on attachment 8907419 [details] [diff] [review]
Part 2 - Cache the parent CV identity on ValidationData. v1

Review of attachment 8907419 [details] [diff] [review]:
-----------------------------------------------------------------

Per IRC discussion, r=me if you think this is a clear win. With the formatting comments addressed though.
Attachment #8907419 - Flags: review+
We'll use these next to propagate information about style reuse to the ElementDataFlags.

MozReview-Commit-ID: Dya6vgzydpL
Attachment #8907804 - Flags: review?(emilio)
Attachment #8907421 - Attachment is obsolete: true
Attachment #8907422 - Attachment is obsolete: true
MozReview-Commit-ID: H67j3Sbt3gr
Attachment #8907805 - Flags: review?(emilio)
MozReview-Commit-ID: AFTwtzi4P93
Attachment #8907806 - Flags: review?(emilio)
This is emilio's patch.

MozReview-Commit-ID: VuWHKzYVod
Attachment #8907809 - Flags: review+
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #31)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ff2e48c5d7a93c5df1bd866ef9e4bb2a1c22f940

I figured I might as well rebase. Cancelled the above and repushed:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8cc8bd0128193b87b188154ec2542ec873e4bca
Comment on attachment 8907804 [details] [diff] [review]
Part 4 - Add some wrapper types to propagate styles out of style resolver. v1

Review of attachment 8907804 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/style/style_resolver.rs
@@ +69,5 @@
> +impl ResolvedStyle {
> +    /// Creates a new ResolvedStyle.
> +    pub fn new(style: Arc<ComputedValues>) -> Self {
> +        ResolvedStyle {
> +            style: style,

nit: I'd just use the shorthand initializer (ResolvedStyle { style })

@@ +76,5 @@
> +}
> +
> +impl PrimaryStyle {
> +    /// Convenience accessor for the style.
> +    pub fn style(&self) -> &Arc<ComputedValues> {

I think we don't clone this anywhere, may it make sense just returning &ComputedValues here? It'd save a few &** in a few other places.
Attachment #8907804 - Flags: review?(emilio) → review+
Comment on attachment 8907805 [details] [diff] [review]
Part 5 - Do a second pass on the sharing cache to reuse style by rule node identity. v3

Review of attachment 8907805 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/style/data.rs
@@ +290,5 @@
>  
>      /// Sets a new set of styles, returning the old ones.
>      pub fn set_styles(&mut self, new_styles: ResolvedElementStyles) -> ElementStyles {
> +        if new_styles.primary.0.reused_via_rule_node {
> +            self.flags.insert(PRIMARY_STYLE_REUSED_VIA_RULE_NODE);

Ok, I can live with it here :)

::: servo/components/style/sharing/mod.rs
@@ +247,5 @@
>  /// FakeCandidate below.
>  #[derive(Debug)]
>  pub struct StyleSharingCandidate<E: TElement> {
>      /// The element.
> +    pub element: E,

Is this `pub` used?

@@ +756,5 @@
> +            // landed when computed values were fused, see
> +            // https://bugzilla.mozilla.org/show_bug.cgi?id=1381635.)
> +            // So at the moment, we need to additionally compare visitedness,
> +            // since that is not accounted for by rule nodes alone.
> +            // FIXME(jryans): This seems like it breaks the constant time

We already do the same for the normal sharing cache (comparing the element state without ignoring visitedness), and we ran it through jryans and dbaron iirc, and there didn't seem to be a problem.

See the:

        if target.element.get_state() != candidate.get_state() {

check in the sharing code.

::: servo/components/style/style_resolver.rs
@@ +74,3 @@
>          ResolvedStyle {
>              style: style,
> +            reused_via_rule_node: reused_via_rule_node,

ditto re. the shorthand syntax.

@@ +277,5 @@
> +        // Before doing the cascade, check the sharing cache and see if we can reuse
> +        // the style via rule node identity.
> +        let may_reuse = pseudo.is_none() && !self.element.is_native_anonymous() &&
> +                        parent_style.is_some() && inputs.rules.is_some();
> +        let cached = if !may_reuse { None } else {

nit: Let's indent this properly, even though it takes a bit more space.

but actually, since we don't use `cached` for anything else, why not:

if may_reuse {
    let cached = cache.lookup_by_rules(
        // ...
    );

    if let Some(el) = cached {
        styles_reused += 1;
        return ...
    }
}
Attachment #8907805 - Flags: review?(emilio) → review+
Comment on attachment 8907806 [details] [diff] [review]
Part 6 - Share styles during recascades. v3

Review of attachment 8907806 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with that fixed.

Much more obvious than the first series, thanks!

Kinda unrelated, but I'd prefer the part 0 to include the two commits, because the first one was logically separated.

You can use:

  curl -L https://github.com/servo/servo/pull/18482.patch | git am -

in your servo tree for that.

::: servo/components/style/sharing/mod.rs
@@ +438,3 @@
>          self.entries.insert(StyleSharingCandidate {
>              element: el,
> +            validation_data: validation_data,

You really dislike the shorthand syntax ;)

::: servo/components/style/traversal.rs
@@ +723,5 @@
> +            context.thread_local.sharing_cache.insert_if_possible(
> +                &element,
> +                new_styles.primary.style(),
> +                None,
> +                context.thread_local.bloom_filter.matching_depth(),

You probably want to use `traversal_data.current_dom_depth`, given we don't set up the bloom filter in this path.

We should use it in the other insert_if_possible, for consistency, I guess, and probably

   debug_assert_eq!(bloom_filter.matching_depth(), traversal_data.current_dom_depth)

in the MatchAndCascade path.
Attachment #8907806 - Flags: review?(emilio) → review+
(In reply to Emilio Cobos Álvarez [:emilio] from comment #34)
> Comment on attachment 8907804 [details] [diff] [review]
> Part 4 - Add some wrapper types to propagate styles out of style resolver. v1
> 
> Review of attachment 8907804 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: servo/components/style/style_resolver.rs
> @@ +69,5 @@
> > +impl ResolvedStyle {
> > +    /// Creates a new ResolvedStyle.
> > +    pub fn new(style: Arc<ComputedValues>) -> Self {
> > +        ResolvedStyle {
> > +            style: style,
> 
> nit: I'd just use the shorthand initializer (ResolvedStyle { style })

Fixed.

> 
> @@ +76,5 @@
> > +}
> > +
> > +impl PrimaryStyle {
> > +    /// Convenience accessor for the style.
> > +    pub fn style(&self) -> &Arc<ComputedValues> {
> 
> I think we don't clone this anywhere, may it make sense just returning
> &ComputedValues here? It'd save a few &** in a few other places.

It does mean that we can't have similar semantics with style_mut(), but I can probably just inline that. Fixed.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #35)
> ::: servo/components/style/sharing/mod.rs
> @@ +247,5 @@
> >  /// FakeCandidate below.
> >  #[derive(Debug)]
> >  pub struct StyleSharingCandidate<E: TElement> {
> >      /// The element.
> > +    pub element: E,
> 
> Is this `pub` used?

Not anymore, removed.

> 
> @@ +756,5 @@
> > +            // landed when computed values were fused, see
> > +            // https://bugzilla.mozilla.org/show_bug.cgi?id=1381635.)
> > +            // So at the moment, we need to additionally compare visitedness,
> > +            // since that is not accounted for by rule nodes alone.
> > +            // FIXME(jryans): This seems like it breaks the constant time
> 
> We already do the same for the normal sharing cache (comparing the element
> state without ignoring visitedness), and we ran it through jryans and dbaron
> iirc, and there didn't seem to be a problem.
> 
> See the:
> 
>         if target.element.get_state() != candidate.get_state() {
> 
> check in the sharing code.

This part was written by jryans, who investigated the visited failures. I'm going to just leave it because I'm trying to avoid thinking about visited style, feel free to NI him if you want it changed.

> ::: servo/components/style/style_resolver.rs
> @@ +74,3 @@
> >          ResolvedStyle {
> >              style: style,
> > +            reused_via_rule_node: reused_via_rule_node,
> 
> ditto re. the shorthand syntax.

Fixed.

> 
> @@ +277,5 @@
> > +        // Before doing the cascade, check the sharing cache and see if we can reuse
> > +        // the style via rule node identity.
> > +        let may_reuse = pseudo.is_none() && !self.element.is_native_anonymous() &&
> > +                        parent_style.is_some() && inputs.rules.is_some();
> > +        let cached = if !may_reuse { None } else {
> 
> nit: Let's indent this properly, even though it takes a bit more space.
> 
> but actually, since we don't use `cached` for anything else, why not:
> 
> if may_reuse {
>     let cached = cache.lookup_by_rules(
>         // ...
>     );
> 
>     if let Some(el) = cached {
>         styles_reused += 1;
>         return ...
>     }
> }

I was trying to avoid too much indenting, but I guess this is fine.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #36)
> Kinda unrelated, but I'd prefer the part 0 to include the two commits,
> because the first one was logically separated.

Ok.

> ::: servo/components/style/sharing/mod.rs
> @@ +438,3 @@
> >          self.entries.insert(StyleSharingCandidate {
> >              element: el,
> > +            validation_data: validation_data,
> 
> You really dislike the shorthand syntax ;)

Fixed.

> 
> ::: servo/components/style/traversal.rs
> @@ +723,5 @@
> > +            context.thread_local.sharing_cache.insert_if_possible(
> > +                &element,
> > +                new_styles.primary.style(),
> > +                None,
> > +                context.thread_local.bloom_filter.matching_depth(),
> 
> You probably want to use `traversal_data.current_dom_depth`, given we don't
> set up the bloom filter in this path.
> 
> We should use it in the other insert_if_possible, for consistency, I guess,
> and probably
> 
>    debug_assert_eq!(bloom_filter.matching_depth(),
> traversal_data.current_dom_depth)
> 
> in the MatchAndCascade path.

Good catch.
https://hg.mozilla.org/integration/autoland/rev/ff89cb83be0a2a1ba74099c839d4f796981dda67
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 1401317
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: