stylo: implement an nth-index cache

VERIFIED FIXED in Firefox 57

Status

()

defect
P2
normal
VERIFIED FIXED
3 years ago
2 years ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

(Blocks 2 bugs)

unspecified
mozilla58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57+ verified, firefox58 verified)

Details

Attachments

(7 attachments, 1 obsolete attachment)

> <emilio>	bholley: also something interesting we should look at is whether we need something like an nth-index cache
> <emilio>	bholley: but I guess we need at least a tls-version of it (actually ours may be way more efficient due to the breadth-first search), otherwise lots of siblings may result in pretty slow selector-matching
> <bholley>	emilio: if you're on your phone I can file ;-)
> <emilio>	bholley: that'd be helpful :). Also, if you do, please note that this (as with the style sharing) will also need to be handled differently for the sequential and parallel traversal
Priority: -- → P4
Duplicate of this bug: 1401547
Bug 1401547 shows a regression on a real site. Should be straightforward to hook up, I'll see if I can get it done today.
Assignee: nobody → bobbyholley
Priority: P4 → P2
MozReview-Commit-ID: BYPlePdKvAT
Attachment #8910595 - Flags: review?(emilio)
MozReview-Commit-ID: KouOaYTluRx
Attachment #8910596 - Flags: review?(emilio)
MozReview-Commit-ID: F5FbKFqpXEh
Attachment #8910597 - Flags: review?(emilio)
Some future refactoring here to pass fewer things as parameters would be nice.

MozReview-Commit-ID: KVOxLTUKl1B
Attachment #8910598 - Flags: review?(emilio)
MozReview-Commit-ID: ALeggMDh92m
Attachment #8910599 - Flags: review?(emilio)
MozReview-Commit-ID: Ee0um3QXkxl
Attachment #8910600 - Flags: review?(emilio)
Comment on attachment 8910595 [details] [diff] [review]
Part 1 - Hoist the LRU cache into its own crate to share it with selectors. v1

Review of attachment 8910595 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with that comment.

::: servo/components/lru_cache/Cargo.toml
@@ +2,5 @@
> +name = "lru_cache"
> +version = "0.0.1"
> +authors = ["The Servo Project Developers"]
> +license = "MPL-2.0"
> +publish = false

This should probably just be published. We should consider moving this to crates.io, can you leave a comment here?
Attachment #8910595 - Flags: review?(emilio) → review+
Comment on attachment 8910596 [details] [diff] [review]
Part 2 - LRUCache::new -> LRUCache::default. v1

Review of attachment 8910596 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/lru_cache/lib.rs
@@ +33,5 @@
>      next: u16,
>  }
>  
> +impl<T, A: Array<Item=Entry<T>>> Default for LRUCache<T, A> {
> +    fn default() -> Self {

Fwiw, you probably want to inline this in a separate commit (maybe a few other things?).

::: servo/components/style/sharing/mod.rs
@@ +414,5 @@
>  
>  impl<Candidate> Default for SharingCacheBase<Candidate> {
>      fn default() -> Self {
>          Self {
> +            entries: LRUCache::default(),

Just derive default here.
Attachment #8910596 - Flags: review?(emilio) → review+
Attachment #8910597 - Flags: review?(emilio) → review+
(In reply to Emilio Cobos Álvarez [:emilio] from comment #13)
> Comment on attachment 8910596 [details] [diff] [review]
> Part 2 - LRUCache::new -> LRUCache::default. v1
> 
> Review of attachment 8910596 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: servo/components/lru_cache/lib.rs
> @@ +33,5 @@
> >      next: u16,
> >  }
> >  
> > +impl<T, A: Array<Item=Entry<T>>> Default for LRUCache<T, A> {
> > +    fn default() -> Self {
> 
> Fwiw, you probably want to inline this in a separate commit (maybe a few
> other things?).

We don't need #[inline] for methods on generic types, right?
> 
> ::: servo/components/style/sharing/mod.rs
> @@ +414,5 @@
> >  
> >  impl<Candidate> Default for SharingCacheBase<Candidate> {
> >      fn default() -> Self {
> >          Self {
> > +            entries: LRUCache::default(),
> 
> Just derive default here.

We can't because Candidate doesn't implement Default and derive is dumb.
Comment on attachment 8910598 [details] [diff] [review]
Part 4 - Introduce an NthIndexCache type and pipe it from ThreadLocalStyleContext to MatchingContext. v1

Review of attachment 8910598 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/script/dom/node.rs
@@ +753,5 @@
>              // Step 2.
>              Err(_) => Err(Error::Syntax),
>              // Step 3.
>              Ok(selectors) => {
> +                let mut ctx = MatchingContext::new(MatchingMode::Normal, None, None,

Please mention that we probably want an nth-index cache here and above.

::: servo/components/style/invalidation/element/invalidator.rs
@@ +636,5 @@
>          let mut context =
>              MatchingContext::new_for_visited(
>                  MatchingMode::Normal,
>                  None,
> +                None,

Please mention that we may also want this here, and below.
Attachment #8910598 - Flags: review?(emilio) → review+
Comment on attachment 8910599 [details] [diff] [review]
Part 5 - Hoist index computation into a helper. v1

Review of attachment 8910599 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/selectors/matching.rs
@@ +804,5 @@
> +    element: &E,
> +    is_of_type: bool,
> +    is_from_end: bool,
> +) -> i32
> +where E: Element

nit:

where
    E: Element,
Attachment #8910599 - Flags: review?(emilio) → review+
Blocks: 1401855
Comment on attachment 8910600 [details] [diff] [review]
Part 6 - Implement an nth-index cache. v1

Review of attachment 8910600 [details] [diff] [review]:
-----------------------------------------------------------------

I'd rather rework it than landing on its current state, tbh.

I'd say let's do this right instead, please, even if it misses the merge...

::: servo/components/selectors/context.rs
@@ +75,2 @@
>  #[derive(Default)]
> +pub struct NthIndexCache {

Let's move this to its own module, and mention (with details, please) the issues about eviction we discussed on IRC in a comment (for which we probably want to use a HashMap).

@@ +85,5 @@
> +    pub fn get(
> +        &mut self,
> +        is_of_type: bool,
> +        is_from_end: bool
> +        ) -> &mut NthIndexCacheInner {

nit: Indentation is off.

::: servo/components/selectors/matching.rs
@@ +787,5 @@
>          HAS_SLOW_SELECTOR_LATER_SIBLINGS
>      });
>  
> +    // Grab a reference to the appropriate cache.
> +    let mut cache = context.shared.nth_index_cache.as_mut().map(

nit:

.map(|c| {
    c.get(..)
});

@@ +795,5 @@
> +    // Lookup or compute the index.
> +    let index = if let Some(i) = cache.as_mut().and_then(|c| c.lookup(element.opaque())) {
> +        i
> +    } else {
> +        let i = nth_child_index(element, is_of_type, is_from_end, reborrow(&mut cache));

Why not looking and inserting in the cache in `nth_child_index`? Looks cleaner.

@@ +822,5 @@
>  fn nth_child_index<E>(
>      element: &E,
>      is_of_type: bool,
>      is_from_end: bool,
> +    mut cache: Option<&mut NthIndexCacheInner>,

I thought you were going to make this not take the cache, and just make this the "uncached index", but if you do, let's just do the above.

FWIW, Blink caches indices using a "spread" value[1], and only when it finds more than 32 elements[2].

[1]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/dom/NthIndexCache.cpp?l=228&rcl=4f7eaed4b82ca184db6155afd743ac9cb4a4efb0
[2]: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/dom/NthIndexCache.cpp?l=36&rcl=4f7eaed4b82ca184db6155afd743ac9cb4a4efb0

@@ +836,5 @@
>  
> +    // Do an explicit cache check for our previous sibling, which is the most
> +    // likely element to be in the cache. Checking this here allows the cache
> +    // to be used for NthLast/NthLastOfType, which otherwise would never walk
> +    // previous siblings.

Add a FIXME pointing to your eviction comment. This really feels like a hack.

@@ +901,5 @@
>  }
> +
> +// FIXME(bholley): Is there really no better way to do this?
> +fn deref<'a, 'b: 'a, T>(t: &'a mut &'b mut T) -> &'a mut T { *t }
> +fn reborrow<'a, 'b: 'a, T>(mut x: &'a mut Option<&'b mut T>)->Option<&'a mut T> {

nit: Spacing here is way off.

But why not just using: .as_mut().map(|s| &mut *s)

::: servo/components/selectors/tree.rs
@@ +12,4 @@
>  use std::fmt::Debug;
>  
> +/// Opaque representation of an Element, for identity comparisons. We use
> +/// NonZeroPtrMut to get the NonZero optimization.

Given we don't use Option<> I'd rather keep it simple, but...
Attachment #8910600 - Flags: review?(emilio) → review-
Happy to review a patch without the reborrow hack and the stuff cleaned up, though I'd seriously consider just getting it right now instead of rushing it and cleaning up afterwards.
Everything but the last patch landed as https://hg.mozilla.org/integration/autoland/rev/003763b2e730 (and made the merge to 57).
(In reply to Emilio Cobos Álvarez [:emilio] from comment #17)
> Comment on attachment 8910600 [details] [diff] [review]
> Part 6 - Implement an nth-index cache. v1
> 
> Review of attachment 8910600 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd rather rework it than landing on its current state, tbh.
> 
> I'd say let's do this right instead, please, even if it misses the merge...
> 
> ::: servo/components/selectors/context.rs
> @@ +75,2 @@
> >  #[derive(Default)]
> > +pub struct NthIndexCache {
> 
> Let's move this to its own module, and mention (with details, please) the
> issues about eviction we discussed on IRC in a comment (for which we
> probably want to use a HashMap).

Ok, I'll use a hashmap.

> 
> @@ +85,5 @@
> > +    pub fn get(
> > +        &mut self,
> > +        is_of_type: bool,
> > +        is_from_end: bool
> > +        ) -> &mut NthIndexCacheInner {
> 
> nit: Indentation is off.

Fixed.

> 
> ::: servo/components/selectors/matching.rs
> @@ +787,5 @@
> >          HAS_SLOW_SELECTOR_LATER_SIBLINGS
> >      });
> >  
> > +    // Grab a reference to the appropriate cache.
> > +    let mut cache = context.shared.nth_index_cache.as_mut().map(
> 
> nit:
> 
> .map(|c| {
>     c.get(..)
> });

Fixed.

> 
> @@ +795,5 @@
> > +    // Lookup or compute the index.
> > +    let index = if let Some(i) = cache.as_mut().and_then(|c| c.lookup(element.opaque())) {
> > +        i
> > +    } else {
> > +        let i = nth_child_index(element, is_of_type, is_from_end, reborrow(&mut cache));
> 
> Why not looking and inserting in the cache in `nth_child_index`? Looks
> cleaner.

The reason for doing it this way is so that we can unconditionally insert the result into the cache without re-inserting a duplicate entry.

> 
> @@ +822,5 @@
> >  fn nth_child_index<E>(
> >      element: &E,
> >      is_of_type: bool,
> >      is_from_end: bool,
> > +    mut cache: Option<&mut NthIndexCacheInner>,
> 
> I thought you were going to make this not take the cache, and just make this
> the "uncached index", but if you do, let's just do the above.

See above.

> 
> FWIW, Blink caches indices using a "spread" value[1], and only when it finds
> more than 32 elements[2].

This cache is modeled after Gecko's. Fiddling with the model later is fine, but it's not a blocker to ship.

> 
> [1]:
> https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/dom/
> NthIndexCache.cpp?l=228&rcl=4f7eaed4b82ca184db6155afd743ac9cb4a4efb0
> [2]:
> https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/dom/
> NthIndexCache.cpp?l=36&rcl=4f7eaed4b82ca184db6155afd743ac9cb4a4efb0
> 
> @@ +836,5 @@
> >  
> > +    // Do an explicit cache check for our previous sibling, which is the most
> > +    // likely element to be in the cache. Checking this here allows the cache
> > +    // to be used for NthLast/NthLastOfType, which otherwise would never walk
> > +    // previous siblings.
> 
> Add a FIXME pointing to your eviction comment. This really feels like a hack.

This doesn't have to do with eviction. It has to do with whether or not we store intermediate results.

> 
> @@ +901,5 @@
> >  }
> > +
> > +// FIXME(bholley): Is there really no better way to do this?
> > +fn deref<'a, 'b: 'a, T>(t: &'a mut &'b mut T) -> &'a mut T { *t }
> > +fn reborrow<'a, 'b: 'a, T>(mut x: &'a mut Option<&'b mut T>)->Option<&'a mut T> {
> 
> nit: Spacing here is way off.
> 
> But why not just using: .as_mut().map(|s| &mut *s)

Per IRC discussion, the magic turned out to be .as_mut().map(|s| &mut **s)
MozReview-Commit-ID: Ee0um3QXkxl
Attachment #8910871 - Flags: review?(emilio)
Attachment #8910600 - Attachment is obsolete: true
Comment on attachment 8910871 [details] [diff] [review]
Part 6 - Implement an nth-index cache. v2

Review of attachment 8910871 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok, but I'd still like to see the revised patch. I thought we agreed on me taking this tomorrow, but I guess this is fine too.

::: servo/components/selectors/matching.rs
@@ +787,5 @@
>      } else {
>          HAS_SLOW_SELECTOR_LATER_SIBLINGS
>      });
>  
> +    // Grab a reference to the appropriate cache.

This will be cleaner if it was just:

    let index = nth_child_index(elemnet, is_of_type, is_from_end, cache.as_mut().map(|s| &mut **s));

And make the cache lookup (and store) in nth_child_index.

@@ +800,5 @@
> +        let i = nth_child_index(element, is_of_type, is_from_end, cache.as_mut().map(|s| &mut **s));
> +        cache.as_mut().map(|c| c.insert(element.opaque(), i));
> +        i
> +    };
> +    debug_assert_eq!(index, nth_child_index(element, is_of_type, is_from_end, None), "bad cache");

"bad cache" is still a pretty bad assertion message, I'd just drop it.

@@ +835,5 @@
>      } else {
>          element.prev_sibling_element()
>      };
>  
> +    // Do an explicit cache check for our previous sibling, which is the most

Is this needed now? I think it's not.

::: servo/components/selectors/nth_index_cache.rs
@@ +6,5 @@
> +use tree::OpaqueElement;
> +
> +/// A cache to speed up matching of nth-index-like selectors.
> +#[derive(Default)]
> +pub struct NthIndexCache {

It'd be nice to document more the trade-offs of this approach, maybe compared to Blink's.

@@ +31,5 @@
> +}
> +
> +/// The concrete per-pseudo-class cache.
> +#[derive(Default)]
> +pub struct NthIndexCacheInner(FnvHashMap<OpaqueElement, i32>);

I'm not sure the wrapper type provides much value, but your call.

::: servo/components/selectors/tree.rs
@@ +22,5 @@
> +        OpaqueElement(NonZeroPtrMut::new(ptr as *const () as *mut ()))
> +    }
> +}
> +
> +pub trait Element: Sized + Clone + Debug {

Why the Clone bound? I don't think it's used.
Attachment #8910871 - Flags: review?(emilio)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #22)
> Comment on attachment 8910871 [details] [diff] [review]
> Part 6 - Implement an nth-index cache. v2
> 
> Review of attachment 8910871 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks ok, but I'd still like to see the revised patch.

Which part requires a re-review?

> I thought we agreed
> on me taking this tomorrow, but I guess this is fine too.

Per bug 1401855 comment 3, I decided that it would be reasonable to swap out the backend and keep my patch intact. And I wanted to finish my patch. :-)

> 
> ::: servo/components/selectors/matching.rs
> @@ +787,5 @@
> >      } else {
> >          HAS_SLOW_SELECTOR_LATER_SIBLINGS
> >      });
> >  
> > +    // Grab a reference to the appropriate cache.
> 
> This will be cleaner if it was just:
> 
>     let index = nth_child_index(elemnet, is_of_type, is_from_end,
> cache.as_mut().map(|s| &mut **s));
> 
> And make the cache lookup (and store) in nth_child_index.

As I said in comment 20, there is a reason for doing it this way.

> 
> @@ +800,5 @@
> > +        let i = nth_child_index(element, is_of_type, is_from_end, cache.as_mut().map(|s| &mut **s));
> > +        cache.as_mut().map(|c| c.insert(element.opaque(), i));
> > +        i
> > +    };
> > +    debug_assert_eq!(index, nth_child_index(element, is_of_type, is_from_end, None), "bad cache");
> 
> "bad cache" is still a pretty bad assertion message, I'd just drop it.

I wanted to make it clear that we're double-checking any cached results by passing None, which isn't otherwise obvious. I'll change it to "Invalid cache".
> 
> @@ +835,5 @@
> >      } else {
> >          element.prev_sibling_element()
> >      };
> >  
> > +    // Do an explicit cache check for our previous sibling, which is the most
> 
> Is this needed now? I think it's not.

It is, because given how the memoization works, we're unlikely to have cache entries for later siblings. Gecko walks all the way to the left in the is_from_end case, where we just do a lookaside. I'll note the tradeoff in the comment.

> 
> ::: servo/components/selectors/nth_index_cache.rs
> @@ +6,5 @@
> > +use tree::OpaqueElement;
> > +
> > +/// A cache to speed up matching of nth-index-like selectors.
> > +#[derive(Default)]
> > +pub struct NthIndexCache {
> 
> It'd be nice to document more the trade-offs of this approach, maybe
> compared to Blink's.

I'll link to bug 1401855 comment 3.

> 
> @@ +31,5 @@
> > +}
> > +
> > +/// The concrete per-pseudo-class cache.
> > +#[derive(Default)]
> > +pub struct NthIndexCacheInner(FnvHashMap<OpaqueElement, i32>);
> 
> I'm not sure the wrapper type provides much value, but your call.

It made it easier to swap around backends, and IIRC the lookup method simplified some borrowing issues on the HashMap gets.

> 
> ::: servo/components/selectors/tree.rs
> @@ +22,5 @@
> > +        OpaqueElement(NonZeroPtrMut::new(ptr as *const () as *mut ()))
> > +    }
> > +}
> > +
> > +pub trait Element: Sized + Clone + Debug {
> 
> Why the Clone bound? I don't think it's used.

It is, in the next_sibling.clone() call.
Comment on attachment 8910871 [details] [diff] [review]
Part 6 - Implement an nth-index cache. v2

Review of attachment 8910871 [details] [diff] [review]:
-----------------------------------------------------------------

Per off-line discussion, this is fine to land with the nits addressed, maybe with a comment pointing to bug 1401855, and walking to the left through the previous siblings.
Attachment #8910871 - Flags: review+
Requesting tracking on all outstanding p2 stylo bugs.
Summary: stylo: Consider implementing nth-index cache → stylo: implement an nth-index cache
Blocks: 1402143
https://hg.mozilla.org/integration/autoland/rev/fb23ec3c896e
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Please request uplift if you want this in 57.
Target Milestone: --- → mozilla58
Posted file rev for uplift
Approval Request Comment
[Feature/Bug causing the regression]: Stylo bug
[User impact if declined]: Pages like hg.m.o will load unacceptably slowly, see bug 1401547.
[Is this code covered by automated tests?]: Not the optimization specifically, but all this code is heavily tested in automation.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: Would be good. Bug 1401547 is a good STR, plus the nth-child tests in https://github.com/heycam/style-perf-tests
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: not very risky
[Why is the change risky/not risky?]: It's an optimization to cache certain results. In debug builds, we verify that the cached result matches a freshly-computed result, so we know the optimization is correct for any selector matching that happens on CI, which is a lot.
[String changes made/needed]: None
Attachment #8911294 - Flags: approval-mozilla-beta?
\o/ I no longer have to disable Stylo to look at crashreports that link to hg.mozilla.org
Status: RESOLVED → VERIFIED
Comment on attachment 8911294 [details]
rev for uplift

Fix a severe perf regression, taking it.
Should be in 57b3
Attachment #8911294 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
Managed to reproduce the initial issue (the hang and the crash) on 57.0a1 (20170803134456), using the STR from bug 1401547. I can confirm that 57.0b13 (20171030163911) and 58.0a1 (2017-11-01) builds are fixed. Verified across platforms (Windows 10 x64, Ubuntu 16.04 x64 and macOS 10.13).
You need to log in before you can comment on or make changes to this bug.