Closed Bug 1373800 Opened 4 years ago Closed 4 years ago

stylo: Match compound selectors left-to-right

Categories

(Core :: CSS Parsing and Computation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bholley, Assigned: bholley)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 1 obsolete file)

Namespaces, local names, ids, and classes are relatively cheap to check, whereas pseudo-classes are more of a mixed bag. Additionally, authors almost always put those cheaper things on the left: people write .foo:first-child, not :firstchild.foo.

Discussing with bz, we think it may make sense to invert the order of compound selectors within the sequence so that we're more likely to be able to fast-reject on the cheap selectors before checking the expensive ones.
I assume we’d keep higher-level matching (across combinators) right-to-left. Given that we now store everything in flat slice/array, if we change the in-memory order to correspond to matching order, the memory representation would become quite subtle. So I’d be in favor of making the raw array private to the selectors crate (or ever the selectors::parser module) and providing multiple iterators methods with explicit names. (We still want to serialize components in the same order they were parsed, for example.)
Here's another thing that would be helped by this change.  minimal-xul.css has these three selectors:

  [hidden="true"]
  [collapsed="true"]
  [moz-collapsed="true"]

which end up getting matched, as revalidation selectors, against everything in sight.  These are all XUL-namespaced, so shouldn't get matched against HTML elements at all, ideally, or should fail to match immediately based on namespace, but we end up doing those attr gets instead.
Boris verified that my original approach of improves matching performance, but I unfortunately determined it to slow down parsing. I'm working on a nice abstraction to make it fast.
I should note that fixing bug 1374017 would also help with the selectors from comment 2.  It wouldn't help with other cases, though...
This makes the code easier to work with, and fixes a bug where we don't currently
reject pseudos within :not().

MozReview-Commit-ID: Cgl9w0PBsN3
Attachment #8878986 - Flags: review?(simon.sapin)
It probably makes more sense (eventually) to put it in SmallVec.

MozReview-Commit-ID: AIBKCLiMNN2
Attachment #8878987 - Flags: review?(simon.sapin)
MozReview-Commit-ID: IM9aAjSeJI9
Attachment #8878988 - Flags: review?(simon.sapin)
We add a slow in-place reverse during parsing to achieve this for now, which
gets fixed up later on.

MozReview-Commit-ID: 42QkOzSgV3T
Attachment #8878989 - Flags: review?(simon.sapin)
This patch doesn't modify any of the code because making a few things pub. I
did this first to make the next patch easier to audit.

MozReview-Commit-ID: 7PYxoS5bVGN
Attachment #8878990 - Flags: review?(simon.sapin)
MozReview-Commit-ID: JkBjqyYgbrB
Attachment #8878991 - Flags: review?(simon.sapin)
I verified that, on our parsing micobenchmark, this regresses parsing performance by about 1-2%. Given the wins that bz found with this patch, and given that selector parsing doesn't affect inline style performance, I think that's probably acceptable.
Note that this will also regress matching of the invalidation stuff though, which already happens left to right. Perhaps its acceptable.
Comment on attachment 8878986 [details] [diff] [review]
Part 1 - Stop using parse_compound_selector for negation parsing. v1

Review of attachment 8878986 [details] [diff] [review]:
-----------------------------------------------------------------

> "reject pseudos within :not()"

You mean pseudo-elements, not pseudo-classes, right?


r+ with unnecessary block removed, or explanation of why parse_nested_block is not enough.

::: servo/components/selectors/parser.rs
@@ +1448,5 @@
> +            }
> +        }
> +    }
> +
> +    // Additional selectors are forbiden.

I think this block is unnecessary. Additional *anything* is forbidden, and the parse_nested_block call in parse_one_simple_selector already takes care for returning Err if there is any remaining input before the `)` token marking the end of the `not(` function’s arguments.
Attachment #8878986 - Flags: review?(simon.sapin) → review+
Comment on attachment 8878987 [details] [diff] [review]
Part 2 - Hoist sink into selectors. v1

Review of attachment 8878987 [details] [diff] [review]:
-----------------------------------------------------------------

It first I thought sink.rs was missing from this diff but it looks like Splinter doesn’t show files renamed without changes…

Regardless, I think the Push<T> trait should be removed and replaced with FnMut(T), so I’d rather not add more usage of it.
Comment on attachment 8878988 [details] [diff] [review]
Part 3 - Stop creating unnecessarily-large SmallVecs for specific tasks. v1

Review of attachment 8878988 [details] [diff] [review]:
-----------------------------------------------------------------

As in previous commit, please use a closure instead.
Comment on attachment 8878990 [details] [diff] [review]
Part 5 - Hoist specificity computation into a new private builder module. v1

Review of attachment 8878990 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/selectors/builder.rs
@@ +6,5 @@
> +use std::cmp;
> +use std::ops::Add;
> +
> +#[derive(Copy, Clone, Debug, Eq, PartialEq)]
> +pub struct SpecificityAndFlags(pub u32);

Should the u32 field be private as well?
Attachment #8878990 - Flags: review?(simon.sapin) → review+
> and fixes a bug where we don't currently reject pseudos within :not()

Do we not have a test covering this?  Should add one as needed...
Comment on attachment 8878989 [details] [diff] [review]
Part 4 - Store selectors in matching order, rather than parse order. v1

Review of attachment 8878989 [details] [diff] [review]:
-----------------------------------------------------------------

matching.rs uses iter_raw_parse_order_from, and selector_parser.rs uses iter_raw_match_order. Isn’t this backward? I have no idea which is correct for invalidator.rs. Maybe each call site could use a code comment?

Also, the next patch changes the meaning of iter_raw_parse_order_from and makes it only in parse order up to the next combinator. Can we avoid exposing that API at all?
Depends on: 1374357
I filed bug 1374357 on comment 13, to make sure it does not get lost.
Comment on attachment 8878991 [details] [diff] [review]
Part 6 - Invert the order of each compound selector. v2

Review of attachment 8878991 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/selectors/builder.rs
@@ +68,5 @@
> +        self.combinators.push((c, self.current_len));
> +        self.current_len = 0;
> +    }
> +
> +    /// Returns true if no simple selectors have been pushed.

Please add to the doc-comment: "since the sink creation (not since the last push_combinator call)"

Is this really the desired behavior? I don’t understand how Combinator::PseudoElement is used.

@@ +160,5 @@
> +        Arc::into_thin(Arc::from_header_and_iter(header, self))
> +    }
> +}
> +
> +impl<Impl: SelectorImpl> ExactSizeIterator for SelectorBuilder<Impl> {

Although it probably ends up never being called in build_with_specificity_and_flags, when ExactSizeIterator is implemented, Iterator::size_hint should also be implemented and return (self.len(), Some(self.len()). (The default impl returns (0, None).)

@@ +182,5 @@
> +        if let Some((combinator, len)) = self.sink.combinators.pop() {
> +            // There's another compound selector. Reset the pointers to iterate it,
> +            // and then return the combinator.
> +            self.end = self.base;
> +            self.base = unsafe { self.end.offset(-(len as isize)) };

I think this is UB when called without calling build_with_specificity_and_flags. It looks like SelectorBuilder is a type mostly to be able to implement Iterator. This type should be private to the 'builder' module. Users would create a SelectorSink directly instead, then call build or build_with_specificity_and_flags which would be free functions or methods of SelectorSink. SelectorBuilder would contain &mut SelectorSink.

I also have some ideas to use slices instead of raw pointers to make this less unsafe. I can try it out if you push your branch to some repository.
(In reply to Simon Sapin (:SimonSapin) from comment #15)
> Comment on attachment 8878987 [details] [diff] [review]
> Part 2 - Hoist sink into selectors. v1
> 
> Review of attachment 8878987 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It first I thought sink.rs was missing from this diff but it looks like
> Splinter doesn’t show files renamed without changes…
> 
> Regardless, I think the Push<T> trait should be removed and replaced with
> FnMut(T), so I’d rather not add more usage of it.

Per IRC discussion, I'm a bit nervous about a closure compiling down to the same thing, and would rather not take the time to measure. This seems like something we can tweak later.

(In reply to Simon Sapin (:SimonSapin) from comment #17)
> Comment on attachment 8878990 [details] [diff] [review]
> Part 5 - Hoist specificity computation into a new private builder module. v1

> > +#[derive(Copy, Clone, Debug, Eq, PartialEq)]
> > +pub struct SpecificityAndFlags(pub u32);
> 
> Should the u32 field be private as well?

The test code uses it. It's alright, because the module is private.


(In reply to Simon Sapin (:SimonSapin) from comment #19)
> Comment on attachment 8878989 [details] [diff] [review]
> Part 4 - Store selectors in matching order, rather than parse order. v1
> 
> Review of attachment 8878989 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> matching.rs uses iter_raw_parse_order_from, and selector_parser.rs uses
> iter_raw_match_order. Isn’t this backward? I have no idea which is correct
> for invalidator.rs. Maybe each call site could use a code comment?

I don't really know, but I asked emilio to look at it, and he said the usage was correct, so I just tried to preserve existing behavior and make the naming more clear. I think the code should definitely be made more understandable, but that's unrelated to this bug, and should be done in a followup.

> Also, the next patch changes the meaning of iter_raw_parse_order_from and
> makes it only in parse order up to the next combinator.

Yep, but thankfully that's ok, because the consumers only seem to care about ordering at the inter-compound-selector level (not the intra-compound-selector level).

> Can we avoid exposing that API at all?

Not without reworking the callers. I agree that would be nice to do, but again, doesn't need to block this.

(In reply to Simon Sapin (:SimonSapin) from comment #21)
> Comment on attachment 8878991 [details] [diff] [review]
> Part 6 - Invert the order of each compound selector. v2
> 
> Review of attachment 8878991 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: servo/components/selectors/builder.rs
> @@ +68,5 @@
> > +        self.combinators.push((c, self.current_len));
> > +        self.current_len = 0;
> > +    }
> > +
> > +    /// Returns true if no simple selectors have been pushed.
> 
> Please add to the doc-comment: "since the sink creation (not since the last
> push_combinator call)"

Sure.

> 
> Is this really the desired behavior? I don’t understand how
> Combinator::PseudoElement is used.

I think so, since it preserves the invariant that combinators do not appear at the endpoints of the selector.

> 
> @@ +160,5 @@
> > +        Arc::into_thin(Arc::from_header_and_iter(header, self))
> > +    }
> > +}
> > +
> > +impl<Impl: SelectorImpl> ExactSizeIterator for SelectorBuilder<Impl> {
> 
> Although it probably ends up never being called in
> build_with_specificity_and_flags, when ExactSizeIterator is implemented,
> Iterator::size_hint should also be implemented and return (self.len(),
> Some(self.len()). (The default impl returns (0, None).)

Ok, sure.

> 
> @@ +182,5 @@
> > +        if let Some((combinator, len)) = self.sink.combinators.pop() {
> > +            // There's another compound selector. Reset the pointers to iterate it,
> > +            // and then return the combinator.
> > +            self.end = self.base;
> > +            self.base = unsafe { self.end.offset(-(len as isize)) };
> 
> I think this is UB when called without calling
> build_with_specificity_and_flags.

Oh, that's a good point, I didn't consider the fact that somebody else might try to iterate it.

> It looks like SelectorBuilder is a type
> mostly to be able to implement Iterator. This type should be private to the
> 'builder' module.

I ended up needing to rework this to work around https://github.com/rust-lang/rust/issues/42763 . I think the new setup should satisfy your concerns here.

> I also have some ideas to use slices instead of raw pointers to make this
> less unsafe. I can try it out if you push your branch to some repository.

My patches are at [1]. That said, I don't think it's worth spending too much time on this. We have a ton of more-pressing things to do and we shouldn't rathole on these patches.

[1] https://github.com/bholley/gecko/commits/station_r
With this patch, the small parsing performance regression mentioned above
goes away.

MozReview-Commit-ID: JkBjqyYgbrB
Attachment #8879410 - Flags: review?(simon.sapin)
Attachment #8878991 - Attachment is obsolete: true
Attachment #8878991 - Flags: review?(simon.sapin)
Comment on attachment 8878987 [details] [diff] [review]
Part 2 - Hoist sink into selectors. v1

Review of attachment 8878987 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, let’s keep the Push trait. ForgetfulSink is unused though, please remove it.
Attachment #8878987 - Flags: review?(simon.sapin) → review+
Attachment #8878988 - Flags: review?(simon.sapin) → review+
Attachment #8878989 - Flags: review?(simon.sapin) → review+
Comment on attachment 8879410 [details] [diff] [review]
Part 6 - Invert the order of each compound selector. v3

Review of attachment 8879410 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with additional patch that I’ll add next.
Attachment #8879410 - Flags: review?(simon.sapin) → review+
> That said, I don't think it's worth spending too much time on this. We have a ton of more-pressing things to do and we shouldn't rathole on these patches.

I disagree. Spending some time to avoid playing loose with raw pointers reduce unsafety is worth it, and not "rat-holing". In this case it wasn’t particularly hard.

The point of doing reviews is to improve patches, not just to rubber-stamp them.
Attachment #8879525 - Flags: review?(bobbyholley)
Can I see some numbers with these patches applied on autoland? This seems super-fragile, and I'd like to ensure it's well worth it after the :dir and :lang changes landed recently (which were the worst offenders).
(In reply to Emilio Cobos Álvarez [:emilio] from comment #28)
> Can I see some numbers with these patches applied on autoland? 

Here's a sequential wikipedia profile with all the recent patches except for this one: http://bit.ly/2sPnTBg
Here's a sequential wikipedia profile the above plus the patches in this bug: http://bit.ly/2sPz2BV

Note that we spend 13 ms in selectors::matching in the first profile and 9ms in the second. The overall numbers fluctuate somewhat, but this particular difference (in time spent matching selectors) seems pretty reliable across runs.

> This seems super-fragile

Why, exactly? The parsing logic is nicely encapsulated within builder.rs. The serialization handling is a bit annoying, but that's just a few lines of code in one function. And as far as everything else is concerned, there's no change, because matching order within a given compound selector is unobservable to consumers.
Comment on attachment 8879525 [details] [diff] [review]
Part 7 - Less unsafe in selectors::builder

Review of attachment 8879525 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Simon Sapin (:SimonSapin) from comment #26)
> Created attachment 8879525 [details] [diff] [review]
> Part 7 - Less unsafe in selectors::builder
> 
> > That said, I don't think it's worth spending too much time on this. We have a ton of more-pressing things to do and we shouldn't rathole on these patches.
> 
> I disagree. Spending some time to avoid playing loose with raw pointers
> reduce unsafety is worth it, and not "rat-holing". In this case it wasn’t
> particularly hard.

My apologies - I entirely misunderstood the scope of what you were proposing, and was also just cranky from spending too much weekend time on these patches and wanting to be done with them. This approach looks like a very nice improvement, and I appreciate you taking the time to do it. :-)
Attachment #8879525 - Flags: review?(bobbyholley) → review+
https://hg.mozilla.org/integration/autoland/rev/b29d23ddaa6a
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.