stylo: reduce tokenizer overhead

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P1
normal
5 months ago
a month ago

People

(Reporter: bholley, Assigned: SimonSapin)

Tracking

(Blocks: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Per bug 1331843 comment 23 this isn't a huge win on the testcase that I'm measuring, but it's easy and a small win.

I wrote a verification patch to ensure that we're hitting the cache in all the cases where parser.try() returns failure and rewinds.
Created attachment 8847406 [details] [diff] [review]
Verification patch (do not land). v1

This patch prints statistics to verify that the cache is working correctly. The number of
cache hits track the number of try() failures.

MozReview-Commit-ID: Cpzf7eBSGyD
Created attachment 8847407 [details] [diff] [review]
Add a cache for the last token returned by Parser::next(). v1

MozReview-Commit-ID: AJDKMAemFLT
Attachment #8847407 - Flags: review?(simon.sapin)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e9c33d1a6a1ef80b0872a80e0c94f39fdd0bd36c
https://treeherder.mozilla.org/#/jobs?repo=try&revision=281c3a3622b0658427ff7dffbd3adcd228b6b522
Comment on attachment 8847407 [details] [diff] [review]
Add a cache for the last token returned by Parser::next(). v1

Hm, the measurements are a bit noisy, but this doesn't seem to improve much (even with the verified cache hits), and _might_ actually make it slightly slower.

I'm operating under the assumption that clone()ing a Token should never be more expensive than just copying a word or two. Is that right Simon? If so, I'm not sure why this would make things slower, but combined with the oranges on try, it might be best to just punt on this one.
Flags: needinfo?(simon.sapin)
Attachment #8847407 - Flags: review?(simon.sapin)
(Assignee)

Comment 6

4 months ago
A token can own a memory allocation, but only when there’s a backslash-escape which I expect is rare in practice. (Escapes are unnecessary when using UTF-8.)

Even without allocations, Token is more than a word or two. The largest variant is:

    Dimension(NumericValue, Cow<'a, str>),

Adding this to unit tests passes on x86_64:

    assert_eq!(::std::mem::size_of::<Token>(), 56);
    assert_eq!(::std::mem::size_of::<Cow<str>>(), 32);
    assert_eq!(::std::mem::size_of::<NumericValue>(), 16);

Enums, especially nested, are space-inefficient because padding is often added to the discriminant to keep variant’s fields aligned. So discriminants take 8 bytes for Token, 8 bytes for Cow, and 4 bytes for Option in Option<i32> in NumericValue.

If we want to reduce this we can easily cut 4 bytes by replacing Option<i32> with (i32, bool) (there’s another bool in NumericValue whose space currently gets padded up to 4 bytes, so the new bool fits in there), and another 8 bytes by replacing Cow<str> with something that uses Box<str> instead of String. (String::into_boxed_str may reallocate, but again I expect backslash-escapes to be rare.)

We could go further by "flattening" enums (more Token variants instead of nested enums or booleans) but this would negatively impact ergonomics for code matching Token.



… I kinda went down a rabbit hole here. I don’t know if reducing the size of Token would even have much impact on parsing performance.
Flags: needinfo?(simon.sapin)
Priority: -- → P1
Simon is investigating parser performance issues in bug 1331843, so assigning this one to him. We may or may not want to land something like this.
Assignee: bobbyholley → simon.sapin
Generalizing this a little bit to include the various approaches of:
(a) Shrinking tokens
(b) Changing the API to avoid retokenizing, and
(c) a 1-entry cache

Stylo spends 200-250 ms tokenizing the 100x myspace sheet, and gecko spends around half that. So there's definitely room for improvement here.
Summary: stylo: add a 1-entry cache to cssparser to avoid tokenizing twice → stylo: reduce tokenizer overhead
Blocks: 1243581
(Assignee)

Comment 9

a month ago
https://github.com/servo/rust-cssparser/pull/159 and https://github.com/servo/servo/pull/17345 take size_of::<Token>() from 56 bytes to 32 bytes, on 64-bit platforms.

Next I’ll look and maintaining a "current token" which can be borrowed without advancing the tokenizer’s position. Then, progressively moving parsing code to use that new API instead of Parser::try should reduce the amount of duplicated tokenization work.
You need to log in before you can comment on or make changes to this bug.