Per bug 1331843 comment 23 this isn't a huge win on the testcase that I'm measuring, but it's easy and a small win. I wrote a verification patch to ensure that we're hitting the cache in all the cases where parser.try() returns failure and rewinds.
Created attachment 8847406 [details] [diff] [review] Verification patch (do not land). v1 This patch prints statistics to verify that the cache is working correctly. The number of cache hits track the number of try() failures. MozReview-Commit-ID: Cpzf7eBSGyD
Created attachment 8847407 [details] [diff] [review] Add a cache for the last token returned by Parser::next(). v1 MozReview-Commit-ID: AJDKMAemFLT
Comment on attachment 8847407 [details] [diff] [review] Add a cache for the last token returned by Parser::next(). v1 Hm, the measurements are a bit noisy, but this doesn't seem to improve much (even with the verified cache hits), and _might_ actually make it slightly slower. I'm operating under the assumption that clone()ing a Token should never be more expensive than just copying a word or two. Is that right Simon? If so, I'm not sure why this would make things slower, but combined with the oranges on try, it might be best to just punt on this one.
A token can own a memory allocation, but only when there’s a backslash-escape which I expect is rare in practice. (Escapes are unnecessary when using UTF-8.) Even without allocations, Token is more than a word or two. The largest variant is: Dimension(NumericValue, Cow<'a, str>), Adding this to unit tests passes on x86_64: assert_eq!(::std::mem::size_of::<Token>(), 56); assert_eq!(::std::mem::size_of::<Cow<str>>(), 32); assert_eq!(::std::mem::size_of::<NumericValue>(), 16); Enums, especially nested, are space-inefficient because padding is often added to the discriminant to keep variant’s fields aligned. So discriminants take 8 bytes for Token, 8 bytes for Cow, and 4 bytes for Option in Option<i32> in NumericValue. If we want to reduce this we can easily cut 4 bytes by replacing Option<i32> with (i32, bool) (there’s another bool in NumericValue whose space currently gets padded up to 4 bytes, so the new bool fits in there), and another 8 bytes by replacing Cow<str> with something that uses Box<str> instead of String. (String::into_boxed_str may reallocate, but again I expect backslash-escapes to be rare.) We could go further by "flattening" enums (more Token variants instead of nested enums or booleans) but this would negatively impact ergonomics for code matching Token. … I kinda went down a rabbit hole here. I don’t know if reducing the size of Token would even have much impact on parsing performance.
Simon is investigating parser performance issues in bug 1331843, so assigning this one to him. We may or may not want to land something like this.
Generalizing this a little bit to include the various approaches of: (a) Shrinking tokens (b) Changing the API to avoid retokenizing, and (c) a 1-entry cache Stylo spends 200-250 ms tokenizing the 100x myspace sheet, and gecko spends around half that. So there's definitely room for improvement here.
https://github.com/servo/rust-cssparser/pull/159 and https://github.com/servo/servo/pull/17345 take size_of::<Token>() from 56 bytes to 32 bytes, on 64-bit platforms. Next I’ll look and maintaining a "current token" which can be borrowed without advancing the tokenizer’s position. Then, progressively moving parsing code to use that new API instead of Parser::try should reduce the amount of duplicated tokenization work.