stylo: parsing a single property is still somewhat slower than Gecko

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P1
normal
2 months ago
9 days ago

People

(Reporter: bz, Assigned: SimonSapin, NeedInfo)

Tracking

(Depends on: 1 bug, Blocks: 3 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 affected)

Details

This is sort of a continuation of bug 1344136 but the parameters are different enough that I think a different bug is a good idea.

Looking at a profile of the testcase in bug 1344131, stylo spends about 12% of its time in  memmove, called mostly from declaration_block::parse_one_declaration and glue::set_property.  Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string.  The rest of the stylo-specific time is various parsing code.

I put a profile at https://perf-html.io/public/ca2114c54af0c8a917d476430772b924b2fa9046/calltree/?thread=2 in case something in there jumps out to someone more familiar with the servo CSS parser/tokenizer.  If not, I can dig some more and try to do more comparable head-to-head profiles vs Gecko to see which parts are slower in stylo.
Flags: needinfo?(simon.sapin)
Blocks: 1289864
There's probably significant overlap with SimonSapin's work in bug 1331843.
Assignee: nobody → simon.sapin
Priority: -- → P1
Happy to reprofile once that lands.
Depends on: 1331843
Blocks: 1243581
(Assignee)

Comment 3

2 months ago
> Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string.

On the output side of this conversion/copy, rust-cssparser’s tokenizer it its current incarnation fundamentally needs UTF-8 (Rust &str) for its input. Making it generic to support UTF-16 input would be a significant amount of work.

On the input side, is it really UTF-16 rather than Latin-1? IIRC SpiderMonkey can store JS strings as Latin-1 internally when they don’t contain code units beyond U+00FF, which is the case of "10px". Based on chatting with heycam, it sounds like the WebIDL bindings layer would first convert Latin-1 to UTF-16, then the CSSOM implementation for Stylo converts that to UTF-8.

If we add a WebIDL annotation that a given string parameter is wanted in UTF-8, we could make that one conversion instead of two. We only want UTF-8 when a string is going to Stylo/Servo though, so doing this may be tricky as long as we have two style systems.

Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked if the string happens to only contain ASCII code units (and is stored as Latin-1). However, systematically scanning Latin-1 inputs to check whether they’re ASCII also has a cost. So ideally, SpiderMonkey would track this for us with a "is ASCII" / "might not be ASCII" bit kept in Latin-1 strings.

(This is about CSSOM, I’ve filed Bug 1354989 separately for stylesheets.)
> Making it generic to support UTF-16 input would be a significant amount of work.

Right.  I'm not sure what we can do about this overhead so far.

> On the input side, is it really UTF-16 rather than Latin-1? 

It's UTF-16 (or WTF-16, I guess; more on this below) once it gets past the bindings layer.  The original JS string is in fact Latin-1 in this case.

We _could_ do something with bindings here to produce UTF-8 instead of UTF-16, yes.  As you note, that's easier to do once we don't have to worry about two style systems, or once we're ok taking a perf hit on the non-stylo style system.

> Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked
> if the string happens to only contain ASCII code units

As things stand, bindings will make a copy no matter what.  They do right now.  So if we implemented the "bindings output UTF-8" thing, ideally we would not need any more copies or scanning of the string or whatnot.

That said, does rust-cssparser actually expect UTF-8, or WTF-8?  That is, what happens for surrogate code points in the input?  Should bindings be outputting WTF-8 given unpaired surrogates in the WTF-16 input, or converting unpaired surrogates to replacement char, or something else?

All the encoding stuff could use a separate bug or three to track it, obviously.  ;)
(Assignee)

Comment 5

2 months ago
Filed bug 1355101 and bug 1355106.
(Assignee)

Comment 6

9 days ago
I’ve submitted https://github.com/servo/servo/pull/16954 which will hopefully reduce `memmove` traffic during parsing.
You need to log in before you can comment on or make changes to this bug.