Open
Bug 1354566
Opened 8 years ago
Updated 2 years ago
stylo: parsing a single property is still somewhat slower than Gecko
Categories
(Core :: CSS Parsing and Computation, defect, P4)
Core
CSS Parsing and Computation
Tracking
()
NEW
People
(Reporter: bzbarsky, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
This is sort of a continuation of bug 1344136 but the parameters are different enough that I think a different bug is a good idea.
Looking at a profile of the testcase in bug 1344131, stylo spends about 12% of its time in memmove, called mostly from declaration_block::parse_one_declaration and glue::set_property. Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string. The rest of the stylo-specific time is various parsing code.
I put a profile at https://perf-html.io/public/ca2114c54af0c8a917d476430772b924b2fa9046/calltree/?thread=2 in case something in there jumps out to someone more familiar with the servo CSS parser/tokenizer. If not, I can dig some more and try to do more comparable head-to-head profiles vs Gecko to see which parts are slower in stylo.
Flags: needinfo?(simon.sapin)
![]() |
Reporter | |
Updated•8 years ago
|
Blocks: stylo-perf
Comment 1•8 years ago
|
||
There's probably significant overlap with SimonSapin's work in bug 1331843.
Assignee: nobody → simon.sapin
Priority: -- → P1
Comment 3•8 years ago
|
||
> Another 4% is the UTF16-to-UTF8 copy of the to-be-parsed string.
On the output side of this conversion/copy, rust-cssparser’s tokenizer it its current incarnation fundamentally needs UTF-8 (Rust &str) for its input. Making it generic to support UTF-16 input would be a significant amount of work.
On the input side, is it really UTF-16 rather than Latin-1? IIRC SpiderMonkey can store JS strings as Latin-1 internally when they don’t contain code units beyond U+00FF, which is the case of "10px". Based on chatting with heycam, it sounds like the WebIDL bindings layer would first convert Latin-1 to UTF-16, then the CSSOM implementation for Stylo converts that to UTF-8.
If we add a WebIDL annotation that a given string parameter is wanted in UTF-8, we could make that one conversion instead of two. We only want UTF-8 when a string is going to Stylo/Servo though, so doing this may be tricky as long as we have two style systems.
Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked if the string happens to only contain ASCII code units (and is stored as Latin-1). However, systematically scanning Latin-1 inputs to check whether they’re ASCII also has a cost. So ideally, SpiderMonkey would track this for us with a "is ASCII" / "might not be ASCII" bit kept in Latin-1 strings.
(This is about CSSOM, I’ve filed Bug 1354989 separately for stylesheets.)
![]() |
Reporter | |
Comment 4•8 years ago
|
||
> Making it generic to support UTF-16 input would be a significant amount of work.
Right. I'm not sure what we can do about this overhead so far.
> On the input side, is it really UTF-16 rather than Latin-1?
It's UTF-16 (or WTF-16, I guess; more on this below) once it gets past the bindings layer. The original JS string is in fact Latin-1 in this case.
We _could_ do something with bindings here to produce UTF-8 instead of UTF-16, yes. As you note, that's easier to do once we don't have to worry about two style systems, or once we're ok taking a perf hit on the non-stylo style system.
> Going further, we can avoid converting/copying entirely and use std::str::from_utf8_unchecked
> if the string happens to only contain ASCII code units
As things stand, bindings will make a copy no matter what. They do right now. So if we implemented the "bindings output UTF-8" thing, ideally we would not need any more copies or scanning of the string or whatnot.
That said, does rust-cssparser actually expect UTF-8, or WTF-8? That is, what happens for surrogate code points in the input? Should bindings be outputting WTF-8 given unpaired surrogates in the WTF-16 input, or converting unpaired surrogates to replacement char, or something else?
All the encoding stuff could use a separate bug or three to track it, obviously. ;)
Comment 5•8 years ago
|
||
Filed bug 1355101 and bug 1355106.
Comment 6•8 years ago
|
||
I’ve submitted https://github.com/servo/servo/pull/16954 which will hopefully reduce `memmove` traffic during parsing.
Comment 7•8 years ago
|
||
https://github.com/servo/servo/pull/17820 should also help with this.
Updated•8 years ago
|
Priority: P1 → P4
Comment 9•7 years ago
|
||
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57:
--- → wontfix
Updated•4 years ago
|
Assignee: simon.sapin → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•