Open
Bug 1328497
(stylo-codesize)
Opened 8 years ago
Updated 2 years ago
stylo: Optimize codesize
Categories
(Core :: CSS Parsing and Computation, defect, P5)
Core
CSS Parsing and Computation
Tracking
()
NEW
Tracking | Status | |
---|---|---|
firefox57 | --- | wontfix |
People
(Reporter: bholley, Unassigned)
References
(Depends on 2 open bugs)
Details
Attachments
(6 files)
I did some investigation today on codesize. I'm writing up the takeaways here while they're in my head.
TL;DR: Stylo currently bloats libxul by 3.57M on linux64. This is ok but not great, and can probably be optimized further. Some overhead is inevitable while we're shipping two style systems at once, but we should minimize it.
The easiest way to measure codesize on linux is by running |size libxul.so| and looking at the text segment (the data segment doesn't change much with/without stylo).
I also investigated passing -Os instead of -O2 to rustc. This wins back about 0.7 megabytes, but I didn't check what impact it had on perf. It also requires nightly Rust, so we'll need to push on stabilization if we decide we want to take that path.
Here are dumps of the symbol tables without stylo, with stylo, and with stylo + -Os:
https://www.dropbox.com/s/9c8kfj8hm2m227p/readelf-nostylo.txt.bz2?dl=0
https://www.dropbox.com/s/1fztqnkmap1t1xu/readelf-stylo.txt.bz2?dl=0
https://www.dropbox.com/s/j2rk0t7pofpc9lk/stylo-size-with-Os.bz2?dl=0
Nathan did some analysis and found some interesting large-ish stylo functions in the top 100. I'll attach the results to the bug.
Some ways to move forward here:
* Look at the large functions Nathan found and see if we can make them smaller. A lot of them look like they ought to be a lot smaller.
* Look for small-but-numerous auto-generated functions.
* Investigate -Os more.
Reporter | ||
Comment 1•8 years ago
|
||
Here are the large functions. Some of these are almost 100k and probably shouldn't be.
Comment 2•8 years ago
|
||
The PropertyDeclaration ones *should* be implemented with a jump table; it is problematic if that's not the case. But given that we have 200-odd properties I wouldn't be surprised if it's large even with a jump table.
CSSColor makes sense. GradiantKind::parse_radial does not, but that could be due to a lot of inlining.
We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?
I wonder why clip_path has a large cascade_property function. Aggressive inlining?
Comment 3•8 years ago
|
||
Is there a clear path to removing the old style system? I ask because I've been around long enough to have seen multiple cases where we introduced a new and improved way of doing something, but never got around to getting rid of all the uses of the old way...
Comment 4•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment #3)
> Is there a clear path to removing the old style system?
I could be wrong, but I believe that only one is used at a time, so that should not be a problem.
Comment 5•8 years ago
|
||
This should help with one of them: https://github.com/servo/servo/pull/14844
I think I'll need to disassemble, but I think if we care a lot about code size we can make a bunch of the data static and index by discriminant value.
I don't know if rust generates that kind of code by default, but if it doesn't we should probably also do it for perf, to avoid unnecessary branches.
Reporter | ||
Comment 6•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #4)
> (In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment
> #3)
> > Is there a clear path to removing the old style system?
>
> I could be wrong, but I believe that only one is used at a time, so that
> should not be a problem.
Yeah - the plan of record is that the old style system should be dead code modulo the pref.
Matt said he'd take over investigating this - thanks Matt!
Assignee: nobody → mbrubeck
Comment 7•8 years ago
|
||
> We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?
Removing `env_logger` from the `geckoservo` dependencies reduces the text segment by 370 KiB, so this idea is very promising.
Comment 8•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #2)
> We're pulling in the entire regex crate because of env_logger. Perhaps we
> should turn that off and use a smaller logger that hooks into the Firefox
> logging infra?
I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.
Comment 9•8 years ago
|
||
This mostly shows the same functions as in the previous attachment, but in a different format and with slightly different size data.
Updated•8 years ago
|
Attachment #8823898 -
Attachment is patch: false
Comment 10•8 years ago
|
||
> I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.
We would use the built in regex stuff. regex::Regex are not JS regexes. If we were to make that switch I'd prefer to move everything over to regex::Regex (it's possible that the regex parser could be configured to only accept JS regexes).
Reporter | ||
Comment 11•8 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #7)
> > We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?
>
> Removing `env_logger` from the `geckoservo` dependencies reduces the text
> segment by 370 KiB, so this idea is very promising.
I do use the rust logging stuff pretty heavily when working on the style system, but rarely need the regexp matching. How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature?
Comment 12•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #10)
> > I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.
>
> We would use the built in regex stuff. regex::Regex are not JS regexes. If
> we were to make that switch I'd prefer to move everything over to
> regex::Regex (it's possible that the regex parser could be configured to
> only accept JS regexes).
What do you mean by "built in regex stuff"? Gecko currently uses SpiderMonkey's regex, which... causes problem like bug 1310335.
But I think we can ask Gecko to evulate @-moz-document rule like what we plan to do for media queries, so we probably don't need any regex stuff for that in the servo side.
Comment 13•8 years ago
|
||
"preexisting", not "built in", sorry :)
Comment 14•8 years ago
|
||
> How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature?
Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex), and it turns out that it is already an optional feature! Submitted https://github.com/servo/servo/pull/14866 to disable this feature.
Comment 15•8 years ago
|
||
Just for reference, I also measured a build with `opt-level=3` and found that the text segment is 230 KiB larger than the same revision compiled with the current default `opt-level=2`.
Comment 16•8 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #14)
> Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex),
I was going to ask whether regex could be slimmed down in any way, but then I checked re2 (https://github.com/google/re2) and libre2.so.0 is about 500KB (~490KB .text), so perhaps 331KB for a regex library isn't really unusual!
Reporter | ||
Comment 17•8 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #15)
> Just for reference, I also measured a build with `opt-level=3` and found
> that the text segment is 230 KiB larger than the same revision compiled with
> the current default `opt-level=2`.
Good to know! I filed bug 1328954 to evaluate the performance tradeoff sometime down the line.
Comment 18•8 years ago
|
||
Hi, some things off the top of my head:
- Are you compiling with full debug symbols even though you might only need line tables for backtraces?
- If stylo is compiled into a cdylib or staticlib, did you try compiling with LTO? This might give LLVM a chance to deduplicate some code.
- I'm vaguely aware of some functions generated by #[derive] being unnecessarily big. "doener" and "eddyb" over on the #rustc channel might know more about this.
Comment 19•8 years ago
|
||
Here's the equivalent list for Windows from Chromium's Windows Binary Sizes tools. It looks pretty similar, as you'd expect.
Reporter | ||
Comment 20•8 years ago
|
||
(In reply to michaelwoerister from comment #18)
> Hi, some things off the top of my head:
>
> - Are you compiling with full debug symbols even though you might only need
> line tables for backtraces?
The Gecko build system strips debug symbols, and our measurements are just for text size, so I doubt that impacts us here.
> - If stylo is compiled into a cdylib or staticlib, did you try compiling
> with LTO? This might give LLVM a chance to deduplicate some code.
We are compiling with LTO.
> - I'm vaguely aware of some functions generated by #[derive] being
> unnecessarily big. "doener" and "eddyb" over on the #rustc channel might
> know more about this.
I don't _think_ that #[derive]-ed functions are at the top of our list right now (it's more stuff like a 500k parse function). But check with mbrubeck and Manishearth, who have looked more recently than I.
Comment 21•8 years ago
|
||
(In reply to Matt Brubeck (:mbrubeck) from comment #15)
> Just for reference, I also measured a build with `opt-level=3` and found
> that the text segment is 230 KiB larger than the same revision compiled with
> the current default `opt-level=2`.
In the disassembly I'm seeing a _lot_ of unrolling of memcpy- and memset-like code sequences. I hear that LLVM can be quite eager with this stuff at opt-level 2 and 3.
Just as an experiment, could you try building with opt-level of s or z, to see how much smaller the code could in theory get?
Comment 22•8 years ago
|
||
Ah, I missed the discussion of -Os in comment 0.
Comment 23•8 years ago
|
||
Among other things, https://github.com/servo/rust-cssparser/pull/122 rewrites cssparser::parse_color_keyword to use a rust-phf map instead of a large `match` expression that compiled to an amount of code proportional to the number of keywords. Now there’s a &[(&str, Color)] table backing a static hash map, which takes 3576 bytes (+ storage for the keyword strings) for 149 color keywords.
Reporter | ||
Updated•8 years ago
|
Priority: -- → P5
Comment 24•7 years ago
|
||
Here's a neat visualization (open index.html) of a Stylo-enabled libxul, produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured out how to get it working with "diff" mode to see just the stylo bits, but for that I can use the .txt file.
Reporter | ||
Comment 25•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #24)
> Created attachment 8871389 [details]
> visualization with webtreemap
>
> Here's a neat visualization (open index.html) of a Stylo-enabled libxul,
> produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured
> out how to get it working with "diff" mode to see just the stylo bits, but
> for that I can use the .txt file.
This is awesome!
Updated•7 years ago
|
Reporter | ||
Comment 26•7 years ago
|
||
Looks like I misinterpreted comment 24, and in particular didn't realize that the generated code was accounted for in objdir, rather than in servo/. So the overall codesize regression there is about 4MB. That was a month ago, and glandium suggests that it's currently 5.7MB on linux (though when he measured a build from a month ago, he got 4.6MB, so linux might be slightly worse on this front).
So anway, it's plausible that we're currently looking at a 5MB codesize regression on Windows, which is significant. Luckily dmajor has been digging into this, so hopefully he'll find some wins.
Comment 27•7 years ago
|
||
Do we know how much would be removed from removing the non-stylo CSS stuff from gecko?
Reporter | ||
Comment 28•7 years ago
|
||
from styloDelta.txt, looks like the codesize increase breaks down as follows:
* 790k in servo/
* 270k from third_party/rust
* 2250k from objdir (generated code)
* 155k of standard library code (not in the breakdown, but shows up in the delta).
That gets us to around 3.5M, which is 0.5M short of the total reported in the delta. There's probably some increased size just from various handles getting activated, but 0.5M looks like a lot.
Then again, the 4M was from "raw symbol differences". If I scroll down, I also see:
Merged Sections / Types
Merged Count : 5
--------------------------------------
Increases in Total Count
Total Size Total Count Name
3594061 4046 code
36592 1275 rdata
135 32 thunk
161 17 bss
248 14 data
So maybe 3.6 is more accurate?
Anyway, would be good to redo these measurements.
(In reply to Mike Hommey [:glandium] from comment #27)
> Do we know how much would be removed from removing the non-stylo CSS stuff
> from gecko?
The graph shows about 800k in layout/style, most but not all of which will be removable.
Comment 29•7 years ago
|
||
> Anyway, would be good to redo these measurements.
Here's a new diff based on yesterday's m-c, Win64 build.
File size difference in xul.dll:
77,340,672 with stylo
71,698,432 no stylo
5,642,240 difference
The SymbolSort tool shows almost the same number:
Total Size : 5632000
Of which, some notable directories are:
2979040 1526 e:\m-c\objdir\toolkit\library\x86_64-pc-windows-msvc\release\build\style-9b15ab90f9d7a34a\out\properties.rs
1258803 1007 e:\m-c\servo
294264 263 e:\m-c\third_party\rust
249767 1423 c:\projects\rust\src
112226 1174 e:\m-c\layout\style
Reporter | ||
Comment 30•7 years ago
|
||
Yikes. Seems like we need to do find some way to slim down that generated code.
Comment 31•7 years ago
|
||
It's probably worth checking where the code size is from.
One thing I'm suspecting is the serialization. ToCss could be a significant contributor of the code size, and since to_css is generic, we may unexpectedly double or even triple the code it generates by using multiple different types to get the result. That is one thing probably worth checking.
Comment 32•7 years ago
|
||
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #31)
There's a breakdown by function in the .txt file.
There's 547k in style::properties::PropertyDeclaration::parse_into (bug 1351737).
There's a total of 908k in functions like "style::properties::substitute_variables_XYZ_slow<closure>".
Grep for "to_css" sums up to only 202k.
Reporter | ||
Comment 33•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #32)
> There's a total of 908k in functions like
> "style::properties::substitute_variables_XYZ_slow<closure>".
!!!!!
That definitely sounds wrong. File a bug and needinfo SimonSapin?
Comment 34•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30)
> Yikes. Seems like we need to do find some way to slim down that generated code.
properties.rs seems to go heavy on the "for each CSS property, generate code specific to that property" approach. How feasible would it be to use a more table-based approach, where 200+ codepaths can collapse down to a single generic one that takes different branches based on some bits set in a table somewhere?
Reporter | ||
Comment 35•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #34)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30)
> > Yikes. Seems like we need to do find some way to slim down that generated code.
>
> properties.rs seems to go heavy on the "for each CSS property, generate code
> specific to that property" approach. How feasible would it be to use a more
> table-based approach, where 200+ codepaths can collapse down to a single
> generic one that takes different branches based on some bits set in a table
> somewhere?
In general that's the right idea here. We went with the codegen approach because there was a lot of exploration involved in hooking up the value computation to C++, and we had to figure out a lot of the requirements as we went.
Now that most of the code is written, we should try to figure out where the commonalities are and collapse them into shared code (especially in cases that are less performance-sensitive). In order to guide those efforts for maximal impact, it would be really helpful to get a breakdown of the big-ticket categories within the generated code. comment 29 is a great start, especially since it showed that one if the biggest offending categories is CSS variables, which are not at all performance-sensitive. If you can expand on that breakdown, it would be really helpful. Let me know know if you need help navigating the mako templates.
Comment 36•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #34)
> properties.rs seems to go heavy on the "for each CSS property, generate code
> specific to that property" approach. How feasible would it be to use a more
> table-based approach, where 200+ codepaths can collapse down to a single
> generic one that takes different branches based on some bits set in a table
> somewhere?
For what it’s worth, this design is mostly mine from when the goal was to make something work at all in Servo, and I didn’t consider code size at the time. In particular, the goal to encode specified values in the type system with a different Rust type for potentially every CSS property means that there is not an easy way to manipulate just "a property value". There’s a giant Rust `enum` that also encodes which property it is (effectively the property name). And for a long time, PropertyId did not exist and the style system largely avoided needing it by generating per-property functions called in big `match` expressions.
Now that we *are* looking at code size, we can definitely improve on this design. It’s not all-or-nothing, though. I think we should look at each function and try to deduplicate them one by one, for example by taking a PropertyId parameter instead. It’s not gonna be "remove everything and replace it with some bits in a table" in one go.
I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good starting point.
Reporter | ||
Comment 37•7 years ago
|
||
(In reply to Simon Sapin (:SimonSapin) from comment #36)
> I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good
> starting point.
Great - thanks Simon!
Comment 38•7 years ago
|
||
With bug 1377262 landed, Stylo now weighs in at 4.2MB (DLL size).
Here's a new SymbolSort output for anyone curious. I haven't spent too long looking at it yet. The interesting bits start at line 5786, "Increases in Total Size".
I am guessing that the previously-547k PropertyDeclaration::parse_into is now split among:
214140 1 style::properties::LonghandId::parse_value
71669 1 style::properties::PropertyDeclaration::parse_into
(and maybe some other smaller ones further down the list?)
There's still a bunch of animated_properties code near the top (bug 1377594).
Comment 39•7 years ago
|
||
Note: After bug 1386371 lands, LTO for Rust code will be disabled by default in local builds, which causes a significant increase in code size. (Official release builds will still have LTO enabled, as before.)
If you are measuring code size, you will want LTO enabled. (This may also affect other performance measurements.) Add this line to the mozconfig for your optimized builds:
CARGO_RUSTCFLAGS=-C lto
...or this line, which enables Rust LTO along with other options that are used in official release builds but are too slow to enable by default (such as gcc's identical code folding):
ac_add_options --enable-release
Comment 40•7 years ago
|
||
https://github.com/servo/servo/pull/18396 yields a 23.4k win.
Comment 41•7 years ago
|
||
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57:
--- → wontfix
Updated•6 years ago
|
Assignee: mbrubeck → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•