Open Bug 1328497 (stylo-codesize) Opened 7 years ago Updated 2 years ago

stylo: Optimize codesize

Categories

(Core :: CSS Parsing and Computation, defect, P5)

defect

Tracking

()

Tracking Status
firefox57 --- wontfix

People

(Reporter: bholley, Unassigned)

References

(Depends on 2 open bugs)

Details

Attachments

(6 files)

I did some investigation today on codesize. I'm writing up the takeaways here while they're in my head.

TL;DR: Stylo currently bloats libxul by 3.57M on linux64. This is ok but not great, and can probably be optimized further. Some overhead is inevitable while we're shipping two style systems at once, but we should minimize it.

The easiest way to measure codesize on linux is by running |size libxul.so| and looking at the text segment (the data segment doesn't change much with/without stylo).

I also investigated passing -Os instead of -O2 to rustc. This wins back about 0.7 megabytes, but I didn't check what impact it had on perf. It also requires nightly Rust, so we'll need to push on stabilization if we decide we want to take that path.

Here are dumps of the symbol tables without stylo, with stylo, and with stylo + -Os:

https://www.dropbox.com/s/9c8kfj8hm2m227p/readelf-nostylo.txt.bz2?dl=0
https://www.dropbox.com/s/1fztqnkmap1t1xu/readelf-stylo.txt.bz2?dl=0
https://www.dropbox.com/s/j2rk0t7pofpc9lk/stylo-size-with-Os.bz2?dl=0

Nathan did some analysis and found some interesting large-ish stylo functions in the top 100. I'll attach the results to the bug.

Some ways to move forward here:
* Look at the large functions Nathan found and see if we can make them smaller. A lot of them look like they ought to be a lot smaller.
* Look for small-but-numerous auto-generated functions.
* Investigate -Os more.
Here are the large functions. Some of these are almost 100k and probably shouldn't be.
The PropertyDeclaration ones *should* be implemented with a jump table; it is problematic if that's not the case. But given that we have 200-odd properties I wouldn't be surprised if it's large even with a jump table.

CSSColor makes sense. GradiantKind::parse_radial does not, but that could be due to a lot of inlining.

We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?


I wonder why clip_path has a large cascade_property function. Aggressive inlining?
Is there a clear path to removing the old style system? I ask because I've been around long enough to have seen multiple cases where we introduced a new and improved way of doing something, but never got around to getting rid of all the uses of the old way...
(In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment #3)
> Is there a clear path to removing the old style system?

I could be wrong, but I believe that only one is used at a time, so that should not be a problem.
This should help with one of them: https://github.com/servo/servo/pull/14844

I think I'll need to disassemble, but I think if we care a lot about code size we can make a bunch of the data static and index by discriminant value.

I don't know if rust generates that kind of code by default, but if it doesn't we should probably also do it for perf, to avoid unnecessary branches.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> (In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment
> #3)
> > Is there a clear path to removing the old style system?
> 
> I could be wrong, but I believe that only one is used at a time, so that
> should not be a problem.

Yeah - the plan of record is that the old style system should be dead code modulo the pref.

Matt said he'd take over investigating this - thanks Matt!
Assignee: nobody → mbrubeck
> We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?

Removing `env_logger` from the `geckoservo` dependencies reduces the text segment by 370 KiB, so this idea is very promising.
(In reply to Manish Goregaokar [:manishearth] from comment #2)
> We're pulling in the entire regex crate because of env_logger. Perhaps we
> should turn that off and use a smaller logger that hooks into the Firefox
> logging infra?

I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.
This mostly shows the same functions as in the previous attachment, but in a different format and with slightly different size data.
Attachment #8823898 - Attachment is patch: false
> I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.

We would use the built in regex stuff. regex::Regex are not JS regexes. If we were to make that switch I'd prefer to move everything over to regex::Regex (it's possible that the regex parser could be configured to only accept JS regexes).
(In reply to Matt Brubeck (:mbrubeck) from comment #7)
> > We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra?
> 
> Removing `env_logger` from the `geckoservo` dependencies reduces the text
> segment by 370 KiB, so this idea is very promising.

I do use the rust logging stuff pretty heavily when working on the style system, but rarely need the regexp matching. How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature?
(In reply to Manish Goregaokar [:manishearth] from comment #10)
> > I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.
> 
> We would use the built in regex stuff. regex::Regex are not JS regexes. If
> we were to make that switch I'd prefer to move everything over to
> regex::Regex (it's possible that the regex parser could be configured to
> only accept JS regexes).

What do you mean by "built in regex stuff"? Gecko currently uses SpiderMonkey's regex, which... causes problem like bug 1310335.

But I think we can ask Gecko to evulate @-moz-document rule like what we plan to do for media queries, so we probably don't need any regex stuff for that in the servo side.
"preexisting", not "built in", sorry :)
> How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature?

Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex), and it turns out that it is already an optional feature!  Submitted https://github.com/servo/servo/pull/14866 to disable this feature.
Just for reference, I also measured a build with `opt-level=3` and found that the text segment is 230 KiB larger than the same revision compiled with the current default `opt-level=2`.
(In reply to Matt Brubeck (:mbrubeck) from comment #14)
> Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex),

I was going to ask whether regex could be slimmed down in any way, but then I checked re2 (https://github.com/google/re2) and libre2.so.0 is about 500KB (~490KB .text), so perhaps 331KB for a regex library isn't really unusual!
(In reply to Matt Brubeck (:mbrubeck) from comment #15)
> Just for reference, I also measured a build with `opt-level=3` and found
> that the text segment is 230 KiB larger than the same revision compiled with
> the current default `opt-level=2`.

Good to know! I filed bug 1328954 to evaluate the performance tradeoff sometime down the line.
Hi, some things off the top of my head:

- Are you compiling with full debug symbols even though you might only need line tables for backtraces?
- If stylo is compiled into a cdylib or staticlib, did you try compiling with LTO? This might give LLVM a chance to deduplicate some code.
- I'm vaguely aware of some functions generated by #[derive] being unnecessarily big. "doener" and "eddyb" over on the #rustc channel might know more about this.
Here's the equivalent list for Windows from Chromium's Windows Binary Sizes tools. It looks pretty similar, as you'd expect.
(In reply to michaelwoerister from comment #18)
> Hi, some things off the top of my head:
> 
> - Are you compiling with full debug symbols even though you might only need
> line tables for backtraces?

The Gecko build system strips debug symbols, and our measurements are just for text size, so I doubt that impacts us here.

> - If stylo is compiled into a cdylib or staticlib, did you try compiling
> with LTO? This might give LLVM a chance to deduplicate some code.

We are compiling with LTO.

> - I'm vaguely aware of some functions generated by #[derive] being
> unnecessarily big. "doener" and "eddyb" over on the #rustc channel might
> know more about this.

I don't _think_ that #[derive]-ed functions are at the top of our list right now (it's more stuff like a 500k parse function). But check with mbrubeck and Manishearth, who have looked more recently than I.
(In reply to Matt Brubeck (:mbrubeck) from comment #15)
> Just for reference, I also measured a build with `opt-level=3` and found
> that the text segment is 230 KiB larger than the same revision compiled with
> the current default `opt-level=2`.

In the disassembly I'm seeing a _lot_ of unrolling of memcpy- and memset-like code sequences. I hear that LLVM can be quite eager with this stuff at opt-level 2 and 3.

Just as an experiment, could you try building with opt-level of s or z, to see how much smaller the code could in theory get?
Ah, I missed the discussion of -Os in comment 0.
Among other things, https://github.com/servo/rust-cssparser/pull/122 rewrites cssparser::parse_color_keyword to use a rust-phf map instead of a large `match` expression that compiled to an amount of code proportional to the number of keywords. Now there’s a &[(&str, Color)] table backing a static hash map, which takes 3576 bytes (+ storage for the keyword strings) for 149 color keywords.
Depends on: 1345207
Depends on: 1345688
Priority: -- → P5
Depends on: 1349438
Depends on: 1351737
Here's a neat visualization (open index.html) of a Stylo-enabled libxul, produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured out how to get it working with "diff" mode to see just the stylo bits, but for that I can use the .txt file.
(In reply to David Major [:dmajor] from comment #24)
> Created attachment 8871389 [details]
> visualization with webtreemap
> 
> Here's a neat visualization (open index.html) of a Stylo-enabled libxul,
> produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured
> out how to get it working with "diff" mode to see just the stylo bits, but
> for that I can use the .txt file.

This is awesome!
Depends on: 1369420
Blocks: stylo-release
No longer blocks: stylo
Looks like I misinterpreted comment 24, and in particular didn't realize that the generated code was accounted for in objdir, rather than in servo/. So the overall codesize regression there is about 4MB. That was a month ago, and glandium suggests that it's currently 5.7MB on linux (though when he measured a build from a month ago, he got 4.6MB, so linux might be slightly worse on this front).

So anway, it's plausible that we're currently looking at a 5MB codesize regression on Windows, which is significant. Luckily dmajor has been digging into this, so hopefully he'll find some wins.
Do we know how much would be removed from removing the non-stylo CSS stuff from gecko?
from styloDelta.txt, looks like the codesize increase breaks down as follows:
* 790k in servo/
* 270k from third_party/rust
* 2250k from objdir (generated code)
* 155k of standard library code (not in the breakdown, but shows up in the delta).

That gets us to around 3.5M, which is 0.5M short of the total reported in the delta. There's probably some increased size just from various handles getting activated, but 0.5M looks like a lot.

Then again, the 4M was from "raw symbol differences". If I scroll down, I also see:

Merged Sections / Types
Merged Count  : 5
--------------------------------------
Increases in Total Count
  Total Size  Total Count  Name
     3594061         4046  code
       36592         1275  rdata
         135           32  thunk
         161           17  bss
         248           14  data


So maybe 3.6 is more accurate?

Anyway, would be good to redo these measurements.


(In reply to Mike Hommey [:glandium] from comment #27)
> Do we know how much would be removed from removing the non-stylo CSS stuff
> from gecko?


The graph shows about 800k in layout/style, most but not all of which will be removable.
Attached file styloDiffJun28.txt
> Anyway, would be good to redo these measurements.

Here's a new diff based on yesterday's m-c, Win64 build.

File size difference in xul.dll:
77,340,672 with stylo
71,698,432 no stylo
 5,642,240 difference

The SymbolSort tool shows almost the same number:
Total Size   : 5632000

Of which, some notable directories are:
     2979040    1526  e:\m-c\objdir\toolkit\library\x86_64-pc-windows-msvc\release\build\style-9b15ab90f9d7a34a\out\properties.rs
     1258803    1007  e:\m-c\servo
      294264     263  e:\m-c\third_party\rust
      249767    1423  c:\projects\rust\src
      112226    1174  e:\m-c\layout\style
Yikes. Seems like we need to do find some way to slim down that generated code.
It's probably worth checking where the code size is from.

One thing I'm suspecting is the serialization. ToCss could be a significant contributor of the code size, and since to_css is generic, we may unexpectedly double or even triple the code it generates by using multiple different types to get the result. That is one thing probably worth checking.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #31)
There's a breakdown by function in the .txt file.

There's 547k in style::properties::PropertyDeclaration::parse_into (bug 1351737).
There's a total of 908k in functions like "style::properties::substitute_variables_XYZ_slow<closure>".
Grep for "to_css" sums up to only 202k.
(In reply to David Major [:dmajor] from comment #32)
> There's a total of 908k in functions like
> "style::properties::substitute_variables_XYZ_slow<closure>".

!!!!!

That definitely sounds wrong. File a bug and needinfo SimonSapin?
Depends on: 1377262
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30)
> Yikes. Seems like we need to do find some way to slim down that generated code.

properties.rs seems to go heavy on the "for each CSS property, generate code specific to that property" approach. How feasible would it be to use a more table-based approach, where 200+ codepaths can collapse down to a single generic one that takes different branches based on some bits set in a table somewhere?
(In reply to David Major [:dmajor] from comment #34)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30)
> > Yikes. Seems like we need to do find some way to slim down that generated code.
> 
> properties.rs seems to go heavy on the "for each CSS property, generate code
> specific to that property" approach. How feasible would it be to use a more
> table-based approach, where 200+ codepaths can collapse down to a single
> generic one that takes different branches based on some bits set in a table
> somewhere?

In general that's the right idea here. We went with the codegen approach because there was a lot of exploration involved in hooking up the value computation to C++, and we had to figure out a lot of the requirements as we went.

Now that most of the code is written, we should try to figure out where the commonalities are and collapse them into shared code (especially in cases that are less performance-sensitive). In order to guide those efforts for maximal impact, it would be really helpful to get a breakdown of the big-ticket categories within the generated code. comment 29 is a great start, especially since it showed that one if the biggest offending categories is CSS variables, which are not at all performance-sensitive. If you can expand on that breakdown, it would be really helpful. Let me know know if you need help navigating the mako templates.
(In reply to David Major [:dmajor] from comment #34)
> properties.rs seems to go heavy on the "for each CSS property, generate code
> specific to that property" approach. How feasible would it be to use a more
> table-based approach, where 200+ codepaths can collapse down to a single
> generic one that takes different branches based on some bits set in a table
> somewhere?

For what it’s worth, this design is mostly mine from when the goal was to make something work at all in Servo, and I didn’t consider code size at the time. In particular, the goal to encode specified values in the type system with a different Rust type for potentially every CSS property means that there is not an easy way to manipulate just "a property value". There’s a giant Rust `enum` that also encodes which property it is (effectively the property name). And for a long time, PropertyId did not exist and the style system largely avoided needing it by generating per-property functions called in big `match` expressions.

Now that we *are* looking at code size, we can definitely improve on this design. It’s not all-or-nothing, though. I think we should look at each function and try to deduplicate them one by one, for example by taking a PropertyId parameter instead. It’s not gonna be "remove everything and replace it with some bits in a table" in one go.

I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good starting point.
(In reply to Simon Sapin (:SimonSapin) from comment #36)
> I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good
> starting point.

Great - thanks Simon!
Depends on: 1377594
Attached file styloDiffJul14.txt
With bug 1377262 landed, Stylo now weighs in at 4.2MB (DLL size).

Here's a new SymbolSort output for anyone curious. I haven't spent too long looking at it yet. The interesting bits start at line 5786, "Increases in Total Size".

I am guessing that the previously-547k PropertyDeclaration::parse_into is now split among:
      214140            1  style::properties::LonghandId::parse_value
       71669            1  style::properties::PropertyDeclaration::parse_into
(and maybe some other smaller ones further down the list?)

There's still a bunch of animated_properties code near the top (bug 1377594).
Note: After bug 1386371 lands, LTO for Rust code will be disabled by default in local builds, which causes a significant increase in code size.  (Official release builds will still have LTO enabled, as before.)

If you are measuring code size, you will want LTO enabled.  (This may also affect other performance measurements.)  Add this line to the mozconfig for your optimized builds:

CARGO_RUSTCFLAGS=-C lto

...or this line, which enables Rust LTO along with other options that are used in official release builds but are too slow to enable by default (such as gcc's identical code folding):

ac_add_options --enable-release
Depends on: 1397386
status-firefox57=wontfix unless someone thinks this bug should block 57
Depends on: 1408300
Depends on: 1444097
Assignee: mbrubeck → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: