1328497 - (stylo-codesize) stylo: Optimize codesize

Reporter

Description

•

8 years ago

I did some investigation today on codesize. I'm writing up the takeaways here while they're in my head. TL;DR: Stylo currently bloats libxul by 3.57M on linux64. This is ok but not great, and can probably be optimized further. Some overhead is inevitable while we're shipping two style systems at once, but we should minimize it. The easiest way to measure codesize on linux is by running |size libxul.so| and looking at the text segment (the data segment doesn't change much with/without stylo). I also investigated passing -Os instead of -O2 to rustc. This wins back about 0.7 megabytes, but I didn't check what impact it had on perf. It also requires nightly Rust, so we'll need to push on stabilization if we decide we want to take that path. Here are dumps of the symbol tables without stylo, with stylo, and with stylo + -Os: https://www.dropbox.com/s/9c8kfj8hm2m227p/readelf-nostylo.txt.bz2?dl=0 https://www.dropbox.com/s/1fztqnkmap1t1xu/readelf-stylo.txt.bz2?dl=0 https://www.dropbox.com/s/j2rk0t7pofpc9lk/stylo-size-with-Os.bz2?dl=0 Nathan did some analysis and found some interesting large-ish stylo functions in the top 100. I'll attach the results to the bug. Some ways to move forward here: * Look at the large functions Nathan found and see if we can make them smaller. A lot of them look like they ought to be a lot smaller. * Look for small-but-numerous auto-generated functions. * Investigate -Os more.

Bobby Holley (:bholley)

Reporter

Comment 1

•

8 years ago

Attached file large functions (size is the third column) — Details

Here are the large functions. Some of these are almost 100k and probably shouldn't be.

Manish Goregaokar [:manishearth]

Comment 2

•

8 years ago

The PropertyDeclaration ones *should* be implemented with a jump table; it is problematic if that's not the case. But given that we have 200-odd properties I wouldn't be surprised if it's large even with a jump table. CSSColor makes sense. GradiantKind::parse_radial does not, but that could be due to a lot of inlining. We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra? I wonder why clip_path has a large cascade_property function. Aggressive inlining?

Nicholas Nethercote [inactive]

Comment 3

•

8 years ago

Is there a clear path to removing the old style system? I ask because I've been around long enough to have seen multiple cases where we introduced a new and improved way of doing something, but never got around to getting rid of all the uses of the old way...

Andrew McCreight [:mccr8]

Comment 4

•

8 years ago

(In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment #3) > Is there a clear path to removing the old style system? I could be wrong, but I believe that only one is used at a time, so that should not be a problem.

Emilio Cobos Álvarez (:emilio)

Comment 5

•

8 years ago

This should help with one of them: https://github.com/servo/servo/pull/14844 I think I'll need to disassemble, but I think if we care a lot about code size we can make a bunch of the data static and index by discriminant value. I don't know if rust generates that kind of code by default, but if it doesn't we should probably also do it for perf, to avoid unnecessary branches.

Bobby Holley (:bholley)

Reporter

Comment 6

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #4) > (In reply to Nicholas Nethercote [:njn] (PTO until January 9th) from comment > #3) > > Is there a clear path to removing the old style system? > > I could be wrong, but I believe that only one is used at a time, so that > should not be a problem. Yeah - the plan of record is that the old style system should be dead code modulo the pref. Matt said he'd take over investigating this - thanks Matt!

Assignee: nobody → mbrubeck

Matt Brubeck (:mbrubeck)

Comment 7

•

8 years ago

> We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra? Removing `env_logger` from the `geckoservo` dependencies reduces the text segment by 370 KiB, so this idea is very promising.

Xidorn Quan [:xidorn] UTC+10

Comment 8

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #2) > We're pulling in the entire regex crate because of env_logger. Perhaps we > should turn that off and use a smaller logger that hooks into the Firefox > logging infra? I wonder whether we would eventually need regex crate to support regex in @-moz-document rules.

Matt Brubeck (:mbrubeck)

Comment 9

•

8 years ago

Attached file large functions via Bloaty McBloatface — Details

This mostly shows the same functions as in the previous attachment, but in a different format and with slightly different size data.

Matt Brubeck (:mbrubeck)

Updated

•

8 years ago

Attachment #8823898 - Attachment is patch: false

Manish Goregaokar [:manishearth]

Comment 10

•

8 years ago

> I wonder whether we would eventually need regex crate to support regex in @-moz-document rules. We would use the built in regex stuff. regex::Regex are not JS regexes. If we were to make that switch I'd prefer to move everything over to regex::Regex (it's possible that the regex parser could be configured to only accept JS regexes).

Bobby Holley (:bholley)

Reporter

Comment 11

•

8 years ago

(In reply to Matt Brubeck (:mbrubeck) from comment #7) > > We're pulling in the entire regex crate because of env_logger. Perhaps we should turn that off and use a smaller logger that hooks into the Firefox logging infra? > > Removing `env_logger` from the `geckoservo` dependencies reduces the text > segment by 370 KiB, so this idea is very promising. I do use the rust logging stuff pretty heavily when working on the style system, but rarely need the regexp matching. How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature?

Xidorn Quan [:xidorn] UTC+10

Comment 12

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #10) > > I wonder whether we would eventually need regex crate to support regex in @-moz-document rules. > > We would use the built in regex stuff. regex::Regex are not JS regexes. If > we were to make that switch I'd prefer to move everything over to > regex::Regex (it's possible that the regex parser could be configured to > only accept JS regexes). What do you mean by "built in regex stuff"? Gecko currently uses SpiderMonkey's regex, which... causes problem like bug 1310335. But I think we can ask Gecko to evulate @-moz-document rule like what we plan to do for media queries, so we probably don't need any regex stuff for that in the servo side.

Manish Goregaokar [:manishearth]

Comment 13

•

8 years ago

"preexisting", not "built in", sorry :)

Matt Brubeck (:mbrubeck)

Comment 14

•

8 years ago

> How much of the env_logger overhead is regexp? If the answer is "most", perhaps we could patch env_logger to make regexp log filters an optional feature? Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex), and it turns out that it is already an optional feature! Submitted https://github.com/servo/servo/pull/14866 to disable this feature.

Matt Brubeck (:mbrubeck)

Comment 15

•

8 years ago

Just for reference, I also measured a build with `opt-level=3` and found that the text segment is 230 KiB larger than the same revision compiled with the current default `opt-level=2`.

Nathan Froyd [:froydnj]

Comment 16

•

8 years ago

(In reply to Matt Brubeck (:mbrubeck) from comment #14) > Yes, the answer is "most" (of the 370 KiB text size, 331 KiB is from regex), I was going to ask whether regex could be slimmed down in any way, but then I checked re2 (https://github.com/google/re2) and libre2.so.0 is about 500KB (~490KB .text), so perhaps 331KB for a regex library isn't really unusual!

Bobby Holley (:bholley)

Reporter

Comment 17

•

8 years ago

(In reply to Matt Brubeck (:mbrubeck) from comment #15) > Just for reference, I also measured a build with `opt-level=3` and found > that the text segment is 230 KiB larger than the same revision compiled with > the current default `opt-level=2`. Good to know! I filed bug 1328954 to evaluate the performance tradeoff sometime down the line.

michaelwoerister

Comment 18

•

8 years ago

Hi, some things off the top of my head: - Are you compiling with full debug symbols even though you might only need line tables for backtraces? - If stylo is compiled into a cdylib or staticlib, did you try compiling with LTO? This might give LLVM a chance to deduplicate some code. - I'm vaguely aware of some functions generated by #[derive] being unnecessarily big. "doener" and "eddyb" over on the #rustc channel might know more about this.

(Away)

Comment 19

•

8 years ago

Attached file large functions on windows — Details

Here's the equivalent list for Windows from Chromium's Windows Binary Sizes tools. It looks pretty similar, as you'd expect.

Bobby Holley (:bholley)

Reporter

Comment 20

•

8 years ago

(In reply to michaelwoerister from comment #18) > Hi, some things off the top of my head: > > - Are you compiling with full debug symbols even though you might only need > line tables for backtraces? The Gecko build system strips debug symbols, and our measurements are just for text size, so I doubt that impacts us here. > - If stylo is compiled into a cdylib or staticlib, did you try compiling > with LTO? This might give LLVM a chance to deduplicate some code. We are compiling with LTO. > - I'm vaguely aware of some functions generated by #[derive] being > unnecessarily big. "doener" and "eddyb" over on the #rustc channel might > know more about this. I don't _think_ that #[derive]-ed functions are at the top of our list right now (it's more stuff like a 500k parse function). But check with mbrubeck and Manishearth, who have looked more recently than I.

(Away)

Comment 21

•

8 years ago

(In reply to Matt Brubeck (:mbrubeck) from comment #15) > Just for reference, I also measured a build with `opt-level=3` and found > that the text segment is 230 KiB larger than the same revision compiled with > the current default `opt-level=2`. In the disassembly I'm seeing a _lot_ of unrolling of memcpy- and memset-like code sequences. I hear that LLVM can be quite eager with this stuff at opt-level 2 and 3. Just as an experiment, could you try building with opt-level of s or z, to see how much smaller the code could in theory get?

(Away)

Comment 22

•

8 years ago

Ah, I missed the discussion of -Os in comment 0.

Simon Sapin (:SimonSapin)

Comment 23

•

8 years ago

Among other things, https://github.com/servo/rust-cssparser/pull/122 rewrites cssparser::parse_color_keyword to use a rust-phf map instead of a large `match` expression that compiled to an amount of code proportional to the number of keywords. Now there’s a &[(&str, Color)] table backing a static hash map, which takes 3576 bytes (+ storage for the keyword strings) for 149 color keywords.

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Depends on: 1345207

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Depends on: 1345688

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Priority: -- → P5

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Depends on: 1349438

Nathan Froyd [:froydnj]

Updated

•

8 years ago

Depends on: 1351737

(Away)

Comment 24

•

8 years ago

Attached file visualization with webtreemap — Details

Here's a neat visualization (open index.html) of a Stylo-enabled libxul, produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured out how to get it working with "diff" mode to see just the stylo bits, but for that I can use the .txt file.

Bobby Holley (:bholley)

Reporter

Comment 25

•

8 years ago

(In reply to David Major [:dmajor] from comment #24) > Created attachment 8871389 [details] > visualization with webtreemap > > Here's a neat visualization (open index.html) of a Stylo-enabled libxul, > produced by https://github.com/rongjiecomputer/bloat-win. I haven't figured > out how to get it working with "diff" mode to see just the stylo bits, but > for that I can use the .txt file. This is awesome!

(Away)

Updated

•

7 years ago

Depends on: 1369420

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Blocks: stylo-release
No longer blocks: stylo

Bobby Holley (:bholley)

Reporter

Comment 26

•

7 years ago

Looks like I misinterpreted comment 24, and in particular didn't realize that the generated code was accounted for in objdir, rather than in servo/. So the overall codesize regression there is about 4MB. That was a month ago, and glandium suggests that it's currently 5.7MB on linux (though when he measured a build from a month ago, he got 4.6MB, so linux might be slightly worse on this front). So anway, it's plausible that we're currently looking at a 5MB codesize regression on Windows, which is significant. Luckily dmajor has been digging into this, so hopefully he'll find some wins.

Mike Hommey [:glandium]

Comment 27

•

7 years ago

Do we know how much would be removed from removing the non-stylo CSS stuff from gecko?

Bobby Holley (:bholley)

Reporter

Comment 28

•

7 years ago

from styloDelta.txt, looks like the codesize increase breaks down as follows: * 790k in servo/ * 270k from third_party/rust * 2250k from objdir (generated code) * 155k of standard library code (not in the breakdown, but shows up in the delta). That gets us to around 3.5M, which is 0.5M short of the total reported in the delta. There's probably some increased size just from various handles getting activated, but 0.5M looks like a lot. Then again, the 4M was from "raw symbol differences". If I scroll down, I also see: Merged Sections / Types Merged Count : 5 -------------------------------------- Increases in Total Count Total Size Total Count Name 3594061 4046 code 36592 1275 rdata 135 32 thunk 161 17 bss 248 14 data So maybe 3.6 is more accurate? Anyway, would be good to redo these measurements. (In reply to Mike Hommey [:glandium] from comment #27) > Do we know how much would be removed from removing the non-stylo CSS stuff > from gecko? The graph shows about 800k in layout/style, most but not all of which will be removable.

(Away)

Comment 29

•

7 years ago

Attached file styloDiffJun28.txt — Details

> Anyway, would be good to redo these measurements. Here's a new diff based on yesterday's m-c, Win64 build. File size difference in xul.dll: 77,340,672 with stylo 71,698,432 no stylo 5,642,240 difference The SymbolSort tool shows almost the same number: Total Size : 5632000 Of which, some notable directories are: 2979040 1526 e:\m-c\objdir\toolkit\library\x86_64-pc-windows-msvc\release\build\style-9b15ab90f9d7a34a\out\properties.rs 1258803 1007 e:\m-c\servo 294264 263 e:\m-c\third_party\rust 249767 1423 c:\projects\rust\src 112226 1174 e:\m-c\layout\style

Bobby Holley (:bholley)

Reporter

Comment 30

•

7 years ago

Yikes. Seems like we need to do find some way to slim down that generated code.

Xidorn Quan [:xidorn] UTC+10

Comment 31

•

7 years ago

It's probably worth checking where the code size is from. One thing I'm suspecting is the serialization. ToCss could be a significant contributor of the code size, and since to_css is generic, we may unexpectedly double or even triple the code it generates by using multiple different types to get the result. That is one thing probably worth checking.

(Away)

Comment 32

•

7 years ago

(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #31) There's a breakdown by function in the .txt file. There's 547k in style::properties::PropertyDeclaration::parse_into (bug 1351737). There's a total of 908k in functions like "style::properties::substitute_variables_XYZ_slow<closure>". Grep for "to_css" sums up to only 202k.

Bobby Holley (:bholley)

Reporter

Comment 33

•

7 years ago

(In reply to David Major [:dmajor] from comment #32) > There's a total of 908k in functions like > "style::properties::substitute_variables_XYZ_slow<closure>". !!!!! That definitely sounds wrong. File a bug and needinfo SimonSapin?

(Away)

Updated

•

7 years ago

Depends on: 1377262

(Away)

Comment 34

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30) > Yikes. Seems like we need to do find some way to slim down that generated code. properties.rs seems to go heavy on the "for each CSS property, generate code specific to that property" approach. How feasible would it be to use a more table-based approach, where 200+ codepaths can collapse down to a single generic one that takes different branches based on some bits set in a table somewhere?

Bobby Holley (:bholley)

Reporter

Comment 35

•

7 years ago

(In reply to David Major [:dmajor] from comment #34) > (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #30) > > Yikes. Seems like we need to do find some way to slim down that generated code. > > properties.rs seems to go heavy on the "for each CSS property, generate code > specific to that property" approach. How feasible would it be to use a more > table-based approach, where 200+ codepaths can collapse down to a single > generic one that takes different branches based on some bits set in a table > somewhere? In general that's the right idea here. We went with the codegen approach because there was a lot of exploration involved in hooking up the value computation to C++, and we had to figure out a lot of the requirements as we went. Now that most of the code is written, we should try to figure out where the commonalities are and collapse them into shared code (especially in cases that are less performance-sensitive). In order to guide those efforts for maximal impact, it would be really helpful to get a breakdown of the big-ticket categories within the generated code. comment 29 is a great start, especially since it showed that one if the biggest offending categories is CSS variables, which are not at all performance-sensitive. If you can expand on that breakdown, it would be really helpful. Let me know know if you need help navigating the mako templates.

Simon Sapin (:SimonSapin)

Comment 36

•

7 years ago

(In reply to David Major [:dmajor] from comment #34) > properties.rs seems to go heavy on the "for each CSS property, generate code > specific to that property" approach. How feasible would it be to use a more > table-based approach, where 200+ codepaths can collapse down to a single > generic one that takes different branches based on some bits set in a table > somewhere? For what it’s worth, this design is mostly mine from when the goal was to make something work at all in Servo, and I didn’t consider code size at the time. In particular, the goal to encode specified values in the type system with a different Rust type for potentially every CSS property means that there is not an easy way to manipulate just "a property value". There’s a giant Rust `enum` that also encodes which property it is (effectively the property name). And for a long time, PropertyId did not exist and the style system largely avoided needing it by generating per-property functions called in big `match` expressions. Now that we *are* looking at code size, we can definitely improve on this design. It’s not all-or-nothing, though. I think we should look at each function and try to deduplicate them one by one, for example by taking a PropertyId parameter instead. It’s not gonna be "remove everything and replace it with some bits in a table" in one go. I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good starting point.

Bobby Holley (:bholley)

Reporter

Comment 37

•

7 years ago

(In reply to Simon Sapin (:SimonSapin) from comment #36) > I’ll work on this. substitute_variables_* (bug 1377262) seems to be a good > starting point. Great - thanks Simon!

(Away)

Updated

•

7 years ago

Depends on: 1377594

(Away)

Comment 38

•

7 years ago

Attached file styloDiffJul14.txt — Details

With bug 1377262 landed, Stylo now weighs in at 4.2MB (DLL size). Here's a new SymbolSort output for anyone curious. I haven't spent too long looking at it yet. The interesting bits start at line 5786, "Increases in Total Size". I am guessing that the previously-547k PropertyDeclaration::parse_into is now split among: 214140 1 style::properties::LonghandId::parse_value 71669 1 style::properties::PropertyDeclaration::parse_into (and maybe some other smaller ones further down the list?) There's still a bunch of animated_properties code near the top (bug 1377594).

Matt Brubeck (:mbrubeck)

Comment 39

•

7 years ago

Note: After bug 1386371 lands, LTO for Rust code will be disabled by default in local builds, which causes a significant increase in code size. (Official release builds will still have LTO enabled, as before.) If you are measuring code size, you will want LTO enabled. (This may also affect other performance measurements.) Add this line to the mozconfig for your optimized builds: CARGO_RUSTCFLAGS=-C lto ...or this line, which enables Rust LTO along with other options that are used in official release builds but are too slow to enable by default (such as gcc's identical code folding): ac_add_options --enable-release

Josh Matthews [:jdm]

Comment 40

•

7 years ago

https://github.com/servo/servo/pull/18396 yields a 23.4k win.

Nathan Froyd [:froydnj]

Updated

•

7 years ago

Depends on: 1397386

Chris Peterson [:cpeterson]

Comment 41

•

7 years ago

status-firefox57=wontfix unless someone thinks this bug should block 57

status-firefox57: --- → wontfix

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Depends on: 1408300

Matt Brubeck (:mbrubeck)

Updated

•

7 years ago

Depends on: 1444097

Matt Brubeck (:mbrubeck)

Updated

•

6 years ago

Assignee: mbrubeck → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

large functions (size is the third column) 8 years ago Bobby Holley (:bholley) 2.87 KB, text/plain		Details
large functions via Bloaty McBloatface 8 years ago Matt Brubeck (:mbrubeck) 2.17 KB, text/plain		Details
large functions on windows 8 years ago (Away) 7.82 KB, text/plain		Details
visualization with webtreemap 8 years ago (Away) 305.52 KB, application/x-zip-compressed		Details
styloDiffJun28.txt 7 years ago (Away) 993.15 KB, text/plain		Details
styloDiffJul14.txt 7 years ago (Away) 1.01 MB, text/plain		Details