Closed
Bug 1351737
Opened 7 years ago
Closed 6 years ago
stylo: Large code size for property value parsing
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | wontfix |
firefox58 | --- | wontfix |
firefox59 | --- | affected |
People
(Reporter: froydnj, Assigned: bholley)
References
(Depends on 3 open bugs, Blocks 1 open bug)
Details
Attachments
(1 file)
107.04 KB,
text/plain
|
Details |
A recent stylo build has the following in readelf output: 77977: 0000000003426fb0 0x49b0c FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration5parse17h038457228acc1e74E 77979: 000000000329e760 0x19a87 FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration6expand17h63f4d482f0751482E 77980: 00000000033f7ca0 0x190af FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration6expand17h75e46498d8a31402E 77978: 0000000003471490 95917 FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration6expand17h499a60559f4b27b0E 77981: 0000000003239d20 95581 FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration6expand17hae77ba661952b0e7E 80390: 00000000032628b0 90911 FUNC LOCAL DEFAULT 12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E 83820: 0000000004966de8 87888 OBJECT LOCAL DEFAULT 23 ref.Yf 80391: 0000000003411020 82195 FUNC LOCAL DEFAULT 12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E.911 Ignoring the ref.Yf bit, that's ~1MB of code just for 7 functions. There's got to be some bloat that be shaved off there. ParsedDeclaration::expand is probably the easiest place to start; why is its code being duplicated four times?
Comment 1•7 years ago
|
||
We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure? This means that the closure will no longer be inlined (which can get us some perf benefits like eliding capacity checks, but last I checked that wasn't happening *anyway*; ¯\_(ツ)_/¯ llvm). Also, not inlining the function will be a perf benefit anyway; but come to think of it LLVM should have realized the closure was being called a million times and not inlined it, but last I checked it wasn't, so we had the worst of both worlds; ¯\_(ツ)_/¯ llvm.
Comment 2•7 years ago
|
||
> We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure? That sounds overkill to me. Also, some of them are rather hot paths, so I don't think we'd like to malloc in there. https://github.com/servo/servo/pull/16182 gets rid of one of the copies.
Comment 3•7 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2) > > We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure? > > That sounds overkill to me. Also, some of them are rather hot paths, so I > don't think we'd like to malloc in there. > https://github.com/servo/servo/pull/16182 gets rid of one of the copies. Actually, it gets rid of two of them.
Comment 4•7 years ago
|
||
These closures are all very similar. I submitted https://github.com/servo/servo/pull/16183 to make the expand method non-generic, so it should only appear once in generated code.
Comment 5•7 years ago
|
||
> Also, some of them are rather hot paths, so I don't think we'd like to malloc in there.
Pass a reference to a closure then :)
Updated•7 years ago
|
Summary: Stylo's ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous
Assignee | ||
Comment 6•7 years ago
|
||
Nathan, can you remeasure? How much did we win with comment 4?
Priority: -- → P4
Reporter | ||
Comment 7•7 years ago
|
||
Comment 4's pull request made things much better: 79332: 000000000347ca90 77836 FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration11expand_into17hd122e59b433e61b9E We only have one copy of the expansion code and as a bonus, it's smaller than any of the other previous instances (maybe we also removed some code in the intervening time?). ParsedDeclaration::parse is still huge, and has gotten huger over the intervening time, for whatever reason: 79333: 000000000349f860 0x4feb8 FUNC LOCAL DEFAULT 12 _ZN5style10properties17ParsedDeclaration5parse17h643988f121904484E About a 20K increase over the past week.
Assignee | ||
Comment 8•7 years ago
|
||
Ok. It may be that we just added a bunch of parsing code in the interim, and it's all landing in the same function, I'm not sure. Do you think it's worth closing this bug, or is there more investigation to do here?
Reporter | ||
Comment 9•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8) > Do you think it's worth closing this bug, or is there more investigation to > do here? I think we should re-purpose the bug to look at ParsedDeclaration::parse; I have a hard time believing even an auto-generated function needs to be 300K+. Gecko's CSSParserImpl and what looks like supporting code weighs in at ~120K or so.
Summary: stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration::parse is enormous
Comment 10•7 years ago
|
||
In a recent Win64 build the largest function by far is style::properties::PropertyDeclaration::parse_into, at 544K. Is that the successor to ParsedDeclaration::parse? In other words, should we continue with it here in this bug?
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 11•7 years ago
|
||
Redirecting this to Simon.
Flags: needinfo?(bobbyholley) → needinfo?(simon.sapin)
Comment 12•7 years ago
|
||
Yes, the named changed when the signature changed but it’s basically still the same method. grep counts 498 instances of #[inline] under components/style/properties and components/style/values combined. Some of these might be in Mako templates that end up generating multiple functions or methods. I think most of those were added out of habit more than for any specific reason. If some of those inlined functions or methods are called many times, they would end up duplicated in the machine code. We might want to consider removing all of them, see if it regressed parsing performance (run `./mach gtest Stylo*` for a benchmark), and if it does try to recover performance by adding back a small number of them. The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior.
Flags: needinfo?(simon.sapin)
Summary: stylo: ParsedDeclaration::parse is enormous → stylo: ParsedDeclaration::parse_into is enormous
Comment 13•7 years ago
|
||
By the way, it would be nice to decide a general guideline on when #[inline] should be use. I asked in 2014, the thread was inconclusive. https://internals.rust-lang.org/t/when-should-i-use-inline/598
Comment 14•7 years ago
|
||
> The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior. (only in the same crate, you need `#[inline]` for cross crate stuff. > more than for any specific reason. I looked at some of these in the past when I was investigating this issue and most are fine; they're functions which only get called once. But we can have a closer look again. But I suspect removing those inlines, while it may fix the size of this function, will not fix overall code size.
Comment 15•7 years ago
|
||
> But I suspect removing those inlines, while it may fix the size of this
> function, will not fix overall code size.
Indeed, it only shaved a few KB from xul.dll.
I did notice some other oddities though, for while I'll file dependent bugs shortly.
Comment 17•7 years ago
|
||
I’ve made the title more general since the function names tend to change over time as we move things around. In bug 1397386, Nathan notes that LonghandId::parse_value is 218K according to bloaty on Linux x86-64 nightly. I can reproduce this with official Nightly builds, but not when building mozilla-central on my machine: I get 59K with Rust 1.19 and 45K with 1.20. I can look into factoring some more code for similar properties, but such a difference in what we’re measuring is troubling. Nathan, any idea what could explain it? I build on Ubuntu 17.04, my mozconfig is: ac_add_options --enable-optimize ac_add_options --disable-debug ac_add_options --enable-profiling ac_add_options --enable-stylo mk_add_options AUTOCLOBBER=1 ac_add_options --with-ccache=/usr/bin/ccache mk_add_options 'export CCACHE_PREFIX=icecc' mk_add_options 'export RUSTC_WRAPPER=sccache' mk_add_options MOZ_MAKE_FLAGS="-j100"
Flags: needinfo?(nfroyd)
Summary: stylo: ParsedDeclaration::parse_into is enormous → stylo: Large code size for property value parsing
Comment 18•7 years ago
|
||
Rust LTO is disabled by default for development builds (bug 1386371). To re-enable it, see bug 1386371 comment 38.
Comment 19•7 years ago
|
||
Thanks Matt, I’ve added --enable--release and got LonghandId::parse_value at 225 KiB.
Flags: needinfo?(nfroyd)
Comment 20•7 years ago
|
||
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57:
--- → wontfix
Comment 21•7 years ago
|
||
Nathan says this bug is still an issue. Some excerpts from his recent analysis: LonghandId::parse is the largest function in libxul at ~225K and several other style-related functions make a showing in the top 50 biggest things: 0x1c01c FUNC LOCAL DEFAULT 11 _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::hf4dc63cd1d21296c 83152 FUNC LOCAL DEFAULT 11 _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h4cd1bbb5253aecc6 56128 FUNC LOCAL DEFAULT 11 _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h95e73bec7530a735 Why do we have three copies? ToCss::to_css is parameterized over a W: writer, and we're monomorphizing each to_css implementation. It's not obvious to me why we're doing this, and it's possible we'd win ~100k of code size with a less aggressive monomorphization strategy. 36020 FUNC LOCAL DEFAULT 11 _$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$::clone::ha6cd0389aaba9666 35948 FUNC LOCAL DEFAULT 11 _ZN77_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$5clone17ha6cd0389aaba9666E.8222 These, too, look like basically the same function as far as the name goes? The ".8222" usually indicates the compiler has split out some code for its own reasons, but perhaps there's a way to tell the compiler to not do that? 31124 FUNC LOCAL DEFAULT 11 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17he0ebb06942021e1bE.8376 30640 FUNC LOCAL DEFAULT 11 _$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$::eq::he0ebb06942021e1b These, too, look like basically the same function as far as the name goes. And some animated_properties functions: 79128 FUNC LOCAL DEFAULT 11 _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..fmt..Debug$GT$::fmt::h4ee10d544b639464 70760 FUNC LOCAL DEFAULT 11 style::properties::animated_properties::AnimationValue::from_declaration::h7c5a09b4ed7df0a9 69224 FUNC LOCAL DEFAULT 11 style::properties::PropertyDeclaration::parse_into::h536d7f86f48913b6 43748 FUNC LOCAL DEFAULT 11 _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$style..values..animated..Animate$GT$::animate::ha601ea3ed9214b46 35764 FUNC LOCAL DEFAULT 11 style::properties::animated_properties::TransitionProperty::parse::hb9dc001b4160492b 33288 FUNC LOCAL DEFAULT 11 style::properties::UnparsedValue::substitute_variables::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h42758ba3ba241feb 32208 FUNC LOCAL DEFAULT 11 style::properties::animated_properties::AnimationValue::from_computed_values::h3b1e28026534b89f 30932 FUNC LOCAL DEFAULT 11 style::values::specified::image::_$LT$impl$u20$style..parser..Parse$u20$for$u20$style..values..generics..image..Gradient$LT$style..values..specified..image..LineDirection$C$$u20$style..values..specified..length..Length$C$$u20$style..values..specified..length..LengthOrPercentage$C$$u20$style..values..specified..image..GradientPosition$C$$u20$style..values..specified..color..RGBAColor$C$$u20$style..values..specified..angle..Angle$GT$$GT$::parse::he617f438212b05cf 28556 FUNC LOCAL DEFAULT 11 _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..cmp..PartialEq$GT$::ne::h40352ce229891eeb
status-firefox56:
--- → unaffected
status-firefox58:
--- → affected
status-firefox-esr52:
--- → unaffected
Priority: P4 → P2
Comment 22•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #21) > 0x1c01c FUNC LOCAL DEFAULT 11 > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > ToCss$GT$::to_css::hf4dc63cd1d21296c > 83152 FUNC LOCAL DEFAULT 11 > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > ToCss$GT$::to_css::h4cd1bbb5253aecc6 > 56128 FUNC LOCAL DEFAULT 11 > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > ToCss$GT$::to_css::h95e73bec7530a735 > > Why do we have three copies? ToCss::to_css is parameterized over a W: > writer, and we're monomorphizing each to_css implementation. It's not > obvious to me why we're doing this, and it's possible we'd win ~100k of code > size with a less aggressive monomorphization strategy. Hmm, property serialization can be super-hot though, so not sure virtual dispatch would be acceptable... > 79128 FUNC LOCAL DEFAULT 11 > _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core.. > fmt..Debug$GT$::fmt::h4ee10d544b639464 This one at least should be easy to make it go away. I don't know where do we use debug formatting on release, but we should just stop doing that. > style::properties::animated_properties::TransitionProperty::parse:: > hb9dc001b4160492b The fact that this function shows up here is kinda worrying. This is basically just the expansion of a single, straight-forward, match_ignore_ascii_case macro. This is basically a `match` macro expansion, whose arms in this case always return the same kind of value: http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/third_party/rust/cssparser/src/macros.rs#47 If this is generating so much code, this looks like something the Rust compiler devs could look at...
Reporter | ||
Comment 23•7 years ago
|
||
(In reply to Emilio Cobos Álvarez [:emilio] from comment #22) > (In reply to Chris Peterson [:cpeterson] from comment #21) > > 0x1c01c FUNC LOCAL DEFAULT 11 > > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > > ToCss$GT$::to_css::hf4dc63cd1d21296c > > 83152 FUNC LOCAL DEFAULT 11 > > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > > ToCss$GT$::to_css::h4cd1bbb5253aecc6 > > 56128 FUNC LOCAL DEFAULT 11 > > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values.. > > ToCss$GT$::to_css::h95e73bec7530a735 > > > > Why do we have three copies? ToCss::to_css is parameterized over a W: > > writer, and we're monomorphizing each to_css implementation. It's not > > obvious to me why we're doing this, and it's possible we'd win ~100k of code > > size with a less aggressive monomorphization strategy. > > Hmm, property serialization can be super-hot though, so not sure virtual > dispatch would be acceptable... Could we do something like have a single serialization routine that serializes to a string (so can be fast) and then the three copies simply call that one and then write out the string with W? I don't think we necessarily need to go full virtual dispatch. > > 79128 FUNC LOCAL DEFAULT 11 > > _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core.. > > fmt..Debug$GT$::fmt::h4ee10d544b639464 > > This one at least should be easy to make it go away. I don't know where do > we use debug formatting on release, but we should just stop doing that. Yes please! :) > > style::properties::animated_properties::TransitionProperty::parse:: > > hb9dc001b4160492b > > The fact that this function shows up here is kinda worrying. This is > basically just the expansion of a single, straight-forward, > match_ignore_ascii_case macro. This is basically a `match` macro expansion, > whose arms in this case always return the same kind of value: I could believe that if-then-else chains with string matching (and maybe the string matching gets inlined) could be expensive. It does sound like this might be a rustc-side thing, though.
Comment 24•7 years ago
|
||
> and we're monomorphizing each to_css implementation
Are we? I'm not certain we call to_css with anything other than a string writer. Those could also be closure symbols. What are the monomorphizations?
Reporter | ||
Comment 25•7 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #24) > > and we're monomorphizing each to_css implementation > > Are we? I'm not certain we call to_css with anything other than a string > writer. Those could also be closure symbols. What are the monomorphizations? I don't know. I filed https://github.com/rust-lang/rust/issues/45691 on the symbol names being incomplete, and then haven't tried to track down the particulars after that.
Comment 26•6 years ago
|
||
Cameron, this is the code size bug we discussed in the Stylo meeting. Nathan's notes (comment 21) say it looks like aggressive monomorphization is generating many copies of property parsing functions that are nearly identical.
Assignee | ||
Comment 27•6 years ago
|
||
Per discussion with Cameron I'll take a crack at this.
Assignee: nobody → bobbyholley
Flags: needinfo?(cam) → needinfo?(bobbyholley)
Assignee | ||
Comment 28•6 years ago
|
||
Here are all the functions with "style" in the name on today's m-c, with functions weighing less than 1k removed.
Assignee | ||
Comment 29•6 years ago
|
||
bholley@slice ~/Desktop $ cat styleweights.txt | grep Debug 0000000076911920 0000000000002074 t _ZN80_$LT$style..gecko..pseudo_element..PseudoElement$u20$as$u20$core..fmt..Debug$GT$3fmt17h2e23d3128aeccf4fE 0000000076935568 0000000000002295 t _ZN84_$LT$style..gecko_bindings..structs..root..nsCSSUnit$u20$as$u20$core..fmt..Debug$GT$3fmt17h3a8d8af8433d85f8E 0000000077445216 0000000000003157 t _ZN101_$LT$style..properties..longhands.._moz_appearance..computed_value..T$u20$as$u20$core..fmt..Debug$GT$3fmt17hd6d38b1f3b3928ccE 0000000077591152 0000000000012340 t _ZN90_$LT$style..gecko_bindings..structs..root..nsCSSPropertyID$u20$as$u20$core..fmt..Debug$GT$3fmt17hbe53e067632f265fE 0000000077572320 0000000000018708 t _ZN87_$LT$style..gecko_bindings..structs..root..nsCSSKeyword$u20$as$u20$core..fmt..Debug$GT$3fmt17hc6409f76c195f24aE 0000000077324208 0000000000042049 t _ZN91_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..fmt..Debug$GT$3fmt17h33a504ee7627710bE So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves.
Comment 30•6 years ago
|
||
> So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves. #[cfg(debug_assertions)] and #[cfg_attr(debug_assertions, derive(Debug))] are enabled by default for debug builds but not release. This can be configured in the workspace’s Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections
Assignee | ||
Comment 31•6 years ago
|
||
(In reply to Simon Sapin (:SimonSapin) from comment #30) > > So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves. > > #[cfg(debug_assertions)] and #[cfg_attr(debug_assertions, derive(Debug))] > are enabled by default for debug builds but not release. This can be > configured in the workspace’s Cargo.toml: > https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections Well, the linker should normally eliminate them even if they're derived, but the problem is that we have a few places where we include {:?} in panic messages, which I'm tracking down now. My understanding is that #[cfg_attr(debug_assertions, derive(Debug))] would cause compilation failures if the formatters were used in debug!() and warn!(), because those get typechecked even when they're not enabled. Is that right? If so, it would be nice to find some other way to enforce that this code doesn't end up in release builds. I wonder if there's some way the compiler could help us?
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 32•6 years ago
|
||
Tracked down most (but not quite all) of the Debug impls in the release binary: https://github.com/servo/servo/pull/19756
Assignee | ||
Comment 33•6 years ago
|
||
https://github.com/servo/servo/pull/19756 saved 80k https://github.com/servo/servo/pull/19779 saves 60k more
Assignee | ||
Comment 34•6 years ago
|
||
https://github.com/servo/servo/pull/19838 saved several hundred kilobytes. I filed https://github.com/rust-lang/rust/issues/47796 for the large PartialEq and Clone implementations, which might save another 100k+ if we can get the appropriate compiler fix. We could also consider working around it locally for PropertyDeclaration, but it would be gross.
Comment 35•6 years ago
|
||
https://github.com/rust-lang/rust/pull/47841 for reducing code size of debug impls.
Comment 36•6 years ago
|
||
That PR is now https://github.com/rust-lang/rust/pull/47867 . It works in the sense that we should be able to use it to measure codesize impact (it does the right thing with correct code, but it also makes some wrong code compile), however it is not in a state that can be landed yet. Landing involves fixing a soundness hole that I (knowingly) introduced there, some various little fixups, and perhaps doing a small compiler RFC.
Assignee | ||
Comment 37•6 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #34) > We could also consider working around it locally for PropertyDeclaration, but it would be gross. I think this ended up happening, right nox? Are there any other major avenues for codesize improvement here, or should we call this fixed? /me would needinfo nox, but needinfos are blocked
Assignee | ||
Comment 38•6 years ago
|
||
I'm going to call this WFM.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•