Closed Bug 1351737 Opened 7 years ago Closed 6 years ago

stylo: Large code size for property value parsing

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- affected

People

(Reporter: froydnj, Assigned: bholley)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

Attachments

(1 file)

A recent stylo build has the following in readelf output:

 77977: 0000000003426fb0 0x49b0c FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration5parse17h038457228acc1e74E
 77979: 000000000329e760 0x19a87 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h63f4d482f0751482E
 77980: 00000000033f7ca0 0x190af FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h75e46498d8a31402E
 77978: 0000000003471490 95917 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h499a60559f4b27b0E
 77981: 0000000003239d20 95581 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17hae77ba661952b0e7E
 80390: 00000000032628b0 90911 FUNC    LOCAL  DEFAULT   12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E
 83820: 0000000004966de8 87888 OBJECT  LOCAL  DEFAULT   23 ref.Yf
 80391: 0000000003411020 82195 FUNC    LOCAL  DEFAULT   12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E.911

Ignoring the ref.Yf bit, that's ~1MB of code just for 7 functions.  There's got to be some bloat that be shaved off there.  ParsedDeclaration::expand is probably the easiest place to start; why is its code being duplicated four times?
We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?

This means that the closure will no longer be inlined (which can get us some perf benefits like eliding capacity checks, but last I checked that wasn't happening *anyway*; ¯\_(ツ)_/¯ llvm). Also, not inlining the function will be a perf benefit anyway; but come to think of it LLVM should have realized the closure was being called a million times and not inlined it, but last I checked it wasn't, so we had the worst of both worlds; ¯\_(ツ)_/¯ llvm.
> We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?

That sounds overkill to me. Also, some of them are rather hot paths, so I don't think we'd like to malloc in there. https://github.com/servo/servo/pull/16182 gets rid of one of the copies.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?
> 
> That sounds overkill to me. Also, some of them are rather hot paths, so I
> don't think we'd like to malloc in there.
> https://github.com/servo/servo/pull/16182 gets rid of one of the copies.

Actually, it gets rid of two of them.
These closures are all very similar. I submitted https://github.com/servo/servo/pull/16183 to make the expand method non-generic, so it should only appear once in generated code.
>  Also, some of them are rather hot paths, so I don't think we'd like to malloc in there. 

Pass a reference to a closure then :)
Summary: Stylo's ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous
Nathan, can you remeasure? How much did we win with comment 4?
Priority: -- → P4
Comment 4's pull request made things much better:

 79332: 000000000347ca90 77836 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration11expand_into17hd122e59b433e61b9E

We only have one copy of the expansion code and as a bonus, it's smaller than any of the other previous instances (maybe we also removed some code in the intervening time?).

ParsedDeclaration::parse is still huge, and has gotten huger over the intervening time, for whatever reason:

 79333: 000000000349f860 0x4feb8 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration5parse17h643988f121904484E

About a 20K increase over the past week.
Ok. It may be that we just added a bunch of parsing code in the interim, and it's all landing in the same function, I'm not sure.

Do you think it's worth closing this bug, or is there more investigation to do here?
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> Do you think it's worth closing this bug, or is there more investigation to
> do here?

I think we should re-purpose the bug to look at ParsedDeclaration::parse; I have a hard time believing even an auto-generated function needs to be 300K+.

Gecko's CSSParserImpl and what looks like supporting code weighs in at ~120K or so.
Summary: stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration::parse is enormous
In a recent Win64 build the largest function by far is style::properties::PropertyDeclaration::parse_into, at 544K.

Is that the successor to ParsedDeclaration::parse? In other words, should we continue with it here in this bug?
Flags: needinfo?(bobbyholley)
Redirecting this to Simon.
Flags: needinfo?(bobbyholley) → needinfo?(simon.sapin)
Yes, the named changed when the signature changed but it’s basically still the same method.

grep counts 498 instances of #[inline] under components/style/properties and components/style/values combined. Some of these might be in Mako templates that end up generating multiple functions or methods. I think most of those were added out of habit more than for any specific reason. If some of those inlined functions or methods are called many times, they would end up duplicated in the machine code.

We might want to consider removing all of them, see if it regressed parsing performance (run `./mach gtest Stylo*` for a benchmark), and if it does try to recover performance by adding back a small number of them.

The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior.
Flags: needinfo?(simon.sapin)
Summary: stylo: ParsedDeclaration::parse is enormous → stylo: ParsedDeclaration::parse_into is enormous
By the way, it would be nice to decide a general guideline on when #[inline] should be use. I asked in 2014, the thread was inconclusive. https://internals.rust-lang.org/t/when-should-i-use-inline/598
> The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior.

(only in the same crate, you need `#[inline]` for cross crate stuff.

> more than for any specific reason.

I looked at some of these in the past when I was investigating this issue and most are fine; they're functions which only get called once. But we can have a closer look again.

But I suspect removing those inlines, while it may fix the size of this function, will not fix overall code size.
> But I suspect removing those inlines, while it may fix the size of this
> function, will not fix overall code size.

Indeed, it only shaved a few KB from xul.dll.

I did notice some other oddities though, for while I'll file dependent bugs shortly.
Depends on: 1375222
Depends on: 1375225
Depends on: 1375234
I’ve made the title more general since the function names tend to change over time as we move things around.

In bug 1397386, Nathan notes that LonghandId::parse_value is 218K according to bloaty on Linux x86-64 nightly. I can reproduce this with official Nightly builds, but not when building mozilla-central on my machine: I get 59K with Rust 1.19 and 45K with 1.20.

I can look into factoring some more code for similar properties, but such a difference in what we’re measuring is troubling. Nathan, any idea what could explain it? I build on Ubuntu 17.04, my mozconfig is:

ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-profiling
ac_add_options --enable-stylo

mk_add_options AUTOCLOBBER=1
ac_add_options --with-ccache=/usr/bin/ccache
mk_add_options 'export CCACHE_PREFIX=icecc'
mk_add_options 'export RUSTC_WRAPPER=sccache'
mk_add_options MOZ_MAKE_FLAGS="-j100"
Flags: needinfo?(nfroyd)
Summary: stylo: ParsedDeclaration::parse_into is enormous → stylo: Large code size for property value parsing
Rust LTO is disabled by default for development builds (bug 1386371). To re-enable it, see bug 1386371 comment 38.
Thanks Matt, I’ve added --enable--release and got LonghandId::parse_value at 225 KiB.
Flags: needinfo?(nfroyd)
status-firefox57=wontfix unless someone thinks this bug should block 57
Nathan says this bug is still an issue. Some excerpts from his recent analysis:


LonghandId::parse is the largest function in libxul at ~225K and several other style-related functions make a showing in the top 50 biggest things:

0x1c01c FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::hf4dc63cd1d21296c
83152 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h4cd1bbb5253aecc6
56128 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h95e73bec7530a735

Why do we have three copies? ToCss::to_css is parameterized over a W: writer, and we're monomorphizing each to_css implementation.  It's not obvious to me why we're doing this, and it's possible we'd win ~100k of code size with a less aggressive monomorphization strategy.


36020 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$::clone::ha6cd0389aaba9666
35948 FUNC    LOCAL  DEFAULT   11
_ZN77_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$5clone17ha6cd0389aaba9666E.8222

These, too, look like basically the same function as far as the name goes?  The ".8222" usually indicates the compiler has split out some code for its own reasons, but perhaps there's a way to tell the compiler to not do that?



31124 FUNC    LOCAL  DEFAULT   11
_ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17he0ebb06942021e1bE.8376
30640 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$::eq::he0ebb06942021e1b

These, too, look like basically the same function as far as the name goes.


And some animated_properties functions:

79128 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..fmt..Debug$GT$::fmt::h4ee10d544b639464
70760 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::AnimationValue::from_declaration::h7c5a09b4ed7df0a9
69224 FUNC    LOCAL  DEFAULT   11
style::properties::PropertyDeclaration::parse_into::h536d7f86f48913b6
43748 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$style..values..animated..Animate$GT$::animate::ha601ea3ed9214b46
35764 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::TransitionProperty::parse::hb9dc001b4160492b
33288 FUNC    LOCAL  DEFAULT   11
style::properties::UnparsedValue::substitute_variables::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h42758ba3ba241feb
32208 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::AnimationValue::from_computed_values::h3b1e28026534b89f
30932 FUNC    LOCAL  DEFAULT   11
style::values::specified::image::_$LT$impl$u20$style..parser..Parse$u20$for$u20$style..values..generics..image..Gradient$LT$style..values..specified..image..LineDirection$C$$u20$style..values..specified..length..Length$C$$u20$style..values..specified..length..LengthOrPercentage$C$$u20$style..values..specified..image..GradientPosition$C$$u20$style..values..specified..color..RGBAColor$C$$u20$style..values..specified..angle..Angle$GT$$GT$::parse::he617f438212b05cf
28556 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..cmp..PartialEq$GT$::ne::h40352ce229891eeb
(In reply to Chris Peterson [:cpeterson] from comment #21)
> 0x1c01c FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::hf4dc63cd1d21296c
> 83152 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::h4cd1bbb5253aecc6
> 56128 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::h95e73bec7530a735
> 
> Why do we have three copies? ToCss::to_css is parameterized over a W:
> writer, and we're monomorphizing each to_css implementation.  It's not
> obvious to me why we're doing this, and it's possible we'd win ~100k of code
> size with a less aggressive monomorphization strategy.

Hmm, property serialization can be super-hot though, so not sure virtual dispatch would be acceptable...

> 79128 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..
> fmt..Debug$GT$::fmt::h4ee10d544b639464

This one at least should be easy to make it go away. I don't know where do we use debug formatting on release, but we should just stop doing that.

> style::properties::animated_properties::TransitionProperty::parse::
> hb9dc001b4160492b

The fact that this function shows up here is kinda worrying. This is basically just the expansion of a single, straight-forward, match_ignore_ascii_case macro. This is basically a `match` macro expansion, whose arms in this case always return the same kind of value:

  http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/third_party/rust/cssparser/src/macros.rs#47

If this is generating so much code, this looks like something the Rust compiler devs could look at...
(In reply to Emilio Cobos Álvarez [:emilio] from comment #22)
> (In reply to Chris Peterson [:cpeterson] from comment #21)
> > 0x1c01c FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::hf4dc63cd1d21296c
> > 83152 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::h4cd1bbb5253aecc6
> > 56128 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::h95e73bec7530a735
> > 
> > Why do we have three copies? ToCss::to_css is parameterized over a W:
> > writer, and we're monomorphizing each to_css implementation.  It's not
> > obvious to me why we're doing this, and it's possible we'd win ~100k of code
> > size with a less aggressive monomorphization strategy.
> 
> Hmm, property serialization can be super-hot though, so not sure virtual
> dispatch would be acceptable...

Could we do something like have a single serialization routine that serializes to a string (so can be fast) and then the three copies simply call that one and then write out the string with W?  I don't think we necessarily need to go full virtual dispatch.

> > 79128 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..
> > fmt..Debug$GT$::fmt::h4ee10d544b639464
> 
> This one at least should be easy to make it go away. I don't know where do
> we use debug formatting on release, but we should just stop doing that.

Yes please! :)

> > style::properties::animated_properties::TransitionProperty::parse::
> > hb9dc001b4160492b
> 
> The fact that this function shows up here is kinda worrying. This is
> basically just the expansion of a single, straight-forward,
> match_ignore_ascii_case macro. This is basically a `match` macro expansion,
> whose arms in this case always return the same kind of value:

I could believe that if-then-else chains with string matching (and maybe the string matching gets inlined) could be expensive.  It does sound like this might be a rustc-side thing, though.
>  and we're monomorphizing each to_css implementation

Are we? I'm not certain we call to_css with anything other than a string writer. Those could also be closure symbols. What are the monomorphizations?
(In reply to Manish Goregaokar [:manishearth] from comment #24)
> >  and we're monomorphizing each to_css implementation
> 
> Are we? I'm not certain we call to_css with anything other than a string
> writer. Those could also be closure symbols. What are the monomorphizations?

I don't know.  I filed https://github.com/rust-lang/rust/issues/45691 on the symbol names being incomplete, and then haven't tried to track down the particulars after that.
Cameron, this is the code size bug we discussed in the Stylo meeting. Nathan's notes (comment 21) say it looks like aggressive monomorphization is generating many copies of property parsing functions that are nearly identical.
Flags: needinfo?(cam)
Per discussion with Cameron I'll take a crack at this.
Assignee: nobody → bobbyholley
Flags: needinfo?(cam) → needinfo?(bobbyholley)
Attached file styleweights.txt
Here are all the functions with "style" in the name on today's m-c, with functions weighing less than 1k removed.
bholley@slice ~/Desktop $ cat styleweights.txt | grep Debug
0000000076911920 0000000000002074 t _ZN80_$LT$style..gecko..pseudo_element..PseudoElement$u20$as$u20$core..fmt..Debug$GT$3fmt17h2e23d3128aeccf4fE
0000000076935568 0000000000002295 t _ZN84_$LT$style..gecko_bindings..structs..root..nsCSSUnit$u20$as$u20$core..fmt..Debug$GT$3fmt17h3a8d8af8433d85f8E
0000000077445216 0000000000003157 t _ZN101_$LT$style..properties..longhands.._moz_appearance..computed_value..T$u20$as$u20$core..fmt..Debug$GT$3fmt17hd6d38b1f3b3928ccE
0000000077591152 0000000000012340 t _ZN90_$LT$style..gecko_bindings..structs..root..nsCSSPropertyID$u20$as$u20$core..fmt..Debug$GT$3fmt17hbe53e067632f265fE
0000000077572320 0000000000018708 t _ZN87_$LT$style..gecko_bindings..structs..root..nsCSSKeyword$u20$as$u20$core..fmt..Debug$GT$3fmt17hc6409f76c195f24aE
0000000077324208 0000000000042049 t _ZN91_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..fmt..Debug$GT$3fmt17h33a504ee7627710bE

So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves.
> So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves.

#[cfg(debug_assertions)] and #[cfg_attr(debug_assertions, derive(Debug))] are enabled by default for debug builds but not release. This can be configured in the workspace’s Cargo.toml: https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections
(In reply to Simon Sapin (:SimonSapin) from comment #30)
> > So we can save about 80k by eliminating debug stringifiers from release builds. I'll have a look and see what that involves.
> 
> #[cfg(debug_assertions)] and #[cfg_attr(debug_assertions, derive(Debug))]
> are enabled by default for debug builds but not release. This can be
> configured in the workspace’s Cargo.toml:
> https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections

Well, the linker should normally eliminate them even if they're derived, but the problem is that we have a few places where we include {:?} in panic messages, which I'm tracking down now.

My understanding is that #[cfg_attr(debug_assertions, derive(Debug))] would cause compilation failures if the formatters were used in debug!() and warn!(), because those get typechecked even when they're not enabled. Is that right? If so, it would be nice to find some other way to enforce that this code doesn't end up in release builds. I wonder if there's some way the compiler could help us?
Flags: needinfo?(bobbyholley)
Tracked down most (but not quite all) of the Debug impls in the release binary: https://github.com/servo/servo/pull/19756
Depends on: 1431268
https://github.com/servo/servo/pull/19838 saved several hundred kilobytes.

I filed https://github.com/rust-lang/rust/issues/47796 for the large PartialEq and Clone implementations, which might save another 100k+ if we can get the appropriate compiler fix. We could also consider working around it locally for PropertyDeclaration, but it would be gross.
That PR is now https://github.com/rust-lang/rust/pull/47867 . It works in the sense that we should be able to use it to measure codesize impact (it does the right thing with correct code, but it also makes some wrong code compile), however it is not in a state that can be landed yet. Landing involves fixing a soundness hole that I (knowingly) introduced there, some various little fixups, and perhaps doing a small compiler RFC.
(In reply to Bobby Holley (:bholley) from comment #34)
> We could also consider working around it locally for PropertyDeclaration, but it would be gross.

I think this ended up happening, right nox? Are there any other major avenues for codesize improvement here, or should we call this fixed?

/me would needinfo nox, but needinfos are blocked
I'm going to call this WFM.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: