stylo: Large code size for property value parsing

NEW
Unassigned

Status

()

Core
CSS Parsing and Computation
P2
normal
8 months ago
12 hours ago

People

(Reporter: froydnj, Unassigned)

Tracking

(Depends on: 3 bugs, Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 unaffected, firefox56 unaffected, firefox57 wontfix, firefox58 affected)

Details

A recent stylo build has the following in readelf output:

 77977: 0000000003426fb0 0x49b0c FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration5parse17h038457228acc1e74E
 77979: 000000000329e760 0x19a87 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h63f4d482f0751482E
 77980: 00000000033f7ca0 0x190af FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h75e46498d8a31402E
 77978: 0000000003471490 95917 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17h499a60559f4b27b0E
 77981: 0000000003239d20 95581 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration6expand17hae77ba661952b0e7E
 80390: 00000000032628b0 90911 FUNC    LOCAL  DEFAULT   12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E
 83820: 0000000004966de8 87888 OBJECT  LOCAL  DEFAULT   23 ref.Yf
 80391: 0000000003411020 82195 FUNC    LOCAL  DEFAULT   12 _ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17hf5d3ab3789e43715E.911

Ignoring the ref.Yf bit, that's ~1MB of code just for 7 functions.  There's got to be some bloat that be shaved off there.  ParsedDeclaration::expand is probably the easiest place to start; why is its code being duplicated four times?
We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?

This means that the closure will no longer be inlined (which can get us some perf benefits like eliding capacity checks, but last I checked that wasn't happening *anyway*; ¯\_(ツ)_/¯ llvm). Also, not inlining the function will be a perf benefit anyway; but come to think of it LLVM should have realized the closure was being called a million times and not inlined it, but last I checked it wasn't, so we had the worst of both worlds; ¯\_(ツ)_/¯ llvm.
> We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?

That sounds overkill to me. Also, some of them are rather hot paths, so I don't think we'd like to malloc in there. https://github.com/servo/servo/pull/16182 gets rid of one of the copies.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > We have four of those because we're passing a generic push function to that thing. So it gets monomorphized four times. Perhaps we should accept a boxed closure?
> 
> That sounds overkill to me. Also, some of them are rather hot paths, so I
> don't think we'd like to malloc in there.
> https://github.com/servo/servo/pull/16182 gets rid of one of the copies.

Actually, it gets rid of two of them.
These closures are all very similar. I submitted https://github.com/servo/servo/pull/16183 to make the expand method non-generic, so it should only appear once in generated code.
>  Also, some of them are rather hot paths, so I don't think we'd like to malloc in there. 

Pass a reference to a closure then :)
Summary: Stylo's ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous
Nathan, can you remeasure? How much did we win with comment 4?
Priority: -- → P4
Comment 4's pull request made things much better:

 79332: 000000000347ca90 77836 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration11expand_into17hd122e59b433e61b9E

We only have one copy of the expansion code and as a bonus, it's smaller than any of the other previous instances (maybe we also removed some code in the intervening time?).

ParsedDeclaration::parse is still huge, and has gotten huger over the intervening time, for whatever reason:

 79333: 000000000349f860 0x4feb8 FUNC    LOCAL  DEFAULT   12 _ZN5style10properties17ParsedDeclaration5parse17h643988f121904484E

About a 20K increase over the past week.
Ok. It may be that we just added a bunch of parsing code in the interim, and it's all landing in the same function, I'm not sure.

Do you think it's worth closing this bug, or is there more investigation to do here?
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8)
> Do you think it's worth closing this bug, or is there more investigation to
> do here?

I think we should re-purpose the bug to look at ParsedDeclaration::parse; I have a hard time believing even an auto-generated function needs to be 300K+.

Gecko's CSSParserImpl and what looks like supporting code weighs in at ~120K or so.
Summary: stylo: ParsedDeclaration and PropertyDeclaration implementations are enormous → stylo: ParsedDeclaration::parse is enormous
Blocks: 1243581
In a recent Win64 build the largest function by far is style::properties::PropertyDeclaration::parse_into, at 544K.

Is that the successor to ParsedDeclaration::parse? In other words, should we continue with it here in this bug?
Flags: needinfo?(bobbyholley)
Redirecting this to Simon.
Flags: needinfo?(bobbyholley) → needinfo?(simon.sapin)
Yes, the named changed when the signature changed but it’s basically still the same method.

grep counts 498 instances of #[inline] under components/style/properties and components/style/values combined. Some of these might be in Mako templates that end up generating multiple functions or methods. I think most of those were added out of habit more than for any specific reason. If some of those inlined functions or methods are called many times, they would end up duplicated in the machine code.

We might want to consider removing all of them, see if it regressed parsing performance (run `./mach gtest Stylo*` for a benchmark), and if it does try to recover performance by adding back a small number of them.

The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior.
Flags: needinfo?(simon.sapin)
Summary: stylo: ParsedDeclaration::parse is enormous → stylo: ParsedDeclaration::parse_into is enormous
By the way, it would be nice to decide a general guideline on when #[inline] should be use. I asked in 2014, the thread was inconclusive. https://internals.rust-lang.org/t/when-should-i-use-inline/598
> The default (no inlining-related attribute) is not #[inline(never)], it leaves LLVM’s heuristics to their default behavior.

(only in the same crate, you need `#[inline]` for cross crate stuff.

> more than for any specific reason.

I looked at some of these in the past when I was investigating this issue and most are fine; they're functions which only get called once. But we can have a closer look again.

But I suspect removing those inlines, while it may fix the size of this function, will not fix overall code size.
> But I suspect removing those inlines, while it may fix the size of this
> function, will not fix overall code size.

Indeed, it only shaved a few KB from xul.dll.

I did notice some other oddities though, for while I'll file dependent bugs shortly.

Updated

5 months ago
Depends on: 1375222

Updated

5 months ago
Depends on: 1375225

Updated

5 months ago
Depends on: 1375234
Duplicate of this bug: 1397386
I’ve made the title more general since the function names tend to change over time as we move things around.

In bug 1397386, Nathan notes that LonghandId::parse_value is 218K according to bloaty on Linux x86-64 nightly. I can reproduce this with official Nightly builds, but not when building mozilla-central on my machine: I get 59K with Rust 1.19 and 45K with 1.20.

I can look into factoring some more code for similar properties, but such a difference in what we’re measuring is troubling. Nathan, any idea what could explain it? I build on Ubuntu 17.04, my mozconfig is:

ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-profiling
ac_add_options --enable-stylo

mk_add_options AUTOCLOBBER=1
ac_add_options --with-ccache=/usr/bin/ccache
mk_add_options 'export CCACHE_PREFIX=icecc'
mk_add_options 'export RUSTC_WRAPPER=sccache'
mk_add_options MOZ_MAKE_FLAGS="-j100"
Flags: needinfo?(nfroyd)
Summary: stylo: ParsedDeclaration::parse_into is enormous → stylo: Large code size for property value parsing
Rust LTO is disabled by default for development builds (bug 1386371). To re-enable it, see bug 1386371 comment 38.
Thanks Matt, I’ve added --enable--release and got LonghandId::parse_value at 225 KiB.
Flags: needinfo?(nfroyd)
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57: --- → wontfix
Nathan says this bug is still an issue. Some excerpts from his recent analysis:


LonghandId::parse is the largest function in libxul at ~225K and several other style-related functions make a showing in the top 50 biggest things:

0x1c01c FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::hf4dc63cd1d21296c
83152 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h4cd1bbb5253aecc6
56128 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..ToCss$GT$::to_css::h95e73bec7530a735

Why do we have three copies? ToCss::to_css is parameterized over a W: writer, and we're monomorphizing each to_css implementation.  It's not obvious to me why we're doing this, and it's possible we'd win ~100k of code size with a less aggressive monomorphization strategy.


36020 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$::clone::ha6cd0389aaba9666
35948 FUNC    LOCAL  DEFAULT   11
_ZN77_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..clone..Clone$GT$5clone17ha6cd0389aaba9666E.8222

These, too, look like basically the same function as far as the name goes?  The ".8222" usually indicates the compiler has split out some code for its own reasons, but perhaps there's a way to tell the compiler to not do that?



31124 FUNC    LOCAL  DEFAULT   11
_ZN79_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$2eq17he0ebb06942021e1bE.8376
30640 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..PropertyDeclaration$u20$as$u20$core..cmp..PartialEq$GT$::eq::he0ebb06942021e1b

These, too, look like basically the same function as far as the name goes.


And some animated_properties functions:

79128 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..fmt..Debug$GT$::fmt::h4ee10d544b639464
70760 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::AnimationValue::from_declaration::h7c5a09b4ed7df0a9
69224 FUNC    LOCAL  DEFAULT   11
style::properties::PropertyDeclaration::parse_into::h536d7f86f48913b6
43748 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$style..values..animated..Animate$GT$::animate::ha601ea3ed9214b46
35764 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::TransitionProperty::parse::hb9dc001b4160492b
33288 FUNC    LOCAL  DEFAULT   11
style::properties::UnparsedValue::substitute_variables::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h42758ba3ba241feb
32208 FUNC    LOCAL  DEFAULT   11
style::properties::animated_properties::AnimationValue::from_computed_values::h3b1e28026534b89f
30932 FUNC    LOCAL  DEFAULT   11
style::values::specified::image::_$LT$impl$u20$style..parser..Parse$u20$for$u20$style..values..generics..image..Gradient$LT$style..values..specified..image..LineDirection$C$$u20$style..values..specified..length..Length$C$$u20$style..values..specified..length..LengthOrPercentage$C$$u20$style..values..specified..image..GradientPosition$C$$u20$style..values..specified..color..RGBAColor$C$$u20$style..values..specified..angle..Angle$GT$$GT$::parse::he617f438212b05cf
28556 FUNC    LOCAL  DEFAULT   11
_$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..cmp..PartialEq$GT$::ne::h40352ce229891eeb
status-firefox55: affected → unaffected
status-firefox56: --- → unaffected
status-firefox58: --- → affected
status-firefox-esr52: --- → unaffected
Priority: P4 → P2
(In reply to Chris Peterson [:cpeterson] from comment #21)
> 0x1c01c FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::hf4dc63cd1d21296c
> 83152 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::h4cd1bbb5253aecc6
> 56128 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> ToCss$GT$::to_css::h95e73bec7530a735
> 
> Why do we have three copies? ToCss::to_css is parameterized over a W:
> writer, and we're monomorphizing each to_css implementation.  It's not
> obvious to me why we're doing this, and it's possible we'd win ~100k of code
> size with a less aggressive monomorphization strategy.

Hmm, property serialization can be super-hot though, so not sure virtual dispatch would be acceptable...

> 79128 FUNC    LOCAL  DEFAULT   11
> _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..
> fmt..Debug$GT$::fmt::h4ee10d544b639464

This one at least should be easy to make it go away. I don't know where do we use debug formatting on release, but we should just stop doing that.

> style::properties::animated_properties::TransitionProperty::parse::
> hb9dc001b4160492b

The fact that this function shows up here is kinda worrying. This is basically just the expansion of a single, straight-forward, match_ignore_ascii_case macro. This is basically a `match` macro expansion, whose arms in this case always return the same kind of value:

  http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/third_party/rust/cssparser/src/macros.rs#47

If this is generating so much code, this looks like something the Rust compiler devs could look at...
(In reply to Emilio Cobos Álvarez [:emilio] from comment #22)
> (In reply to Chris Peterson [:cpeterson] from comment #21)
> > 0x1c01c FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::hf4dc63cd1d21296c
> > 83152 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::h4cd1bbb5253aecc6
> > 56128 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..PropertyDeclaration$u20$as$u20$style_traits..values..
> > ToCss$GT$::to_css::h95e73bec7530a735
> > 
> > Why do we have three copies? ToCss::to_css is parameterized over a W:
> > writer, and we're monomorphizing each to_css implementation.  It's not
> > obvious to me why we're doing this, and it's possible we'd win ~100k of code
> > size with a less aggressive monomorphization strategy.
> 
> Hmm, property serialization can be super-hot though, so not sure virtual
> dispatch would be acceptable...

Could we do something like have a single serialization routine that serializes to a string (so can be fast) and then the three copies simply call that one and then write out the string with W?  I don't think we necessarily need to go full virtual dispatch.

> > 79128 FUNC    LOCAL  DEFAULT   11
> > _$LT$style..properties..animated_properties..AnimationValue$u20$as$u20$core..
> > fmt..Debug$GT$::fmt::h4ee10d544b639464
> 
> This one at least should be easy to make it go away. I don't know where do
> we use debug formatting on release, but we should just stop doing that.

Yes please! :)

> > style::properties::animated_properties::TransitionProperty::parse::
> > hb9dc001b4160492b
> 
> The fact that this function shows up here is kinda worrying. This is
> basically just the expansion of a single, straight-forward,
> match_ignore_ascii_case macro. This is basically a `match` macro expansion,
> whose arms in this case always return the same kind of value:

I could believe that if-then-else chains with string matching (and maybe the string matching gets inlined) could be expensive.  It does sound like this might be a rustc-side thing, though.
>  and we're monomorphizing each to_css implementation

Are we? I'm not certain we call to_css with anything other than a string writer. Those could also be closure symbols. What are the monomorphizations?
(In reply to Manish Goregaokar [:manishearth] from comment #24)
> >  and we're monomorphizing each to_css implementation
> 
> Are we? I'm not certain we call to_css with anything other than a string
> writer. Those could also be closure symbols. What are the monomorphizations?

I don't know.  I filed https://github.com/rust-lang/rust/issues/45691 on the symbol names being incomplete, and then haven't tried to track down the particulars after that.
You need to log in before you can comment on or make changes to this bug.