Stylo: Lots of 48-byte stack locals being copied in style::properties::PropertyDeclaration::parse_into

NEW
Unassigned

Status

()

enhancement
P3
normal
2 years ago
Last year

People

(Reporter: dmajor, Unassigned)

Tracking

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 wontfix, firefox58 wontfix, firefox59 ?)

Details

Reporter

Description

2 years ago
style::properties::PropertyDeclaration::parse_into is full of these six-instruction sequences in the pattern of:

00000001`8044480f 0f1085d0020000  movups  xmm0,xmmword ptr [rbp+2D0h]
00000001`80444816 0f108de0020000  movups  xmm1,xmmword ptr [rbp+2E0h]
00000001`8044481d 0f1095f0020000  movups  xmm2,xmmword ptr [rbp+2F0h]
00000001`80444824 0f299590040000  movaps  xmmword ptr [rbp+490h],xmm2
00000001`8044482b 0f298d80040000  movaps  xmmword ptr [rbp+480h],xmm1
00000001`80444832 0f298570040000  movaps  xmmword ptr [rbp+470h],xmm0

By eyeball, I'd guess that these instructions make up around 20% of the function's size. It's particularly bad because we have to use a longer 7-byte form of the instruction to encode rbp offsets greater than 0xFF.

(I'm ignoring the movaps/movups disagreement. It's amusing, but I don't think it really matters perf-wise, and certainly not for code size.)

My very vague understanding is that Rust code tends to do a lot of copy-by-value. There's a lot of inlining going on, and my debug info isn't that great, so I really don't know what source code is leading to this, other than it's somewhere under the tree of parse_into. Do any stylo experts know offhand what these 0x30-byte objects might be, and whether we can handle them differently, e.g. pass by reference?
My guess would be that it's PropertyDeclaration, though that should be 32-bytes, not 48-bytes: 

http://searchfox.org/mozilla-central/rev/714606a8145636d93b116943d3a65a6a49d2acf8/servo/tests/unit/stylo/size_of.rs#39

It might be PropertyDeclarationBlock: http://searchfox.org/mozilla-central/rev/714606a8145636d93b116943d3a65a6a49d2acf8/servo/components/style/properties/declaration_block.rs#72

I'm not sure how big LonghandIdSet ends up being.
LonghandIdSet is 40 bytes in Stylo. (Easiest way I found to find out is `mach cargo-geckolib doc` then looking at target/geckolib/doc/style/properties/struct.LonghandIdSet.html.) This makes PropertyDeclarationBlock 72 bytes on 64-bit.
https://gist.github.com/anonymous/d4e363941ede04fb6d5e45d71805c0d1 is the output of:

  mach cargo-geckolib -- rustc -p style -- -Z print-type-sizes|grep '48 bytes'|grep 'type:'|sort

There’s 531 types of that size in the style crate, including 317 closures in code generated by Mako. This is two more than the number of longhand properties (279) plus the number of shorthands (36). Based on the names of captured variables, I suspect the closures passed to substitute_variables_*.

This patch https://github.com/servo/servo/compare/closures-by-ref passes these closures by reference instead of by value. Does it help?
Reporter

Comment 4

2 years ago
(In reply to Simon Sapin (:SimonSapin) from comment #3)
> This patch https://github.com/servo/servo/compare/closures-by-ref passes
> these closures by reference instead of by value. Does it help?

This removed 74KB from libxul. Nice!

However, it didn't change the size of PropertyDeclaration::parse_into, and I still see tons of this six-instruction movups/movaps sequence. So the savings must have come from elsewhere.
I’ve opened https://github.com/servo/servo/pull/17477 with that patch, but it sounds like we should keep this open even after that lands.

The next things that seem suspicious are:

* cssparser::ParseError<()>
* cssparser::ParseError<selectors::parser::SelectorParseError<style_traits::StyleParseError>>

… both 48 bytes. They’re used in lots of return types, but inside a Result which would then be larger. So it’s maybe not what causes these 6 instructions sequences.
Reporter

Comment 6

2 years ago
Is it possible that this might be some sub-48-byte object with padding?
Reporter

Comment 7

2 years ago
I inspected one of these objects live in today's Win64 nightly (hooray!).

movups  xmm0,xmmword ptr [rbp+2C0h]
movups  xmm1,xmmword ptr [rbp+2D0h]
movups  xmm2,xmmword ptr [rbp+2E0h] 

rbp+2C0 0000009a`ba3fc6f0  0000009a`ba3fc701 --> nearby address, must be some other stack object
rbp+2C8 0000009a`ba3fc6f8  00000000`00000000
rbp+2D0 0000009a`ba3fc700  00000000`00000000
rbp+2D8 0000009a`ba3fc708  000001e8`77590017 --> points to 0xe5e5e5e5 garbage. uninitialized?
rbp+2E0 0000009a`ba3fc710  000001e8`78bae548 --> points to a string!
rbp+2E8 0000009a`ba3fc718  00000000`0000000c --> string length?

That string is:
000001e8`78bae548  "cubic-bezier(.07, .95, 0, 1);.}."
000001e8`78bae568  "./* ::::::::::.   :: Rules for '"
000001e8`78bae588  "hiding' portions of the chrome f"
000001e8`78bae5a8  "or special.   :: kinds of window"
000001e8`78bae5c8  "s (not JUST browser windows) wit"

If I take 0xC as the length, that would leave just "cubic-bezier".

Does this help identify the object?
Reporter

Comment 8

2 years ago
> rbp+2C0 0000009a`ba3fc6f0  0000009a`ba3fc701 --> nearby address, must be some other stack object

On closer inspection, that address is actually within this same object. But the fact that it ends in "1" is a little scary. Low-order-bit-smuggling?
(0x1e8_78bae5a8, 0xC) fits the memory layout of a &str. One type that is moved around a lot during parsing and contains &str slices of CSS is cssparser::Token, but I landed a change 21 days ago to shrink it from 56 bytes to 32, so it doesn’t fit 48.

Although Token::Function is the 24th variant in source order, and I’m not aware of rustc doing (yet) any fancy optimization in assigning discriminant values, so its discriminant is probably 0x17. This could be the low bytes of 0x1e8_77590017, with the rest of the bytes being un-initialized padding.

Now that I think about it, Parser::next returns (and so moves) not just Token but Result<Token, BasicParseError>, which I think might fit 48 bytes. (32 for Token inside BasicParseError::Token, plus two nested enum discriminants each padded to one word for field alignment.)

So yeah, I think that’s it.

I’m already in the middle of a redesign of Parser::next and related APIs. Ideally I’d like to get the return type down to Result<&Token, SomeZeroSizedErrorType> which would be represented as a single nullable pointer. The lack of non-lexical lifetimes in the language gets in the way, though. To get something to borrow-check in today’s language I often need to clone the Token behind that pointer anyway, which would bring back these 48-bytes (or maybe 32-bytes) copies.
(In reply to David Major [:dmajor] from comment #8)
> > rbp+2C0 0000009a`ba3fc6f0  0000009a`ba3fc701 --> nearby address, must be some other stack object
> 
> On closer inspection, that address is actually within this same object. But
> the fact that it ends in "1" is a little scary. Low-order-bit-smuggling?

Theory: the low 0x01 byte is an enum discriminant. The rest of this word is uninitialized padding (that previously might have contained a pointer), which exists to keep the rest of the type word-aligned.
Priority: -- → P3
Summary: Lots of 48-byte stack locals being copied in style::properties::PropertyDeclaration::parse_into → Stylo: Lots of 48-byte stack locals being copied in style::properties::PropertyDeclaration::parse_into
Priority: P3 → --
Priority: -- → P4
Priority: P4 → P3
You need to log in before you can comment on or make changes to this bug.