1344131 - stylo: Setting inline style can be a lot slower than Gecko

Reporter

Description

•

8 years ago

Take this microbenchmark: <script> var div = document.createElement("div"); var s = div.style; var count = 1000000; var start = new Date; for (var i = 0; i < count; ++i) { s.top = "10px"; } var stop = new Date; document.write(stop - start); </script> In Gecko I get numbers on the order of 600ms. In Stylo I get numbers on the order of 900ms. For scale, by the way, on that testcase Chrome is at ~300ms and Safari at ~130ms, so even the non-stylo number is not amazing.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

8 years ago

> In Gecko I get numbers on the order of 600ms. Correction, closer to 300ms if I do an apples to apples comparison (same profile, no random other tabs running, etc).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

8 years ago

So I tried running this testcase (with a larger iteration count) for about 14s in both Gecko an stylo. In Gecko, I see time break down like so: 1s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration 1s: under nsDocument::BeginUpdate 1.5s: under nsDocument::EndUpdate (NS_IsMainThread shows up under RemoveScriptBlocker here....) 1.5s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment 7s: under nsCSSParser::ParseProperty and then some long-tail stuff. In stylo, it looks like this: 0.5s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration 0.5s: AppendUTF16toUTF8 (converting the property value to UTF-8) 0.5s: under nsDocument::BeginUpdate 0.6s: under nsDocument::EndUpdate 0.7s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment 3.5s: under nsStyledElement::SetInlineStyleDeclaration. 7s: ServoDeclarationBlock_SetPropertyById. Rescaling that to the Gecko times by using the overal timings, or the BeginUpdate/EndUpdate stuff, that means that for the same number of iterations: 1) ServoDeclarationBlock_SetPropertyById takes 14+s vs 7s for nsCSSParser::ParseProperty. 2) There's an extra cost of 7s for nsStyledElement::SetInlineStyleDeclaration. First, the simple bit. Why are we ending up under SetInlineStyleDeclaration? That means that "changed" was true near the tail end of nsDOMCSSDeclaration::ParsePropertyValue. And then we go to do a bunch of ~nsAttrValue, freeing misc containers, SetAttrAndNotify, constructing nsAttrValues and allocating their misc containers, etc. But why is "changed" true? It shouldn't be! But it looks like servo just returns whether the value parsed successfully. Whereas Gecko ends up in http://searchfox.org/mozilla-central/rev/546a05fec017cb785fca62a5199d42812a6a1fd3/layout/style/nsCSSDataBlock.cpp#37 and actually checks whether the value changed. Now the more complicated bit: why is ServoDeclarationBlock_SetPropertyById slow? Profile says a few things about the servo side: 1) Directly under glue::set_property we're doing malloc/free traffic. That accounts for about 1/6 of the time ServoDeclarationBlock_SetPropertyById takes (a bit over 1.1s out of 7s). 2) We're doing some spinlock stuff directly under glue::set_property. That takes 100-200ms (it's pretty scattered in the profile, hard to add it all up). 3) drop::hd805e44d693ee661 directly under glue::set_property is 150ms. 4) Under declaration_block::parse_one_declaration we have more malloc/free and spinlock traffic. That takes up about 2s. 5) The actual call to PropertyDeclaration::parse is about 2.7s, which is comparable to the time Gecko's parser bits take, or a bit less. It's hard to tell what the right apples to apples comparison is here. Anyway, I filed bug 1344135 on the boolean return value being wrong and bug 1344136 on nearly half the time under ServoDeclarationBlock_SetPropertyById being spent on malloc, free, and spinlocks.

Blocks: stylo

Depends on: 1344135, 1344136

Xidorn Quan [:xidorn] UTC+10

Comment 3

•

8 years ago

I'm more curious about the spinlock... why do we need spinlock there? We are on the main thread, and there shouldn't be any retention.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

8 years ago

The total spinlock time in my profile is order of 400ms out of 14s, so about 3%. A single set takes about 900ns. So about 26ns of spinlocking per call, at a guess, on 1.5-year-old laptop hardware. That does sound a little high.

Bobby Holley (:bholley)

Updated

•

8 years ago

Blocks: stylo-perf
No longer blocks: stylo

Bobby Holley (:bholley)

Updated

•

8 years ago

Priority: -- → P2

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

8 years ago

Depends on: 1346974

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

8 years ago

Depends on: 1354566

Manish Goregaokar [:manishearth]

Updated

•

8 years ago

Blocks: stylo

Bobby Holley (:bholley)

Updated

•

8 years ago

Assignee: nobody → simon.sapin

Xidorn Quan [:xidorn] UTC+10

Comment 5

•

8 years ago

It seems several optimizations have been landed since this bug filed. Probably we should profile again and see what's going on now? BTW, we probably should open a bug for having IDL binding pass in raw JS string for CSSOM, so that we can avoid double conversion in many cases.

Simon Sapin (:SimonSapin)

Assignee

Comment 6

•

8 years ago

Bug 1355106 is relevant to the double conversion.

Xidorn Quan [:xidorn] UTC+10

Updated

•

8 years ago

Depends on: 1355106

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

8 years ago

Passing raw JS string is rather annoying in various ways. But yes, we could temporarily do something like that. Bug 1355106 is a bit hard to do until we drop non-stylo or stop caring about its performance... So it's not going to be the bullet that saves us. And note that in the common case bindings don't really do "conversion" on a JS string: they either do a memcpy if it's non-latin1 or a fast byte-expanding copy if it is. So you will still end up doing just as much conversion as now even if you're coming from a JS string directly. Things will be a bit faster because you touch less memory, of course.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

8 years ago

Oh, and now that I have builds I can report on the remeasurement and reprofiling. In Gecko nightly, I get times of about 400ms on the microbenchmark from comment 0. Yes, I know comment 0 says 600ms. In Firefox 53 I get about 600ms. So we made some serious progress here at some point even for gecko. Looks like it happened between 53 and 54; I get 400ms in 54. In stylo nightly I get times of about 430ms. This is still a bit worse than (new) Gecko, but a good bit better than 53 was. For stylo, time breakdown under nsDOMCSSDeclaration::ParsePropertyValue looks like this now: 8% EndUpdate notifications. 8% AppendUTF16toUTF8 8% BeginUpdate notifiactions 6% under nsDOMCSSAttributeDeclaration::GetCSSDeclaration; about half is AttributeWillChange, 1% is nsContentUtils::HasMutationListeners, the rest is Element::GetInlineStyleDeclaration. 4% under nsDOMCSSAttributeDeclaration::GetServoCSSParsingEnvironment. This is 1% GetDefaultStyleAttrURLData, 1% GetBindingParent, 1% GetURLDataForStyleAttr, 1% self time. 57% is under Servo_DeclaraionBlock_SetPropertyById, breaking down as: 1% getting PropertyIds from nsCSSPropertyId. 8% under PropertyDeclarationBlock::extend_common. There's various vec::reserve stuff there, comparisons under push_common, etc. I dunno why the reserve() shows up at all; we should not be changing the length here. The rest of the time is pretty much under parse_one_declaration_into. The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric. One is: PropertyDeclaration::parse_into calls LengthOrPercentageOrAuto::parse_internal calls Parser::next_including_whitespace_and_comments calls tokenizer::next_token calls tokenizer::consume_numeric. The other one is: PropertyDeclaration::parse_into calls style..properties..CSSWideKeyword$u20$as$u20$style..parser..Parse$GT$::parse::heb90e588fd20a8e6 calls Parser::next_including_whitespace_and_comments calls tokenizer::next_token calls tokenizer::consume_numeric. I'm not sure what that symbol is that's being called from parse_into here, and hence whether we're really tokenizing the number twice or the profiler is just being confused. If there's a way to get better information about what that symbol is, I'd love to know!

Cameron McCormack (:heycam)

Comment 9

•

8 years ago

I'm guessing it's this: http://searchfox.org/mozilla-central/rev/d67ef71097da4d1aa344c9d9c672e49a7228e765/servo/components/style/properties/properties.mako.rs#510-515

Xidorn Quan [:xidorn] UTC+10

Comment 10

•

8 years ago

Also note that, in Gecko, we are still spending time on computing xml:base, which should be much expensive than GetServoCSSParsingEnvironment. It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase().

Cameron McCormack (:heycam)

Comment 11

•

8 years ago

We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure). But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time. Parser::expect_ident and friends return any unexpected token in the error part of their Result. Maybe we should have versions of these functions that don't need to know what token it did unexpectedly fine. Then we could have them do a quick check of the leading bytes to see what token we have next, so that when trying to parse a an "inherit" when the next byte is a digit, we could return quickly. (We would have to look past white space and comments when doing that.)

Simon Sapin (:SimonSapin)

Assignee

Comment 12

•

8 years ago

> The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric. Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice. > We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure). But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time. We’ve been talking about adding that buffer and reducing the size of tokens. I’ll work on that next. Checking leading bytes also sounds interesting. It’ll break through the nice abstraction between tokenizer and parser, but it may be worth it.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 13

•

8 years ago

> It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase(). Then the Gecko time drops to about 360ms. > Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice. That seems like something we should really fix; some tokens are pretty expensive to tokenize. Is there an existing bug/issue tracking this? (Gecko solves this by handing out a borrow to an internal token storage, but I'm pretty sure the borrow checker would correctly point out that this is insane in terms of guaranteeing safety statically.)

Flags: needinfo?(simon.sapin)

Simon Sapin (:SimonSapin)

Assignee

Comment 14

•

8 years ago

There is bug 1347408. Yes, separate "borrow the current token" and "advance to the next token" APIs would help eliminate this overhead. The current API gives ownership of the token to the caller. Token contains Cow<str> for ident-like things, which borrows the input when possible and has an allocated String when backslash-escapes are involves. Some callers take advantage of this ownership to avoid re-allocating. Although maybe backslash-escape are rare enough that we shouldn’t care about optimizing them? Another idea is to replace Cow<str> with something that always borrows the input and does unescaping on demand. This would eliminate String allocations in Token, making it cheap to clone.

Flags: needinfo?(simon.sapin)

Comment hidden (mozreview-request)

Simon Sapin (:SimonSapin)

Assignee

Updated

•

8 years ago

Keywords: leave-open

Emilio Cobos Álvarez (:emilio)

Comment 16

•

8 years ago

mozreview-review

Comment on attachment 8887360 [details] Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. https://reviewboard.mozilla.org/r/158194/#review163370 ::: layout/style/test/gtest/StyloParsingBench.cpp:71 (Diff revision 1) > + RefPtr<RawServoDeclarationBlock> block = Servo_DeclarationBlock_CreateEmpty().Consume(); > + RefPtr<URLExtraData> data = new URLExtraData( > + NullPrincipalURI::Create(), nullptr, NullPrincipal::Create()); > + > + NS_NAMED_LITERAL_CSTRING(css_, " 10px"); > + const nsACString& css = css_; You should be able to do: ``` const nsAutoCString css = NS_LITERAL_CSTRING(" 10px"); ``` Or something like that, and the `&css` above should still work. ::: layout/style/test/gtest/StyloParsingBench.cpp:83 (Diff revision 1) > + &css, > + /* is_important = */ false, > + data, > + ParsingMode::Default, > + eCompatibility_FullStandards, > + NULL nit: nullptr.

Attachment #8887360 - Flags: review+

Emilio Cobos Álvarez (:emilio)

Comment 17

•

8 years ago

mozreview-review

Comment on attachment 8887360 [details] Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. https://reviewboard.mozilla.org/r/158194/#review163380

Boris Zbarsky [:bzbarsky]

Reporter

Comment 18

•

8 years ago

I'm a bit confused. Why can't you just do: NS_NAMED_LITERAL_CSTRING(css, " 10px"); And then use &css as needed?

Comment hidden (mozreview-request)

Simon Sapin (:SimonSapin)

Assignee

Comment 20

•

8 years ago

Changed NULL to nullptr. I’m quite confused with ns*String APIs myself. This bit copy/pasted from the other micro-benchmark in that file, which I originally wrote pretty much by trying things at random until something compiled. Boris, when I try what you suggest I get this error message: "error: cannot convert ‘const nsLiteralCString*’ to ‘const nsACString*’". My (limited) understanding is that that conversion happens implicitly in an assignment to a new local variable, but not in a argument to a function call, or at least not "through" the & operator. Emilio, your suggestion gives: "error: conversion from 'const nsLiteralCString' to non-scalar type 'const nsAutoCString' requested".

Boris Zbarsky [:bzbarsky]

Reporter

Comment 21

•

8 years ago

Oh, nsLiteralCString does not inherit (any more?) from nsACString? That's annoying, ok.

Comment hidden (mozreview-request)

Pulsebot

Comment 23

•

8 years ago

Pushed by simon.sapin@exyr.org: https://hg.mozilla.org/integration/autoland/rev/d1f7d9c6394e Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. r=emilio

(PTO June 19-July 7) Ryan VanderMeulen

Comment 24

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d1f7d9c6394e

Simon Sapin (:SimonSapin)

Assignee

Updated

•

8 years ago

Depends on: 1382713

Simon Sapin (:SimonSapin)

Assignee

Comment 25

•

8 years ago

https://github.com/servo/servo/pull/17820 should help with this.

Bobby Holley (:bholley)

Updated

•

8 years ago

Priority: P2 → --

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

8 years ago

Priority: -- → P3

Bobby Holley (:bholley)

Comment 26

•

8 years ago

In the microbenchmark in comment 0, I get about 530 gecko vs 630 stylo. This is probably fine for shipping.

Priority: P3 → P4

Chris Peterson [:cpeterson]

Comment 27

•

8 years ago

status-firefox57=wontfix unless someone thinks this bug should block 57

status-firefox57: --- → wontfix

Hiroyuki Ikezoe (:hiro)

Updated

•

7 years ago

Comment 28

•

7 years ago

The leave-open keyword is there and there is no activity for 6 months. :SimonSapin, maybe it's time to close this bug?

Flags: needinfo?(simon.sapin)

Bobby Holley (:bholley)

Comment 29

•

7 years ago

I don't think this bug needs to be open at this point. We can file a new one if we want to improve the situation further.

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(simon.sapin)

Resolution: --- → WORKSFORME

BugBot [:suhaib / :marco/ :calixte]

Updated

•

7 years ago

Keywords: leave-open