Closed Bug 1344131 Opened 8 years ago Closed 7 years ago

stylo: Setting inline style can be a lot slower than Gecko

Categories

(Core :: CSS Parsing and Computation, defect, P4)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox54 --- affected
firefox57 --- wontfix

People

(Reporter: bzbarsky, Assigned: SimonSapin)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Take this microbenchmark: <script> var div = document.createElement("div"); var s = div.style; var count = 1000000; var start = new Date; for (var i = 0; i < count; ++i) { s.top = "10px"; } var stop = new Date; document.write(stop - start); </script> In Gecko I get numbers on the order of 600ms. In Stylo I get numbers on the order of 900ms. For scale, by the way, on that testcase Chrome is at ~300ms and Safari at ~130ms, so even the non-stylo number is not amazing.
> In Gecko I get numbers on the order of 600ms. Correction, closer to 300ms if I do an apples to apples comparison (same profile, no random other tabs running, etc).
So I tried running this testcase (with a larger iteration count) for about 14s in both Gecko an stylo. In Gecko, I see time break down like so: 1s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration 1s: under nsDocument::BeginUpdate 1.5s: under nsDocument::EndUpdate (NS_IsMainThread shows up under RemoveScriptBlocker here....) 1.5s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment 7s: under nsCSSParser::ParseProperty and then some long-tail stuff. In stylo, it looks like this: 0.5s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration 0.5s: AppendUTF16toUTF8 (converting the property value to UTF-8) 0.5s: under nsDocument::BeginUpdate 0.6s: under nsDocument::EndUpdate 0.7s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment 3.5s: under nsStyledElement::SetInlineStyleDeclaration. 7s: ServoDeclarationBlock_SetPropertyById. Rescaling that to the Gecko times by using the overal timings, or the BeginUpdate/EndUpdate stuff, that means that for the same number of iterations: 1) ServoDeclarationBlock_SetPropertyById takes 14+s vs 7s for nsCSSParser::ParseProperty. 2) There's an extra cost of 7s for nsStyledElement::SetInlineStyleDeclaration. First, the simple bit. Why are we ending up under SetInlineStyleDeclaration? That means that "changed" was true near the tail end of nsDOMCSSDeclaration::ParsePropertyValue. And then we go to do a bunch of ~nsAttrValue, freeing misc containers, SetAttrAndNotify, constructing nsAttrValues and allocating their misc containers, etc. But why is "changed" true? It shouldn't be! But it looks like servo just returns whether the value parsed successfully. Whereas Gecko ends up in http://searchfox.org/mozilla-central/rev/546a05fec017cb785fca62a5199d42812a6a1fd3/layout/style/nsCSSDataBlock.cpp#37 and actually checks whether the value changed. Now the more complicated bit: why is ServoDeclarationBlock_SetPropertyById slow? Profile says a few things about the servo side: 1) Directly under glue::set_property we're doing malloc/free traffic. That accounts for about 1/6 of the time ServoDeclarationBlock_SetPropertyById takes (a bit over 1.1s out of 7s). 2) We're doing some spinlock stuff directly under glue::set_property. That takes 100-200ms (it's pretty scattered in the profile, hard to add it all up). 3) drop::hd805e44d693ee661 directly under glue::set_property is 150ms. 4) Under declaration_block::parse_one_declaration we have more malloc/free and spinlock traffic. That takes up about 2s. 5) The actual call to PropertyDeclaration::parse is about 2.7s, which is comparable to the time Gecko's parser bits take, or a bit less. It's hard to tell what the right apples to apples comparison is here. Anyway, I filed bug 1344135 on the boolean return value being wrong and bug 1344136 on nearly half the time under ServoDeclarationBlock_SetPropertyById being spent on malloc, free, and spinlocks.
Blocks: stylo
Depends on: 1344135, 1344136
I'm more curious about the spinlock... why do we need spinlock there? We are on the main thread, and there shouldn't be any retention.
The total spinlock time in my profile is order of 400ms out of 14s, so about 3%. A single set takes about 900ns. So about 26ns of spinlocking per call, at a guess, on 1.5-year-old laptop hardware. That does sound a little high.
Blocks: stylo-perf
No longer blocks: stylo
Priority: -- → P2
Depends on: 1346974
Depends on: 1354566
Assignee: nobody → simon.sapin
It seems several optimizations have been landed since this bug filed. Probably we should profile again and see what's going on now? BTW, we probably should open a bug for having IDL binding pass in raw JS string for CSSOM, so that we can avoid double conversion in many cases.
Bug 1355106 is relevant to the double conversion.
Depends on: 1355106
Passing raw JS string is rather annoying in various ways. But yes, we could temporarily do something like that. Bug 1355106 is a bit hard to do until we drop non-stylo or stop caring about its performance... So it's not going to be the bullet that saves us. And note that in the common case bindings don't really do "conversion" on a JS string: they either do a memcpy if it's non-latin1 or a fast byte-expanding copy if it is. So you will still end up doing just as much conversion as now even if you're coming from a JS string directly. Things will be a bit faster because you touch less memory, of course.
Oh, and now that I have builds I can report on the remeasurement and reprofiling. In Gecko nightly, I get times of about 400ms on the microbenchmark from comment 0. Yes, I know comment 0 says 600ms. In Firefox 53 I get about 600ms. So we made some serious progress here at some point even for gecko. Looks like it happened between 53 and 54; I get 400ms in 54. In stylo nightly I get times of about 430ms. This is still a bit worse than (new) Gecko, but a good bit better than 53 was. For stylo, time breakdown under nsDOMCSSDeclaration::ParsePropertyValue looks like this now: 8% EndUpdate notifications. 8% AppendUTF16toUTF8 8% BeginUpdate notifiactions 6% under nsDOMCSSAttributeDeclaration::GetCSSDeclaration; about half is AttributeWillChange, 1% is nsContentUtils::HasMutationListeners, the rest is Element::GetInlineStyleDeclaration. 4% under nsDOMCSSAttributeDeclaration::GetServoCSSParsingEnvironment. This is 1% GetDefaultStyleAttrURLData, 1% GetBindingParent, 1% GetURLDataForStyleAttr, 1% self time. 57% is under Servo_DeclaraionBlock_SetPropertyById, breaking down as: 1% getting PropertyIds from nsCSSPropertyId. 8% under PropertyDeclarationBlock::extend_common. There's various vec::reserve stuff there, comparisons under push_common, etc. I dunno why the reserve() shows up at all; we should not be changing the length here. The rest of the time is pretty much under parse_one_declaration_into. The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric. One is: PropertyDeclaration::parse_into calls LengthOrPercentageOrAuto::parse_internal calls Parser::next_including_whitespace_and_comments calls tokenizer::next_token calls tokenizer::consume_numeric. The other one is: PropertyDeclaration::parse_into calls style..properties..CSSWideKeyword$u20$as$u20$style..parser..Parse$GT$::parse::heb90e588fd20a8e6 calls Parser::next_including_whitespace_and_comments calls tokenizer::next_token calls tokenizer::consume_numeric. I'm not sure what that symbol is that's being called from parse_into here, and hence whether we're really tokenizing the number twice or the profiler is just being confused. If there's a way to get better information about what that symbol is, I'd love to know!
Also note that, in Gecko, we are still spending time on computing xml:base, which should be much expensive than GetServoCSSParsingEnvironment. It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase().
We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure). But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time. Parser::expect_ident and friends return any unexpected token in the error part of their Result. Maybe we should have versions of these functions that don't need to know what token it did unexpectedly fine. Then we could have them do a quick check of the leading bytes to see what token we have next, so that when trying to parse a an "inherit" when the next byte is a digit, we could return quickly. (We would have to look past white space and comments when doing that.)
> The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric. Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice. > We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure). But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time. We’ve been talking about adding that buffer and reducing the size of tokens. I’ll work on that next. Checking leading bytes also sounds interesting. It’ll break through the nice abstraction between tokenizer and parser, but it may be worth it.
> It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase(). Then the Gecko time drops to about 360ms. > Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice. That seems like something we should really fix; some tokens are pretty expensive to tokenize. Is there an existing bug/issue tracking this? (Gecko solves this by handing out a borrow to an internal token storage, but I'm pretty sure the borrow checker would correctly point out that this is insane in terms of guaranteeing safety statically.)
Flags: needinfo?(simon.sapin)
There is bug 1347408. Yes, separate "borrow the current token" and "advance to the next token" APIs would help eliminate this overhead. The current API gives ownership of the token to the caller. Token contains Cow<str> for ident-like things, which borrows the input when possible and has an allocated String when backslash-escapes are involves. Some callers take advantage of this ownership to avoid re-allocating. Although maybe backslash-escape are rare enough that we shouldn’t care about optimizing them? Another idea is to replace Cow<str> with something that always borrows the input and does unescaping on demand. This would eliminate String allocations in Token, making it cheap to clone.
Flags: needinfo?(simon.sapin)
Keywords: leave-open
Comment on attachment 8887360 [details] Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. https://reviewboard.mozilla.org/r/158194/#review163370 ::: layout/style/test/gtest/StyloParsingBench.cpp:71 (Diff revision 1) > + RefPtr<RawServoDeclarationBlock> block = Servo_DeclarationBlock_CreateEmpty().Consume(); > + RefPtr<URLExtraData> data = new URLExtraData( > + NullPrincipalURI::Create(), nullptr, NullPrincipal::Create()); > + > + NS_NAMED_LITERAL_CSTRING(css_, " 10px"); > + const nsACString& css = css_; You should be able to do: ``` const nsAutoCString css = NS_LITERAL_CSTRING(" 10px"); ``` Or something like that, and the `&css` above should still work. ::: layout/style/test/gtest/StyloParsingBench.cpp:83 (Diff revision 1) > + &css, > + /* is_important = */ false, > + data, > + ParsingMode::Default, > + eCompatibility_FullStandards, > + NULL nit: nullptr.
Attachment #8887360 - Flags: review+
Comment on attachment 8887360 [details] Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. https://reviewboard.mozilla.org/r/158194/#review163380
I'm a bit confused. Why can't you just do: NS_NAMED_LITERAL_CSTRING(css, " 10px"); And then use &css as needed?
Changed NULL to nullptr. I’m quite confused with ns*String APIs myself. This bit copy/pasted from the other micro-benchmark in that file, which I originally wrote pretty much by trying things at random until something compiled. Boris, when I try what you suggest I get this error message: "error: cannot convert ‘const nsLiteralCString*’ to ‘const nsACString*’". My (limited) understanding is that that conversion happens implicitly in an assignment to a new local variable, but not in a argument to a function call, or at least not "through" the & operator. Emilio, your suggestion gives: "error: conversion from 'const nsLiteralCString' to non-scalar type 'const nsAutoCString' requested".
Oh, nsLiteralCString does not inherit (any more?) from nsACString? That's annoying, ok.
Pushed by simon.sapin@exyr.org: https://hg.mozilla.org/integration/autoland/rev/d1f7d9c6394e Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. r=emilio
Depends on: 1382713
Priority: P2 → --
Priority: -- → P3
In the microbenchmark in comment 0, I get about 530 gecko vs 630 stylo. This is probably fine for shipping.
Priority: P3 → P4
status-firefox57=wontfix unless someone thinks this bug should block 57
See Also: → 1430706
The leave-open keyword is there and there is no activity for 6 months. :SimonSapin, maybe it's time to close this bug?
Flags: needinfo?(simon.sapin)
I don't think this bug needs to be open at this point. We can file a new one if we want to improve the situation further.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(simon.sapin)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: