stylo: Setting inline style can be a lot slower than Gecko

NEW
Assigned to

Status

()

Core
CSS Parsing and Computation
P2
normal
5 months ago
3 days ago

People

(Reporter: bz, Assigned: SimonSapin)

Tracking

(Depends on: 2 bugs, Blocks: 2 bugs, {leave-open})

Trunk
leave-open
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 affected)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

5 months ago
Take this microbenchmark:

  <script>
  var div = document.createElement("div");
  var s = div.style;
  var count = 1000000;
  var start = new Date;
  for (var i = 0; i < count; ++i) {
    s.top = "10px";
  }
  var stop = new Date;
  document.write(stop - start);
  </script>                    

In Gecko I get numbers on the order of 600ms.  In Stylo I get numbers on the order of 900ms.

For scale, by the way, on that testcase Chrome is at ~300ms and Safari at ~130ms, so even the non-stylo number is not amazing.
(Reporter)

Comment 1

5 months ago
> In Gecko I get numbers on the order of 600ms.

Correction, closer to 300ms if I do an apples to apples comparison (same profile, no random other tabs running, etc).
(Reporter)

Comment 2

5 months ago
So I tried running this testcase (with a larger iteration count) for about 14s in both Gecko an stylo.

In Gecko, I see time break down like so:

1s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration
1s: under nsDocument::BeginUpdate
1.5s: under nsDocument::EndUpdate (NS_IsMainThread shows up under RemoveScriptBlocker here....)
1.5s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment
7s: under nsCSSParser::ParseProperty

and then some long-tail stuff.

In stylo, it looks like this:

0.5s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration
0.5s: AppendUTF16toUTF8 (converting the property value to UTF-8)
0.5s: under nsDocument::BeginUpdate
0.6s: under nsDocument::EndUpdate
0.7s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment
3.5s: under nsStyledElement::SetInlineStyleDeclaration.
7s: ServoDeclarationBlock_SetPropertyById.

Rescaling that to the Gecko times by using the overal timings, or the BeginUpdate/EndUpdate stuff, that means that for the same number of iterations:

1) ServoDeclarationBlock_SetPropertyById takes 14+s vs 7s for nsCSSParser::ParseProperty.
2) There's an extra cost of 7s for nsStyledElement::SetInlineStyleDeclaration.

First, the simple bit.  Why are we ending up under SetInlineStyleDeclaration?  That means that "changed" was true near the tail end of nsDOMCSSDeclaration::ParsePropertyValue.  And then we go to do a bunch of ~nsAttrValue, freeing misc containers, SetAttrAndNotify, constructing nsAttrValues and allocating their misc containers, etc.

But why is "changed" true?  It shouldn't be!  But it looks like servo just returns whether the value parsed successfully.  Whereas Gecko ends up in http://searchfox.org/mozilla-central/rev/546a05fec017cb785fca62a5199d42812a6a1fd3/layout/style/nsCSSDataBlock.cpp#37 and actually checks whether the value changed.

Now the more complicated bit: why is ServoDeclarationBlock_SetPropertyById slow?  Profile says a few things about the servo side:

1)  Directly under glue::set_property we're doing malloc/free traffic.  That accounts for about
    1/6 of the time ServoDeclarationBlock_SetPropertyById takes (a bit over 1.1s out of 7s).
2)  We're doing some spinlock stuff directly under glue::set_property.  That takes 100-200ms (it's
    pretty scattered in the profile, hard to add it all up).
3)  drop::hd805e44d693ee661 directly under glue::set_property is 150ms.
4)  Under declaration_block::parse_one_declaration we have more malloc/free and spinlock traffic.
    That takes up about 2s.
5)  The actual call to PropertyDeclaration::parse is about 2.7s, which is comparable to the time
    Gecko's parser bits take, or a bit less.  It's hard to tell what the right apples to apples
    comparison is here.

Anyway, I filed bug 1344135 on the boolean return value being wrong and bug 1344136 on nearly half the time under ServoDeclarationBlock_SetPropertyById being spent on malloc, free, and spinlocks.
Blocks: 1243581
Depends on: 1344135, 1344136
I'm more curious about the spinlock... why do we need spinlock there? We are on the main thread, and there shouldn't be any retention.
(Reporter)

Comment 4

5 months ago
The total spinlock time in my profile is order of 400ms out of 14s, so about 3%.  A single set takes about 900ns.  So about 26ns of spinlocking per call, at a guess, on 1.5-year-old laptop hardware.  That does sound a little high.
Blocks: 1289864
No longer blocks: 1243581
Priority: -- → P2
(Reporter)

Updated

5 months ago
Depends on: 1346974
(Reporter)

Updated

4 months ago
Depends on: 1354566
Blocks: 1243581
Assignee: nobody → simon.sapin
It seems several optimizations have been landed since this bug filed. Probably we should profile again and see what's going on now?

BTW, we probably should open a bug for having IDL binding pass in raw JS string for CSSOM, so that we can avoid double conversion in many cases.
(Assignee)

Comment 6

a month ago
Bug 1355106 is relevant to the double conversion.
Depends on: 1355106
(Reporter)

Comment 7

a month ago
Passing raw JS string is rather annoying in various ways.  But yes, we could temporarily do something like that.

Bug 1355106 is a bit hard to do until we drop non-stylo or stop caring about its performance...  So it's not going to be the bullet that saves us.  And note that in the common case bindings don't really do "conversion" on a JS string: they either do a memcpy if it's non-latin1 or a fast byte-expanding copy if it is.  So you will still end up doing just as much conversion as now even if you're coming from a JS string directly.  Things will be a bit faster because you touch less memory, of course.
(Reporter)

Comment 8

a month ago
Oh, and now that I have builds I can report on the remeasurement and reprofiling.

In Gecko nightly, I get times of about 400ms on the microbenchmark from comment 0.  Yes, I know comment 0 says 600ms.  In Firefox 53 I get about 600ms.  So we made some serious progress here at some point even for gecko.  Looks like it happened between 53 and 54; I get 400ms in 54.

In stylo nightly I get times of about 430ms.  This is still a bit worse than (new) Gecko, but a good bit better than 53 was.

For stylo, time breakdown under nsDOMCSSDeclaration::ParsePropertyValue looks like this now:

8% EndUpdate notifications.
8% AppendUTF16toUTF8
8% BeginUpdate notifiactions
6% under nsDOMCSSAttributeDeclaration::GetCSSDeclaration; about half
   is AttributeWillChange, 1% is nsContentUtils::HasMutationListeners,
   the rest is Element::GetInlineStyleDeclaration.
4% under nsDOMCSSAttributeDeclaration::GetServoCSSParsingEnvironment.  This is
   1% GetDefaultStyleAttrURLData, 1% GetBindingParent, 1% GetURLDataForStyleAttr,
   1% self time.
57% is under Servo_DeclaraionBlock_SetPropertyById, breaking down as:

1% getting PropertyIds from nsCSSPropertyId.
8% under PropertyDeclarationBlock::extend_common.  There's various vec::reserve
   stuff there, comparisons under push_common, etc.  I dunno why the reserve()
   shows up at all; we should not be changing the length here.

The rest of the time is pretty much under parse_one_declaration_into.  The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric.  One is:

  PropertyDeclaration::parse_into calls 
  LengthOrPercentageOrAuto::parse_internal calls 
  Parser::next_including_whitespace_and_comments calls 
  tokenizer::next_token calls 
  tokenizer::consume_numeric.

The other one is:
  
  PropertyDeclaration::parse_into calls
  style..properties..CSSWideKeyword$u20$as$u20$style..parser..Parse$GT$::parse::heb90e588fd20a8e6 calls
  Parser::next_including_whitespace_and_comments calls 
  tokenizer::next_token calls 
  tokenizer::consume_numeric.

I'm not sure what that symbol is that's being called from parse_into here, and hence whether we're really tokenizing the number twice or the profiler is just being confused.  If there's a way to get better information about what that symbol is, I'd love to know!
I'm guessing it's this: http://searchfox.org/mozilla-central/rev/d67ef71097da4d1aa344c9d9c672e49a7228e765/servo/components/style/properties/properties.mako.rs#510-515
Also note that, in Gecko, we are still spending time on computing xml:base, which should be much expensive than GetServoCSSParsingEnvironment. It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase().
We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure).  But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time.

Parser::expect_ident and friends return any unexpected token in the error part of their Result.  Maybe we should have versions of these functions that don't need to know what token it did unexpectedly fine.  Then we could have them do a quick check of the leading bytes to see what token we have next, so that when trying to parse a an "inherit" when the next byte is a digit, we could return quickly.  (We would have to look past white space and comments when doing that.)
(Assignee)

Comment 12

a month ago
> The only weird bit there is that I see _two_ codepaths that land us in tokenizer::consume_numeric.

Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice.

> We could give Parser a lookahead buffer to store any single token that we unget (when a try() function does a reset() upon detecting failure).  But Tokens look pretty big, so I'm not sure we want to be cloning them out of that lookahead buffer all the time.

We’ve been talking about adding that buffer and reducing the size of tokens. I’ll work on that next. Checking leading bytes also sounds interesting. It’ll break through the nice abstraction between tokenizer and parser, but it may be worth it.
(Reporter)

Comment 13

a month ago
> It's worth checking how long would Gecko take if we change nsIContent::GetBaseURIForStyleAttr() to just return GetBaseURIWithoutXMLBase().

Then the Gecko time drops to about 360ms.

> Yes. cssparser’s `try` method works by rewinding the tokenizer, so we end up doing the same work twice.

That seems like something we should really fix; some tokens are pretty expensive to tokenize.  Is there an existing bug/issue tracking this?

(Gecko solves this by handing out a borrow to an internal token storage, but I'm pretty sure the borrow checker would correctly point out that this is insane in terms of guaranteeing safety statically.)
Flags: needinfo?(simon.sapin)
(Assignee)

Comment 14

a month ago
There is bug 1347408.

Yes, separate "borrow the current token" and "advance to the next token" APIs would help eliminate this overhead. The current API gives ownership of the token to the caller. Token contains Cow<str> for ident-like things, which borrows the input when possible and has an allocated String when backslash-escapes are involves. Some callers take advantage of this ownership to avoid re-allocating. Although maybe backslash-escape are rare enough that we shouldn’t care about optimizing them?

Another idea is to replace Cow<str> with something that always borrows the input and does unescaping on demand. This would eliminate String allocations in Token, making it cheap to clone.
Flags: needinfo?(simon.sapin)
Comment hidden (mozreview-request)
(Assignee)

Updated

9 days ago
Keywords: leave-open

Comment 16

9 days ago
mozreview-review
Comment on attachment 8887360 [details]
Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById.

https://reviewboard.mozilla.org/r/158194/#review163370

::: layout/style/test/gtest/StyloParsingBench.cpp:71
(Diff revision 1)
> +  RefPtr<RawServoDeclarationBlock> block = Servo_DeclarationBlock_CreateEmpty().Consume();
> +  RefPtr<URLExtraData> data = new URLExtraData(
> +    NullPrincipalURI::Create(), nullptr, NullPrincipal::Create());
> +
> +  NS_NAMED_LITERAL_CSTRING(css_, " 10px");
> +  const nsACString& css = css_;

You should be able to do:

```
const nsAutoCString css = NS_LITERAL_CSTRING(" 10px");
```

Or something like that, and the `&css` above should still work.

::: layout/style/test/gtest/StyloParsingBench.cpp:83
(Diff revision 1)
> +      &css,
> +      /* is_important = */ false,
> +      data,
> +      ParsingMode::Default,
> +      eCompatibility_FullStandards,
> +      NULL

nit: nullptr.
Attachment #8887360 - Flags: review+

Comment 17

9 days ago
mozreview-review
Comment on attachment 8887360 [details]
Bug 1344131 - Add a gtest bench for Servo_DeclarationBlock_SetPropertyById.

https://reviewboard.mozilla.org/r/158194/#review163380
(Reporter)

Comment 18

9 days ago
I'm a bit confused.  Why can't you just do:

  NS_NAMED_LITERAL_CSTRING(css, " 10px");

And then use &css as needed?
Comment hidden (mozreview-request)
(Assignee)

Comment 20

9 days ago
Changed NULL to nullptr.

I’m quite confused with ns*String APIs myself. This bit copy/pasted from the other micro-benchmark in that file, which I originally wrote pretty much by trying things at random until something compiled.

Boris, when I try what you suggest I get this error message: "error: cannot convert ‘const nsLiteralCString*’ to ‘const nsACString*’". My (limited) understanding is that that conversion happens implicitly in an assignment to a new local variable, but not in a argument to a function call, or at least not "through" the & operator.

Emilio, your suggestion gives: "error: conversion from 'const nsLiteralCString' to non-scalar type 'const nsAutoCString' requested".
(Reporter)

Comment 21

9 days ago
Oh, nsLiteralCString does not inherit (any more?) from nsACString?  That's annoying, ok.
Comment hidden (mozreview-request)

Comment 23

9 days ago
Pushed by simon.sapin@exyr.org:
https://hg.mozilla.org/integration/autoland/rev/d1f7d9c6394e
Add a gtest bench for Servo_DeclarationBlock_SetPropertyById. r=emilio

Comment 24

8 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d1f7d9c6394e
(Assignee)

Updated

7 days ago
Depends on: 1382713
(Assignee)

Comment 25

3 days ago
https://github.com/servo/servo/pull/17820 should help with this.
You need to log in before you can comment on or make changes to this bug.