stylo: Setting inline style can be a lot slower than Gecko

NEW
Unassigned

Status

()

Core
CSS Parsing and Computation
P2
normal
3 months ago
2 months ago

People

(Reporter: bz, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 affected)

Details

Take this microbenchmark:

  <script>
  var div = document.createElement("div");
  var s = div.style;
  var count = 1000000;
  var start = new Date;
  for (var i = 0; i < count; ++i) {
    s.top = "10px";
  }
  var stop = new Date;
  document.write(stop - start);
  </script>                    

In Gecko I get numbers on the order of 600ms.  In Stylo I get numbers on the order of 900ms.

For scale, by the way, on that testcase Chrome is at ~300ms and Safari at ~130ms, so even the non-stylo number is not amazing.
> In Gecko I get numbers on the order of 600ms.

Correction, closer to 300ms if I do an apples to apples comparison (same profile, no random other tabs running, etc).
So I tried running this testcase (with a larger iteration count) for about 14s in both Gecko an stylo.

In Gecko, I see time break down like so:

1s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration
1s: under nsDocument::BeginUpdate
1.5s: under nsDocument::EndUpdate (NS_IsMainThread shows up under RemoveScriptBlocker here....)
1.5s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment
7s: under nsCSSParser::ParseProperty

and then some long-tail stuff.

In stylo, it looks like this:

0.5s: nsDOMCSSAttributeDeclaration::GetCSSDeclaration
0.5s: AppendUTF16toUTF8 (converting the property value to UTF-8)
0.5s: under nsDocument::BeginUpdate
0.6s: under nsDocument::EndUpdate
0.7s: under nsDOMCSSAttributeDeclaration::GetCSSParsingEnvironment
3.5s: under nsStyledElement::SetInlineStyleDeclaration.
7s: ServoDeclarationBlock_SetPropertyById.

Rescaling that to the Gecko times by using the overal timings, or the BeginUpdate/EndUpdate stuff, that means that for the same number of iterations:

1) ServoDeclarationBlock_SetPropertyById takes 14+s vs 7s for nsCSSParser::ParseProperty.
2) There's an extra cost of 7s for nsStyledElement::SetInlineStyleDeclaration.

First, the simple bit.  Why are we ending up under SetInlineStyleDeclaration?  That means that "changed" was true near the tail end of nsDOMCSSDeclaration::ParsePropertyValue.  And then we go to do a bunch of ~nsAttrValue, freeing misc containers, SetAttrAndNotify, constructing nsAttrValues and allocating their misc containers, etc.

But why is "changed" true?  It shouldn't be!  But it looks like servo just returns whether the value parsed successfully.  Whereas Gecko ends up in http://searchfox.org/mozilla-central/rev/546a05fec017cb785fca62a5199d42812a6a1fd3/layout/style/nsCSSDataBlock.cpp#37 and actually checks whether the value changed.

Now the more complicated bit: why is ServoDeclarationBlock_SetPropertyById slow?  Profile says a few things about the servo side:

1)  Directly under glue::set_property we're doing malloc/free traffic.  That accounts for about
    1/6 of the time ServoDeclarationBlock_SetPropertyById takes (a bit over 1.1s out of 7s).
2)  We're doing some spinlock stuff directly under glue::set_property.  That takes 100-200ms (it's
    pretty scattered in the profile, hard to add it all up).
3)  drop::hd805e44d693ee661 directly under glue::set_property is 150ms.
4)  Under declaration_block::parse_one_declaration we have more malloc/free and spinlock traffic.
    That takes up about 2s.
5)  The actual call to PropertyDeclaration::parse is about 2.7s, which is comparable to the time
    Gecko's parser bits take, or a bit less.  It's hard to tell what the right apples to apples
    comparison is here.

Anyway, I filed bug 1344135 on the boolean return value being wrong and bug 1344136 on nearly half the time under ServoDeclarationBlock_SetPropertyById being spent on malloc, free, and spinlocks.
Blocks: 1243581
Depends on: 1344135, 1344136
I'm more curious about the spinlock... why do we need spinlock there? We are on the main thread, and there shouldn't be any retention.
The total spinlock time in my profile is order of 400ms out of 14s, so about 3%.  A single set takes about 900ns.  So about 26ns of spinlocking per call, at a guess, on 1.5-year-old laptop hardware.  That does sound a little high.
Blocks: 1289864
No longer blocks: 1243581
Priority: -- → P2
Depends on: 1346974
Depends on: 1354566
Blocks: 1243581
You need to log in before you can comment on or make changes to this bug.