Closed Bug 1421408 Opened 7 years ago Closed 6 years ago

stylo: Computing property values takes much longer time than Gecko

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox59 --- affected

People

(Reporter: xidorn, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: perf)

Attachments

(2 files)

In comparison of profile https://perfht.ml/2j0miSs (gecko) vs. https://perfht.ml/2iYJbpu (stylo), it can be seen that it takes much more time to compute property values in stylo (counting time spent in *::cascade_property) than in gecko (counting time spent in nsRuleNode::Compute*).

Stylo takes 22.9ms, while Gecko uses only 7.6ms, which seems to be a pretty significant difference.
Keywords: perf
Priority: -- → P2
heycam, this is something worth checking further.
Flags: needinfo?(cam)
I suspect that this check[1] is just effectively disabling style sharing, which means that we'll just style many more elements :)

[1]: https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/servo/components/style/sharing/mod.rs#671
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> I suspect that this check[1] is just effectively disabling style sharing,
> which means that we'll just style many more elements :)

I did some profiling with that check removed, and I don't really see significant difference from that.

Profile with that check untouched: https://perfht.ml/2zBtivB
Profile with that check removed: https://perfht.ml/2zDYla3

The selected section in the two have some difference, but looking at other runs in the same profile, this difference is more like noises rather than something actually optimized.
Although removing the style scope check doesn't seem to be very helpful, but the guess that this is due to lack of style sharing can be a good starting point.

The attachment log is that from style::sharing on startup.

In this log, we have 3592 "Miss: Different style scopes". However, if I move the style scope to the last place before accept sharing, the total number of this is shrunk to 451, which means in majority of cases, style scope is probably not the only blocker for style sharing.
The two log is actually diffable (with hex numbers removed).

With a brief look, it seems that most of sharing rejected by style scope would then be rejected by local name / ID / class, which we don't really have much to do with, so probably style scope isn't the problem for us here.

(There are still 400+ misses due to different style scopes, though, it is not clear to me whether it should be considered significant in the log which contains almos 30k misses).
I did a memory measurement on my machine, and it seems for each browser.xul document, Display struct takes 0.11MB on stylo, while it takes 0.13MB on gecko, which probably means we do pretty well on style sharing, so it's probably not the reason we are slow here. (Or do we do any sharing after computing?)

However, when counting all style structs for browser.xul, stylo takes 0.34MB while gecko takes only 0.22MB. They look like:

─0.34 MB (00.15%) -- servo-style-structs
├──0.11 MB (00.05%) ── Display
├──0.11 MB (00.05%) ── Position
├──0.03 MB (00.02%) ── sundries
├──0.03 MB (00.01%) ── Border
├──0.02 MB (00.01%) ── Text
├──0.01 MB (00.01%) ── UIReset
├──0.01 MB (00.00%) ── Background
├──0.01 MB (00.00%) ── UserInterface
└──0.01 MB (00.00%) ── Font

─0.22 MB (00.09%) -- gecko-style-structs
├──0.13 MB (00.05%) ── Display
├──0.04 MB (00.02%) ── Position
├──0.03 MB (00.01%) ── sundries
└──0.02 MB (00.01%) ── Border

Looks like we are probably doing some wasteful work on some other style structs.

One possible reason comes to me is that Stylo eagerly computes all style structs / properties while it may not be really necessary in many cases in chrome. It may explain why Gecko generates fewer style structs. But in general if a property doesn't present in a rule matched, it doesn't need to be computed in stylo either. But why could we have lots of useless property declarations in chrome style?

That makes me suspect the reason could be display:none tree. In Gecko, we don't need to compute any style struct for elements in a display:none tree (since there would be no frame for them, and thus no other style struct would be queried). However, in Stylo, we may need to compute them all?

If that's the case, we may need to avoid cascading properties when we find any display:none... Not sure whether that's something doable.
emilio, heycam, what do you think about comment 7?
Flags: needinfo?(emilio)
(In reply to Xidorn Quan [:xidorn] UTC-6 (less responsive Nov 5 ~ Dec 16) from comment #7)
> I did a memory measurement on my machine, and it seems for each browser.xul
> document, Display struct takes 0.11MB on stylo, while it takes 0.13MB on
> gecko, which probably means we do pretty well on style sharing, so it's
> probably not the reason we are slow here. (Or do we do any sharing after
> computing?)

Maybe this is just a matter of the reset struct sharing...
 
> However, when counting all style structs for browser.xul, stylo takes 0.34MB
> while gecko takes only 0.22MB. They look like:
> 
> ─0.34 MB (00.15%) -- servo-style-structs
> ├──0.11 MB (00.05%) ── Display
> ├──0.11 MB (00.05%) ── Position
> ├──0.03 MB (00.02%) ── sundries
> ├──0.03 MB (00.01%) ── Border
> ├──0.02 MB (00.01%) ── Text
> ├──0.01 MB (00.01%) ── UIReset
> ├──0.01 MB (00.00%) ── Background
> ├──0.01 MB (00.00%) ── UserInterface
> └──0.01 MB (00.00%) ── Font
> 
> ─0.22 MB (00.09%) -- gecko-style-structs
> ├──0.13 MB (00.05%) ── Display
> ├──0.04 MB (00.02%) ── Position
> ├──0.03 MB (00.01%) ── sundries
> └──0.02 MB (00.01%) ── Border
> 
> Looks like we are probably doing some wasteful work on some other style
> structs.
> 
> One possible reason comes to me is that Stylo eagerly computes all style
> structs / properties while it may not be really necessary in many cases in
> chrome. It may explain why Gecko generates fewer style structs. But in
> general if a property doesn't present in a rule matched, it doesn't need to
> be computed in stylo either. But why could we have lots of useless property
> declarations in chrome style?

Maybe? I'd expect not, but we looked at this when looking at reducing stylo's memory usage and it was fairly non-trivial.

> That makes me suspect the reason could be display:none tree. In Gecko, we
> don't need to compute any style struct for elements in a display:none tree
> (since there would be no frame for them, and thus no other style struct
> would be queried). However, in Stylo, we may need to compute them all?

Well, note that we only style display: none roots (we don't style elements under a display: none subtree)... I don't think the browser has enough display: none roots to make this the root cause of the issue we see here.

> If that's the case, we may need to avoid cascading properties when we find
> any display:none... Not sure whether that's something doable.

I don't think it's particularly doable... But I'd be surprised if this would be the root cause really, because of the above.
Flags: needinfo?(emilio)
So I just did a quick dirty change which pulls display (as well as -moz-binding) to early category, and cut the cascading of other category when display:none is found.

This brings the size of servo-style-structs down from 0.34MB as shown in comment 7 to 0.25MB, which is much closer to Gecko's 0.22MB, so unnecessary property cascading on display:none root seems likely to be the reason of extra style structs generated, and potentially one of the reason why cascading takes much more time on stylo than gecko on chrome.

The memory usage becomes:
─0.25 MB (00.14%) -- servo-style-structs
├──0.10 MB (00.06%) ── Display
├──0.07 MB (00.04%) ── Position
├──0.04 MB (00.03%) ── sundries
├──0.01 MB (00.01%) ── Border
├──0.01 MB (00.01%) ── Text
└──0.01 MB (00.01%) ── UIReset

However, although size of Position is down from 0.11MB to 0.07MB, it still seems to be high, comparing to gecko's. We don't have any early category property in Position struct, so this theory doesn't explain why Position is high. Maybe we do lack of some kind of reset struct sharing?
Another thing which can be optimized is that, if we detect -moz-binding specified with non-null value in the initial styling, we can stop cascading immediately because we may need to restyle it anyway, and thus we can save us from cascading all the other properties. And for -moz-binding case, the result from getComputedValue wouldn't be affected because we need to restyle it anyway.
So I pushed the change mentioned in comment 10 to try, and it's not the magic silver bullet to save us from the big regression... but it does seem to reduce the regression by 2~5%.

See the comparison:
* gecko vs. stylo-chrome m-c: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8fc35439a857&newProject=try&newRevision=a4843b80ed7c&framework=1
* gecko vs. stylo-chrome with patch: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8fc35439a857&newProject=try&newRevision=023345a144d9&framework=1
* stylo-chrome m-c vs. stylo-chrome with patch: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a4843b80ed7c&newProject=try&newRevision=023345a144d9&framework=1

Based on this comparison, I guess this is probably some optimization worth doing.

For getComputedStyle on display:none element, I guess we can put some flag into ComputedValues, and force restyle the given element if the cascading was stopped by a display:none.

I know it may add some complexity... but given its improvement, probably we should do it.

Another observation is that, although this change reduces regression of tpaint and slightly ts_paint, it doesn't seem to change much for sessionrestore (actually even regress more? how can it?) and tresize. The latter two may have some different performance issue which we may need to diagnose in addition.
Depends on: 1422615
Depends on: 1428285
I think the phenomenon observed in comment 0 was mostly because Stylo is slower on full document restyle, and we unfortunately trigger lots of full document restyle because of document state pseudo-classes.

We are still slower on full document restyle, but given that we have eliminated lots of full document restyle, I'm going to close this bug, and use bug 1420423 as the single meta bug for tracking all the talos regression related to stylo-chrome.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(cam)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: