Get some telemetry probe indicating the expected maximum gain from stylo in the wild

RESOLVED INCOMPLETE

Status

()

P5
normal
RESOLVED INCOMPLETE
2 years ago
6 months ago

People

(Reporter: Ehsan, Unassigned)

Tracking

(Blocks: 1 bug)

unspecified
Points:
---

Firefox Tracking Flags

(firefox57 wontfix)

Details

(Reporter)

Description

2 years ago
It would be nice if we had some kind of measurement from the impact that we could get from turning on stylo in the wild.

I have been thinking about how to do this.  My rough understanding of the maximum expected gain we can expect is perfect parallelism across all available cores for the restyling phase.  Bug 1349315 covers a similar probe for WebRender, basing the measurement around the percentage of the rendering pipeline.  For stylo, the number that we are hoping to change, if my understanding is correct, would be GeckoRestyleManager::RebuildAllStyleData() or somewhere like that, and for the expected parallelism, we can use the available number of cores.

Bobby, does this sound sane to you?  Thanks!
Flags: needinfo?(bobbyholley)
(In reply to :Ehsan Akhgari from comment #0)
> It would be nice if we had some kind of measurement from the impact that we
> could get from turning on stylo in the wild.
> 
> I have been thinking about how to do this.  My rough understanding of the
> maximum expected gain we can expect is perfect parallelism across all
> available cores for the restyling phase.

Well, there are two parts: initial styling and restyling. Initial styling (which happens during pageload) is currently driven off of frame construction, which makes it rather hard to measure, because all the frame construction and styling bits are interleaved. Measuring time in the frame constructor would provide an upper bound on this, but it's hard to say exactly how much other stuff we'd be measuring.

Restyling in stock Gecko is driven off the RestyleTracker, which makes it easier to measure (this is why we have rudimentary markers for it in the profiler).

Then there's the added complication that initial styling itself is sometimes driven off the RestyleTracker in the case where we do lazy frame construction (which is the optimal and (I think) common case during pageload). That goes through CreateNeededFrames, which causes us to do lazy frame construction on various subtrees, and that frame construction triggers styling as discussed above. 

> Bug 1349315 covers a similar probe
> for WebRender, basing the measurement around the percentage of the rendering
> pipeline.  For stylo, the number that we are hoping to change, if my
> understanding is correct, would be
> GeckoRestyleManager::RebuildAllStyleData()

RebuildAllStyleData is mostly used when doing things like adding stylesheets to the DOM, which, while expensive and parallelizable, is not the common restyling case. ProcessPendingRestyles is more along the lines of what you want, but you need to be careful not to include change hint processing in there, and figure out how to model animation stuff, and various other things.

And _then_ you have the issue that all of the above only covers selector matching, which is only ~75% or so of style system overhead. Stylo does the property cascade eagerly in parallel, whereas Gecko does it on-demand during layout. This is probably less of an issue with restyle because restyle will recascade anything that was already cascaded in order to generate the correct change hints. But it does throw another wrench in our ability to measure the impact on pageload.

The overall problem is that (unlike the Stylo case), the work done by the Gecko style sy
> or somewhere like that, and for
> the expected parallelism, we can use the available number of cores.

For an upper bound, yes. But the extent to which we can parallelize depends a lot on the shape of the DOM and how much weird stuff (like XBL/NAC) is involved which needs some sequential processing.

> Bobby, does this sound sane to you?  Thanks!

It sounds tractable, but I think it would require a fair amount of tweaking and measuring from a relatively knowledgeable person to have much confidence in the data. And even then it's a complicated-enough measurement problem that I worry the data will be hard to interpret without spending a lot of time on it.

Especially given that the style system is not very platform-dependent, it seems safer to me to focus on controlled local measurements that we can reproduce (a la Hasal). Harald did some work on this for the style system last fall, but I'm not sure what came of it.
Flags: needinfo?(bobbyholley)
> Harald did some work on this for the style system last fall, but I'm not sure what came of it.

We looked at extracting that data from profiles. I'll have a prototype ready to show collect profiles in the wild.

What would be helpful is making sure we have markers for all parts that are affected by stylo, so aggregation doesn't depend on stacks.
(In reply to :Harald Kirschner :digitarald from comment #2)
> What would be helpful is making sure we have markers for all parts that are
> affected by stylo, so aggregation doesn't depend on stacks.

How expensive are markers? If they're cheap enough to add to all the calls to ResolveStyleFor that would work, but otherwise we have the interleaving problem described above.
:mstange can advise if this is the recommended way forward.
Flags: needinfo?(mstange)
Markers might be too expensive. PROFILER_LABELs are probably a better fit for this job: they're cheap to push and pop, and they're sampled at a time-based rate. And if you give them the category ProfileEntry::Category::CSS, they're easy to find by automated profile processing scripts.
Flags: needinfo?(mstange)
(Reporter)

Comment 6

2 years ago
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #1)
> (In reply to :Ehsan Akhgari from comment #0)
> > It would be nice if we had some kind of measurement from the impact that we
> > could get from turning on stylo in the wild.
> > 
> > I have been thinking about how to do this.  My rough understanding of the
> > maximum expected gain we can expect is perfect parallelism across all
> > available cores for the restyling phase.
> 
> Well, there are two parts: initial styling and restyling. Initial styling
> (which happens during pageload) is currently driven off of frame
> construction, which makes it rather hard to measure, because all the frame
> construction and styling bits are interleaved. Measuring time in the frame
> constructor would provide an upper bound on this, but it's hard to say
> exactly how much other stuff we'd be measuring.

Ugh, yeah you're right, I wasn't thinking about that.  :(

How would the initial styling change post-stylo?  Would it still happen during frame construction like Gecko does today?

> Restyling in stock Gecko is driven off the RestyleTracker, which makes it
> easier to measure (this is why we have rudimentary markers for it in the
> profiler).

Right.

> Then there's the added complication that initial styling itself is sometimes
> driven off the RestyleTracker in the case where we do lazy frame
> construction (which is the optimal and (I think) common case during
> pageload). That goes through CreateNeededFrames, which causes us to do lazy
> frame construction on various subtrees, and that frame construction triggers
> styling as discussed above. 

Yeah we do lazy frame construction by default for most content (notable current exception is editable regions, which bug 1348073 will hopefully fix).  But still the styling happening as part of the lazy frame constructions happens in the interleaved and hard to measure fashion you described above.

> > Bug 1349315 covers a similar probe
> > for WebRender, basing the measurement around the percentage of the rendering
> > pipeline.  For stylo, the number that we are hoping to change, if my
> > understanding is correct, would be
> > GeckoRestyleManager::RebuildAllStyleData()
> 
> RebuildAllStyleData is mostly used when doing things like adding stylesheets
> to the DOM, which, while expensive and parallelizable, is not the common
> restyling case. ProcessPendingRestyles is more along the lines of what you
> want, but you need to be careful not to include change hint processing in
> there, and figure out how to model animation stuff, and various other things.

OK, thanks.

> And _then_ you have the issue that all of the above only covers selector
> matching, which is only ~75% or so of style system overhead. Stylo does the
> property cascade eagerly in parallel, whereas Gecko does it on-demand during
> layout. This is probably less of an issue with restyle because restyle will
> recascade anything that was already cascaded in order to generate the
> correct change hints. But it does throw another wrench in our ability to
> measure the impact on pageload.

Ugh.  That's actually pretty bad since it the amount of overhead would be different depending on what properties change, if I'm understanding you correctly.  :(

> The overall problem is that (unlike the Stylo case), the work done by the
> Gecko style sy
> > or somewhere like that, and for
> > the expected parallelism, we can use the available number of cores.
> 
> For an upper bound, yes. But the extent to which we can parallelize depends
> a lot on the shape of the DOM and how much weird stuff (like XBL/NAC) is
> involved which needs some sequential processing.

Yeah of course.  And also on other things such as the number of actual free cores presumably and not just the number of present cores on the system.

> > Bobby, does this sound sane to you?  Thanks!
> 
> It sounds tractable, but I think it would require a fair amount of tweaking
> and measuring from a relatively knowledgeable person to have much confidence
> in the data. And even then it's a complicated-enough measurement problem
> that I worry the data will be hard to interpret without spending a lot of
> time on it.
> 
> Especially given that the style system is not very platform-dependent, it
> seems safer to me to focus on controlled local measurements that we can
> reproduce (a la Hasal). Harald did some work on this for the style system
> last fall, but I'm not sure what came of it.

Hmm, thinking about what you said above, it worries me that any naive measurement that I had in my mind would basically completely miss a big part of the picture.  :/
Blocks: 1243581
Priority: -- → P2
(In reply to :Ehsan Akhgari (busy) from comment #6)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #1)
> > (In reply to :Ehsan Akhgari from comment #0)
> > > It would be nice if we had some kind of measurement from the impact that we
> > > could get from turning on stylo in the wild.
> > > 
> > > I have been thinking about how to do this.  My rough understanding of the
> > > maximum expected gain we can expect is perfect parallelism across all
> > > available cores for the restyling phase.
> > 
> > Well, there are two parts: initial styling and restyling. Initial styling
> > (which happens during pageload) is currently driven off of frame
> > construction, which makes it rather hard to measure, because all the frame
> > construction and styling bits are interleaved. Measuring time in the frame
> > constructor would provide an upper bound on this, but it's hard to say
> > exactly how much other stuff we'd be measuring.
> 
> Ugh, yeah you're right, I wasn't thinking about that.  :(
> 
> How would the initial styling change post-stylo?  Would it still happen
> during frame construction like Gecko does today?

With Stylo, we kick off a parallel traversal as late as possible. With eager frame construction, this means eagerly traversing subtrees right before constructing frames for them, which can give us some parallelism if the subtree is large, but otherwise not much. The lazy frame construction case is much better, because in that case we basically traversal the entire DOM (just like we would with a restyle) and do the frame construction in the post-pass. See bug 1338921.

> 
> > Restyling in stock Gecko is driven off the RestyleTracker, which makes it
> > easier to measure (this is why we have rudimentary markers for it in the
> > profiler).
> 
> Right.
> 
> > Then there's the added complication that initial styling itself is sometimes
> > driven off the RestyleTracker in the case where we do lazy frame
> > construction (which is the optimal and (I think) common case during
> > pageload). That goes through CreateNeededFrames, which causes us to do lazy
> > frame construction on various subtrees, and that frame construction triggers
> > styling as discussed above. 
> 
> Yeah we do lazy frame construction by default for most content (notable
> current exception is editable regions, which bug 1348073 will hopefully
> fix).  But still the styling happening as part of the lazy frame
> constructions happens in the interleaved and hard to measure fashion you
> described above.
> 
> > > Bug 1349315 covers a similar probe
> > > for WebRender, basing the measurement around the percentage of the rendering
> > > pipeline.  For stylo, the number that we are hoping to change, if my
> > > understanding is correct, would be
> > > GeckoRestyleManager::RebuildAllStyleData()
> > 
> > RebuildAllStyleData is mostly used when doing things like adding stylesheets
> > to the DOM, which, while expensive and parallelizable, is not the common
> > restyling case. ProcessPendingRestyles is more along the lines of what you
> > want, but you need to be careful not to include change hint processing in
> > there, and figure out how to model animation stuff, and various other things.
> 
> OK, thanks.
> 
> > And _then_ you have the issue that all of the above only covers selector
> > matching, which is only ~75% or so of style system overhead. Stylo does the
> > property cascade eagerly in parallel, whereas Gecko does it on-demand during
> > layout. This is probably less of an issue with restyle because restyle will
> > recascade anything that was already cascaded in order to generate the
> > correct change hints. But it does throw another wrench in our ability to
> > measure the impact on pageload.
> 
> Ugh.  That's actually pretty bad since it the amount of overhead would be
> different depending on what properties change, if I'm understanding you
> correctly.  :(

Not so much which properties change (since any change in properties would result in a new rule node, and throwing away all the old style structs), but rather which computed values were needed by layout in order to render the page.


> 
> > The overall problem is that (unlike the Stylo case), the work done by the
> > Gecko style sy
> > > or somewhere like that, and for
> > > the expected parallelism, we can use the available number of cores.
> > 
> > For an upper bound, yes. But the extent to which we can parallelize depends
> > a lot on the shape of the DOM and how much weird stuff (like XBL/NAC) is
> > involved which needs some sequential processing.
> 
> Yeah of course.  And also on other things such as the number of actual free
> cores presumably and not just the number of present cores on the system.
> 
> > > Bobby, does this sound sane to you?  Thanks!
> > 
> > It sounds tractable, but I think it would require a fair amount of tweaking
> > and measuring from a relatively knowledgeable person to have much confidence
> > in the data. And even then it's a complicated-enough measurement problem
> > that I worry the data will be hard to interpret without spending a lot of
> > time on it.
> > 
> > Especially given that the style system is not very platform-dependent, it
> > seems safer to me to focus on controlled local measurements that we can
> > reproduce (a la Hasal). Harald did some work on this for the style system
> > last fall, but I'm not sure what came of it.
> 
> Hmm, thinking about what you said above, it worries me that any naive
> measurement that I had in my mind would basically completely miss a big part
> of the picture.  :/

That matches my view, in general.
Priority: P2 → P5
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57: --- → wontfix
Ehsan, I assume we're not still planning on doing this.  Can we close this bug?
Flags: needinfo?(ehsan)
(Reporter)

Comment 10

6 months ago
Sure, we no longer have anything to compare against, so this bug has lost its original point.  :-)
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Flags: needinfo?(ehsan)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.