Closed Bug 1324668 (stylo-perf-test) Opened 7 years ago Closed 7 years ago

Performance test for Stylo

Categories

(Core :: CSS Parsing and Computation, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: shinglyu, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

We need a formal performance test for stylo to justify building it.

# What we want to measure?
There are many things we can measure, from low-level to user-perceived performance

* restyle counts
* (other style system internal metrics, please suggest)
* profiler time for the style system
* user-perceived page load time

# What tools do have?
* Style system internal metrics
  * We have the restyle time monitor built-in in the early Stylo POC, we can reuse that and make it emit the number to a file or log.
  * We need more code in Stylo to measure other metrics.

* Profiler
  * Gecko profiler[0]: will impact the performance; need some effort to automate
  * Google's tracing framework[1]: looks promising, but the Hasel team says they had problem installing the Firefox extension
  * Talos[2]: Our exisiting performance framework; Not sure how much effort required to add our stylo test.

* User-perceived page load time
  * The Hasal project[3]: mozilla internal project; measures the page load time from video recordings;
  * WebPageTest[4]: External project; Need some effort to automate
  * Servo's performance test[5]: Also has a Gecko test script, uses selenium and show data on Perfherder[6]

# Presentation
How we are going to present the data?

* Perfherder[6]: Standard graph dashboard for Talos[2], Hasal[3], and Servo[5]


# Plan
* Phase 1: Set up baseline test with the Hasal team. Run our Stylo build with Hasal's page load test (Wikipedia and Google search result)
* Phase 2: Get one easy low-level metrics (to be identified) online and show on Perfherder. Compare with stock gecko style system.
* Phase 3: Implement all remaining metrics for comparison

# References
[0] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler
[1] http://google.github.io/tracing-framework/
[2] https://wiki.mozilla.org/Buildbot/Talos
[3] https://wiki.mozilla.org/Hasal
[4] https://www.webpagetest.org/
[5] https://github.com/servo/servo/tree/master/etc/ci/performance
[6] https://treeherder.mozilla.org/perf.html#/graphs
Bobby, do you have any opinion on this? Especially on what kind of internal metrics do we need?
Flags: needinfo?(bobbyholley)
Depends on: 1322656
I think a few metrics that would be helpful for measuring general Style system performance (not for comparing to Gecko), would be how many elements have we run selector matching for, and how many elements have we styled (with or without selector matching).

I don't know if that's inside the scope of this bug, I can open another to discuss this.
(Sorry for being slow here, tied up with various other things. I'll respond here tomorrow).
Thanks for putting this together Shing! Here are my thoughts.

Broadly speaking, there are two reasons to do performance tests:
(1) To guide our engineering and prevent regressions
(2) To demonstrate the real-world impact of our project, and convince decision-makers that users will notice the difference.

Harald's team is worrying about (2), so I think we shouldn't spend too much time on it. We can certainly help them out when needed (and I think they plan to work closely with the Hasal team), but I don't think we need to drive it.

(1) is very important from an engineering perspective, and what I think we should focus on.

In that vein, the most useful thing (by far) would be a dead-simple mechanism for adding performance tests. Adding a basic performance test to Gecko should be as easy as adding a crashtest, and right now it isn't. We should have some mechanism where we can drop an html file in a directory (just like we do with crashtests) and have the time spent loading that file (or between start() and end() calls) tracked in a dashboard.

It doesn't need to be super precise. For a lot of the style system optimizations we do, it shouldn't be hard to write a test that runs 10x or 100x slower if the optimization is missing. We might be able to borrow some of the methodology from |cargo bench| to run the test multiple times to get the most accurate numbers. We could eventually extend this mechanism to track various metrics (like the ones in comment 2), but simple end-to-end time should be good enough for a first pass.

It's probably worth talking to the Talos team to get their opinion about this, since maybe there's some reason nobody's done something like this before. But I can't think of one, so we might as well push forward unless someone gives us a reason to do otherwise.

There are other things that could be useful too (like hooking |cargo bench| up to perfherder), but the mechanism above would be the most useful to us, so I think we should start with that.

I'll be AFK the next week or so, but Cameron and I have talked about this, so he'd be a good person for any followup discussion in the mean time.
Flags: needinfo?(bobbyholley)
Depends on: 1326140
(In reply to Bobby Holley (PTO) (busy with Stylo) from comment #4)
> Harald's team is worrying about (2), so I think we shouldn't spend too much
> time on it. We can certainly help them out when needed (and I think they
> plan to work closely with the Hasal team), but I don't think we need to
> drive it.

I've been working with the Hasal team to get Stylo to run in their framework. I'll keep acting as the contact window for the Hasal team about Stylo-specific topics.

> We
> should have some mechanism where we can drop an html file in a directory
> (just like we do with crashtests) and have the time spent loading that file
> (or between start() and end() calls) tracked in a dashboard.

I have a simple test framework for that inside the servo tree[0]. We could start with that, but you can only measure things that can be printed from JavaScript. Or I could make it easy to add probes (in gecko) and log parsers so we can easily add tests. I'm not sure where we should put that in m-c, maybe I can put that in a GitHub repo first.

> We might be able to borrow some
> of the methodology from |cargo bench| to run the test multiple times to get
> the most accurate numbers. 

That was already done in [0]

> We could eventually extend this mechanism to
> track various metrics (like the ones in comment 2), but simple end-to-end
> time should be good enough for a first pass.

Also done in [0]

> It's probably worth talking to the Talos team to get their opinion about
> this, since maybe there's some reason nobody's done something like this
> before. But I can't think of one, so we might as well push forward unless
> someone gives us a reason to do otherwise.

I believe the tp5 test in Talos already done the end-to-end test, but nobody actually run Stylo in it yet. I'll work with the Talos team to get it running. It'll be much tricker to build a easy path to add tests. I'll also look into that, but will probably take more time. 

[0]: https://github.com/servo/servo/tree/master/etc/ci/performance
(In reply to Shing Lyu [:shinglyu] from comment #5)
> (In reply to Bobby Holley (PTO) (busy with Stylo) from comment #4)
> > Harald's team is worrying about (2), so I think we shouldn't spend too much
> > time on it. We can certainly help them out when needed (and I think they
> > plan to work closely with the Hasal team), but I don't think we need to
> > drive it.
> 
> I've been working with the Hasal team to get Stylo to run in their
> framework. I'll keep acting as the contact window for the Hasal team about
> Stylo-specific topics.

Great!

> 
> > We
> > should have some mechanism where we can drop an html file in a directory
> > (just like we do with crashtests) and have the time spent loading that file
> > (or between start() and end() calls) tracked in a dashboard.
> 
> I have a simple test framework for that inside the servo tree[0]. We could
> start with that, but you can only measure things that can be printed from
> JavaScript. Or I could make it easy to add probes (in gecko) and log parsers
> so we can easily add tests. I'm not sure where we should put that in m-c,
> maybe I can put that in a GitHub repo first.

I think we should put it in m-c. There are enough differences between Firefox and Servo that Quantum needs performance tests to run on Firefox proper.

As for where exactly this should live in m-c, we should probably discuss that with the a-team.

> 
> > We might be able to borrow some
> > of the methodology from |cargo bench| to run the test multiple times to get
> > the most accurate numbers. 
> 
> That was already done in [0]

Great! Those numbers shouldn't depend on Gecko vs Servo, so we can just write benchmarks and track them on the Servo dashboard. What's the link to that, by the way?

> 
> > We could eventually extend this mechanism to
> > track various metrics (like the ones in comment 2), but simple end-to-end
> > time should be good enough for a first pass.
> 
> Also done in [0]
> 
> > It's probably worth talking to the Talos team to get their opinion about
> > this, since maybe there's some reason nobody's done something like this
> > before. But I can't think of one, so we might as well push forward unless
> > someone gives us a reason to do otherwise.
> 
> I believe the tp5 test in Talos already done the end-to-end test, but nobody
> actually run Stylo in it yet. I'll work with the Talos team to get it
> running.

Right, getting the existing tp5 pageset running on stylo is an important step. But tp5 is about real-world testcases, which are carefully chosen and curated. I _also_ want the ability to add simple, handwritten, relatively-synthetic testcases without adding them to tp5 (since adding things to tp5 is kind of a Big Deal).

> It'll be much tricker to build a easy path to add tests. I'll also
> look into that, but will probably take more time.

Yeah. I'm not sure whether it should be part of Talos somehow or something separate. Worth discussing with the a-team.
Alias: stylo-perf-test
Depends on: 1330069
Blocks: stylo-nightly
No longer blocks: stylo
Priority: -- → P2
Depends on: 1330550
Depends on: 1330589
Depends on: 1330592
Depends on: 1337643
Blocks: stylo-tooling
No longer blocks: stylo-nightly
Summary: [Stylo] Performance test for Stylo → Performance test for Stylo
Depends on: 1259311
Resolving this bug as fixed because we are already testing Stylo with Talos, tp6, Hasal, and AWSY.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.