Closed Bug 1013650 (mozbench) Opened 6 years ago Closed 4 years ago

Browser/Game Benchmark Automation (mozbench)

Categories

(Testing :: General, defect, P3)

All
Other
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: cpeterson, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta)

>>Problem:
The Games and JS teams would like to run automated browser benchmarks on all release channels for Firefox vs Chrome, Firefox OS vs Android.

>>Solution:
Alan Kligman (akligman) has written most of the automation framework (mozbench) for test running and reporting:

https://github.com/modeswitch/mozbench

Unfortunately, Alan has been reassigned to a different project. He has since been working with Joel Maher to bring him up to speed on mozbench. Alan estimates that there are "a couple weeks" of work left to finish the mozbench framework and then automate the test suites.

The primary test suites are those used by Tom's Hardware Guide's "Web Browser Grand Prix":

https://wiki.mozilla.org/Web_Browser_Grand_Prix

>>Mozilla Top Level Goal:
The goal of the mozbench project is to track performance improvements and regressions for Firefox browser and Firefox OS compared to the competition, Chrome and Android, respectively.

>>Existing Bug:
No bug

>>Per-Commit:
The mozbench tests would be run either weekly or nightly. Its goal is to track trends, not bisect changesets.

>>Data other than Pass/Fail:
mozbench will report performance metrics and has its own data collection and reporting server owned by Kyle Lahnakoski (klahnakoski).

>>Prototype Date:
Flexible, but hopefully 1–2 months.

>>Production Date:
No hard deadline.

>>Most Valuable Piece:
Completing and standing up the mozbench framework. Automation of individual test suites can be done later.

>>Responsible Engineer:
akligman developed the framework and Kamil Jozwiak (kjozwiak) will be maintaining the test hardware in Toronto

>>Manager:
Project manager Chris Peterson (cpeterson) and JS engineering manager Naveed Ihsanullah (nihsanullah)

>>Other Teams/External Dependencies:
Martin Best (mbest) from the Games Initiative and Milan Sreckovic (msreckovic) from the GFx team are also interested in these tests.

>>Additional Info:
mozbench differs from Are We Fast Yet (AWFY) in that it runs the actual browser, not just the JS shell.
Summary: Brower Benchmark Automation (mozbench) → Browser Benchmark Automation (mozbench)
The current repo contains the skeleton of a new continuous-integration framework.  What's the motivation for creating this versus using an off-the-shelf solution like Jenkins?
Alan: can you please share your design rationale with Jonathan?

My understanding was that the test framework needed to be mobile-friendly for automation on device benchmarks.
Flags: needinfo?(akligman)
* small, lightweight (JS with only pure-JS deps)
* portable wherever Node is (covers Android & FxOS)
* plugs into our existing reporting system
* more scalable than perfy

Not sure which of those are covered by Jenkins.
Flags: needinfo?(akligman)
(In reply to Alan K [:ack] from comment #3)
> * small, lightweight (JS with only pure-JS deps)
> * portable wherever Node is (covers Android & FxOS)
> * plugs into our existing reporting system
> * more scalable than perfy
> 
> Not sure which of those are covered by Jenkins.

What happened to the existing automation system we developed last summer that we already had working on android and Firefox OS for this exact effort? Beyond my general aversion to re-inventing wheels, I want to understand why it was evidently deemed insufficient since that might be important to decide on the proper approach here so that we don't repeat the same mistakes.
I think it might be helpful to better understand the ultimate goal, as well:

* What platforms do we want to run the tests on?
* Do we have existing hardware for this or will we need to order more?
* Where do the tests report results, and what interface is used for that?
(In reply to Jonathan Griffin (:jgriffin) from comment #5)
> I think it might be helpful to better understand the ultimate goal, as well:
> 
> * What platforms do we want to run the tests on?

Android, Firefox OS, Linux, OS X, and Windows. I believe Android, Linux, and Windows tests are already running an older version of Alan's test framework.

> * Do we have existing hardware for this or will we need to order more?

We have existing hardware (desktop machines and Android devices in Toronto) running an older version of Alan's test framework.

> * Where do the tests report results, and what interface is used for that?

The framework posts (JSON?) test results to a reporting database server maintained by Kyle Lahnakoski.
Thanks.  Do we not care to run these tests on B2G?

For the desktop platforms, do we care which flavors we use?

For Android, do we want to test against a variety of hardware, or only one or two reference devices?

I agree with Clint in that I'd prefer to stand this up using existing tools if possible, rather than develop a custom CI/runner.  I'd like to assign an engineer to examine the currently running tests, identify pain points, and come up with a least-cost solution to getting these running long-term.
Jonathan: sorry for the late reply.

(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> Thanks.  Do we not care to run these tests on B2G?
> 
> For the desktop platforms, do we care which flavors we use?

We have three dimensions of testing: browsers, operating systems, and benchmarks. After talking with ack, we think the best place to start would be comparing Firefox Nightly and Chrome Canary running a webaudio benchmark on a recent version of Windows. ack has a particular webaudio benchmark in mind.


Operating system priorities:

1. Latest version of Windows
2. Compare current version of B2G to previous B2G releases
3. Compare current version of B2G to Android (Fennec and Chrome) on same 
hardware
4. Latest version of OS X
5. Linux
6. Android
7. Older versions of Windows like 7 or XP?

Browser priorities:

1. Compare Firefox Nightly vs Chrome Canary
2. Compare Firefox's Nightly, Aurora, Beta, and Release channels
3. Compare Chrome's Canary, Beta, and Release channels

Benchmark priorities:

1. webaudio (waiting for link from ack)
2. WebGL and canvas
3. TBD: Other random benchmarks like azakai's MASSIVE (for asm.js) or Browsermark and Peacekeeper (for Tom's Hardware)


> For Android, do we want to test against a variety of hardware, or only one
> or two reference devices?

Only one or two reference Android devices would be needed, but Android is a lower priority platform than desktop.


> I agree with Clint in that I'd prefer to stand this up using existing tools
> if possible, rather than develop a custom CI/runner.  I'd like to assign an
> engineer to examine the currently running tests, identify pain points, and
> come up with a least-cost solution to getting these running long-term.

Yes, that makes sense to use the team's standard tools.

What are the next steps forward? Mark Cote scheduled a "Planning for games benchmarking" meeting for next week. What information should I bring to that meeting or share before then?
Blocks: WBGP
Flags: needinfo?(jgriffin)
Alan's performance results were sent to an ES cluster.  A small set of dashboards used the cluster to compare the various versions of FF and Google Chrome: http://people.mozilla.org/~klahnakoski/perfy/Perfy-Overview.html#
(In reply to Kyle Lahnakoski [:ekyle] from comment #9)
> Alan's performance results were sent to an ES cluster.  A small set of
> dashboards used the cluster to compare the various versions of FF and Google
> Chrome: http://people.mozilla.org/~klahnakoski/perfy/Perfy-Overview.html#

Ah, thanks, that was my last question.  :)
Flags: needinfo?(jgriffin)
Depends on: 974464
Keywords: meta
Assignee: jgriffin → dminor
Status: NEW → ASSIGNED
Depends on: 1039637
Depends on: 1050880
Depends on: 1050645
Alias: mozbench
Summary: Browser Benchmark Automation (mozbench) → Browser/Game Benchmark Automation (mozbench)
Depends on: 1051798
Depends on: 1056094
Depends on: 1066056
Depends on: 1066657
Depends on: 1066665
Depends on: 1067403
Depends on: 970432
Depends on: 1079438
Depends on: 1101551
Depends on: 1103035
Depends on: 1103060
Depends on: 1103062
Depends on: 1103063
Depends on: 1103064
Depends on: 1103134
Depends on: 1103995
Depends on: 1110270
Depends on: 1113227
Depends on: 1113611
Depends on: 1128908
Priority: -- → P3
I'm in maintenance mode for mozbench, so I'm unassigning myself in case others are interested in picking up the remaining open bugs.
Assignee: dminor → nobody
Status: ASSIGNED → NEW
We're replacing mozbench with arewefastyet.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.