Closed Bug 859155 Opened 12 years ago Closed 11 years ago

Experimenting around with "mochiperf" tests

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jimm, Assigned: jimm)

References

Details

Attachments

(4 files, 6 obsolete files)

mochiperf tests wip 12 years ago Jim Mathies [:jimm] 42.73 KB, patch		Details \| Diff \| Splinter Review
mochiperf tests wip 12 years ago Jim Mathies [:jimm] 43.99 KB, patch		Details \| Diff \| Splinter Review
latest tests wip 12 years ago Jim Mathies [:jimm] 48.21 KB, patch		Details \| Diff \| Splinter Review
latest tests wip 12 years ago Jim Mathies [:jimm] 48.26 KB, patch		Details \| Diff \| Splinter Review
latest tests wip 12 years ago Jim Mathies [:jimm] 137.14 KB, patch		Details \| Diff \| Splinter Review
tests v.1 11 years ago Jim Mathies [:jimm] 26.72 KB, patch		Details \| Diff \| Splinter Review
test html content 11 years ago Jim Mathies [:jimm] 110.49 KB, patch		Details \| Diff \| Splinter Review
video and images 11 years ago Jim Mathies [:jimm] 2.23 MB, patch		Details \| Diff \| Splinter Review
helper functions .v1 11 years ago Jim Mathies [:jimm] 6.39 KB, patch	mbrubeck : review+	Details \| Diff \| Splinter Review
test set v.1 11 years ago Jim Mathies [:jimm] 19.87 KB, patch	mbrubeck : review+ sfoster : feedback+	Details \| Diff \| Splinter Review

Jim Mathies [:jimm]

Assignee

Description

•

12 years ago

Attached patch mochiperf tests wip (obsolete) — Details — Splinter Review

This is an experiment revolving around using mochitest to micro-benchmark the metro browser. The wip here is the mochitest side with a few test, plus the beginnings of a PerfTest object to facilitate measuring performance and reporting test data out to mochitest logs. The other side of this that I haven't put together yet is the graph server, which will likely be a simple python script that will monitor checkins on inbound and parse out mochitest-metro-chrome logs for test results. We don't have metro mochitest automated yet so I'll lok at this part after we do. The output from the perf tests currently looks like this: TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/perf/browser_tabs_01.js | PERF-TEST | DECLARE {"id":"FBD7A532-D63A-44B5-9744-5CB07CFD131A","name":"Tab Open","category":"Jim's Tests","subcategory":"UX","description":"Open twenty tabs in succession, closing each before the next is opened. Gives the browser time to settle in between. Lets the ui react however it is designed to. Strips outliers."} TEST-INFO | chrome://mochitests/content/metro/browser/metro/base/tests/perf/browser_tabs_01.js | PERF-TEST | RESULTS {"r1":201.31578947368422} The idea here is that all the data about the test displayed on the graph server is contained in the logs. Tests are identified by a unique id. Everything else is updated every time a log is parsed, so it's easy to change test names, categories, etc.. The web interface will render using a current snapshot of static json data dumped by the log parser. For maintenance, I figure the graph server will keep the historical data around (and append new data points to it) as long as the test is reporting. If any particular test stops reporting for a period of time (~1 month say) all the data is deleted and the test is dropped.

Jim Mathies [:jimm]

Assignee

Comment 1

•

12 years ago

Attached patch mochiperf tests wip (obsolete) — Details — Splinter Review

Attachment #734411 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

12 years ago

some changes to our existing harness which I think will make these tests more successful: * remove as many addons as possible (including specialpowers) * fresh start of the browser for each test I looked at doing this with mochitest plain+chrome, but allowing browser-chrome style tests would give us the best flexibility for writing test cases. The danger in hacking the harness to remove unecessary profile bits and to start the browser fresh for each test is that we end up with a confusing harness. I don't mind doing that up front to get tests going. Another option we have is to upload data from the test/testharness directly to datazilla instead of scraping. Adding into datazilla doesn't require a lot of work. To me that makes the most sense, but I would stand behind other decisions if they make as much or more sense.

Jim Mathies [:jimm]

Assignee

Comment 3

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #2) > some changes to our existing harness which I think will make these tests > more successful: > * remove as many addons as possible (including specialpowers) > * fresh start of the browser for each test > > I looked at doing this with mochitest plain+chrome, but allowing > browser-chrome style tests would give us the best flexibility for writing > test cases. The danger in hacking the harness to remove unecessary profile > bits and to start the browser fresh for each test is that we end up with a > confusing harness. I don't mind doing that up front to get tests going. Sure, lots we can do here. Maybe a mochitest variant would run these types of tests. Being able to control the profile that gets loaded might be really interesting too. Not worth worrying about while experimenting though. > Another option we have is to upload data from the test/testharness directly > to datazilla instead of scraping. Adding into datazilla doesn't require a > lot of work. To me that makes the most sense, but I would stand behind > other decisions if they make as much or more sense. I'm not at all familiar with datazilla. How does it work? Is it open ended enough such that checking in new tests / removing tests / serving up graphs could be 'automatic'? To make something like 'mochipref' really useful I think you want to avoid the *heavy* releng overhead we have in talos. Again though, this is still just an experiment, so I'm not looking to hook up a production system today. :) What I am interested in finding is a way to run a background log parser / web server someplace. I was wondering if people.moz supported that. Any ideas on where I might be able to hook something like that up?

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

12 years ago

Datazilla allows us to post arbitrary data without predefined test names, machine names, etc. We would need to create a project. The good thing is this data can easily be uploaded to from the test machines and accessible via an api. We can then write any front end to manage the data (i.e. your own stuff, chart.io, excel, custom datazilla pages, etc...). My goal is to make this as open an hackable to developers instead of the tightly closed system we have for Talos. So any decisions we would make here, that is how I will approach it. I don't know about running via people.m.o, but there might be a way to do it. If it was all javascript I could see that working.

Jim Mathies [:jimm]

Assignee

Comment 5

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #4) > Datazilla allows us to post arbitrary data without predefined test names, > machine names, etc. We would need to create a project. The good thing is > this data can easily be uploaded to from the test machines and accessible > via an api. We can then write any front end to manage the data (i.e. your > own stuff, chart.io, excel, custom datazilla pages, etc...). My goal is to > make this as open an hackable to developers instead of the tightly closed > system we have for Talos. So any decisions we would make here, that is how > I will approach it. Hmm, this would solve the problem of the server middle man. Is this open and available now such that I could experiment around with it, or is it still under development? If it's ready for experimenting, how should I go about creating a project and sending some test data to it?

Jim Mathies [:jimm]

Assignee

Comment 6

•

12 years ago

I found this, curious if it's up to date - http://datazilla.readthedocs.org/en/latest/

Jim Mathies [:jimm]

Assignee

Comment 7

•

12 years ago

Attached patch latest tests wip (obsolete) — Details — Splinter Review

Attachment #734650 - Attachment is obsolete: true

Jim Mathies [:jimm]

Assignee

Comment 8

•

12 years ago

Attached patch latest tests wip (obsolete) — Details — Splinter Review

I was planning on emails you about this to figure out next steps with datazilla. FWIW, what I have thus far is: 1) this set of tests that generate about 5 different metrics in the metro browser 2) a basic log parser that gets the data and writes it out into a nice json format suitable for reading into a web page. 3) a graphs page that turns the json into flot graphs What I'm unsure of going forward is how datazilla best fits into this picture. I think maybe it initially fits in well as a replacement for the json data file, with the python log parser writing to datazilla and the web page reading from datazilla. But maybe taking it one step further and writing data directly to datazilla from tests is possible without a lot of work. I can post the python log parser and graphs page if you're curious.

Attachment #735950 - Attachment is obsolete: true

Jim Mathies [:jimm]

Assignee

Comment 9

•

12 years ago

Attached patch latest tests wip (obsolete) — Details — Splinter Review

Attachment #736900 - Attachment is obsolete: true

Jim Mathies [:jimm]

Assignee

Updated

•

12 years ago

Blocks: metro-omtc

Jim Mathies [:jimm]

Assignee

Updated

•

12 years ago

Blocks: 866852

Jim Mathies [:jimm]

Assignee

Updated

•

12 years ago

No longer blocks: metro-omtc

Jim Mathies [:jimm]

Assignee

Comment 11

•

11 years ago

Attached patch tests v.1 (obsolete) — Details — Splinter Review

Assignee: nobody → jmathies

Attachment #740820 - Attachment is obsolete: true

Jim Mathies [:jimm]

Assignee

Comment 12

•

11 years ago

Attached patch test html content — Details — Splinter Review

Jim Mathies [:jimm]

Assignee

Comment 13

•

11 years ago

Attached patch video and images — Details — Splinter Review

Jim Mathies [:jimm]

Assignee

Comment 14

•

11 years ago

Attached patch helper functions .v1 — Details — Splinter Review

Now that we have mochitests going, I'd like to land these tests to collect some data for a few weeks to see how reliable the data sets are.

Attachment #760853 - Attachment is obsolete: true

Attachment #760856 - Flags: review?(mbrubeck)

Jim Mathies [:jimm]

Assignee

Comment 15

•

11 years ago

Attached patch test set v.1 — Details — Splinter Review

This is a set of 12 starter perf tests that measure the following: - browser deck shift times - new tab open performance - basic canvas rendering - composition perf (for omtc) - scroll perf / jank - basic video playback - message manager perf ping me on irc if you have any questions.

Attachment #760857 - Flags: review?(mbrubeck)

Gregory Szorc [:gps]

Comment 16

•

11 years ago

This is relevant to bug 867742 (for unifying test harnesses) and for Chris Manchester's summer project, which is to add machine readable output from Automation.

Gregory Szorc [:gps]

Comment 17

•

11 years ago

Comment on attachment 760857 [details] [diff] [review] test set v.1 Review of attachment 760857 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/metro/base/tests/mochiperf/Makefile.in @@ +33,5 @@ > +libs:: $(PERF_TESTS) > + $(INSTALL) $(foreach f,$^,"$f") $(DEPTH)/_tests/testing/mochitest/metro/$(relativesrcdir) > + > +libs:: $(BROWSER_PERF_RESOURCES) > + $(INSTALL) $(foreach f,$^,"$f") $(DEPTH)/_tests/testing/mochitest/metro/$(relativesrcdir)/res Use INSTALL_TARGETS (don't write your own one-off rules). If this is a common pattern for Metro, we should add a moz.build variable for defining Metro mochitests.

Jim Mathies [:jimm]

Assignee

Comment 18

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #17) > > Use INSTALL_TARGETS (don't write your own one-off rules). Always freaking forget that exists. Sorry, will them up.

Matt Brubeck (:mbrubeck)

Comment 19

•

11 years ago

Comment on attachment 760856 [details] [diff] [review] helper functions .v1 Review of attachment 760856 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/metro/base/tests/mochiperf/perfhelpers.js @@ +27,5 @@ > + * Declare a test which the graph server will pick up and track. > + * Must be called by every test on startup. Graph server will > + * search for result data between this declaration and the next. > + * > + * @param UID string for this particular test, case sensitive. naming nit: should this be more specific and say "UUID"? @@ +135,5 @@ > + > + computeMedian: function computeMedian(aArray, aOptions) { > + aArray.sort(function (a, b) { > + return a - b; > + }); Isn't this equivalent to "aArray.sort();"? @@ +228,5 @@ > + * Returns the total time ellapsed in milliseconds. If the stopwatch > + * has not been stopped, stops it. Returns zero if no time has been > + * accumulated. > + */ > + time: function time() { "If the stopwatch has not been stopped, stops it" -- this appears to be false.

Attachment #760856 - Flags: review?(mbrubeck) → review+

Matt Brubeck (:mbrubeck)

Comment 20

•

11 years ago

Comment on attachment 760857 [details] [diff] [review] test set v.1 Review of attachment 760857 [details] [diff] [review]: ----------------------------------------------------------------- These look good to me. I'd love to have a second opinion from Sam on these patches just to get a few more eyes on the code. I agree with gps that we should add a special build variable for mochitest-metro-chrome files, similar to the ones for mochitest-browser-chrome et al. ::: browser/metro/base/tests/mochiperf/browser_miscgfx_01.js @@ +139,5 @@ > + "heavy graphics demo using canvas."); > + let stopwatch = new StopWatch(true); > + let event = yield waitForEvent(win, "test", 20000); > + let msec = stopwatch.stop(); > + PerfTest.declareNumericalResult((event.detail.frames / msec) * 1000.0, "fps"); There are a bunch of "fps" benchmarks in this patch; should they use PerfTest.declareFrameRateResult?

Attachment #760857 - Flags: review?(sfoster)

Attachment #760857 - Flags: review?(mbrubeck)

Attachment #760857 - Flags: review+

Jim Mathies [:jimm]

Assignee

Comment 21

•

11 years ago

ping - sfoster for review. bbondy would like to turn omtc on by default so before he does that I'd like to land these to collect some prelim data.

Sam Foster [:sfoster] (he/him)

Comment 22

•

11 years ago

Comment on attachment 760857 [details] [diff] [review] test set v.1 I dont feel like I (yet) have all the context I need to give a useful review here, so please accept some drive-by comments. I understand that this gets us an easy way to measure and track some metro-specific performance metrics over time with (vs. talos which know little about tbh). To that end, minus the grumbling below, this seems direct and pragmatic and we should start hooking something up asap to see how it plays out. Provided we evaluate each perf test to assure ourselves that it will provide meaningful insights (not just data for its own sake), I tend to think some data is better than none and its easier to iterate on a thing that's working. That said, I find the nesting of test syntaxes odd. We have test() and gTest.push({... }) doing one thing, with the ok/is/etc. assertions we know and love, then the PerfTest.declareTest apparently declaring a test inside a test. Its just a naming oddness, but maybe one we can tuck away in our helpers so we can at least interact with a more intuitive API, and start establishing cow-paths. Are there perf assertions we want to make, or is gathering this data just a side-effect for later perusal? There are some basic, default thresholds we could plug in that would fail a test if it took too long. E.g. feedback on a button click (toggling :active) should take < 100ms. Similarly with immediate consequence of any user action - like focusing a field, dragging to highlight text and so on. More than .1 second (roughly) tends to break the illusion of direct manipulation; we should indicate something, even if we can't always complete the outcome of the action in so short a time. Those thresholds would expect to change over time as our and our users' expectations are raised and as we go from being acceptable to slick/snappy/shit-of-shovel fast. For perf data, we may want to log std deviation, to know if the average is meaningful at all? Out of the corner of my eye, I'm watching bug 867742, which is a proposal for a new testsuite module. Its not relevant here while we are experimenting, just something to keep in mind as our own automating testing needs take shape.

Attachment #760857 - Flags: review?(sfoster) → feedback+

Jim Mathies [:jimm]

Assignee

Comment 23

•

11 years ago

(In reply to Sam Foster [:sfoster] from comment #22) > Comment on attachment 760857 [details] [diff] [review] > test set v.1 > > I dont feel like I (yet) have all the context I need to give a useful review > here, so please accept some drive-by comments. I understand that this gets > us an easy way to measure and track some metro-specific performance metrics > over time with (vs. talos which know little about tbh). To that end, minus > the grumbling below, this seems direct and pragmatic and we should start > hooking something up asap to see how it plays out. Provided we evaluate each > perf test to assure ourselves that it will provide meaningful insights (not > just data for its own sake), I tend to think some data is better than none > and its easier to iterate on a thing that's working. > > That said, I find the nesting of test syntaxes odd. We have test() and > gTest.push({... }) doing one thing, with the ok/is/etc. assertions we know > and love, then the PerfTest.declareTest apparently declaring a test inside a > test. Its just a naming oddness, but maybe one we can tuck away in our > helpers so we can at least interact with a more intuitive API, and start > establishing cow-paths. Yes it does seem somewhat hackish. I can ingrain these tests into our testing logic in head.js more. We have gTests that have methods on them that do specific things, we could have a gBenchmarks that have a similar execution model. Test info/output would be available via methods on the test object and spit out by the logic that controls test runs. Results could be returned in a tear down type method. head.js could also stand to be broken up into a shared library of test helpers and two shims that execute two different types of tests. Am I headed in the right direction here? I'd still like to get these landed as is and work on something like this in a follow up when I have time over the next week or so. > Are there perf assertions we want to make, or is gathering this data just a > side-effect for later perusal? There are some basic, default thresholds we > could plug in that would fail a test if it took too long. E.g. feedback on a > button click (toggling :active) should take < 100ms. Similarly with > immediate consequence of any user action - like focusing a field, dragging > to highlight text and so on. More than .1 second (roughly) tends to break > the illusion of direct manipulation; we should indicate something, even if > we can't always complete the outcome of the action in so short a time. Yep, if we find something like that useful. Although we probably wouldn't tie test failures to something as specific as a button state change, I think we'd end up with more random orange than we could handle. But I can see us setting a min/max on the composition test in here, so that we could pick up serious regressions in gfx. > For perf data, we may want to log std deviation, to know if the average is > meaningful at all? Don't see why not. This could be spit out with the other data and graphed along side other data points. > Out of the corner of my eye, I'm watching bug 867742, which is a proposal > for a new testsuite module. Its not relevant here while we are > experimenting, just something to keep in mind as our own automating testing > needs take shape. If and when there's a push to better standardize mozilla in-tree testing we'll be a part of it.

Sam Foster [:sfoster] (he/him)

Comment 24

•

11 years ago

(summarizing #windev discussion) Yes something like a gBenchmarks sounds like the right direction to me. The distinction is testing and making assertions about the results, vs. running benchmarks and gathering data over time. So they share some requirements from the harness but have very different goals otherwise and the code should reflect that. In either case, lets land this and follow up with issues/enhancements.

Jim Mathies [:jimm]

Assignee

Comment 25

•

11 years ago

remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d3e29136b2c6 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/cc479282354e remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/df8ec3acd29f remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/aa3f71ea0daa

Jim Mathies [:jimm]

Assignee

Updated

•

11 years ago

Blocks: 885503

Jim Mathies [:jimm]

Assignee

Updated

•

11 years ago

Depends on: 885536

Ed Morley [:emorley]

Comment 26

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/d3e29136b2c6 https://hg.mozilla.org/mozilla-central/rev/cc479282354e https://hg.mozilla.org/mozilla-central/rev/df8ec3acd29f https://hg.mozilla.org/mozilla-central/rev/aa3f71ea0daa

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Jim Mathies [:jimm]

Assignee

Updated

•

11 years ago

Blocks: metro-testing

Jim Mathies [:jimm]

Assignee

Comment 27

•

11 years ago

experimental graphs page: http://www.mathies.com/mozilla/mochiperf.html note this only loads in fx (nightly?) currently.

Nobody; OK to take it and work on it

Updated

•

10 years ago

OS: Windows 8 Metro → Windows 8.1

You need to log in before you can comment on or make changes to this bug.