Closed
Bug 1156907
Opened 9 years ago
Closed 8 years ago
tracking bug to migrate counters from graph server into treeherder
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(firefox40 affected)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox40 | --- | affected |
People
(Reporter: jmaher, Assigned: jmaher)
References
Details
Attachments
(2 files, 2 obsolete files)
oh we have counters, they all are uploaded quite nicely to graph server- but we don't have a way to currently display them in treeherder. Graph server has a huge list of data point for each counter, those get summarized into a single data point and put on a graph. For treeherder, we have the huge list of data points in the talos json blob, but these are stored in the aux_data- which is not ingested. Here is a list of counters we collect: tp5o: linux*: main_rss, private bytes, xres osx10: main_rss, responsiveness win: main_rss, private bytes, responsiveness, %cpu win7: main_rss, private bytes, responsiveness, %cpu, modlistbytes win8: responsiveness xperf (win7 only): tp5n_main_normal_fileio_paint, tp5n_main_normal_netio_paint tp5n_main_startup_fileio_paint, tp5n_main_startup_netio_paint tp5n_nonmain_normal_fileio_paint, tp5n_nonmain_normal_netio_paint tp5n_main_startup_fileio_paint A next step is to look at the raw data produced by these and see if we care about the replicates or if we can just summarize the data in a few points and upload that instead of all the replicates. One quirk will be that treeherder will magically be adding these as subtests to the summary, which we want to avoid- it will take some thought. Once we understand the data we can start adjusting it and fixing talos, graphserver, perfherder. We should also ensure this data is useful and assess how stable,reliable, and meaningful it is.
Assignee | ||
Comment 1•9 years ago
|
||
:aklotz, we have a list of counters above for xperf. These are tracked and rarely alerted upon for a regression. We find many issues with the mainthreadIO stuff- rad! What are your thoughts on these? Should we keep them, toss them, modify them?
Flags: needinfo?(aklotz)
Comment 2•9 years ago
|
||
I *think* I know what you're asking: I vote for keeping them for now. Obviously for catching regressions we prefer that we just automatically go orange. OTOH I think that keeping that data can be useful for ensuring that the test itself is doing what we want it to do.
Flags: needinfo?(aklotz)
Assignee | ||
Comment 3•9 years ago
|
||
Hey Will. I am looking to make this a reality and I have updated the output from talos to be better informed. 1) tp5 (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmaher@mozilla.com-beb62e8bce0d/try-win32/try_win7-ix_test-tp5o-bm111-tests1-windows-build232.txt.gz): TALOSDATA: [{"talos_counters": {"responsiveness": {"max": "2565.56", "mean": "82.14"}, "Modified Page List Bytes": {"max": "108756992.00", "mean": "69332015.87"}, "% Processor Time": {"max": "100.00", "mean": "75.84"}, "Private Bytes": {"max": "222863360.00", "mean": "192576694.36"}, "Main_RSS": {"max": "239759360.00", "mean": "187039885.24"}}, "test_machine": {"platform": "x86", "osversion": "6.1.7601", "os": "win", "name": "T-W732-IX-061"}, "testrun": {"date": 1431722432, "suite": "tp5o", "options": {"responsiveness": true, "tpmozafterpaint": true, "tpchrome": true, "tppagecycles": 25, "tpcycles": 1, "tprender": false, "shutdown": false, "cycles": 1, "rss": true}}, "results": {"tudou.com": [491.0, 253.0, 246.0, 242.0, 251.0, 245.0, 245.0, 244.0, 244.0, 251.0, 245.0, 245.0, 253.0, 245.0, 251.0, 246.0, ...}}] 2) xperf (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmaher@mozilla.com-beb62e8bce0d/try-win32/try_win7-ix_test-xperf-bm111-tests1-windows-build181.txt.gz): 37929860.0}, "main_normal_netio": {"mean": 7136403.0}, "main_startup_netio": {"mean": 13.0}, "nonmain_startup_fileio": {"mean": 763172.0}, "nonmain_normal_fileio": {"mean": 528476482.0}, "main_normal_fileio": {"mean": 40102144.0}}, "test_machine": {"platform": "x86", "osversion": "6.1.7601", "os": "win", "name": "T-W732-IX-089"}, "testrun": {"date": 1431967004, "suite": "tp5n", "options": {"responsiveness": false, "tpmozafterpaint": true, "tpchrome": true, "tppagecycles": 1, "tpcycles": 1, "tprender": false, "shutdown": false, "cycles": 1, "rss": true}}, "results": {}, "test_build": {"version": "40.0a1", "revision": "beb62e8bce0d", "id": "20150506182320", "branch": "Try-Non-PGO", "name": "Firefox"}}] in addition I have added a printout of the counters raw values into the raw log in case there is a need to look for detailed information. Please let me know what you think.
Comment 4•9 years ago
|
||
Comment on attachment 8607146 [details] [diff] [review] initial stab at fixing talosdata json to have talos_counters (0.9) Review of attachment 8607146 [details] [diff] [review]: ----------------------------------------------------------------- As discussed on irc, I'd prefer an approach which just added a custom class which just wrote the TALOSDATA structure to the logs, since we're going to stop submitting to DZ so soon.
Attachment #8607146 -
Flags: feedback?(wlachance) → feedback-
Assignee | ||
Comment 5•9 years ago
|
||
* add counters to talosdata json blob in the log file * print out counter raw data in the log * removal of talos_aux and xperf_results from talosdata * remove of datazilla from the output NOTE: we still have oddities with datazilla ism's. Especially as we still accept a datazilla-url. Once this is deployed on all branches, we can tackle removing the mozharness bits to add datazilla-url and then we can cleanup perfconfig, etc.
Attachment #8607146 -
Attachment is obsolete: true
Attachment #8608856 -
Flags: review?(wlachance)
Comment 6•9 years ago
|
||
Comment on attachment 8608856 [details] [diff] [review] remove datazilla object and add a perfherder object (1.0) Review of attachment 8608856 [details] [diff] [review]: ----------------------------------------------------------------- This looks great! Would like to see one more iteration of this patch before landing, but it's quite close. Do we still need to include the datazilla client libraries in talos? If not then I'd just as soon remove the dependencies now: http://hg.mozilla.org/build/talos/file/3618f52d7d30/requirements.txt#l8 http://hg.mozilla.org/build/talos/file/3618f52d7d30/create_talos_zip.py#l37 Might be worth grepping for other instances of datazilla in the source, if it's not too much trouble. I don't want to hold up this bug on account of that, but if it only takes 5-10 minutes... ::: talos/output.py @@ +209,5 @@ > # (jhammel: we probably should do this in e.g. results.py vs in graphserver-specific code anyway) > > # exclude counters whose values are tuples (bad for graphserver) > if len(values) > 0 and isinstance(values[0], list): > + print "---------------------------------" Are these seperators really necessary? I don't care that much either way, but if they're not hugely helpful I'd prefer to leave them out. @@ +386,5 @@ > linkName = '%s: %s' % (testname, result) > print 'RETURN: %s' % link_format % (url, linkName) > > > +class PerfHerderOutput(Output): PerfHerder -> Perfherder :) @@ +393,3 @@ > > def output(self, results, results_url, tbpl_output): > """output to the results_url This comment now seems to be a lie. It looks like we output the result as a log message, and optionally also to a file. @@ +400,5 @@ > # parse the results url > results_url_split = utils.urlsplit(results_url) > results_scheme, results_server, results_path, _, _ = results_url_split > > + utils.info("TALOSDATA: %s" % json.dumps(results)) We should probably put a comment in here that this is what treeherder/perfherder actually parses. @@ +408,1 @@ > f.close() Not that important, but while we're here, let's turn this into a oneliner: json.dump(results, file(results_path, 'w'), indent=2, sort_keys=True) @@ +409,4 @@ > > def post(self, results, server, path, scheme, tbpl_output): > + """conform to current code- not needed for perfherder""" > + pass Is there any way we can remove the call to this and take it out? @@ +461,5 @@ > + # build information > + browser_config = self.results.browser_config > + > + # a place to put results > + retVal = [] can you name this something more descriptive like test_results ? Then you wouldn't need the comment. :) @@ +464,5 @@ > + # a place to put results > + retVal = [] > + > + for test in self.results.results: > + testVal = {'test_machine': {}, likewise, let's call this test_result ::: talos/ttest.py @@ +638,5 @@ > > # include global (cross-cycle) counters > test_results.all_counter_results.extend([{key: value} for key, value in global_counters.items()]) > + print "Outputting all counters" > + print "-----------------------------------" Again, I'd prefer to leave out the seperators (and the "Outputting all ..." message) above if it's not genuinely helpful.
Attachment #8608856 -
Flags: review?(wlachance) → review-
Assignee | ||
Comment 7•9 years ago
|
||
feedback addressed! there is still a few instance of 'datazilla' in the code, specifically in perfconfigurator and things that use datazilla_urls. I am going for majority wins here knowing full well that there is other work involved in removing these params.
Attachment #8608856 -
Attachment is obsolete: true
Attachment #8608929 -
Flags: review?(wlachance)
Comment 8•9 years ago
|
||
Comment on attachment 8608929 [details] [diff] [review] remove datazilla object and add a perfherder object (1.1) Review of attachment 8608929 [details] [diff] [review]: ----------------------------------------------------------------- Ok, looks good! Haven't tested it of course. r+ assuming it still works as expected and with my last nit addressed. ::: talos/output.py @@ +455,5 @@ > + > + # build information > + browser_config = self.results.browser_config > + > + testResults = [] Nit: you should use test_results, test_result to be consistent with the rest of the surrounding code here
Attachment #8608929 -
Flags: review?(wlachance) → review+
Assignee | ||
Comment 9•9 years ago
|
||
https://hg.mozilla.org/build/talos/rev/a6052c33d420
Assignee | ||
Comment 10•9 years ago
|
||
now for the perfherder side of the equation!
Attachment #8609395 -
Flags: review?(wlachance)
Comment 11•9 years ago
|
||
Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 Still needs a bit of work, but on right track
Attachment #8609395 -
Flags: review?(wlachance) → review-
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 updated the PR
Attachment #8609395 -
Flags: review- → review?(wlachance)
Comment 13•9 years ago
|
||
Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 Looks good except for a few remaining nits + some minor problems (e.g. unrelated non-trivial changes which should be applied in another a patch). r+ with those addressed.
Attachment #8609395 -
Flags: review?(wlachance) → review+
Comment 14•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/f6a5073e1a10ccf5b80cc31e6150893083240647 Bug 1156907 - migrate counters from graph server into treeherder https://github.com/mozilla/treeherder/commit/e2fdb69da9fbb77b21ba9fe0dbd588b0a0ea7b47 Merge pull request #558 from jmaher/perfcounters Bug 1156907 - migrate counters from graph server into treeherder
Assignee | ||
Comment 15•9 years ago
|
||
addressed issues and merged. 2 of the main pieces are in place. I want to: 1) verify talosdata is what we expect from talos 2) see this show up as valid counters in production treeherder 3) validate (this thursday) the counters and what we care about calculating
Assignee | ||
Comment 16•9 years ago
|
||
I have verified that what we are printing on trunk is indeed what we expect, so #1 is done. #2 and #3 remain. For #3 I have a series of graphs to display the raw data from individual runs (i.e. not graphs over time, but graphs for the raw counter replicates collected during a single run): http://people.mozilla.org/~jmaher/counters/counters.html These graphs take on odd shapes, I would like to start figuring out what is useful to measure so we can track over time and file regressions on it. What we have so far with the average value is working ok. For each counter we should either: * leave it alone * disable it * come up with a better calculation Vladan, in our perftesting meeting today you asked to be included on a bug and have more information about it. Lets use this as a starting point.
Flags: needinfo?(vdjeric)
Comment 17•9 years ago
|
||
Ok, I took a look. The memory and process time counters look meaningful, responsiveness is the only one that looks suspicious to me. Am I misinterpreting it?
Flags: needinfo?(vdjeric)
Assignee | ||
Comment 18•9 years ago
|
||
Vladan, thanks for taking some time to look at this. Yes, responsiveness is the only test which is different, it is outlined in more detail in bug 1170241. As it stands we are calculating the mean and max values of the set of results going into treeherder. This is inline with what we have been doing for graph server. the alerting takes place on the mean value.
Assignee | ||
Comment 19•8 years ago
|
||
counters are in perfherder now.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•