Closed Bug 1156907 Opened 11 years ago Closed 10 years ago

tracking bug to migrate counters from graph server into treeherder

Tracking

(firefox40 affected)

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox40

---

affected

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(2 files, 2 obsolete files)

initial stab at fixing talosdata json to have talos_counters (0.9) 11 years ago Joel Maher ( :jmaher ) (UTC -8) 12.35 KB, patch	wlach : feedback-	Details \| Diff \| Splinter Review
remove datazilla object and add a perfherder object (1.0) 11 years ago Joel Maher ( :jmaher ) (UTC -8) 20.26 KB, patch	wlach : review-	Details \| Diff \| Splinter Review
remove datazilla object and add a perfherder object (1.1) 11 years ago Joel Maher ( :jmaher ) (UTC -8) 23.51 KB, patch	wlach : review+	Details \| Diff \| Splinter Review
https://github.com/mozilla/treeherder/pull/558 11 years ago Joel Maher ( :jmaher ) (UTC -8) 46 bytes, text/x-github-pull-request	wlach : review+	Details \| Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Description

•

11 years ago

oh we have counters, they all are uploaded quite nicely to graph server- but we don't have a way to currently display them in treeherder. Graph server has a huge list of data point for each counter, those get summarized into a single data point and put on a graph. For treeherder, we have the huge list of data points in the talos json blob, but these are stored in the aux_data- which is not ingested. Here is a list of counters we collect: tp5o: linux*: main_rss, private bytes, xres osx10: main_rss, responsiveness win: main_rss, private bytes, responsiveness, %cpu win7: main_rss, private bytes, responsiveness, %cpu, modlistbytes win8: responsiveness xperf (win7 only): tp5n_main_normal_fileio_paint, tp5n_main_normal_netio_paint tp5n_main_startup_fileio_paint, tp5n_main_startup_netio_paint tp5n_nonmain_normal_fileio_paint, tp5n_nonmain_normal_netio_paint tp5n_main_startup_fileio_paint A next step is to look at the raw data produced by these and see if we care about the replicates or if we can just summarize the data in a few points and upload that instead of all the replicates. One quirk will be that treeherder will magically be adding these as subtests to the summary, which we want to avoid- it will take some thought. Once we understand the data we can start adjusting it and fixing talos, graphserver, perfherder. We should also ensure this data is useful and assess how stable,reliable, and meaningful it is.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 1

•

11 years ago

:aklotz, we have a list of counters above for xperf. These are tracked and rarely alerted upon for a regression. We find many issues with the mainthreadIO stuff- rad! What are your thoughts on these? Should we keep them, toss them, modify them?

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Comment 2

•

11 years ago

I *think* I know what you're asking: I vote for keeping them for now. Obviously for catching regressions we prefer that we just automatically go orange. OTOH I think that keeping that data can be useful for ensuring that the test itself is doing what we want it to do.

Flags: needinfo?(aklotz)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 3

•

11 years ago

Attached patch initial stab at fixing talosdata json to have talos_counters (0.9) (obsolete) — Details — Splinter Review

Hey Will. I am looking to make this a reality and I have updated the output from talos to be better informed. 1) tp5 (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmaher@mozilla.com-beb62e8bce0d/try-win32/try_win7-ix_test-tp5o-bm111-tests1-windows-build232.txt.gz): TALOSDATA: [{"talos_counters": {"responsiveness": {"max": "2565.56", "mean": "82.14"}, "Modified Page List Bytes": {"max": "108756992.00", "mean": "69332015.87"}, "% Processor Time": {"max": "100.00", "mean": "75.84"}, "Private Bytes": {"max": "222863360.00", "mean": "192576694.36"}, "Main_RSS": {"max": "239759360.00", "mean": "187039885.24"}}, "test_machine": {"platform": "x86", "osversion": "6.1.7601", "os": "win", "name": "T-W732-IX-061"}, "testrun": {"date": 1431722432, "suite": "tp5o", "options": {"responsiveness": true, "tpmozafterpaint": true, "tpchrome": true, "tppagecycles": 25, "tpcycles": 1, "tprender": false, "shutdown": false, "cycles": 1, "rss": true}}, "results": {"tudou.com": [491.0, 253.0, 246.0, 242.0, 251.0, 245.0, 245.0, 244.0, 244.0, 251.0, 245.0, 245.0, 253.0, 245.0, 251.0, 246.0, ...}}] 2) xperf (http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmaher@mozilla.com-beb62e8bce0d/try-win32/try_win7-ix_test-xperf-bm111-tests1-windows-build181.txt.gz): 37929860.0}, "main_normal_netio": {"mean": 7136403.0}, "main_startup_netio": {"mean": 13.0}, "nonmain_startup_fileio": {"mean": 763172.0}, "nonmain_normal_fileio": {"mean": 528476482.0}, "main_normal_fileio": {"mean": 40102144.0}}, "test_machine": {"platform": "x86", "osversion": "6.1.7601", "os": "win", "name": "T-W732-IX-089"}, "testrun": {"date": 1431967004, "suite": "tp5n", "options": {"responsiveness": false, "tpmozafterpaint": true, "tpchrome": true, "tppagecycles": 1, "tpcycles": 1, "tprender": false, "shutdown": false, "cycles": 1, "rss": true}}, "results": {}, "test_build": {"version": "40.0a1", "revision": "beb62e8bce0d", "id": "20150506182320", "branch": "Try-Non-PGO", "name": "Firefox"}}] in addition I have added a printout of the counters raw values into the raw log in case there is a need to look for detailed information. Please let me know what you think.

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #8607146 - Flags: feedback?(wlachance)

William Lachance (:wlach)

Comment 4

•

11 years ago

Comment on attachment 8607146 [details] [diff] [review] initial stab at fixing talosdata json to have talos_counters (0.9) Review of attachment 8607146 [details] [diff] [review]: ----------------------------------------------------------------- As discussed on irc, I'd prefer an approach which just added a custom class which just wrote the TALOSDATA structure to the logs, since we're going to stop submitting to DZ so soon.

Attachment #8607146 - Flags: feedback?(wlachance) → feedback-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

11 years ago

Attached patch remove datazilla object and add a perfherder object (1.0) (obsolete) — Details — Splinter Review

* add counters to talosdata json blob in the log file * print out counter raw data in the log * removal of talos_aux and xperf_results from talosdata * remove of datazilla from the output NOTE: we still have oddities with datazilla ism's. Especially as we still accept a datazilla-url. Once this is deployed on all branches, we can tackle removing the mozharness bits to add datazilla-url and then we can cleanup perfconfig, etc.

Attachment #8607146 - Attachment is obsolete: true

Attachment #8608856 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 6

•

11 years ago

Comment on attachment 8608856 [details] [diff] [review] remove datazilla object and add a perfherder object (1.0) Review of attachment 8608856 [details] [diff] [review]: ----------------------------------------------------------------- This looks great! Would like to see one more iteration of this patch before landing, but it's quite close. Do we still need to include the datazilla client libraries in talos? If not then I'd just as soon remove the dependencies now: http://hg.mozilla.org/build/talos/file/3618f52d7d30/requirements.txt#l8 http://hg.mozilla.org/build/talos/file/3618f52d7d30/create_talos_zip.py#l37 Might be worth grepping for other instances of datazilla in the source, if it's not too much trouble. I don't want to hold up this bug on account of that, but if it only takes 5-10 minutes... ::: talos/output.py @@ +209,5 @@ > # (jhammel: we probably should do this in e.g. results.py vs in graphserver-specific code anyway) > > # exclude counters whose values are tuples (bad for graphserver) > if len(values) > 0 and isinstance(values[0], list): > + print "---------------------------------" Are these seperators really necessary? I don't care that much either way, but if they're not hugely helpful I'd prefer to leave them out. @@ +386,5 @@ > linkName = '%s: %s' % (testname, result) > print 'RETURN: %s' % link_format % (url, linkName) > > > +class PerfHerderOutput(Output): PerfHerder -> Perfherder :) @@ +393,3 @@ > > def output(self, results, results_url, tbpl_output): > """output to the results_url This comment now seems to be a lie. It looks like we output the result as a log message, and optionally also to a file. @@ +400,5 @@ > # parse the results url > results_url_split = utils.urlsplit(results_url) > results_scheme, results_server, results_path, _, _ = results_url_split > > + utils.info("TALOSDATA: %s" % json.dumps(results)) We should probably put a comment in here that this is what treeherder/perfherder actually parses. @@ +408,1 @@ > f.close() Not that important, but while we're here, let's turn this into a oneliner: json.dump(results, file(results_path, 'w'), indent=2, sort_keys=True) @@ +409,4 @@ > > def post(self, results, server, path, scheme, tbpl_output): > + """conform to current code- not needed for perfherder""" > + pass Is there any way we can remove the call to this and take it out? @@ +461,5 @@ > + # build information > + browser_config = self.results.browser_config > + > + # a place to put results > + retVal = [] can you name this something more descriptive like test_results ? Then you wouldn't need the comment. :) @@ +464,5 @@ > + # a place to put results > + retVal = [] > + > + for test in self.results.results: > + testVal = {'test_machine': {}, likewise, let's call this test_result ::: talos/ttest.py @@ +638,5 @@ > > # include global (cross-cycle) counters > test_results.all_counter_results.extend([{key: value} for key, value in global_counters.items()]) > + print "Outputting all counters" > + print "-----------------------------------" Again, I'd prefer to leave out the seperators (and the "Outputting all ..." message) above if it's not genuinely helpful.

Attachment #8608856 - Flags: review?(wlachance) → review-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 7

•

11 years ago

Attached patch remove datazilla object and add a perfherder object (1.1) — Details — Splinter Review

feedback addressed! there is still a few instance of 'datazilla' in the code, specifically in perfconfigurator and things that use datazilla_urls. I am going for majority wins here knowing full well that there is other work involved in removing these params.

Attachment #8608856 - Attachment is obsolete: true

Attachment #8608929 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 8

•

11 years ago

Comment on attachment 8608929 [details] [diff] [review] remove datazilla object and add a perfherder object (1.1) Review of attachment 8608929 [details] [diff] [review]: ----------------------------------------------------------------- Ok, looks good! Haven't tested it of course. r+ assuming it still works as expected and with my last nit addressed. ::: talos/output.py @@ +455,5 @@ > + > + # build information > + browser_config = self.results.browser_config > + > + testResults = [] Nit: you should use test_results, test_result to be consistent with the rest of the surrounding code here

Attachment #8608929 - Flags: review?(wlachance) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

11 years ago

https://hg.mozilla.org/build/talos/rev/a6052c33d420

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

11 years ago

Blocks: 1167287

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 10

•

11 years ago

Attached file https://github.com/mozilla/treeherder/pull/558 — Details

now for the perfherder side of the equation!

Attachment #8609395 - Flags: review?(wlachance)

William Lachance (:wlach)

Updated

•

11 years ago

Comment 11

•

11 years ago

Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 Still needs a bit of work, but on right track

Attachment #8609395 - Flags: review?(wlachance) → review-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 12

•

11 years ago

Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 updated the PR

Attachment #8609395 - Flags: review- → review?(wlachance)

William Lachance (:wlach)

Comment 13

•

11 years ago

Comment on attachment 8609395 [details] [review] https://github.com/mozilla/treeherder/pull/558 Looks good except for a few remaining nits + some minor problems (e.g. unrelated non-trivial changes which should be applied in another a patch). r+ with those addressed.

Attachment #8609395 - Flags: review?(wlachance) → review+

Treeherder GitHub Bugbot

Comment 14

•

11 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/f6a5073e1a10ccf5b80cc31e6150893083240647 Bug 1156907 - migrate counters from graph server into treeherder https://github.com/mozilla/treeherder/commit/e2fdb69da9fbb77b21ba9fe0dbd588b0a0ea7b47 Merge pull request #558 from jmaher/perfcounters Bug 1156907 - migrate counters from graph server into treeherder

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 15

•

11 years ago

addressed issues and merged. 2 of the main pieces are in place. I want to: 1) verify talosdata is what we expect from talos 2) see this show up as valid counters in production treeherder 3) validate (this thursday) the counters and what we care about calculating

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 16

•

11 years ago

I have verified that what we are printing on trunk is indeed what we expect, so #1 is done. #2 and #3 remain. For #3 I have a series of graphs to display the raw data from individual runs (i.e. not graphs over time, but graphs for the raw counter replicates collected during a single run): http://people.mozilla.org/~jmaher/counters/counters.html These graphs take on odd shapes, I would like to start figuring out what is useful to measure so we can track over time and file regressions on it. What we have so far with the average value is working ok. For each counter we should either: * leave it alone * disable it * come up with a better calculation Vladan, in our perftesting meeting today you asked to be included on a bug and have more information about it. Lets use this as a starting point.

Flags: needinfo?(vdjeric)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

11 years ago

Depends on: 1170241

Vladan Djeric (:vladan)

Comment 17

•

11 years ago

Ok, I took a look. The memory and process time counters look meaningful, responsiveness is the only one that looks suspicious to me. Am I misinterpreting it?

Flags: needinfo?(vdjeric)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 18

•

11 years ago

Vladan, thanks for taking some time to look at this. Yes, responsiveness is the only test which is different, it is outlined in more detail in bug 1170241. As it stands we are calculating the mean and max values of the set of results going into treeherder. This is inline with what we have been doing for graph server. the alerting takes place on the mean value.

William Lachance (:wlach)

Updated

•

11 years ago

Depends on: 1174724

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 19

•

10 years ago

counters are in perfherder now.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.