Closed
Bug 1416347
Opened 7 years ago
Closed 7 years ago
in compare view, add the ability to quickly determine the noise difference between both revisions
Categories
(Tree Management :: Perfherder, enhancement)
Tree Management
Perfherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: jmaher)
Details
Attachments
(1 file)
In trying to solve the problem of how stable are our numbers, we often discuss the term 'noise' which ends up being the stddev of a test. As a way to simplify the math here across a larger dataset, we should provide a simple way to aggregate the stddev into a number, called the 'noise factor'. This noise factor would then be used to easily compare a new config, hardware, patch, etc. After chatting with :ekyle, we believe that a somewhat useful, yet very simplistic view would be a sum of variances: noise = 0 foreach signature: noise += stddev%*stddev* noise = sqrt(noise) This gives us a simple number to track. In comparing two builds that have identical source (i.e. should be identical numbers): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=144bb1e5b3a6d5c9eb1a91ad4df15cea288db75e&framework=1 we end up with: n=113 new=33.900 base=34.014 Now comparing data from old hardware to new hardware: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=144bb1e5b3a6d5c9eb1a91ad4df15cea288db75e&framework=1 n=113 base=33.900 new=69.171 Above you can see that we have <1% noise, but with the new hardware vs old hardware it is 104% noiser- that is >2x the noise. for the fun of it, I wanted to compare two runs on the new hardware: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=b501f311a510&framework=1 Total Data Points: 119 Base Noise: 69.17098878404477 New Noise: 68.39117808926056 so in general these retain the same level of noise. Do we track this as a derivative (aggregation really) of our pushes- when we have a complete run we summarize the noise levels and add them to a signature series- then we can track them over time? I would like to know a few things: 1) does the above make sense 2) would this be understandable to a random hacker who is presented with this data 3) will this be able to measure most changes in noise 4) where should this go? in the compare view? tracked over time? both? other?
Assignee | ||
Comment 1•7 years ago
|
||
:ekyle- I appreciate your help so far on IRC helping me get something halfway realistic for a solution- could you at least look over what I have listed above and outline any concerns you might have?
Flags: needinfo?(klahnakoski)
Assignee | ||
Comment 2•7 years ago
|
||
as a note, here are the changes to treeherder to get some basic data: https://github.com/jmaher/treeherder/commit/1b0b603a49cd273998518d4245149f98db7cc518 I believe this could be cleaner, but lets ensure the logic is good first. :rwood, if you could comment on how this general approach sounds and consider some of the questions above.
Flags: needinfo?(rwood)
Comment 3•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #0) > 1) does the above make sense Yes, I think it's a great idea especially when moving to new hardware like we are now for some platforms. > 2) would this be understandable to a random hacker who is presented with > this data I believe so; it should be added to our talos wiki page (and maybe a link to that incorporated in perfherder alongside the displayed values). > 3) will this be able to measure most changes in noise I'm leaving the math to the pros (you and ekyle!) but I would think so. Your comparisons look consistent especially with your two runs on the same hardware. > 4) where should this go? in the compare view? tracked over time? both? > other? I like your idea of having it in compare view. We could have it so that if say the noise > 2% then it could be highlighted in red. I left a couple other comments on the github link from comment 2. Ideally I think this should be tracked over time and alerted on, maybe a special kind of alert that does show up on our perfherder alerts page, because if the noise level were to change significantly, reducing talos' ability to detect regressions, then it should be made known and investigated.
Flags: needinfo?(rwood)
Assignee | ||
Comment 4•7 years ago
|
||
I suspected these would be the answers to my questions- I am thinking we should track this regularly and generate alerts- if we like how it is displayed in my update to the PR, we can discuss how to do this in a more programatic fashion.
Assignee | ||
Comment 5•7 years ago
|
||
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8928197 -
Flags: review?(wlachance)
Attachment #8928197 -
Flags: review?(rwood)
Comment 6•7 years ago
|
||
Comment on attachment 8928197 [details] [review] implement a noise metric inside of perfherder compare Review of attachment 8928197 [details] [review]: ----------------------------------------------------------------- Yep LGTM, just a couple nits ::: ui/js/controllers/perf/compare.js @@ +167,5 @@ > + // TODO: ideally anything >10.0 is bad, but should we ignore anything? > + if (cmap.originalStddevPct < 50.0 && cmap.newStddevPct < 50.0) { > + $scope.oldStddevVariance[platform].values.push(Math.round(cmap.originalStddevPct * 100) / 100); > + $scope.newStddevVariance[platform].values.push(Math.round(cmap.newStddevPct * 100) / 100); > + } else if (cmap.originalStddevPct === undefined || cmap.newStddevPct === undefined) { Should this condition be checked for first, instead of the top condition "if (cmap...." ? @@ +554,5 @@ > + // TODO: ideally anything >10.0 is bad, but should we ignore anything? > + if (cmap.originalStddevPct < 50.0 && cmap.newStddevPct < 50.0) { > + $scope.oldStddevVariance.values.push(Math.round(cmap.originalStddevPct * 100) / 100); > + $scope.newStddevVariance.values.push(Math.round(cmap.newStddevPct * 100) / 100); > + } else if (cmap.originalStddevPct === undefined || cmap.newStddevPct === undefined) { Same comment as above, maybe this condition should be checked for first? ::: ui/js/services/perf/compare.js @@ +181,4 @@ > abs_t_value >= T_VALUE_CARE_MIN)); > cmap.needsMoreRuns = (cmap.isComplete && !cmap.isConfident && > cmap.originalRuns.length < 6); > + cmap.isNoiseMeric = false; typo, should be "isNoiseMetric" ::: ui/partials/perf/comparetable.html @@ +32,5 @@ > + <label> > + <input type="checkbox" > + ng-model="$ctrl.filterOptions.showOnlyNoise" > + ng-change="$ctrl.updateFilteredTestList()" /> > + Show Only Noise Maybe this should be "Only Show Tests with Noise Metric"? Or is that too long. To me it's a bit misleading as it's not just showing the Noise Metric, it's only showing tests that were included in the noise metric.
Attachment #8928197 -
Attachment is patch: true
Attachment #8928197 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8928197 -
Flags: review?(rwood) → review+
Comment 7•7 years ago
|
||
Comment on attachment 8928197 [details] [review] implement a noise metric inside of perfherder compare I'd recommend against using the Bugzilla interface for reviewing the PR contents - if anything but the r+/r- feature is used Bugzilla messes up the attachment metadata, which then breaks the PR link.
Attachment #8928197 -
Attachment is patch: false
Attachment #8928197 -
Attachment mime type: text/plain → text/x-github-pull-request
Comment 8•7 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #7) > Comment on attachment 8928197 [details] [review] > implement a noise metric inside of perfherder compare > > I'd recommend against using the Bugzilla interface for reviewing the PR > contents - if anything but the r+/r- feature is used Bugzilla messes up the > attachment metadata, which then breaks the PR link. Ah, apologies, thanks for pointing that out!
Comment 9•7 years ago
|
||
Comment on attachment 8928197 [details] [review] implement a noise metric inside of perfherder compare See github pr for more comments
Attachment #8928197 -
Flags: review?(wlachance) → review+
Comment 10•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/776d3329e804b531ac39fdc396c073f992706f06 Bug 1416347 - in compare view, add the ability to quickly determine the noise difference between both revisions (#2946)
Assignee | ||
Updated•7 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(klahnakoski)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•