in compare view, add the ability to quickly determine the noise difference between both revisions

RESOLVED FIXED

Status

RESOLVED FIXED
a year ago
a year ago

People

(Reporter: jmaher, Assigned: jmaher)

Tracking

Details

Attachments

(1 attachment)

(Assignee)

Description

a year ago
In trying to solve the problem of how stable are our numbers, we often discuss the term 'noise' which ends up being the stddev of a test.

As a way to simplify the math here across a larger dataset, we should provide a simple way to aggregate the stddev into a number, called the 'noise factor'.  This noise factor would then be used to easily compare a new config, hardware, patch, etc.

After chatting with :ekyle, we believe that a somewhat useful, yet very simplistic view would be a sum of variances:
noise = 0
foreach signature:
  noise += stddev%*stddev*
noise = sqrt(noise)

This gives us a simple number to track.  In comparing two builds that have identical source (i.e. should be identical numbers):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=144bb1e5b3a6d5c9eb1a91ad4df15cea288db75e&framework=1

we end up with:
n=113
new=33.900
base=34.014

Now comparing data from old hardware to new hardware:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=144bb1e5b3a6d5c9eb1a91ad4df15cea288db75e&framework=1

n=113
base=33.900
new=69.171


Above you can see that we have <1% noise, but with the new hardware vs old hardware it is 104% noiser- that is >2x the noise.


for the fun of it, I wanted to compare two runs on the new hardware:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b76509fb7649&newProject=try&newRevision=b501f311a510&framework=1


Total Data Points: 119
Base Noise: 69.17098878404477
New Noise: 68.39117808926056

so in general these retain the same level of noise.  Do we track this as a derivative (aggregation really) of our pushes- when we have a complete run we summarize the noise levels and add them to a signature series- then we can track them over time?

I would like to know a few things:
1) does the above make sense
2) would this be understandable to a random hacker who is presented with this data
3) will this be able to measure most changes in noise
4) where should this go?  in the compare view?  tracked over time?  both?  other?
(Assignee)

Comment 1

a year ago
:ekyle- I appreciate your help so far on IRC helping me get something halfway realistic for a solution- could you at least look over what I have listed above and outline any concerns you might have?
Flags: needinfo?(klahnakoski)
(Assignee)

Comment 2

a year ago
as a note, here are the changes to treeherder to get some basic data:
https://github.com/jmaher/treeherder/commit/1b0b603a49cd273998518d4245149f98db7cc518

I believe this could be cleaner, but lets ensure the logic is good first.

:rwood, if you could comment on how this general approach sounds and consider some of the questions above.
Flags: needinfo?(rwood)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #0)
> 1) does the above make sense

Yes, I think it's a great idea especially when moving to new hardware like we are now for some platforms.

> 2) would this be understandable to a random hacker who is presented with
> this data

I believe so; it should be added to our talos wiki page (and maybe a link to that incorporated in perfherder alongside the displayed values).

> 3) will this be able to measure most changes in noise

I'm leaving the math to the pros (you and ekyle!) but I would think so. Your comparisons look consistent especially with your two runs on the same hardware.

> 4) where should this go?  in the compare view?  tracked over time?  both? 
> other?

I like your idea of having it in compare view. We could have it so that if say the noise > 2% then it could be highlighted in red. I left a couple other comments on the github link from comment 2.

Ideally I think this should be tracked over time and alerted on, maybe a special kind of alert that does show up on our perfherder alerts page, because if the noise level were to change significantly, reducing talos' ability to detect regressions, then it should be made known and investigated.
Flags: needinfo?(rwood)
(Assignee)

Comment 4

a year ago
I suspected these would be the answers to my questions- I am thinking we should track this regularly and generate alerts- if we like how it is displayed in my update to the PR, we can discuss how to do this in a more programatic fashion.
(Assignee)

Comment 5

a year ago
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8928197 - Flags: review?(wlachance)
Attachment #8928197 - Flags: review?(rwood)
Comment on attachment 8928197 [details] [review]
implement a noise metric inside of perfherder compare

Review of attachment 8928197 [details] [review]:
-----------------------------------------------------------------

Yep LGTM, just a couple nits

::: ui/js/controllers/perf/compare.js
@@ +167,5 @@
> +                    // TODO: ideally anything >10.0 is bad, but should we ignore anything?
> +                    if (cmap.originalStddevPct < 50.0 && cmap.newStddevPct < 50.0) {
> +                        $scope.oldStddevVariance[platform].values.push(Math.round(cmap.originalStddevPct * 100) / 100);
> +                        $scope.newStddevVariance[platform].values.push(Math.round(cmap.newStddevPct * 100) / 100);
> +                    } else if (cmap.originalStddevPct === undefined || cmap.newStddevPct === undefined) {

Should this condition be checked for first, instead of the top condition "if (cmap...." ?

@@ +554,5 @@
> +                // TODO: ideally anything >10.0 is bad, but should we ignore anything?
> +                if (cmap.originalStddevPct < 50.0 && cmap.newStddevPct < 50.0) {
> +                    $scope.oldStddevVariance.values.push(Math.round(cmap.originalStddevPct * 100) / 100);
> +                    $scope.newStddevVariance.values.push(Math.round(cmap.newStddevPct * 100) / 100);
> +                } else if (cmap.originalStddevPct === undefined || cmap.newStddevPct === undefined) {

Same comment as above, maybe this condition should be checked for first?

::: ui/js/services/perf/compare.js
@@ +181,4 @@
>                                       abs_t_value >= T_VALUE_CARE_MIN));
>                  cmap.needsMoreRuns = (cmap.isComplete && !cmap.isConfident &&
>                                        cmap.originalRuns.length < 6);
> +                cmap.isNoiseMeric = false;

typo, should be "isNoiseMetric"

::: ui/partials/perf/comparetable.html
@@ +32,5 @@
> +        <label>
> +            <input type="checkbox"
> +                   ng-model="$ctrl.filterOptions.showOnlyNoise"
> +                   ng-change="$ctrl.updateFilteredTestList()" />
> +            Show Only Noise

Maybe this should be "Only Show Tests with Noise Metric"? Or is that too long. To me it's a bit misleading as it's not just showing the Noise Metric, it's only showing tests that were included in the noise metric.
Attachment #8928197 - Attachment is patch: true
Attachment #8928197 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8928197 - Flags: review?(rwood) → review+
Comment on attachment 8928197 [details] [review]
implement a noise metric inside of perfherder compare

I'd recommend against using the Bugzilla interface for reviewing the PR contents - if anything but the r+/r- feature is used Bugzilla messes up the attachment metadata, which then breaks the PR link.
Attachment #8928197 - Attachment is patch: false
Attachment #8928197 - Attachment mime type: text/plain → text/x-github-pull-request
(In reply to Ed Morley [:emorley] from comment #7)
> Comment on attachment 8928197 [details] [review]
> implement a noise metric inside of perfherder compare
> 
> I'd recommend against using the Bugzilla interface for reviewing the PR
> contents - if anything but the r+/r- feature is used Bugzilla messes up the
> attachment metadata, which then breaks the PR link.

Ah, apologies, thanks for pointing that out!
Comment on attachment 8928197 [details] [review]
implement a noise metric inside of perfherder compare

See github pr for more comments
Attachment #8928197 - Flags: review?(wlachance) → review+
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/776d3329e804b531ac39fdc396c073f992706f06
Bug 1416347 - in compare view, add the ability to quickly determine the noise difference between both revisions (#2946)
(Assignee)

Updated

a year ago
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Flags: needinfo?(klahnakoski)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.