If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Tune the values of fore_window, back_window, and threshold in analysis.cfg

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
5 years ago
2 years ago

People

(Reporter: mbrubeck, Assigned: mbrubeck)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [regression-detection])

Attachments

(3 attachments, 2 obsolete attachments)

(Assignee)

Description

5 years ago
Created attachment 734183 [details]
test script

One cause of false positives in analyze_talos.py seems to be the specific window sizes that are used.  The current defaults are:

back_window = 30
fore_window = 5
threshold = 9

Having back_window much larger than fore_window creates a lot of false positives.  Conceptually this makes sense, since the window can slide several points past the actual culprit and still appear to have a good fit for a regression.  A few "bad" points in the back_window section are easily overwhelmed by the large number of "good" points.

In the example datasets I've played with, a window of (30, 5) almost always produces multiple false positives, while a window of (10, 10) or (12, 12) produces a single true positive and no false positives.

For reference, here's a script (attached) that I'm using to experiment with different window sizes.  Save it alongside analyze.py from http://hg.mozilla.org/graphs/ and use it as follows:

wget -O runs.json http://graphs.mozilla.org/api/test/runs?id=232&branchid=63&platformid=14

test.py 30 5 runs.json
(Assignee)

Comment 1

5 years ago
Created attachment 734199 [details] [diff] [review]
patch

I downloaded a handful of example datasets from recent real-life regressions in various different Talos benchmarks, and analyzed them with a range of settings.

With the patches from bug 858735 and bug 858756 applied, I get perfect results on my example datasets (zero false positives or false negatives) using any window sizes between (10, 10) and (17, 17) inclusive.  I think that a good default setting is about (12, 12) since a smaller fore_window will let us detect regressions slightly earlier.

I'm working on testing this with more sample data, and I'll let you know if I find any more interesting results.  I'd also like to add some automated regression tests that use real-world data.
Attachment #734183 - Attachment is obsolete: true
Attachment #734199 - Flags: review?(catlee)
(Assignee)

Comment 2

5 years ago
Created attachment 734203 [details] [diff] [review]
real-world test data

Here are some regression tests based on some of the datasets I've been using.  All four of these datasets had incorrect changesets blamed without these patches, and now have the correct blame as far as I can tell.
Attachment #734203 - Flags: review?(catlee)
(Assignee)

Comment 3

5 years ago
Created attachment 734697 [details] [diff] [review]
real-world test data (v2)

Update the test config and add some named constants for clarity.
Attachment #734203 - Attachment is obsolete: true
Attachment #734203 - Flags: review?(catlee)
Attachment #734697 - Flags: review?(catlee)
The reason for the larger back_window and smaller fore_window was to be able to have a reasonable representation of the "correct" mean/variance while at the same time being able to detect regressions quickly. Increasing the fore_window in particular will make us slower to detect regressions. Is this ok?
(Assignee)

Comment 5

5 years ago
(In reply to Chris AtLee [:catlee] from comment #4)
> The reason for the larger back_window and smaller fore_window was to be able
> to have a reasonable representation of the "correct" mean/variance while at
> the same time being able to detect regressions quickly.

Unfortunately, a fore_window of 5 just seems too short to avoid false alarms from random noise, regardless of the back_window size.  In my test datasets, any fore_window setting below 10 produced frequent incorrect results.  So I think this trade-off is unfortunate but necessary.

> Increasing the fore_window in particular will make us slower to detect
> regressions. Is this ok?

Yes, I think the improved accuracy is well worth the added delay for consumers of these alerts.  Also, we are now pushing changes more than twice as fast as when this code was written, so the actual latency might not be any worse than it was then.  :)
Comment on attachment 734199 [details] [diff] [review]
patch

Review of attachment 734199 [details] [diff] [review]:
-----------------------------------------------------------------

alright, let's give it a shot!
Attachment #734199 - Flags: review?(catlee) → review+
(Assignee)

Comment 7

5 years ago
Comment on attachment 734199 [details] [diff] [review]
patch

http://hg.mozilla.org/graphs/rev/c9dc630d0749

This changes only the example analysis.cfg.template.  To deploy this, someone will need to edit the production analysis.cfg file (which is not part of the Mercurial repo).
Attachment #734199 - Flags: checked-in+
deployed

Updated

5 years ago
Attachment #734697 - Flags: review?(catlee) → review+
(Assignee)

Comment 9

5 years ago
Comment on attachment 734697 [details] [diff] [review]
real-world test data (v2)

http://hg.mozilla.org/graphs/rev/e328fe4e58c4
Attachment #734697 - Flags: checked-in+
(Assignee)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 10

5 years ago
Created attachment 738713 [details]
test script (v2)

This is an updated version of the test script from comment 0.
Product: mozilla.org → Release Engineering

Comment 11

2 years ago
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/640cefe17a69af8dcf7970b73d0de7f3d143209b
Bug 858877 - Tune fore_window and back_window config in the analysis.cfg template [r=catlee]

https://github.com/mozilla/treeherder/commit/ae7d940e69b9deccfa4e1b6f003932d096174eac
Bug 858877 (2/2) - Add regression tests using real-world Talos data [r=catlee]
You need to log in before you can comment on or make changes to this bug.