Closed Bug 858877 Opened 12 years ago Closed 12 years ago

Tune the values of fore_window, back_window, and threshold in analysis.cfg

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Attachments

(3 files, 2 obsolete files)

test script 12 years ago Matt Brubeck (:mbrubeck) 610 bytes, text/plain		Details
patch 12 years ago Matt Brubeck (:mbrubeck) 916 bytes, patch	catlee : review+ mbrubeck : checked-in+	Details \| Diff \| Splinter Review
real-world test data 12 years ago Matt Brubeck (:mbrubeck) 762.67 KB, patch		Details \| Diff \| Splinter Review
real-world test data (v2) 12 years ago Matt Brubeck (:mbrubeck) 762.92 KB, patch	catlee : review+ mbrubeck : checked-in+	Details \| Diff \| Splinter Review
test script (v2) 12 years ago Matt Brubeck (:mbrubeck) 828 bytes, text/x-python		Details

Matt Brubeck (:mbrubeck)

Assignee

Description

•

12 years ago

Attached file test script (obsolete) — Details

One cause of false positives in analyze_talos.py seems to be the specific window sizes that are used. The current defaults are: back_window = 30 fore_window = 5 threshold = 9 Having back_window much larger than fore_window creates a lot of false positives. Conceptually this makes sense, since the window can slide several points past the actual culprit and still appear to have a good fit for a regression. A few "bad" points in the back_window section are easily overwhelmed by the large number of "good" points. In the example datasets I've played with, a window of (30, 5) almost always produces multiple false positives, while a window of (10, 10) or (12, 12) produces a single true positive and no false positives. For reference, here's a script (attached) that I'm using to experiment with different window sizes. Save it alongside analyze.py from http://hg.mozilla.org/graphs/ and use it as follows: wget -O runs.json http://graphs.mozilla.org/api/test/runs?id=232&branchid=63&platformid=14 test.py 30 5 runs.json

Matt Brubeck (:mbrubeck)

Assignee

Comment 1

•

12 years ago

Attached patch patch — Details — Splinter Review

I downloaded a handful of example datasets from recent real-life regressions in various different Talos benchmarks, and analyzed them with a range of settings. With the patches from bug 858735 and bug 858756 applied, I get perfect results on my example datasets (zero false positives or false negatives) using any window sizes between (10, 10) and (17, 17) inclusive. I think that a good default setting is about (12, 12) since a smaller fore_window will let us detect regressions slightly earlier. I'm working on testing this with more sample data, and I'll let you know if I find any more interesting results. I'd also like to add some automated regression tests that use real-world data.

Attachment #734183 - Attachment is obsolete: true

Attachment #734199 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 2

•

12 years ago

Attached patch real-world test data (obsolete) — Details — Splinter Review

Here are some regression tests based on some of the datasets I've been using. All four of these datasets had incorrect changesets blamed without these patches, and now have the correct blame as far as I can tell.

Attachment #734203 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 3

•

12 years ago

Attached patch real-world test data (v2) — Details — Splinter Review

Update the test config and add some named constants for clarity.

Attachment #734203 - Attachment is obsolete: true

Attachment #734203 - Flags: review?(catlee)

Attachment #734697 - Flags: review?(catlee)

Chris AtLee [:catlee]

Comment 4

•

12 years ago

The reason for the larger back_window and smaller fore_window was to be able to have a reasonable representation of the "correct" mean/variance while at the same time being able to detect regressions quickly. Increasing the fore_window in particular will make us slower to detect regressions. Is this ok?

Matt Brubeck (:mbrubeck)

Assignee

Comment 5

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #4) > The reason for the larger back_window and smaller fore_window was to be able > to have a reasonable representation of the "correct" mean/variance while at > the same time being able to detect regressions quickly. Unfortunately, a fore_window of 5 just seems too short to avoid false alarms from random noise, regardless of the back_window size. In my test datasets, any fore_window setting below 10 produced frequent incorrect results. So I think this trade-off is unfortunate but necessary. > Increasing the fore_window in particular will make us slower to detect > regressions. Is this ok? Yes, I think the improved accuracy is well worth the added delay for consumers of these alerts. Also, we are now pushing changes more than twice as fast as when this code was written, so the actual latency might not be any worse than it was then. :)

Chris AtLee [:catlee]

Comment 6

•

12 years ago

Comment on attachment 734199 [details] [diff] [review] patch Review of attachment 734199 [details] [diff] [review]: ----------------------------------------------------------------- alright, let's give it a shot!

Attachment #734199 - Flags: review?(catlee) → review+

Matt Brubeck (:mbrubeck)

Assignee

Comment 7

•

12 years ago

Comment on attachment 734199 [details] [diff] [review] patch http://hg.mozilla.org/graphs/rev/c9dc630d0749 This changes only the example analysis.cfg.template. To deploy this, someone will need to edit the production analysis.cfg file (which is not part of the Mercurial repo).

Attachment #734199 - Flags: checked-in+

Chris AtLee [:catlee]

Comment 8

•

12 years ago

deployed

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #734697 - Flags: review?(catlee) → review+

Matt Brubeck (:mbrubeck)

Assignee

Comment 9

•

12 years ago

Comment on attachment 734697 [details] [diff] [review] real-world test data (v2) http://hg.mozilla.org/graphs/rev/e328fe4e58c4

Attachment #734697 - Flags: checked-in+

Matt Brubeck (:mbrubeck)

Assignee

Updated

•

12 years ago

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Matt Brubeck (:mbrubeck)

Assignee

Comment 10

•

12 years ago

Attached file test script (v2) — Details

This is an updated version of the test script from comment 0.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Treeherder GitHub Bugbot

Comment 11

•

10 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/640cefe17a69af8dcf7970b73d0de7f3d143209b Bug 858877 - Tune fore_window and back_window config in the analysis.cfg template [r=catlee] https://github.com/mozilla/treeherder/commit/ae7d940e69b9deccfa4e1b6f003932d096174eac Bug 858877 (2/2) - Add regression tests using real-world Talos data [r=catlee]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Tune the values of fore_window, back_window, and threshold in analysis.cfg

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Crash Data

Security

(public)

User Story

Attachments

(3 files, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Updated

Comment 10

Updated

Comment 11

Attachment

General

Description

File Name

Content Type