Closed
Bug 858877
Opened 12 years ago
Closed 12 years ago
Tune the values of fore_window, back_window, and threshold in analysis.cfg
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mbrubeck, Assigned: mbrubeck)
References
Details
(Whiteboard: [regression-detection])
Attachments
(3 files, 2 obsolete files)
916 bytes,
patch
|
catlee
:
review+
mbrubeck
:
checked-in+
|
Details | Diff | Splinter Review |
762.92 KB,
patch
|
catlee
:
review+
mbrubeck
:
checked-in+
|
Details | Diff | Splinter Review |
828 bytes,
text/x-python
|
Details |
One cause of false positives in analyze_talos.py seems to be the specific window sizes that are used. The current defaults are:
back_window = 30
fore_window = 5
threshold = 9
Having back_window much larger than fore_window creates a lot of false positives. Conceptually this makes sense, since the window can slide several points past the actual culprit and still appear to have a good fit for a regression. A few "bad" points in the back_window section are easily overwhelmed by the large number of "good" points.
In the example datasets I've played with, a window of (30, 5) almost always produces multiple false positives, while a window of (10, 10) or (12, 12) produces a single true positive and no false positives.
For reference, here's a script (attached) that I'm using to experiment with different window sizes. Save it alongside analyze.py from http://hg.mozilla.org/graphs/ and use it as follows:
wget -O runs.json http://graphs.mozilla.org/api/test/runs?id=232&branchid=63&platformid=14
test.py 30 5 runs.json
Assignee | ||
Comment 1•12 years ago
|
||
I downloaded a handful of example datasets from recent real-life regressions in various different Talos benchmarks, and analyzed them with a range of settings.
With the patches from bug 858735 and bug 858756 applied, I get perfect results on my example datasets (zero false positives or false negatives) using any window sizes between (10, 10) and (17, 17) inclusive. I think that a good default setting is about (12, 12) since a smaller fore_window will let us detect regressions slightly earlier.
I'm working on testing this with more sample data, and I'll let you know if I find any more interesting results. I'd also like to add some automated regression tests that use real-world data.
Attachment #734183 -
Attachment is obsolete: true
Attachment #734199 -
Flags: review?(catlee)
Assignee | ||
Comment 2•12 years ago
|
||
Here are some regression tests based on some of the datasets I've been using. All four of these datasets had incorrect changesets blamed without these patches, and now have the correct blame as far as I can tell.
Attachment #734203 -
Flags: review?(catlee)
Assignee | ||
Comment 3•12 years ago
|
||
Update the test config and add some named constants for clarity.
Attachment #734203 -
Attachment is obsolete: true
Attachment #734203 -
Flags: review?(catlee)
Attachment #734697 -
Flags: review?(catlee)
Comment 4•12 years ago
|
||
The reason for the larger back_window and smaller fore_window was to be able to have a reasonable representation of the "correct" mean/variance while at the same time being able to detect regressions quickly. Increasing the fore_window in particular will make us slower to detect regressions. Is this ok?
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #4)
> The reason for the larger back_window and smaller fore_window was to be able
> to have a reasonable representation of the "correct" mean/variance while at
> the same time being able to detect regressions quickly.
Unfortunately, a fore_window of 5 just seems too short to avoid false alarms from random noise, regardless of the back_window size. In my test datasets, any fore_window setting below 10 produced frequent incorrect results. So I think this trade-off is unfortunate but necessary.
> Increasing the fore_window in particular will make us slower to detect
> regressions. Is this ok?
Yes, I think the improved accuracy is well worth the added delay for consumers of these alerts. Also, we are now pushing changes more than twice as fast as when this code was written, so the actual latency might not be any worse than it was then. :)
Comment 6•12 years ago
|
||
Comment on attachment 734199 [details] [diff] [review]
patch
Review of attachment 734199 [details] [diff] [review]:
-----------------------------------------------------------------
alright, let's give it a shot!
Attachment #734199 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 7•12 years ago
|
||
Comment on attachment 734199 [details] [diff] [review]
patch
http://hg.mozilla.org/graphs/rev/c9dc630d0749
This changes only the example analysis.cfg.template. To deploy this, someone will need to edit the production analysis.cfg file (which is not part of the Mercurial repo).
Attachment #734199 -
Flags: checked-in+
Comment 8•12 years ago
|
||
deployed
Updated•12 years ago
|
Attachment #734697 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 9•12 years ago
|
||
Comment on attachment 734697 [details] [diff] [review]
real-world test data (v2)
http://hg.mozilla.org/graphs/rev/e328fe4e58c4
Attachment #734697 -
Flags: checked-in+
Assignee | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 10•12 years ago
|
||
This is an updated version of the test script from comment 0.
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 11•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/640cefe17a69af8dcf7970b73d0de7f3d143209b
Bug 858877 - Tune fore_window and back_window config in the analysis.cfg template [r=catlee]
https://github.com/mozilla/treeherder/commit/ae7d940e69b9deccfa4e1b6f003932d096174eac
Bug 858877 (2/2) - Add regression tests using real-world Talos data [r=catlee]
You need to log in
before you can comment on or make changes to this bug.
Description
•