Created attachment 734183 [details] test script One cause of false positives in analyze_talos.py seems to be the specific window sizes that are used. The current defaults are: back_window = 30 fore_window = 5 threshold = 9 Having back_window much larger than fore_window creates a lot of false positives. Conceptually this makes sense, since the window can slide several points past the actual culprit and still appear to have a good fit for a regression. A few "bad" points in the back_window section are easily overwhelmed by the large number of "good" points. In the example datasets I've played with, a window of (30, 5) almost always produces multiple false positives, while a window of (10, 10) or (12, 12) produces a single true positive and no false positives. For reference, here's a script (attached) that I'm using to experiment with different window sizes. Save it alongside analyze.py from http://hg.mozilla.org/graphs/ and use it as follows: wget -O runs.json http://graphs.mozilla.org/api/test/runs?id=232&branchid=63&platformid=14 test.py 30 5 runs.json
Created attachment 734199 [details] [diff] [review] patch I downloaded a handful of example datasets from recent real-life regressions in various different Talos benchmarks, and analyzed them with a range of settings. With the patches from bug 858735 and bug 858756 applied, I get perfect results on my example datasets (zero false positives or false negatives) using any window sizes between (10, 10) and (17, 17) inclusive. I think that a good default setting is about (12, 12) since a smaller fore_window will let us detect regressions slightly earlier. I'm working on testing this with more sample data, and I'll let you know if I find any more interesting results. I'd also like to add some automated regression tests that use real-world data.
Created attachment 734203 [details] [diff] [review] real-world test data Here are some regression tests based on some of the datasets I've been using. All four of these datasets had incorrect changesets blamed without these patches, and now have the correct blame as far as I can tell.
Created attachment 734697 [details] [diff] [review] real-world test data (v2) Update the test config and add some named constants for clarity.
The reason for the larger back_window and smaller fore_window was to be able to have a reasonable representation of the "correct" mean/variance while at the same time being able to detect regressions quickly. Increasing the fore_window in particular will make us slower to detect regressions. Is this ok?
(In reply to Chris AtLee [:catlee] from comment #4) > The reason for the larger back_window and smaller fore_window was to be able > to have a reasonable representation of the "correct" mean/variance while at > the same time being able to detect regressions quickly. Unfortunately, a fore_window of 5 just seems too short to avoid false alarms from random noise, regardless of the back_window size. In my test datasets, any fore_window setting below 10 produced frequent incorrect results. So I think this trade-off is unfortunate but necessary. > Increasing the fore_window in particular will make us slower to detect > regressions. Is this ok? Yes, I think the improved accuracy is well worth the added delay for consumers of these alerts. Also, we are now pushing changes more than twice as fast as when this code was written, so the actual latency might not be any worse than it was then. :)
Comment on attachment 734199 [details] [diff] [review] patch Review of attachment 734199 [details] [diff] [review]: ----------------------------------------------------------------- alright, let's give it a shot!
Comment on attachment 734199 [details] [diff] [review] patch http://hg.mozilla.org/graphs/rev/c9dc630d0749 This changes only the example analysis.cfg.template. To deploy this, someone will need to edit the production analysis.cfg file (which is not part of the Mercurial repo).
Comment on attachment 734697 [details] [diff] [review] real-world test data (v2) http://hg.mozilla.org/graphs/rev/e328fe4e58c4
Created attachment 738713 [details] test script (v2) This is an updated version of the test script from comment 0.
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/640cefe17a69af8dcf7970b73d0de7f3d143209b Bug 858877 - Tune fore_window and back_window config in the analysis.cfg template [r=catlee] https://github.com/mozilla/treeherder/commit/ae7d940e69b9deccfa4e1b6f003932d096174eac Bug 858877 (2/2) - Add regression tests using real-world Talos data [r=catlee]