Closed Bug 858735 Opened 11 years ago Closed 11 years ago

analyze.py blames the wrong regression in Talos tests with very low variance

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Attachments

(4 files)

Attached patch 1/4: basic testsSplinter Review
There are some specific bugs in analyze.py that make it return incorrect results when dealing with non-noisy tests, like "Number of Constructors" (see bug 721387 comment 0).  I wrote some patches to improve this.

This first patch just adds some very basic tests so I won't change anything unintentionally.
Attachment #734018 - Flags: review?(catlee)
Just removing some unused code, and simplifying the analyze() function.  This patch is completely optional and should not change any results.  It's mostly just to help me understand the code.
Attachment #734020 - Flags: review?(catlee)
analyze_t should detect a potential regression at timestamp 6, but it doesn't because it totally ignores the first 5 datapoints and so it can't perform t-tests for points 6-10.

We ought to process from the very beginning of the results.  We still can't be perform t-tests for the first j points, but it can add them to the good_data array so that the next j points can be tested correctly.
Attachment #734022 - Flags: review?(catlee)
Here's the main bug:  The actual regression in my test data happens at timestamp 8, but analyze_t return "good" for that timestamp and blames the surrounding datapoints instead!

This is because calc_t bails out to avoid a divide-by-zero in this case.  Instead of returning 0 when bailing out, it should return infinity, because this should give us the maximum possible confidence of a regression.
Attachment #734025 - Flags: review?(catlee)
Attachment #734018 - Flags: review?(catlee) → review+
Attachment #734020 - Flags: review?(catlee) → review+
Attachment #734022 - Flags: review?(catlee) → review+
Comment on attachment 734025 [details] [diff] [review]
4/4: fix t-test when both series have variance == 0

Review of attachment 734025 [details] [diff] [review]:
-----------------------------------------------------------------

this looks fine, but how does the rest of the code handle inf? for example, does it make sense to append di to good_data at the end of analyze_t in the case of a regression?
(In reply to Chris AtLee [:catlee] from comment #4)
> this looks fine, but how does the rest of the code handle inf? for example,
> does it make sense to append di to good_data at the end of analyze_t in the
> case of a regression?

These "t" values are never used in except in comparisons.  (The current code compares the t score to a threshold, and bug 858756 adds code that compares t scores to other t scores.)  So "inf" should be as safe as any value.

One quirk, which I think is harmless: If two infinite values end up next to each other (I don't think this can even happen in practice), the code from bug 858756 will find that neither of them is greater than the other (because "float('inf') > float('inf')" is false), so it will flag both of them as regressions rather than choosing one as the "highest" regression.

It still makes sense to append di to good_data (as the current code does), since di.value should still form part of the new "baseline" average.
Attachment #734025 - Flags: review?(catlee) → review+
Thanks for the reviews!  Since these patches are all in easily-tested code, I'm confident enough to just push them.  In some other bugs where I make changes to analyze_talos.py, I'd like some advice on testing/staging/deployment.

https://hg.mozilla.org/graphs/rev/d4af2bc04ce7
https://hg.mozilla.org/graphs/rev/f8ce18a0db88
https://hg.mozilla.org/graphs/rev/baad446523cf
https://hg.mozilla.org/graphs/rev/6caa2530736d
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This has been deployed now.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.