Closed Bug 858735 Opened 12 years ago Closed 12 years ago

analyze.py blames the wrong regression in Talos tests with very low variance

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Attachments

(4 files)

1/4: basic tests 12 years ago Matt Brubeck (:mbrubeck) 1.75 KB, patch	catlee : review+	Details \| Diff \| Splinter Review
2/4: minor cleanup 12 years ago Matt Brubeck (:mbrubeck) 1.67 KB, patch	catlee : review+	Details \| Diff \| Splinter Review
3/4: don't ignore the first j datapoints 12 years ago Matt Brubeck (:mbrubeck) 1.96 KB, patch	catlee : review+	Details \| Diff \| Splinter Review
4/4: fix t-test when both series have variance == 0 12 years ago Matt Brubeck (:mbrubeck) 2.60 KB, patch	catlee : review+	Details \| Diff \| Splinter Review

Matt Brubeck (:mbrubeck)

Assignee

Description

•

12 years ago

Attached patch 1/4: basic tests — Details — Splinter Review

There are some specific bugs in analyze.py that make it return incorrect results when dealing with non-noisy tests, like "Number of Constructors" (see bug 721387 comment 0). I wrote some patches to improve this. This first patch just adds some very basic tests so I won't change anything unintentionally.

Attachment #734018 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 1

•

12 years ago

Attached patch 2/4: minor cleanup — Details — Splinter Review

Just removing some unused code, and simplifying the analyze() function. This patch is completely optional and should not change any results. It's mostly just to help me understand the code.

Attachment #734020 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 2

•

12 years ago

Attached patch 3/4: don't ignore the first j datapoints — Details — Splinter Review

analyze_t should detect a potential regression at timestamp 6, but it doesn't because it totally ignores the first 5 datapoints and so it can't perform t-tests for points 6-10. We ought to process from the very beginning of the results. We still can't be perform t-tests for the first j points, but it can add them to the good_data array so that the next j points can be tested correctly.

Attachment #734022 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 3

•

12 years ago

Attached patch 4/4: fix t-test when both series have variance == 0 — Details — Splinter Review

Here's the main bug: The actual regression in my test data happens at timestamp 8, but analyze_t return "good" for that timestamp and blames the surrounding datapoints instead! This is because calc_t bails out to avoid a divide-by-zero in this case. Instead of returning 0 when bailing out, it should return infinity, because this should give us the maximum possible confidence of a regression.

Attachment #734025 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #734018 - Flags: review?(catlee) → review+

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #734020 - Flags: review?(catlee) → review+

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #734022 - Flags: review?(catlee) → review+

Chris AtLee [:catlee]

Comment 4

•

12 years ago

Comment on attachment 734025 [details] [diff] [review] 4/4: fix t-test when both series have variance == 0 Review of attachment 734025 [details] [diff] [review]: ----------------------------------------------------------------- this looks fine, but how does the rest of the code handle inf? for example, does it make sense to append di to good_data at the end of analyze_t in the case of a regression?

Matt Brubeck (:mbrubeck)

Assignee

Comment 5

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #4) > this looks fine, but how does the rest of the code handle inf? for example, > does it make sense to append di to good_data at the end of analyze_t in the > case of a regression? These "t" values are never used in except in comparisons. (The current code compares the t score to a threshold, and bug 858756 adds code that compares t scores to other t scores.) So "inf" should be as safe as any value. One quirk, which I think is harmless: If two infinite values end up next to each other (I don't think this can even happen in practice), the code from bug 858756 will find that neither of them is greater than the other (because "float('inf') > float('inf')" is false), so it will flag both of them as regressions rather than choosing one as the "highest" regression. It still makes sense to append di to good_data (as the current code does), since di.value should still form part of the new "baseline" average.

Chris AtLee [:catlee]

Updated

•

12 years ago

Attachment #734025 - Flags: review?(catlee) → review+

Matt Brubeck (:mbrubeck)

Assignee

Comment 6

•

12 years ago

Thanks for the reviews! Since these patches are all in easily-tested code, I'm confident enough to just push them. In some other bugs where I make changes to analyze_talos.py, I'd like some advice on testing/staging/deployment. https://hg.mozilla.org/graphs/rev/d4af2bc04ce7 https://hg.mozilla.org/graphs/rev/f8ce18a0db88 https://hg.mozilla.org/graphs/rev/baad446523cf https://hg.mozilla.org/graphs/rev/6caa2530736d

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Chris AtLee [:catlee]

Comment 7

•

12 years ago

This has been deployed now.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Treeherder GitHub Bugbot

Comment 8

•

10 years ago

Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/8a091b499bb164ae271b054f1bef07ae151eb32e Bug 858735 (1/4) - Add some tests for analyze.py [r=catlee] https://github.com/mozilla/treeherder/commit/71046ff403b43dad9a58f934789e1da38537c51a Bug 858735 (2/4) - Some minor code cleanup in analyze.py [r=catlee] https://github.com/mozilla/treeherder/commit/7993d122dcd62a616dff5e2c4e76bc43dff585ae Bug 858735 (3/4) - Don't skip the first j results in analyze_t [r=catlee] https://github.com/mozilla/treeherder/commit/ceb5cedbe27d19f42cb2749dabbf9e0f8e11ec99 Bug 858735 (4/4) - Correctly process t-tests for series with zero variance [r=catlee]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

analyze.py blames the wrong regression in Talos tests with very low variance

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Updated

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Updated

Comment 8

Attachment

General

Description

File Name

Content Type