Closed Bug 858756 Opened 11 years ago Closed 11 years ago

When a regression is detected, blame the changeset with the highest t-test score

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Attachments

(2 files, 1 obsolete file)

1/2: add tests 11 years ago Matt Brubeck (:mbrubeck) 9.75 KB, patch	catlee : review+	Details \| Diff \| Splinter Review
2/2: find local maxima in the t-tests 11 years ago Matt Brubeck (:mbrubeck) 12.66 KB, patch		Details \| Diff \| Splinter Review
2/2: find local maxima in the t-tests (v2) 11 years ago Matt Brubeck (:mbrubeck) 12.79 KB, patch	catlee : review+	Details \| Diff \| Splinter Review

Matt Brubeck (:mbrubeck)

Assignee

Description

•

11 years ago

Currently, analyze.py walks through a data series and flags any points where regressions might have occurred according to a t-test.  Then analyze_talos.py looks at each of those points, and sends an alert about the *first* point where each regression might have occurred.  It ignores later possible points for the same regression, even if the later point has a higher t-value:
http://hg.mozilla.org/graphs/file/862ee36fb719/server/analysis/analyze_talos.py#l987

Instead, the alert should either flag *all* of the points where the t-value is over the threshold, or it should choose the point with the *highest* t-value (instead of the first point with a "high enough" t-value).

Matt Brubeck (:mbrubeck)

Assignee

Updated

•

11 years ago

Summary: When a regression is detected, blame find the changeset with the highest t-test score → When a regression is detected, blame the changeset with the highest t-test score

Matt Brubeck (:mbrubeck)

Assignee

Comment 1

•

11 years ago

There's also a similar case where we blame a *later* changeset than we should, because the changeset with the highest score is in the warning_history.json file, but some later changeset has a score over the threshold.  Each time we process new data, we may warn on yet another changeset for the same regression, until we fall back below the threshold.

Matt Brubeck (:mbrubeck)

Assignee

Comment 2

•

11 years ago

Attached patch 1/2: add tests — Details — Splinter Review

This refactors analyze_talos.py to make it possible to add some very basic tests.

Attachment #734197 - Flags: review?(catlee)

Matt Brubeck (:mbrubeck)

Assignee

Comment 3

•

11 years ago

Attached patch 2/2: find local maxima in the t-tests (obsolete) — Details — Splinter Review

Don't blame a changeset if there's better candidate right next to it.  Basically this finds the "high point" within any region where a regression is expected.

Now we no longer need to set "skip = True" for consecutive 'regression' changesets in analyze_talos.py, since the maximum-t-value logic in analyze.py has already reduced each of these ranges to a single changeset.

Attachment #734198 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

11 years ago

Attachment #734197 - Flags: review?(catlee) → review+

Matt Brubeck (:mbrubeck)

Assignee

Comment 4

•

11 years ago

Attached patch 2/2: find local maxima in the t-tests (v2) — Details — Splinter Review

From IRC:

  <catlee> I'm not sure how attachment 734198 [details] [diff] [review] will behave as it processes new data. wouldn't we possibly get multiple regression notices?

  <mbrubeck> We use warning_history.json to prevent re-sending the exact same warning email -- that will still work the same after this patch.  In fact, it will work better, since currently we might flag several changesets in a row for the same regression.  On the first run, we'll send an email for the first changeset.  On the next run, we skip that but send an email for the second changset... and so on.  With the patch, we will choose just one of those changesets to flag, send an email for it, and put it in warning_history.json

  <catlee> right, but won't the worst t-value change as you get more data?

I thought about this some more.  For the case where new data arrives out of order (i.e. retriggers add new datapoints for an old changeset), the new data can definitely change the score of existing changesets.  This is expected and desired, and we don't need to prevent this, since it may help us identify previously-missed regressions.

For the case where new data arrives *in order* (i.e. only for new changesets), it can change a previous score only if it's within the fore_window of the previous changeset.  But we don't perform t-tests for a changeset until it already has a full fore_window datapoints following it, so new data added after those can't change its t-score.

However, with the old version of this patch, new data could *could* change the final state of the very last changeset in the good_data array.  We can't yet compare its t-score to its "next neighbor" because the next neighbor hasn't been tested yet.  Instead, we should ignore it for this run.  When new data is added and the next neighbor is tested, then we'll be able to make a decision on it that won't change.

I modified the patch to set di.state only when di *and both its neighbors* has had its t-test (which means that di and its next neighbor already have at least k datapoints after them).

Attachment #734198 - Attachment is obsolete: true

Attachment #734198 - Flags: review?(catlee)

Attachment #736590 - Flags: review?(catlee)

Chris AtLee [:catlee]

Comment 5

•

11 years ago

Comment on attachment 736590 [details] [diff] [review]
2/2: find local maxima in the t-tests (v2)

Review of attachment 736590 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!

Attachment #736590 - Flags: review?(catlee) → review+

Matt Brubeck (:mbrubeck)

Assignee

Comment 6

•

11 years ago

http://hg.mozilla.org/graphs/rev/1d982ff52990
http://hg.mozilla.org/graphs/rev/75f91c5b3037

This has less automated test coverage than my other patches and I couldn't fully test it locally, so please do what you can to verify this before or after deploying it.  Let me know if I can help!

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Chris AtLee [:catlee]

Comment 7

•

11 years ago

How confident are you in the results? I can deploy now and we can keep an eye on things.

Or, I can set up a parallel instance with the new code and have it mail just you the results.

Matt Brubeck (:mbrubeck)

Assignee

Comment 8

•

11 years ago

I'm confident in the results; I'm more concerned that I broke some of the unrelated code (like the code to send emails or read the config file) without knowing it.  So I think deploying now and keeping an eye on it seems reasonable.

Chris AtLee [:catlee]

Comment 9

•

11 years ago

Ok, I've just deployed the new code.

Matt Brubeck (:mbrubeck)

Assignee

Comment 10

•

11 years ago

It's getting to be suspiciously long since the last regression email (though there's always a chance we just haven't regressed anything since this morning).  Is there any way to check logs to make sure analyze_talos.py is running successfully?  Or can it be run manually on the graph server and its output checked?

Nick Thomas [:nthomas] (UTC+12)

Comment 11

•

11 years ago

I was just looking at the logs when it sent mail
 (Improvement) Mozilla-Inbound - Tp5 Optimized (Main RSS) - Ubuntu HW 12.04 x64 - 2.66%

Can't see any errors there.

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Treeherder GitHub Bugbot

Comment 12

•

9 years ago

Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/b2989cb8e685b543312dbef3ac79ac21a950c92f
Bug 858756 (1/2): Add basic tests for analyze_talos.py [r=catlee]

https://github.com/mozilla/treeherder/commit/da3dc7de296395b566c753baadbe844dda655ad0
Bug 858756 (2/2) - Blame the changeset whose t score is a local maximum [r=catlee]

When several changesets in a row are potential causes of a regression,
blame the one with the highest t-test score rather than the first one.

Jeff Hammel

Updated

•

9 years ago

QA Contact: mshal

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

When a regression is detected, blame the changeset with the highest t-test score

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: mbrubeck, Assigned: mbrubeck)

References

Details

(Whiteboard: [regression-detection])

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Updated

Attachment

General

Description

File Name

Content Type