Closed
Bug 1186078
Opened 9 years ago
Closed 7 years ago
[Meta] Tracking bug to bring 24 hours backouts a reality
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: vaibhav1994, Unassigned)
References
Details
We currently have some things in place to make 24-hour backouts a reality in perf regressions, but a lot of work is still left. Lets use this bug as a tracker.
:jmaher points out the stages in the life of perf sheriff:
> -1: needs attention
> 0: new
> - possibly do something different if this is a merge (look on other >branches, etc. - for automation we don't need to)
> 1: backfilling: needs backfilling (could be the same as #1)
> - mozci to verify rev +- 2 (rev-2, rev-1, rev, rev+1, rev+2) has data
> - mozci to schedule 6 data points builds/jobs for the rev a+- 2 (might need a repeat if there are no builds)
> - need to do this in 2 parts, 1 ensure we have builds, 2, ensure we have tests
> - move to stage -1 if we cannot fill in the holes 100% (i.e. build bustage, dontbuild, trees closed, etc.)
> 2: has more data for specific test
> - somehow verify we have a non merge revision and that revision 'a' is where we shift (we could script this in perfherder/alertmanager)
> 3: needs all-talos run
> - mozci: given revision A showing a regression, schedule all-talos (6 runs) for all tests/platforms for Rev A and A-1.
> - mozci: might have to wait for builds
> 4: has all-talos data for revision a and a-1
> - sanity check we have the full set of data
> 5: bug filed
> 6: closed (wontfix, backout, fixed)
Reporter | ||
Comment 1•9 years ago
|
||
A rough state machine suggested by :jmaher for alert in alerts: startRev = getPushLog(alert.rev) - 2 endRev = getPushLog(alert.rev) + 2 dataPoints = perfherder.query(alert.branch, alert.platform, alert.test, startRev, endRev) switch alert.stage: case 0: #new if getRevision(alert.rev).merge: case = -1 break if alert.branch.endswith('pgo'): case = -1 break alert.stage = 1 case 1: #backfilling if len(dataPoints) < 5: status = mozci.trigger(alert.buildername, startRev, endRev, times=6) if len(status.builds) > 0: alert.stage = 1 # we are waiting on builds, need to run this again else: alert.stage = 2 break alert.stage = 2 case 2: # enough data after initial backfilling, verify status = mozci.trigger(alert.buildername, startRev, endRev, times=6) if status.builds > 0 or status.pending > 0 or status.running > 0: alert.stage = 1 # waiting on builds/tests break if len(dataPoints) < 5: alert.stage = -1 # all builds are done, missing jobs for revisions break for data in dataPoints: if len(data) < 6: alert.stage = -1 # all builds are done, missing data for jobs break # analyze the data, find specific revision: pl = getPushLog() badRevisions = [] for rev in pl[startRev:endRev]: results = perfherder.compare(pl[rev], pl[rev-1], alert.branch, alert.platform, alert.test) if results.change < -2.0: badRevisions.append(rev) if len(badRevisions) != 1: alert.stage = -1 # too noisy, other issues break if getRevision(badRevisions[0]).merge: case = -1 break if alert.rev != badRevisions[0]: alert.rev = badRevisions[0] # we misreported initially, possibly update other tools/status alert.stage = 3 case 3: mozci.trigger_all_talos(alert.rev, alert.branch, times=6) previous_rev = getPushLog(alert.rev) - 2 mozci.trigger_all_talos(previous_rev, alert.branch, times=6) alert.stage = 4 break case 4: # verify all data exists, i.e. jobs are completed
Reporter | ||
Comment 2•9 years ago
|
||
We had a meeting, and these are some things to take action on: https://etherpad.mozilla.org/perf-backouts
Comment 3•7 years ago
|
||
closing out old bugs that haven't been a priority
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•