Closed Bug 1109243 Opened 10 years ago Closed 9 years ago

Talos could benefit from the ability of turning a job orange on try server

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Attachments

(1 file, 3 obsolete files)

turn talos orange when numbers are out of the ordinary (0.91) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 11.64 KB, patch		Details \| Diff \| Splinter Review
turn talos orange when numbers are out of the ordinary (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 24.27 KB, patch	avih : feedback-	Details \| Diff \| Splinter Review
turn a job orange on try if the data values are out of the norm (1.0) 10 years ago Joel Maher ( :jmaher ) (UTC -8) 11.53 KB, patch	wlach : review-	Details \| Diff \| Splinter Review
turn a job orange on try if the data values are out of the norm (2.0) 9 years ago Joel Maher ( :jmaher ) (UTC -8) 12.86 KB, patch	wlach : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Description

•

10 years ago

when there is a new feature in development or a fix for a known performance regression, it is likely that talos is run on try server.  Wouldn't it be great if we could see orange on try server if we suspect a regression?

This is doable, and we can be smart about it to increase the certainty that this is a real regression.  Of course there are many unique situations to think about, but it is something we can move forward on and maybe tighten the loop.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 1

•

10 years ago

Attached patch turn talos orange when numbers are out of the ordinary (0.91) (obsolete) — Details — Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 2

•

10 years ago

a few things to sort out:
* how do we deal with improvements?
* so we get all the tests in a suite ran?  right now we don't
* can we generate a return value to buildbot to retry the job?
* we need to make this try only - possibly a flag via trychooser?
* I require 3+ data points and remove an outlier- is that valid?


This would be great to make a requirement for folks looking to land on beta/aurora.  The danger there- do they really push to try for those branches and we only run pgo, that could be frustrating.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 3

•

10 years ago

Attached patch turn talos orange when numbers are out of the ordinary (1.0) (obsolete) — Details — Splinter Review

uploading my latest patch, this has been working great on try server.  Here are some features (example: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c794046975a4):
* if geometric mean of values for test is outside the 7 days range auto retry
* if we are re triggering a specific job, automatically capture sps profile
* require 3 data points to post a sustained regression|improvement
* job is retried automatically until we get to 3 data points (serially)
* job is orange as either test-unexpected-fail (regression) or test-unexpected-pass (improvement) to drawn attention to the great work
* we default to mozilla-central as the source of truth
* this doesn't work with android, just desktop tests
* this is hacked to only work on branches that match Try*
* tested with all desktop tests/platforms, but not PGO or e10s

That is a lot of stuff.  Please weigh in on concerns with this and other things which we should be aware of.

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #8538511 - Flags: review?(wlachance)

Attachment #8538511 - Flags: feedback?(avihpit)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Depends on: 1113146

Avi Halachmi (:avih)

Comment 4

•

10 years ago

Comment on attachment 8538511 [details] [diff] [review]
turn talos orange when numbers are out of the ordinary (1.0)

Review of attachment 8538511 [details] [diff] [review]:
-----------------------------------------------------------------

The patch adds lots of unrelated general updates, features, and applies decisions which deserve explanations (such as 3 data points instead of 12 for regression detection).

Also, it seemingly changes the concept of talos as a test numbers generator, and makes it reference historical data etc. This is a very big deviation from how the system works now.

- Please divide the patch to few parts, including one which is the general unrelated changes.

- Please describe how are the talos changes interacting with tree herder.

- etc.

Attachment #8538511 - Flags: feedback?(avihpit) → feedback-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Depends on: 1113225

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Depends on: 1113273

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

10 years ago

Comment on attachment 8538511 [details] [diff] [review]
turn talos orange when numbers are out of the ordinary (1.0)

going to split this into a few sub patches

Attachment #8538511 - Flags: review?(wlachance)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

10 years ago

Depends on: 1113576

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 6

•

10 years ago

Attached patch turn a job orange on try if the data values are out of the norm (1.0) (obsolete) — Details — Splinter Review

removing sps profiling, auto retries, etc.

This adds a little bit of code to run_tests.py as well as some support functions in compare.py.

Attachment #8533891 - Attachment is obsolete: true

Attachment #8538511 - Attachment is obsolete: true

Attachment #8539377 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 7

•

10 years ago

Comment on attachment 8539377 [details] [diff] [review]
turn a job orange on try if the data values are out of the norm (1.0)

Review of attachment 8539377 [details] [diff] [review]:
-----------------------------------------------------------------

r- because of my concern over android.

::: talos/compare.py
@@ +279,5 @@
>      return {'low': low, 'high': high, 'avg': average, 'geomean': geomean, 'count': count, 'data': vals}
>  
>  
> +def getCurrentPlatform(e10s):
> +    #TODO: when we add android support to compare.py, add it here

Will talos try jobs continue to work as expected on Android? It sounds from our discussion on irc that this is not the case...

@@ +314,5 @@
> +
> +    platform = getCurrentPlatform(e10s)
> +    return compareResults(revision, branch, master_branch, 0, [platform], [testname], pgo)
> +
> +def makeStatusMessage(results, testname, master_branch, min_count):

I think this method would go better in run_tests.py. It really has nothing to do with comparisons.

Attachment #8539377 - Flags: review?(wlachance) → review-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

9 years ago

Attached patch turn a job orange on try if the data values are out of the norm (2.0) — Details — Splinter Review

updated, we have support for android, so that shouldn't be an issue anymore (yay winter break contributors!)

Attachment #8539377 - Attachment is obsolete: true

Attachment #8543026 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 9

•

9 years ago

Comment on attachment 8543026 [details] [diff] [review]
turn a job orange on try if the data values are out of the norm (2.0)

Review of attachment 8543026 [details] [diff] [review]:
-----------------------------------------------------------------

Two things I'd like addressed before pushing (along with plenty of testing). Sorry for missing them the last time through.

::: talos/compare.py
@@ +311,5 @@
> +            platform = 'WinXP'
> +    if mozinfo.os == 'mac':
> +        if mozinfo.version == "OS X 10.6.8":
> +            platform = "OSX64"
> +        #TODO: verify this

Please actually verify this (and take the comment out) before pushing. :)

@@ +394,5 @@
>                      status = ':('
>                      if t in reverse_tests:
>                          status = ':)'
>  
> +                platVal['status'] = None

I think we should just define a global "REGRESSION" and "IMPROVEMENT" enum and use that (using a method like this: http://stackoverflow.com/a/702903). Trying to overload "True" and "False" to mean regression or improvement is even more confusing that doing comparisons against ':)' and ':('.

Attachment #8543026 - Flags: review?(wlachance) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

9 years ago

Blocks: 1119444

William Lachance (:wlach)

Comment 10

•

9 years ago

Do we still want to do this? I would be inclined to drop this in favor of further improvements to the compare view in perfherder.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

9 years ago

I think we should drop this for now.  We can revisit it later, but the concept of making compare view more included into our normal work flow should solve issues:)

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Flags: needinfo?(jmaher)

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Talos could benefit from the ability of turning a job orange on try server

Categories

(Testing :: Talos, defect)

Tracking

(Not tracked)

People

(Reporter: jmaher, Assigned: jmaher)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 3 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type