Closed Bug 1399224 Opened 7 years ago Closed 7 years ago

OF Robot should comment when 200 failures accumulated in 30 days

Categories

(Tree Management Graveyard :: OrangeFactor, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We have proposed a new policy, where tests are disabled once they accumulate 200 failures. OF Robot comments could provide a warning that a test is due to be disabled, and that could make it easy to find bugs due for action.
I am thinking [stockwell disabled:pending] or [stockwell disabled:soon] ?
I wonder if some people might interpret those to mean "this is disabled, but work is pending" or something like that. I don't want to go all the way to [stockwell this test will be disabled soon], but maybe [stockwell disable-recommended]?
:jmaher wrote in https://groups.google.com/forum/#!topic/mozilla.dev.platform/uJVTekj2l7I:

Over the last 9 months a few of us have really watched intermittent test failures almost daily and done a lot to pester people as well as fix many.  While there are over 420 bugs that have been fixed since the beginning of the year, there are half that many (211+) which have been disabled in some form (including turning off the jobs).

We don't like to disable and have been pretty relaxed in recommending disabling a test.  Overall we have tried to adhere to a policy of:
* >=30 failures/week- ask for owner to look at failure and fix it, if this persists for a few weeks with no real traction we would go ahead [and recommend] disabling it.
* >= 75 failures/week- ask for people to fix this in a shorter time frame and recommend disabling the test in a week or so
* >= 150 failures/week- often just disable the test

This is confusing and hard to manage.  Since then we have started adjusting triage queries and some teams are doing their own triage and we are ignoring those bugs (while they are getting prioritized properly).

What we are looking to start doing this month is adopting a simpler policy:
* any bug that has >=200 instances in the last 30 days will be disabled
** this will be a manual process, so it will happen a couple times/week

We expect the outcome of this to be a similar amount of disabling, just an easier method for doing so.  It is very possible we might recommend disabling a test before it hits the threshold- keep in mind a disabled test is easy to re-enable (so feel free to disable for that one platform until you have time to look at fixing it)

To be clear we (and some component owners) will continue triaging bugs and trying to get fixes in place as often as possible and prefer a fix, not a disabled test!

Please raise any concerns, otherwise we will move forward with this in the coming weeks.
Summary: OF Robot should comment when 200 failures accumulated → OF Robot should comment when 200 failures accumulated in 30 days
I think [stockwell disable-recommended] is good!
An initial implementation of this idea finds these bugs (as well as others that currently fail frequently):

bug 1387923 (very frequent intermittent disabled on worst platforms 2017-08-22)
bug 1395469 (very frequent intermittent fixed 2017-09-01)

I find myself wondering if we should exclude bugs that haven't failed recently...but maybe that is an unnecessary complication.
Another interesting case:

bug 1392690: marked disabled, but still failing frequently

My current implementation comments:

49 failures in 972 pushes (0.05 failures/push) were associated with this bug in the last 7 days.    

** This test has failed more than 200 times in the last 30 days. It should be disabled until it can be fixed. **

but does not modify the whiteboard (since if already contains "disabled").
on the second case of the [disabled] bug, would OF change that back to [stockwell needswork] ?

I think we need to be careful if something is fixed (intentional or unknown) - so annotating as [stockwell disable-me-soon] or whatever tag will allow for human intervention to ensure it is still occurring.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #7)
> on the second case of the [disabled] bug, would OF change that back to
> [stockwell needswork] ?

Currently, it will change [stockwell disabled] or [stockwell fixed] to [stockwell needswork:owner] if the daily/weekly failure threshold is surpassed and it belongs to an owner-managed component. (The commenter does not set [stockwell needswork] for non-owned component bugs currently -- an inconsistency we might reconsider.) I find that behavior irritating: A frequent failure is disabled or fixed and the whiteboard updated accordingly, then the bot comes along that night and changes it to needswork (or now disable-recommended)...and then a human comes back the next day and fixes the whiteboard again. I think the commenter should not modify the whiteboard if it is disabled, fixed, or disable-recommended.
Currently this recommends disabling for these bugs:

# Bug 1204281
# Bug 1261598
# Bug 1282262
# Bug 1285090
# Bug 1292134
# Bug 1359247
# Bug 1365409
# Bug 1370056
# Bug 1370783
# Bug 1373347
# Bug 1373578
# Bug 1379401
# Bug 1380968
# Bug 1387827
# Bug 1388970
# Bug 1389983
# Bug 1391975
# Bug 1392797
# Bug 1393934
# Bug 1394428
# Bug 1397069

I introduced stockwell_whiteboard() to clean up some repetitive code.

I introduced another sleep() after encountering some bmo failures; we hit bmo more often now because of the whiteboard checks.
Attachment #8908389 - Flags: review?(jmaher)
Comment on attachment 8908389 [details] [diff] [review]
recommend disabling tests which fail 200 times in 30 days

Review of attachment 8908389 [details] [diff] [review]:
-----------------------------------------------------------------

I imagine a lot of future code cleanup, right now I think this is the best use of time to get this in as is.

::: woo_commenter.py
@@ +33,5 @@
>  PRIORITY1_THRESHOLD = 75
>  PRIORITY2_THRESHOLD = 30
> +# Recommend disabling when more than 200 failures tracked over 30 days
> +SKIP_THRESHOLD = 200
> +SKIP_DAYS = 30

why are these called skip, instead of DISABLE_THRESHOLD, etc.?

@@ +203,5 @@
>      else:
>          top = []
>          needswork_bugs = []
> +    # Fetch failure counts for the skip threshold period
> +    skip_tb = TopBugs(local_server_url, skip_start_date, skip_end_date, tree='all')

do we want all or 'trunk' ?
Attachment #8908389 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #10)
> why are these called skip, instead of DISABLE_THRESHOLD, etc.?

That's a bit odd, isn't it? Changed to DISABLE_...

> do we want all or 'trunk' ?

I've kept 'all' for consistency with the rest of the commenter, but it's a good point: We wouldn't want to recommend disabling on central for something failing only on beta. Then again 'disable-recommended' with failure counts on beta only would suggest ... disabling on beta? Let's see how this goes...

https://hg.mozilla.org/automation/orangefactor/rev/bab5e465940d4ad84ec86be9b6fad5e62dd954de
Deployed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
This ran last night and marked 20 bugs [stockwell disable-recommended]. Example comment: https://bugzilla.mozilla.org/show_bug.cgi?id=1204281#c414.

I think it is working as expected. Sadly, many of the bugs are meta-bugs, or otherwise resist disabling.
Blocks: 1307197
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: