Close old and in-active intermittent-failure bugs

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

Details

(Assignee)

Description

4 years ago
We're had numerous rounds of closing old and inactive intermittent failure bugs, however most of the time, we've limited searches to bugs that have had comments by tbplbot, in order to reduce false positives.

However this ignores:
* bugs where there was only one occurrence, that were created before treeherder
* bugs that never worked for bug suggestions, and haven't occurred since we've been using treeherder
(Treeherder comments on bugs, even if they were not suggested, unlike TBPL)

In addition, we likely have a number of bugs that do have tbplbot comments, but are just new candidates since the last round of closures.

Currently...

Total intermittent failure bugs: 16018
Number still open: 5018
(https://bugzilla.mozilla.org/bzapi/count?keywords=intermittent-failure&resolution=---)

And so the bugspam begins (sorry!)...
(Assignee)

Comment 1

4 years ago
The criteria I'm using for now are:
* Has a resolution of "---"
* Has the keyword "intermittent-failure"
* Hasn't been modified in the last 3 months
* Doesn't contain "leave-open" or "leave open" in the whiteboard
* Doesn't have the keyword "leave-open" set
* Contains none of the strings "random disabled marked fuzzy todo fails annotated time-bomb" in the whiteboard
* A comment contains either "getParsedLog" or "treeherder"
(Assignee)

Comment 2

4 years ago
Bug 1156105 would allow me to avoid spamming everyone, but unfortunately it's not landed yet.

Comment 3

4 years ago
I wonder if this really should apply also to other trees, like Thunderbird.
Are those failures no longer appearing just due to migrating to treeherder?
Flags: needinfo?(vseerror)
(Assignee)

Comment 4

4 years ago
After several passes using the criteria in comment 1, I dropped the "comment contains" clause, lowered the timeframe from 3 months to ~4-6 weeks (depending on how many comments the bug had had) & just vetted the resultant bugs more carefully.

Remaining open intermittent bugs: 1611
Bugs closed (or keyword removed) today: 3407 \o/

Search showing bugs that were closed as WFM (you'll need to use show-all to see beyond the first 500):
http://mzl.la/1NBiwpq

(If you coming here after a bug was incorrectly closed - the likely reason is that either the "leave-open" keyword was omitted after eg disabling a test, or you're using the "intermittent-failure" keyword for things that don't report to Treeherder [contrary to the keyword's description]. I've attempted to fix these cases prior to mass-closing, but there may be the odd one that slipped past.)

(In reply to :aceman from comment #3)
> I wonder if this really should apply also to other trees, like Thunderbird.
> Are those failures no longer appearing just due to migrating to treeherder?

Thunderbird still uses Treeherder, so if someone classifies failures there, the bugs will receive bug comments. The main thing is that the "intermittent-failure" keyword means "this is an intermittent failure on treeherder", so if the bug in question isn't about a test that is reporting to treeherder, it should use another keyword (I'm happy to create new ones if desired). Otherwise it can cause false positives in Treeherder's bug suggestions, as well as by liable for mass-closure.
(Assignee)

Comment 5

4 years ago
s/by liable/be liable/
(Assignee)

Updated

4 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:emorley] from comment #4)
>...
> (In reply to :aceman from comment #3)
> > I wonder if this really should apply also to other trees, like Thunderbird.
> > Are those failures no longer appearing just due to migrating to treeherder?
> 
> Thunderbird still uses Treeherder, so if someone classifies failures there,
> the bugs will receive bug comments. The main thing is that the
> "intermittent-failure" keyword means "this is an intermittent failure on
> treeherder", so if the bug in question isn't about a test that is reporting
> to treeherder, it should use another keyword (I'm happy to create new ones
> if desired). Otherwise it can cause false positives in Treeherder's bug
> suggestions, as well as by liable for mass-closure.

I can't speak to this. Perhaps magnus or others

34 bugs closed WFM https://bugzilla.mozilla.org/buglist.cgi?keywords=intermittent-failure%2C%20&keywords_type=allwords&list_id=12367031&o1=nowordssubstr&chfieldfrom=2015-07-01&f1=component&o3=anywordssubstr&resolution=WORKSFORME&classification=Client%20Software&classification=Components&chfieldto=2015-07-04&query_format=advanced&f3=short_desc&f2=OP&product=MailNews%20Core&product=Thunderbird
Flags: needinfo?(vseerror) → needinfo?(mkmelin+mozilla)

Comment 7

4 years ago
(In reply to Ed Morley [:emorley] from comment #4)
> (In reply to :aceman from comment #3)
> > I wonder if this really should apply also to other trees, like Thunderbird.
> > Are those failures no longer appearing just due to migrating to treeherder?
> 
> Thunderbird still uses Treeherder, so if someone classifies failures there,
> the bugs will receive bug comments. The main thing is that the
> "intermittent-failure" keyword means "this is an intermittent failure on
> treeherder", so if the bug in question isn't about a test that is reporting
> to treeherder, it should use another keyword (I'm happy to create new ones
> if desired). Otherwise it can cause false positives in Treeherder's bug
> suggestions, as well as by liable for mass-closure.

More like I think the bugs were filled BEFORE TB was migrated to treeherder. So I don't know whether the tests are "reporting to treeherder". But I question that they are suddenly closed as if the underlying problems in tests are fixed, or no longer occurring. I am not sure migration to treeherder had that effect. If a keyword was a problem in those bugs, couldn't it just be removed?
(Assignee)

Comment 8

4 years ago
(In reply to :aceman from comment #7)
> More like I think the bugs were filled BEFORE TB was migrated to treeherder.
> So I don't know whether the tests are "reporting to treeherder". 

You're welcome to do what you want with them - but of the ones I spot checked, there were reports from Treeherder. 

> But I
> question that they are suddenly closed as if the underlying problems in
> tests are fixed, or no longer occurring. I am not sure migration to
> treeherder had that effect. If a keyword was a problem in those bugs,
> couldn't it just be removed?

That's not what WORKSFORME means. 

In this case, if a failure of a test that reports to Treeherder (or TBPL) stops occurring, then as far as Treeherder is concerned, the test failure bug is now worksforme - ie: what could be reproduced before, cannot now.
(Assignee)

Comment 9

4 years ago
To clarify a bit - I'm totally open to other projects setting different terms for these bugs - and if it weren't for the fact that they use the same keyword as the main products (and so can cause false positives in Treeherder), I wouldn't touch them.

We're having a major drive to make intermittent-failure bugs actionable - eg soon we will only file bugs for intermittents that occur more than N times (see bug 1179263) - and closing out old bugs is a part of this.

If there were bugs you didn't want closed, then my apologies and please do reopen (and either drop the "intermittent-failure" keyword or else add the "leave-open" keyword).
I have nothing to add - thunderbird uses treeherder just like other projects now. TBPL doesn't seem to even exist anymore.
Anyway, we can (re)create bugs if they start appearing again.
Flags: needinfo?(mkmelin+mozilla)
This has incorrectly closed bugs (e.g., bug 1123979) that were annotated as failing where the bug was being left open to cover fixing the underlying problem.  (I suspect there were more.)
(Assignee)

Comment 12

4 years ago
(In reply to David Baron [:dbaron] ⌚UTC-7 from comment #11)
> This has incorrectly closed bugs (e.g., bug 1123979) that were annotated as
> failing where the bug was being left open to cover fixing the underlying
> problem.  (I suspect there were more.)

That bug was incorrectly marked. It needed one or both of: keywords:leave-open , or whiteboard:annotated*

A false positive rate is both expected and unavoidable, particularly if people haven't marked bugs correctly. Each time we're done these closing marathons, I've used a variety of saved searches to try and find cases like this (by searching for attachment descriptions that have things like "disable *" or similar in them), however looking at bug 1123979 the attachment there is really not named helpfully.

The odd bug being marked incorrectly is better than not closing any of the bugs at all.
(Assignee)

Comment 13

4 years ago
(In reply to Ed Morley [:emorley] from comment #12)
> That bug was incorrectly marked. It needed one or both of:
> keywords:leave-open , or whiteboard:annotated*

In fact, there were an even wider range of attributes that would have avoided it from being closed, but it was missing all of them (see comment 1).
Arguably, this also bolsters the idea that intermittent-failure bugs should be resolved when they're not failing intermittently in CI, regardless of the "fix" that was used. Follow-up bugs are cheap.
(Assignee)

Comment 15

4 years ago
Yeah agreed :-) 
(Also, note I wasn't blaming the sheriffs in comment 12, it's going to be unavoidable at times :-))
You need to log in before you can comment on or make changes to this bug.