Closed Bug 1180138 Opened 9 years ago Closed 9 years ago

Close old and in-active intermittent-failure bugs

Categories

(Tree Management Graveyard :: OrangeFactor, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

Details

We're had numerous rounds of closing old and inactive intermittent failure bugs, however most of the time, we've limited searches to bugs that have had comments by tbplbot, in order to reduce false positives.

However this ignores:
* bugs where there was only one occurrence, that were created before treeherder
* bugs that never worked for bug suggestions, and haven't occurred since we've been using treeherder
(Treeherder comments on bugs, even if they were not suggested, unlike TBPL)

In addition, we likely have a number of bugs that do have tbplbot comments, but are just new candidates since the last round of closures.

Currently...

Total intermittent failure bugs: 16018
Number still open: 5018
(https://bugzilla.mozilla.org/bzapi/count?keywords=intermittent-failure&resolution=---)

And so the bugspam begins (sorry!)...
The criteria I'm using for now are:
* Has a resolution of "---"
* Has the keyword "intermittent-failure"
* Hasn't been modified in the last 3 months
* Doesn't contain "leave-open" or "leave open" in the whiteboard
* Doesn't have the keyword "leave-open" set
* Contains none of the strings "random disabled marked fuzzy todo fails annotated time-bomb" in the whiteboard
* A comment contains either "getParsedLog" or "treeherder"
Bug 1156105 would allow me to avoid spamming everyone, but unfortunately it's not landed yet.
I wonder if this really should apply also to other trees, like Thunderbird.
Are those failures no longer appearing just due to migrating to treeherder?
Flags: needinfo?(vseerror)
After several passes using the criteria in comment 1, I dropped the "comment contains" clause, lowered the timeframe from 3 months to ~4-6 weeks (depending on how many comments the bug had had) & just vetted the resultant bugs more carefully.

Remaining open intermittent bugs: 1611
Bugs closed (or keyword removed) today: 3407 \o/

Search showing bugs that were closed as WFM (you'll need to use show-all to see beyond the first 500):
http://mzl.la/1NBiwpq

(If you coming here after a bug was incorrectly closed - the likely reason is that either the "leave-open" keyword was omitted after eg disabling a test, or you're using the "intermittent-failure" keyword for things that don't report to Treeherder [contrary to the keyword's description]. I've attempted to fix these cases prior to mass-closing, but there may be the odd one that slipped past.)

(In reply to :aceman from comment #3)
> I wonder if this really should apply also to other trees, like Thunderbird.
> Are those failures no longer appearing just due to migrating to treeherder?

Thunderbird still uses Treeherder, so if someone classifies failures there, the bugs will receive bug comments. The main thing is that the "intermittent-failure" keyword means "this is an intermittent failure on treeherder", so if the bug in question isn't about a test that is reporting to treeherder, it should use another keyword (I'm happy to create new ones if desired). Otherwise it can cause false positives in Treeherder's bug suggestions, as well as by liable for mass-closure.
s/by liable/be liable/
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:emorley] from comment #4)
>...
> (In reply to :aceman from comment #3)
> > I wonder if this really should apply also to other trees, like Thunderbird.
> > Are those failures no longer appearing just due to migrating to treeherder?
> 
> Thunderbird still uses Treeherder, so if someone classifies failures there,
> the bugs will receive bug comments. The main thing is that the
> "intermittent-failure" keyword means "this is an intermittent failure on
> treeherder", so if the bug in question isn't about a test that is reporting
> to treeherder, it should use another keyword (I'm happy to create new ones
> if desired). Otherwise it can cause false positives in Treeherder's bug
> suggestions, as well as by liable for mass-closure.

I can't speak to this. Perhaps magnus or others

34 bugs closed WFM https://bugzilla.mozilla.org/buglist.cgi?keywords=intermittent-failure%2C%20&keywords_type=allwords&list_id=12367031&o1=nowordssubstr&chfieldfrom=2015-07-01&f1=component&o3=anywordssubstr&resolution=WORKSFORME&classification=Client%20Software&classification=Components&chfieldto=2015-07-04&query_format=advanced&f3=short_desc&f2=OP&product=MailNews%20Core&product=Thunderbird
Flags: needinfo?(vseerror) → needinfo?(mkmelin+mozilla)
(In reply to Ed Morley [:emorley] from comment #4)
> (In reply to :aceman from comment #3)
> > I wonder if this really should apply also to other trees, like Thunderbird.
> > Are those failures no longer appearing just due to migrating to treeherder?
> 
> Thunderbird still uses Treeherder, so if someone classifies failures there,
> the bugs will receive bug comments. The main thing is that the
> "intermittent-failure" keyword means "this is an intermittent failure on
> treeherder", so if the bug in question isn't about a test that is reporting
> to treeherder, it should use another keyword (I'm happy to create new ones
> if desired). Otherwise it can cause false positives in Treeherder's bug
> suggestions, as well as by liable for mass-closure.

More like I think the bugs were filled BEFORE TB was migrated to treeherder. So I don't know whether the tests are "reporting to treeherder". But I question that they are suddenly closed as if the underlying problems in tests are fixed, or no longer occurring. I am not sure migration to treeherder had that effect. If a keyword was a problem in those bugs, couldn't it just be removed?
(In reply to :aceman from comment #7)
> More like I think the bugs were filled BEFORE TB was migrated to treeherder.
> So I don't know whether the tests are "reporting to treeherder". 

You're welcome to do what you want with them - but of the ones I spot checked, there were reports from Treeherder. 

> But I
> question that they are suddenly closed as if the underlying problems in
> tests are fixed, or no longer occurring. I am not sure migration to
> treeherder had that effect. If a keyword was a problem in those bugs,
> couldn't it just be removed?

That's not what WORKSFORME means. 

In this case, if a failure of a test that reports to Treeherder (or TBPL) stops occurring, then as far as Treeherder is concerned, the test failure bug is now worksforme - ie: what could be reproduced before, cannot now.
To clarify a bit - I'm totally open to other projects setting different terms for these bugs - and if it weren't for the fact that they use the same keyword as the main products (and so can cause false positives in Treeherder), I wouldn't touch them.

We're having a major drive to make intermittent-failure bugs actionable - eg soon we will only file bugs for intermittents that occur more than N times (see bug 1179263) - and closing out old bugs is a part of this.

If there were bugs you didn't want closed, then my apologies and please do reopen (and either drop the "intermittent-failure" keyword or else add the "leave-open" keyword).
I have nothing to add - thunderbird uses treeherder just like other projects now. TBPL doesn't seem to even exist anymore.
Anyway, we can (re)create bugs if they start appearing again.
Flags: needinfo?(mkmelin+mozilla)
This has incorrectly closed bugs (e.g., bug 1123979) that were annotated as failing where the bug was being left open to cover fixing the underlying problem.  (I suspect there were more.)
(In reply to David Baron [:dbaron] ⌚UTC-7 from comment #11)
> This has incorrectly closed bugs (e.g., bug 1123979) that were annotated as
> failing where the bug was being left open to cover fixing the underlying
> problem.  (I suspect there were more.)

That bug was incorrectly marked. It needed one or both of: keywords:leave-open , or whiteboard:annotated*

A false positive rate is both expected and unavoidable, particularly if people haven't marked bugs correctly. Each time we're done these closing marathons, I've used a variety of saved searches to try and find cases like this (by searching for attachment descriptions that have things like "disable *" or similar in them), however looking at bug 1123979 the attachment there is really not named helpfully.

The odd bug being marked incorrectly is better than not closing any of the bugs at all.
(In reply to Ed Morley [:emorley] from comment #12)
> That bug was incorrectly marked. It needed one or both of:
> keywords:leave-open , or whiteboard:annotated*

In fact, there were an even wider range of attributes that would have avoided it from being closed, but it was missing all of them (see comment 1).
Arguably, this also bolsters the idea that intermittent-failure bugs should be resolved when they're not failing intermittently in CI, regardless of the "fix" that was used. Follow-up bugs are cheap.
Yeah agreed :-) 
(Also, note I wasn't blaming the sheriffs in comment 12, it's going to be unavoidable at times :-))
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.