Bug filer should create new intermittent-failure bugs with priority P5

RESOLVED FIXED

Status

RESOLVED FIXED
a year ago
a year ago

People

(Reporter: gbrown, Assigned: KWierso)

Tracking

(Blocks: 2 bugs)

Details

Attachments

(1 attachment)

Currently, new bugs for intermittent failures are created by the bug filer with Priority='--'; I am proposing we change that to Priority=='P5'.


Priority='--' signals "needs triage", for bugs not marked as intermittent-failure.  intermittent-failure bugs have generally been excluded from triage procedures because the volume is overwhelming. :jmaher and I want to change that by making intermittent-failure bugs P5 by default, and then having the OF robot clear the priority for high-frequency failures.

We are starting a trial of this idea, limited to a group of DOM components. Unfortunately, in the first 24 hours of the trial, we already see more new, likely low-frequency intermittent failures with P=-- than high-frequency failures identified by the OF robot. To avoid "cluttering" triage of intermittent failures with new bugs, we want the bug filer to create new bugs for intermittent failures with Priority P5.
For the DOM trial, we would only need the P5 behavior for intermittent-failure bugs belonging to certain components.

But in the long term, we'll likely want this behavior for all intermittent-failure bugs, regardless of component. I think using P5 is generally "safe" and reasonable for all new intermittent failure bugs, and would simplify the implementation.

:jmaher -- What do you think: P5 for all intermittent failures, or limit it to the trial components for now?
Flags: needinfo?(jmaher)
(Assignee)

Updated

a year ago
Assignee: nobody → wkocher
P5 indicates a bug which Mozilla staff will not fix, but would accept a patch from the community. Do you mean P3 instead?
Flags: needinfo?(gbrown)
I prefer P5, but I am open to P3.

There are too many intermittent failure bugs; we can only possibly investigate a small percentage of them. I'd like to see Mozilla staff spending their time on frequent intermittent failures while generally ignoring less-frequent failures: The frequent ones are easier to reproduce and debug and have a better chance of making a bigger impact on the product. When new bugs are filed, we have no history yet, so I'd like them to have the lowest priority until it's proven that they are important (at which time the OF bot should mark it for triage).

Your thoughts?
Flags: needinfo?(gbrown)
P3 indicates it's part of a backlog of future work to be reviewed. 

We have a process in place to mark intermittent bugs to be resolved as incomplete (if they go three weeks without a comment.)

Between the two processes, we should be down to a backlog of intermittent bugs that I hope would be reasonable.
(Assignee)

Comment 5

a year ago
This will all be better whenever we can get OFv2 up and running and bugs stop being the source of truth for tracking intermittent failures. Then we'd only file bugs for the intermittent failures that pop up frequently enough to become an issue. 

Happy to change what the filer does for now to whatever you end up deciding. At the moment, it doesn't set a priority, but it does bump 'severity' up to 'critical' for failures that include a crash signature.

Are the triage processes you describe used across the board, Emma, or are only some products/components using them? I suppose we could opt in to setting priorities to 3 or 5 based on product/component, but that might become unmanageable pretty quickly.
I am fine with all bugs being P5- we know that a bug for any issue is the current undesirable behavior- and 90% of the intermittents will never get attention by developers.  If there is a different priority we need, then yes, lets choose a different value.  

Our goal for doing this is to allow triagers to stop ignoring intermittent-failure keywords and only surface bugs that are frequent enough to consider investigating and resolving.  If these are all P3, then we need to adjust the queries to be more complex for the DOM team (which is doable)- ideally this is something we can prove out as common for all teams and they can stop ignoring intermittent-failure in their triage queries.
Flags: needinfo?(jmaher)
Created attachment 8887283 [details] [review]
[treeherder] KWierso:b1381587 > mozilla:master
(Assignee)

Comment 8

a year ago
Comment on attachment 8887283 [details] [review]
[treeherder] KWierso:b1381587 > mozilla:master

Totally untested beyond whatever Travis says, but I think this should work. :)

Or would it be better to not send a priority from the UI in all cases and just have the api's end assume priority is P5 unless otherwise set?
Attachment #8887283 - Flags: review?(emorley)
(In reply to Wes Kocher (:KWierso) from comment #8)

> Or would it be better to not send a priority from the UI in all cases and
> just have the api's end assume priority is P5 unless otherwise set?

That would work for me, my triage reports ignore intermittent bugs.
Comment on attachment 8887283 [details] [review]
[treeherder] KWierso:b1381587 > mozilla:master

(In reply to Emma Humphries β˜•οΈ (she/her) [:emceeaich] (UTC-8) +needinfo me from comment #9)
> (In reply to Wes Kocher (:KWierso) from comment #8)
> 
> > Or would it be better to not send a priority from the UI in all cases and
> > just have the api's end assume priority is P5 unless otherwise set?
> 
> That would work for me, my triage reports ignore intermittent bugs.

Wes was referring to where in Treeherder to set the priority as P5, rather than anything to do with the triage scripts.

That said, it sounds like we're still not sure what we want the priority set to? Cancelling review until that's confirmed.
Attachment #8887283 - Flags: review?(emorley)
(In reply to Emma Humphries β˜•οΈ (she/her) [:emceeaich] (UTC-8) +needinfo me from comment #4)
> P3 indicates it's part of a backlog of future work to be reviewed. 
> 
> We have a process in place to mark intermittent bugs to be resolved as
> incomplete (if they go three weeks without a comment.)
> 
> Between the two processes, we should be down to a backlog of intermittent
> bugs that I hope would be reasonable.

I checked in with :jmaher today and we feel P5 is the most appropriate choice here. 

We are only trying to get triage owners to look at the worst intermittent failures, as identified by OrangeFactor. Are the remainder P3 or P5? I do not think it is realistic to view those (new and low-frequency) bugs as a backlog that will get attention eventually, and I don't see much value in reviewing them, so I say P5. Is that reasonable, or do you feel strongly for P3?
Flags: needinfo?(ehumphries)
I think that would be okay, since we're keeping these bugs out of the "main sequence" for triage purposes.

If any intermittent becomes become an issue requiring engineering attention, then the priority should be reset to '--' so that it shows up in a queue of bugs to triage. 

This will also require some changes to my queries, so I'll track that in a related bug.
Flags: needinfo?(ehumphries)
(In reply to Emma Humphries β˜•οΈ (she/her) [:emceeaich] (UTC-8) +needinfo me from comment #12)
> I think that would be okay, since we're keeping these bugs out of the "main
> sequence" for triage purposes.

Great. Thanks.
 
> If any intermittent becomes become an issue requiring engineering attention,
> then the priority should be reset to '--' so that it shows up in a queue of
> bugs to triage. 

Absolutely. That is exactly what we've taught the OF bot to do for frequent intermittents (bug 1380439); that behavior is initially limited - for a trial period - to a set of components overseen by :overholt and :hsinyi.
 
> This will also require some changes to my queries, so I'll track that in a
> related bug.

We would like to eventually see the intermittent-failure exclusion removed from your triage queries (or perhaps have 2 queries, one for intermittents and one for everything else), but we want to see how things go with our trial first.
(In reply to Geoff Brown [:gbrown] from comment #13)
> We would like to eventually see the intermittent-failure exclusion removed
> from your triage queries (or perhaps have 2 queries, one for intermittents
> and one for everything else), but we want to see how things go with our
> trial first.

Bug 1381958 will do that.
Attachment #8887283 - Flags: review?(emorley)
Attachment #8887283 - Flags: review?(emorley) → review+

Comment 15

a year ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/2ce4b08d7a090a4640be419db334076d887f6c54
Bug 1381587 - Make the bug filer set new bugs to P5 priority (#2640) r=emorley
(Assignee)

Updated

a year ago
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
I'm still seeing new intermittent-failure bugs created with Priority=--. For example, bug 1383546, filed today. Is that expected?
Flags: needinfo?(wkocher)
(Assignee)

Comment 17

a year ago
I don't believe there has been a Treeherder production push recently to pick this up.[1]

Should definitely happen sometime this coming week.



1. https://whatsdeployed.io/?owner=mozilla&repo=treeherder&name[]=Stage&url[]=https://treeherder.allizom.org/revision.txt&name[]=Prod&url[]=https://treeherder.mozilla.org/revision.txt
Flags: needinfo?(wkocher)
You need to log in before you can comment on or make changes to this bug.