Closed Bug 1380439 Opened 6 years ago Closed 6 years ago

OrangeFactor enhancements to support intermittent triage of DOM bugs

Categories

(Tree Management Graveyard :: OrangeFactor, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:July])

Attachments

(1 file, 1 obsolete file)

We are still discussing what is needed here, but perhaps something like setting Priority or a keyword for high-frequency intermittents.
we should list the specific components that will be used for this type of triage.
I cannot find product/component in OF, but I think we could just pull from bugzilla:

https://bugzilla.mozilla.org/rest/bug/<bugid>?include_fields=product,component
Yeah the bug data in ES is populated by bzcache, so only the fields here are present:
https://github.com/jonallengriffin/bzcache/blob/7a1d5044ac15fb783dd428084c7da8f7e4cd252d/bzcache/bzcache.py#L67
Whiteboard: [PI:July]
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #1)
> we should list the specific components that will be used for this type of
> triage.

Should they be the same as those used by https://bugzilla.mozilla.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=DOM-triage-withoutIPCorE10s&sharer_id=434964&list_id=13675201 ?
yes, lets do that
Using criteria:
 - >= 30 failures in 7 days
 - product/component as in triage query (comment 4)
 - priority != '--'
I currently find just a few bugs (which is probably good!):
 - bug 1292134 (stockwell needswork, P3)
 - bug 1222282 (no whiteboard, P3)
 - bug 1262224 (stockwell disabled, P3)

Of course we haven't "reset" priority on existing bugs yet, so there may be a lot of neglected Priority='--' bugs out there.
Your careful review is appreciated! 

Am I implementing the procedure you described to :overholt?

Output in test mode:

# Bug 1292134: {u'priority': u'P3', u'keywords': [u'intermittent-failure'], u'product': u'Core', u'whiteboard': u'[stockwell needswork]', u'component': u'DOM: Events'} :: set Priority=-- and whiteboard=[stockwell needswork:DOM]

# Bug 1222282: {u'priority': u'P3', u'keywords': [u'intermittent-failure'], u'product': u'Core', u'whiteboard': u'', u'component': u'DOM: IndexedDB'} :: set Priority=-- and whiteboard=[stockwell needswork:DOM]

# Bug 1262224: {u'priority': u'P3', u'keywords': [u'intermittent-failure', u'leave-open', u'mlk'], u'product': u'Core', u'whiteboard': u'[MemShrink:P2][stockwell disabled]', u'component': u'DOM: Service Workers'} :: set Priority=-- and whiteboard=[MemShrink:P2][stockwell needswork:DOM]
Attachment #8886397 - Flags: review?(jmaher)
Comment on attachment 8886397 [details] [diff] [review]
update priority and whiteboard for DOM triage trial

Review of attachment 8886397 [details] [diff] [review]:
-----------------------------------------------------------------

a few concerns, a great start so far.

::: woo_commenter.py
@@ +126,5 @@
> +        bug_info = bug_info['bugs'][0]
> +        if ((bug_info['product'] in products) and
> +            (bug_info['component'] in components) and
> +            ('intermittent-failure' in bug_info['keywords']) and
> +            (bug_info['priority'] != '--')):

this will reset priorities if they are already P1, P2, etc.  I believe we want to ignore P1, P2, and edit all the rest, unless we want the first round to be a larger volume.

Going into future runs of this code, will we want to ignore any other PX flags given that they would be set manually?

@@ +131,5 @@
> +
> +            # remove [stockwell xxx] from whiteboard, then add
> +            # [stockwell needswork:DOM]
> +            whiteboard = bug_info['whiteboard']
> +            whiteboard = re.sub("\[stockwell.*?\]", "", whiteboard)

this removes all stockwell tags, we only want [stockwell needswork] to be modifed, not [stockwell fixed] or [stockwell disabled], etc.

@@ +213,5 @@
>          print "Submitting comment to bug %s (%d occurrences)" % (bug_id, counts['total'])
>          submit_bug_comment(bmo_session, bug_id, text)
>  
> +    if options.weekly_mode:
> +        update_bugs_for_triage(options, bmo_session, bug_stats)

this will only update once/week, is that a good enough frequency?  Possibly for now and next week work on refining this?  I am thinking if the daily rate is posting (>15/day) then we should be marking this as needswork/P--
Attachment #8886397 - Flags: review?(jmaher) → review-
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #8)
> ::: woo_commenter.py
> @@ +126,5 @@
> > +        bug_info = bug_info['bugs'][0]
> > +        if ((bug_info['product'] in products) and
> > +            (bug_info['component'] in components) and
> > +            ('intermittent-failure' in bug_info['keywords']) and
> > +            (bug_info['priority'] != '--')):
> 
> this will reset priorities if they are already P1, P2, etc.  I believe we
> want to ignore P1, P2, and edit all the rest, unless we want the first round
> to be a larger volume.

Per https://wiki.mozilla.org/Platform/DOM_Bug_Triage, P2 bugs will be reviewed for "priority nomination" only "at least once a release". If a P2 is a frequent intermittent, I think it needs triage right away.

P1 bugs should be triaged weekly, so I agree we should not change priority for P1's:

 +            (bug_info['priority'] != '--') and
 +            (bug_info['priority'] != 'P1')):

> Going into future runs of this code, will we want to ignore any other PX
> flags given that they would be set manually?

I don't think so. I think we generally want action on frequent intermittents, regardless of previously assigned priority (except P1). Ignoring P2,P3,P4 would risk triage not noticing when, for instance, an intermittent that was dormant, assigned P2/3/4, later flared up as a frequent failure.
 
> @@ +131,5 @@
> > +
> > +            # remove [stockwell xxx] from whiteboard, then add
> > +            # [stockwell needswork:DOM]
> > +            whiteboard = bug_info['whiteboard']
> > +            whiteboard = re.sub("\[stockwell.*?\]", "", whiteboard)
> 
> this removes all stockwell tags, we only want [stockwell needswork] to be
> modifed, not [stockwell fixed] or [stockwell disabled], etc.

Are you sure? In my manual stockwell triaging, if a bug is [stockwell fixed] or [stockwell disabled] and now it is failing frequently, I change it to [stockwell needswork], not "[stockwell fixed][stockwell needswork]".

> @@ +213,5 @@
> >          print "Submitting comment to bug %s (%d occurrences)" % (bug_id, counts['total'])
> >          submit_bug_comment(bmo_session, bug_id, text)
> >  
> > +    if options.weekly_mode:
> > +        update_bugs_for_triage(options, bmo_session, bug_stats)
> 
> this will only update once/week, is that a good enough frequency?  Possibly
> for now and next week work on refining this?  I am thinking if the daily
> rate is posting (>15/day) then we should be marking this as needswork/P--

I thought I would consider a daily change next week, in a separate bug. I think it is higher risk, and wasn't mentioned in your email to overholt/htsai.
Comment on attachment 8886397 [details] [diff] [review]
update priority and whiteboard for DOM triage trial

Review of attachment 8886397 [details] [diff] [review]:
-----------------------------------------------------------------

::: woo_commenter.py
@@ +32,5 @@
>  PRIORITY1_THRESHOLD = 75
>  PRIORITY2_THRESHOLD = 30
>  
> +BZ_API_URL = 'https://bugzilla.mozilla.org/rest/bug/%s'
> +COMMENT_API_URL = BZ_API_URL+'/comment'

Sadly we have no teston on this repo - but if we were running flake8 it would say to add spaces around the `+` operator here and below :-)

@@ +33,5 @@
>  PRIORITY2_THRESHOLD = 30
>  
> +BZ_API_URL = 'https://bugzilla.mozilla.org/rest/bug/%s'
> +COMMENT_API_URL = BZ_API_URL+'/comment'
> +TRIAGE_FIELDS = '?include_fields=product,component,priority,whiteboard,keywords'

With the requests library it's possible (and slightly more common) to pass the parameters via the `params` kwarg rather than by adding to the URL. This is slightly more readable from a code point of view, and an added bonus is that requests does automatic URL encoding for it. See:
http://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

@@ +85,5 @@
> +
> +def submit_bug_triage_update(bmo_session, bug_id, priority, whiteboard):
> +    try:
> +        params = {'priority': priority, 'whiteboard': whiteboard}
> +        r = bmo_session.post(BZ_API_URL % bug_id, json=params, timeout=30)

This can probably be generalised given there's also a `submit_bug_comment` method? Perhaps rename it to `submit_bug_change` and make it accept an additional parameter that will be passed to the `json` kwarg?
(In reply to Ed Morley [:emorley] from comment #10)
> @@ +85,5 @@
> > +
> > +def submit_bug_triage_update(bmo_session, bug_id, priority, whiteboard):
> > +    try:
> > +        params = {'priority': priority, 'whiteboard': whiteboard}
> > +        r = bmo_session.post(BZ_API_URL % bug_id, json=params, timeout=30)
> 
> This can probably be generalised given there's also a `submit_bug_comment`
> method? Perhaps rename it to `submit_bug_change` and make it accept an
> additional parameter that will be passed to the `json` kwarg?

About that...

submit_bug_comment() posts to "https://bugzilla.mozilla.org/rest/bug/<bug-id>/comment"; do you know why "/comment" is there?. My understanding of https://bmo.readthedocs.io/en/latest/api/core/v1/bug.html#update-bug is that it shouldn't be...but it obviously works. I was going to make one general method, but then realized this special /comment was there and didn't want to change it.


Thanks for the drive-by Ed!
good point on the priority for -- and P1.

As for the whiteboard, there are many cases where we have fixed a bug on Thursday or Friday and the weekly count is still >30, but there is no work todo, it has been marked [stockwell fixed], and we move on.  I did overlook what was mentioned about a bug becoming more active despite and previous whiteboard tags.  Could we exclude bugs that:
* are marked as fixed (not stockwell fixed, but the resolution flag fixed in bugzilla)
* have a pending needinfo or review

I suspect it is ok to retriage, this way it is clear that we have outstanding intermittent failures and they need attention!!


One other thought, this will make a second pass through the bugs, so there will be 2 comments from orange factor:
1) weekly summary
2) adjusting the priority/whiteboard

should we find some method to make this a single comment from orange factor?  Maybe a followup bug?
(In reply to Geoff Brown [:gbrown] from comment #11)
> submit_bug_comment() posts to
> "https://bugzilla.mozilla.org/rest/bug/<bug-id>/comment"; do you know why
> "/comment" is there?. My understanding of
> https://bmo.readthedocs.io/en/latest/api/core/v1/bug.html#update-bug is that
> it shouldn't be...but it obviously works. I was going to make one general
> method, but then realized this special /comment was there and didn't want to
> change it.

Ah there are two ways to post a comment, the one I'd used before was:
https://bmo.readthedocs.io/en/latest/api/core/v1/comment.html#create-comments

...however via the main bug endpoint works too, so guess both can use that?
Blocks: 1381122
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #12)
> As for the whiteboard, there are many cases where we have fixed a bug on
> Thursday or Friday and the weekly count is still >30, but there is no work
> todo, it has been marked [stockwell fixed], and we move on.  I did overlook
> what was mentioned about a bug becoming more active despite and previous
> whiteboard tags.  Could we exclude bugs that:
> * are marked as fixed (not stockwell fixed, but the resolution flag fixed in
> bugzilla)
> * have a pending needinfo or review

But if the bug was fixed, it should be a P1 and already excluded from our priority/whiteboard modification. Similarly, if there's a patch under review, presumably it is a P1. Pending ni...probably someone is working the bug and it is a P1, or it is awaiting triage, with P=--; in either case, it is already excluded.

I'd prefer to try this with the very simple exclusion based only on Priority. I think it might work well enough, and I am concerned that some of these other exclusions might make us miss some bugs that need triage attention (even though they are marked as fixed, or someone is ni'd, etc).

> I suspect it is ok to retriage, this way it is clear that we have
> outstanding intermittent failures and they need attention!!

Exactly!

> One other thought, this will make a second pass through the bugs, so there
> will be 2 comments from orange factor:
> 1) weekly summary
> 2) adjusting the priority/whiteboard
> 
> should we find some method to make this a single comment from orange factor?
> Maybe a followup bug?

Bug 1381122.
ok, I am on board with trying this out by only filtering on P-- and P1 :)
I've added in the P1 exclusion and implemented submit_bug_change().
Attachment #8886397 - Attachment is obsolete: true
Attachment #8886701 - Flags: review?(jmaher)
Comment on attachment 8886701 [details] [diff] [review]
update priority and whiteboard for DOM triage trial

Review of attachment 8886701 [details] [diff] [review]:
-----------------------------------------------------------------

lets go with this.  if we deploy today, that means that we need to have the existing ~800 bugs edited properly so next week we can update the queries properly.
Attachment #8886701 - Flags: review?(jmaher) → review+
~800 bugs have been updated with P5 priority.
Deployed.

sudo su webtools -
[webtools@brasstacks1.dmz.scl3 gbrown]$ cd ~/apps/orangefactor/src/orangefactor/ && hg pull -uv
pulling from https://hg.mozilla.org/automation/orangefactor/
warning: hg.mozilla.org certificate with fingerprint 73:7f:ef:ab:68:0f:49:3f:88:91:f0:b7:06:69:fd:8f:f2:55:c9:56 not verified (check hostfingerprints or web.cacerts config setting)
searching for changes
all local heads known remotely
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
resolving manifests
getting woo_commenter.py
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Depends on: 1381504
Depends on: 1381535
See Also: → 1381587
Blocks: 1307197
No longer blocks: 1381122
Depends on: 1381122
Depends on: 1382273
See Also: → 1390558
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.