Closed Bug 790571 Opened 7 years ago Closed 7 years ago

Intermittent failures should use the keyword 'intermittent-failure', not whiteboard '[orange]', for faster bzapi searches

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sheriff-want])

Attachments

(4 obsolete files)

tl;dr: For improved TBPL bzapi performance we should switch from whiteboard [orange] to using a keyword.

--

Currently intermittent failures are filed with:
* [orange] in the whiteboard
* Marked as blocking bug with alias 'randomorange'
* [Optionally]: with [red]/[purple] in the whiteboard in addition to the [orange], to indicate the colour(s) under which they appear on TBPL. (This leads to the silly situation, where we have bugs filed with whiteboard "[orange][red]", when they never show up as orange).

TBPL uses the following bzapi call to find bug suggestions:
> https://api-dev.bugzilla.mozilla.org/latest/bug?whiteboard=orange&summary=" . urlencode($fileName)
(http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/b3f9daa143ce/php/inc/AnnotatedSummaryGenerator.php#l100)

These calls take ~5 seconds each (looking at tbpl-dev's pre-fetching log output: https://tbpl-dev.allizom.org/cache/), which means that the annotated summary generation sometimes hits the 30 second timeout, for failures with multiple filenames that need bug suggestions. We now have a bugscache (about to be pushed to tbpl production), but we have to clear out entries older than 3 hours (so we can detect newly filed bugs) & there is also a long tail of results - so even if we handled refreshing results more efficiently, we would still gain a lot from 
improving the initial bzapi call time.

As such, I believe we should:

* Mass add <keyword of bikeshedding choice> to all bugs that have [orange] in the whiteboard, ideally without causing bugspam (glob,dkl: presume we can run a script on the server to avoid bugmail?) - leaving [orange] for now, so we can test/fallback easily.

* Update the bzapi call in AnnotatedSummaryGenerator.php: s/whiteboard=orange/keyword:whatever_chosen/

* Update OrangeFactor http://hg.mozilla.org/automation/orangefactor/file/0894762c1567/server/handlers.py#l365 , http://hg.mozilla.org/automation/orangefactor/file/0894762c1567/server/handlers.py#l438 and http://hg.mozilla.org/automation/orangefactor/file/0894762c1567/server/handlers.py#l839 (ideally reducing some of that duplication whilst we're at it).

* Update the new bugzilla OrangeFactor extension: http://bzr.mozilla.org/bmo/4.2/annotate/head:/extensions/OrangeFactor/Extension.pm#L33

* Post to dev.{platform,planning,...} informing developers of the change - so they know to file new oranges under the new keyword instead of whiteboard.

* See if it is possible to either mass-update saved searches (if there are [m]any) referring to [orange], or at least make a list so we can email devs. I know Ehsan's whine set up to catch bugs marked as blocking alias:randomorange but missing sw:[orange] will need updating etc.

* Once everything working as expected, remove [orange] from all bugs, ideally without causing bugmail + add the keyword to any stragglers that have crept in since step #1.

* I will set up a whine to email me if people try and use [orange] in the future, so I can update the bugs & remind them about the keyword.

glob,dkl: 

* Do the bugzilla parts of that sound ok? 
* Should we go with the new format used by releng (/proposed on the newsgroups a while ago, although doesn't seem to have taken off as much), ie like: #relman/triage/defer-to-group ?

Everyone: 

* for keyword choice, I think we should go with something other than 'orange' to make it clear that it includes red/purple too. 'intermittent-failure' ? (or '#{tree,sheriff}/tbpl/intermittent-failure' if we go with that format)

* Other than TBPL, OrangeFactor & the b.m.o OrangeFactor extension, does anything else need updating to support the keyword?
Summary: Switch intermittent failures from whiteboard '[orange]' to using a keyword, to improve TBPL bzapi performance → Intermittent failures should use a keyword, not whiteboard '[orange]', for faster bzapi searches
Whiteboard: [sheriff-want]
(In reply to Ed Morley [:edmorley UTC+1] from comment #0)
> * Mass add <keyword of bikeshedding choice> to all bugs that have [orange]
> in the whiteboard, ideally without causing bugspam (glob,dkl: presume we can
> run a script on the server to avoid bugmail?) - leaving [orange] for now, so
> we can test/fallback easily.

yes, we can throw together a script to perform this action without triggering bugmail.

> * See if it is possible to either mass-update saved searches (if there are
> [m]any) referring to [orange], or at least make a list so we can email devs.
> I know Ehsan's whine set up to catch bugs marked as blocking
> alias:randomorange but missing sw:[orange] will need updating etc.

bulk updating saved searches is possible, but i don't think it's something we've done before, and would require careful testing.

> * Once everything working as expected, remove [orange] from all bugs,
> ideally without causing bugmail + add the keyword to any stragglers that
> have crept in since step #1.

no problems.


> * Should we go with the new format used by releng (/proposed on the
> newsgroups a while ago, although doesn't seem to have taken off as much), ie
> like: #relman/triage/defer-to-group ?

that's up to you; however as you noted the proposal never caught on, so it's probably better to opt for a simple and easy to remember keyword instead.
Currently we use tb-orange for Thunderbird orange bugs. Please can we accommodate that here as well?
(In reply to Byron Jones ‹:glob› from comment #1)
> yes, we can throw together a script to perform this action without
> triggering bugmail.

Great :-)

> bulk updating saved searches is possible, but i don't think it's something
> we've done before, and would require careful testing.

In that case lets skip and instead rely on announcing this widely.

> > * Should we go with the new format used by releng (/proposed on the
> > newsgroups a while ago, although doesn't seem to have taken off as much), ie
> > like: #relman/triage/defer-to-group ?
> 
> that's up to you; however as you noted the proposal never caught on, so it's
> probably better to opt for a simple and easy to remember keyword instead.

Agreed.

(In reply to Mark Banner (:standard8) (slow responses until 18th Sept) from comment #2)
> Currently we use tb-orange for Thunderbird orange bugs. Please can we
> accommodate that here as well?

Do we need to separate them on keyword, or will bugzilla product be sufficient?

If we do still want to use a different keyword, then we might as well get TBPL to be smart about the searches - and use a different search depending on the tree. (Which would avoid the current situation where we occasionally get mailnews/seamonkey oranges bug suggestions showing up on mozilla-central etc)

Obviously subject to change due to the above, but I propose:

keyword: intermittent-failure (or {intermittent-failure,intermittent-failure-tb,...})

blocks: randomorange 
(though we could possibly stop doing this? would save trying to keep the two in sync, and we could set up saved searches/whines based on keywords to replace the use cases that CCing to the randomorange bug facilitates)

whiteboard: Do we still need to add [red] [purple] (rarely searched, just for info); or should we just ditch this?
Attached patch TBPL Part (obsolete) — Splinter Review
Assignee: nobody → bmo
Status: NEW → ASSIGNED
Attached patch OrangeFactor part (obsolete) — Splinter Review
Summary: Intermittent failures should use a keyword, not whiteboard '[orange]', for faster bzapi searches → Intermittent failures should use the keyword 'intermittent-failure', not whiteboard '[orange]', for faster bzapi searches
Attachment #661801 - Flags: review?(arpad.borsos)
Attachment #661803 - Flags: review?(mcote)
Attachment #661801 - Flags: review?(arpad.borsos) → review+
Thank you Arpad :-)

Left to do:
* Bugzilla's OrangeFactor extension.
* Request for feedback on the newsgroups, to ensure nothing else needs patching.
* Creating & running the script to add the keyword to existing bugs.
* Land everything, push to prod
* Test then announce to newsgroups etc.
* Create/run script to remove [orange] from existing bugs.
(In reply to Ed Morley (Away 18th-20th) [:edmorley UTC+1] from comment #6)
> Thank you Arpad :-)
> 
> Left to do:
> * Bugzilla's OrangeFactor extension.

There is an "Extensions: OranegFactor" component under the bugzilla.mozilla.org product. Please file the request there to update the extension based on the new criteria. Make it dependent on the script bug to add the keyword and clean the whiteboard.

dkl

> * Request for feedback on the newsgroups, to ensure nothing else needs
> patching.
> * Creating & running the script to add the keyword to existing bugs.
> * Land everything, push to prod
> * Test then announce to newsgroups etc.
> * Create/run script to remove [orange] from existing bugs.
Depends on: 791709
Depends on: 791721
No longer depends on: 791721
Depends on: 791758
(In reply to Ed Morley (Away 18th-20th) [:edmorley UTC+1] from comment #6)
> * Request for feedback on the newsgroups, to ensure nothing else needs
> patching.

https://groups.google.com/d/topic/mozilla.dev.platform/3-QnNjxlXdg/discussion
Blocks: 790889
Comment on attachment 661803 [details] [diff] [review]
OrangeFactor part

Review of attachment 661803 [details] [diff] [review]:
-----------------------------------------------------------------

Fine aside from the mistake in the bzAPI URL below. Don't forget to change bzcache appropriately. :) r+ with the URL fix.

::: server/handlers.py
@@ +441,2 @@
>  
> +        apiURL = BZAPI_SERVER + searchType + bzquery + "&include_fields=id,summary,status&keyword=intermittent-failure"

"keyword" should be "keywords". The query as it is above doesn't actually filter by keyword. :)
Attachment #661803 - Flags: review?(mcote) → review+
Hrm, I've just spotted this, what are we going to do for Thunderbird bugs that have been using "[tb-orange]" up until now? This worked because tbpl only looked for the string of "orange".

If we're happy to re-use just "intermittent-failure" as a keyword, then I'm happy for that from a Thunderbird perspective, but I can't assess how that will impact the other systems that use the orange notification.
(In reply to Mark Banner (:standard8) from comment #12)
> Hrm, I've just spotted this

You asked in comment 2 already about this, to which I replied in comment 3 with a few questions... :-)
(In reply to Ed Morley (Away 18th-20th) [:edmorley UTC+1] from comment #13)
> You asked in comment 2 already about this, to which I replied in comment 3
> with a few questions... :-)

Gah, sorry.

(In reply to Ed Morley (Away 18th-20th) [:edmorley UTC+1] from comment #3)
> (In reply to Mark Banner (:standard8) (slow responses until 18th Sept) from
> comment #2)
> > Currently we use tb-orange for Thunderbird orange bugs. Please can we
> > accommodate that here as well?
> 
> Do we need to separate them on keyword, or will bugzilla product be
> sufficient?

There are the cases where we get a Thunderbird specific orange caused by FF tests, and the bug belongs to somewhere in core, though normally that's a permanent orange but using the orange annotation has been useful for picking it up on TBPL.

> 
> If we do still want to use a different keyword, then we might as well get
> TBPL to be smart about the searches - and use a different search depending
> on the tree. (Which would avoid the current situation where we occasionally
> get mailnews/seamonkey oranges bug suggestions showing up on mozilla-central
> etc)

That's also a possibility, though sometimes it can be useful to see if something has migrated from single app to cross app, also note that there's various core oranges that all apps see.

> Obviously subject to change due to the above, but I propose:
> 
> keyword: intermittent-failure (or
> {intermittent-failure,intermittent-failure-tb,...})

I guess I'm easy either way, it is more on if we need to keep these separate for things like orangefactor, of it they will just handle it anyway.

> blocks: randomorange 
> (though we could possibly stop doing this? would save trying to keep the two
> in sync, and we could set up saved searches/whines based on keywords to
> replace the use cases that CCing to the randomorange bug facilitates)

To be honest, I've never understood the randomorange bug given the whiteboard annotations - seems like the same thing to me (and does it really achieve anything?)
Attached patch OrangeFactor part v2 (obsolete) — Splinter Review
(In reply to Mark Côté ( :mcote ) from comment #11)
> "keyword" should be "keywords". The query as it is above doesn't actually
> filter by keyword. :)

Ah; bugzilla quicksearch allows both, presuming bzapi doesn't then?

Adjusted :-)
Attachment #661803 - Attachment is obsolete: true
Attached patch TBPL Part v2 (obsolete) — Splinter Review
Same as v1, except s/keyword/keywords/ for the bzapi URL.
Attachment #661801 - Attachment is obsolete: true
Depends on: 793131
(In reply to Mark Banner (:standard8) from comment #14)
> There are the cases where we get a Thunderbird specific orange caused by FF
> tests, and the bug belongs to somewhere in core, though normally that's a
> permanent orange but using the orange annotation has been useful for picking
> it up on TBPL.

Would using summary/whiteboard to denote this be acceptable? The more I think about this, the more I think we should put everything under the one keyword. I believe OrangeFactor should cope with this, as it has to be told which trees to parse the logs for, so shouldn't start including thunderbird orange on the OF "trunk" view etc.

> That's also a possibility, though sometimes it can be useful to see if
> something has migrated from single app to cross app, also note that there's
> various core oranges that all apps see.

Agreed; by splitting to two keywords we lose that.

> To be honest, I've never understood the randomorange bug given the
> whiteboard annotations - seems like the same thing to me (and does it really
> achieve anything?)

The only use is so that people can CC to the bug and see filed/closed/reopened bugmail. I think we should just see if we can cover those use-cases with bugzilla whines/saved searches/custom reports perhaps. Would save people getting mixed up and not adding the more important intermittent-failure keyword. (And save Ehsan/whomever from having to continually sync them up manually).

(In reply to Ed Morley (Away 18th-20th) [:edmorley UTC+1] from comment #6)
> Left to do:

No objections on the newsgroup thread (https://groups.google.com/d/topic/mozilla.dev.platform/3-QnNjxlXdg/discussion), so think we are good to proceed, pending bug 793131 & bug 791758.
(In reply to Ed Morley [:edmorley UTC+1] from comment #17)
> (In reply to Mark Banner (:standard8) from comment #14)
> > There are the cases where we get a Thunderbird specific orange caused by FF
> > tests, and the bug belongs to somewhere in core, though normally that's a
> > permanent orange but using the orange annotation has been useful for picking
> > it up on TBPL.
> 
> Would using summary/whiteboard to denote this be acceptable?

Yes, that's fine.

> The more I
> think about this, the more I think we should put everything under the one
> keyword.

Agreed.
Blocks: 799534
Depends on: 807113
Depends on: 807114
Comment on attachment 663353 [details] [diff] [review]
OrangeFactor part v2

Unbitrotted version broken out to bug 807114.
Attachment #663353 - Attachment is obsolete: true
Depends on: 807120
Comment on attachment 663354 [details] [diff] [review]
TBPL Part v2

Unbitrotted version broken out to bug 807120, to make it easier to tie with the TBPL push to production.

Keeping this bug to track overall & for the docs/newsgroup/updating whines work afterwards.
Attachment #663354 - Attachment is obsolete: true
Depends on: 813494
Depends on: 814083
Depends on: 815112
Apart from a few small cleanups, we're all done here, so closing.

New bugs should:
* Use the keyword 'intermittent-failure'.
* Not have [orange] in the whiteboard.
* Not block the alias:randomorange bug, since we've stopped using it (for search/whine suggestions, see bug 438871 comment 26).

I have whines set up to catch mis-filed bugs, to help with the transition.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 821270
Depends on: 828871
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.