Closed Bug 886497 Opened 7 years ago Closed 6 years ago

mdn: set up bug paging

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: groovecoder, Assigned: ashish)

References

Details

Attachments

(2 files)

We'd like to set up bug paging for 'blocker' bugs in the Mozilla Developer Network product. What are our options?
Who would you like to page?
Assignee: server-ops → server-ops-webops
Component: Server Operations → Server Operations: Web Operations
QA Contact: shyam → nmaul
Ali Spivak and myself.

At first we want to page for bugs marked as 'blocker' that are unassigned for ... 30 minutes? Can we time-box it or is it a 24-hour thing?
Any status update on this? IRC pages or email pages?
Assignee: server-ops-webops → server-ops
Component: Server Operations: Web Operations → Server Operations
QA Contact: nmaul → shyam
1) Give me the product and component you'd like to be paged on
2) Give me the list of people (email/phone) if you'd like that OR
3) Give me an IRC channel to notify on. 
4) Do you want this to page you 24/7? 

This will probably need some tweaking on our bugzilla check script and then we'd have to do some testing. This is not an overnight thing :) and might need some co-ordination/work with the Bugzilla devs as well.
Assignee: server-ops → ashish
1. Mozilla Developer Network product, all components
3. #mdndev irc channel
4. Let's just do 7am-2pm PT at first

Thanks!
Default assignee for "Mozilla Developer Network" bugs is nobody@mozilla.org, which makes it non-trivial to set this up using our current scripts (which alert based on assignees rather than specific products/components).

That said I'll wait for :glob to respond since he is already CC'd
Attached patch patch v1Splinter Review
this updates nagios_blocker_checker.pl to support filters by product as well as assignee.

FILTERS

  the filter determines which bugs to check, either by assignee or product.
  for backward compatibility, if just an email address is provided, it will be
  used as the assignee.

  --assignee <email>    filter bugs by assignee
  --product <name>      filter bugs by product name
  --unassigned <email>  set the unassigned user (default: nobody@mozilla.org)

TIMING

  time in hours to wait before paging or warning

  --major_alarm <hours> (default: 24)
  --major_warn  <hours> (default: 20)
  --critical_alarm <hours> (default: 8)
  --critical_warn  <hours> (default: 5)
  --blocker_alarm <hours> (default: 0)
  --blocker_warn  <hours> (default: 0)

EXAMPLES

  nagios_blocker_checker.pl --assignee server-ops@mozilla-org.bugs
  nagios_blocker_checker.pl server-ops@mozilla-org.bugs
  nagios_blocker_checker.pl --product 'mozilla developer network'
Attachment #770668 - Flags: review?(ashish)
ping. we've been abusing our mdn-drivers list for blocker bugs and it's making the channel more noisy than we would like.
Comment on attachment 770668 [details] [diff] [review]
patch v1

Please commit this and let's get the code out. I'll test in prod since there isn't an easy way to run a standalone test on this.
Attachment #770668 - Flags: review?(ashish) → review+
Committing to: bzr+ssh://bjones%40mozilla.com@bzr.mozilla.org/bmo/4.2/
modified contrib/nagios_blocker_checker.pl
Committed revision 9113.
Depends on: 932685
Alright, this is ready to be tested. Please file a blocker in the "Mozilla Developer Network" product and it should trigger emails to :groovecoder and :aspivak. Once that's confirmed, I'll add the bots to #mdndev as well.
Status: NEW → ASSIGNED
\o/ I got 16 emails like so:

http://pastebin.mozilla.org/3381464

It looks like I'm going to get them every 10 minutes now? :( If it checks every 10 minutes, does it send email *only* when there's blocker bugs?
I resolved or bumped the 'blocker' bugs to 0 but I'm still getting emails for critical and major bugs:

http://pastebin.mozilla.org/3381746

We should only send pages/alerts if there are 1+ "blocker"-level bugs.
ah, there isn't any way to disable the alerts for non-blocker states.
that'll be easy to add however, i'll do that now.
Attached patch 886497_2.patchSplinter Review
adds a --severity switch.  eg:

> nagios_blocker_checker.pl --product bugzilla.mozilla.org --severity blocker

also simplifies the output when multiple bugs are found:

> bugs CRITICAL: 1 critical bug(s) found https://bugzil.la/829358 11 major bug(s) found https://bugzil.la/530990,584757,652500,692431,737883,740385,819290,834345,843274,850920,853108
Attachment #824652 - Flags: review?(ashish)
(In reply to Luke Crouch [:groovecoder] from comment #12)
> \o/ I got 16 emails like so:
> 
> http://pastebin.mozilla.org/3381464
> 
> It looks like I'm going to get them every 10 minutes now? :( If it checks
> every 10 minutes, does it send email *only* when there's blocker bugs?

The bug queue checks run every minute and alert every 10 mins. Let me know if you would like the notification interval to be bumped.

Also, you and :aspivak should be able to login to the Nagios web UI to manage the check:

https://nagios.mozilla.org/scl3/cgi-bin/extinfo.cgi?type=2&host=bugzillaadm.private.scl3.mozilla.com&service=mdn_bugs
Comment on attachment 824652 [details] [diff] [review]
886497_2.patch

Review of attachment 824652 [details] [diff] [review]:
-----------------------------------------------------------------

Good to push out
Attachment #824652 - Flags: review?(ashish) → review+
Committing to: bzr+ssh://bjones%40mozilla.com@bzr.mozilla.org/bmo/4.2/
modified contrib/nagios_blocker_checker.pl
Committed revision 9117.
Depends on: 933126
Pushed out Comment 18 in Bug 933126.
Alright, I've setup paging for 7a-2p as requested in Comment 4 and email every 30 mins. I do not want to bring out paging to #mdndev yet, pending some bot access/authorisation concerns. For the time being, the pages should alert :groovecoder and :alispivak and can be accessed via the URL in Comment 16. Closing this out. Please file a new bug for changes in notification times.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Can we change this to 6am-5pm PT? The alert recipients span from PT to ET so that gives us a better coverage.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Done.

Please file a new bug for any further changes. Thanks!
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.