Closed
Bug 864877
Opened 12 years ago
Closed 11 years ago
Nagios paging changes for vcs-sync machines
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: rbryce)
Details
(Whiteboard: [reit-ops])
We've updated our usage of the github-sync machines. Please adjust nagios as follows:
github-sync2.dmz.scl3.mozilla.com
- is now "owned" by :aki for development of next tools
- should not page on nagios alerts (email/irc only)
github-sync1-dev.dmz.scl3.mozilla.com
github-sync1.dmz.scl3.mozilla.com
github-sync3.dmz.scl3.mozilla.com
- continue to be production machines
- should page hwine on critical alerts
Thanks!
| Assignee | ||
Updated•12 years ago
|
Assignee: server-ops → rbryce
| Assignee | ||
Comment 1•12 years ago
|
||
Changes made. I added 2 extra contactgroups in nagios to direct alerts for just these hosts.
"githubsync" for asasaki
"releng" for hwine
Hal, I can make it so you only receive "Critical" alerts, but that would apply to other systems you get alerts for as well. Is that ok?
Updated•12 years ago
|
Flags: needinfo?(hwine)
| Assignee | ||
Comment 2•12 years ago
|
||
worked this out with Hal on IRC.
Hal you are set to receive only critical SMS alerts for these host. You're email alerts will remain the same.
Flags: needinfo?(hwine)
| Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Unfortunately, I did not get paged for the event in bug 872333 comment 0
Please adjust so I would have.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 4•12 years ago
|
||
(In reply to Hal Wine [:hwine] from comment #3)
> Unfortunately, I did not get paged for the event in bug 872333 comment 0
>
> Please adjust so I would have.
Just sent some test pages to hwine. The config seems to be working as expected. This could be a lost SMS in the carrier system. At hwine's request, I updated his pager number to a gvoice number.
| Assignee | ||
Comment 5•12 years ago
|
||
Paging should be good to go now.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 6•11 years ago
|
||
github-sync2.dmz.scl3 just paged oncall for low inodes. It looks like that host is set for IRC-only notifications as well as the stuff for :aki:
'github-sync2.dmz.scl3.mozilla.com' => {
parents => 'seamicro-b1.r101-3.console.scl3.mozilla.com',
contact_groups => 'sysalertsonly,githubsync',
Perhaps that doesn't override the generic disk check's contact_groups?
Status: RESOLVED → REOPENED
Flags: needinfo?(ashish)
Resolution: FIXED → ---
Comment 7•11 years ago
|
||
(In reply to Eric Ziegenhorn :ericz from comment #6)
> github-sync2.dmz.scl3 just paged oncall for low inodes. It looks like that
> host is set for IRC-only notifications as well as the stuff for :aki:
>
> 'github-sync2.dmz.scl3.mozilla.com' => {
> parents => 'seamicro-b1.r101-3.console.scl3.mozilla.com',
> contact_groups => 'sysalertsonly,githubsync',
>
> Perhaps that doesn't override the generic disk check's contact_groups?
That is correct - service checks default to 'sysalerts' unless changed and do not inherit the host's contact_groups.
Flags: needinfo?(ashish)
| Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Eric Ziegenhorn :ericz from comment #6)
> github-sync2.dmz.scl3 just paged oncall for low inodes. It looks like that
> host is set for IRC-only notifications as well as the stuff for :aki:
>
> 'github-sync2.dmz.scl3.mozilla.com' => {
> parents => 'seamicro-b1.r101-3.console.scl3.mozilla.com',
> contact_groups => 'sysalertsonly,githubsync',
>
> Perhaps that doesn't override the generic disk check's contact_groups?
Eric, Im not sure what the action is here.
can you answer comment 8, please
Flags: needinfo?(eziegenhorn)
Comment 10•11 years ago
|
||
The action here is to remove these systems that should have irc-only alerts from the generic hostgroup and put them in / make irc-only versions of that same group of checks.
Flags: needinfo?(eziegenhorn)
| Reporter | ||
Comment 11•11 years ago
|
||
Latest machine usage list:
(In reply to Hal Wine [:hwine] (use needinfo) from comment #0)
> We've updated our usage of the github-sync machines. Please adjust nagios as
> follows:
>
> github-sync2.dmz.scl3.mozilla.com
> - is again a production machine
> - should page on nagios alerts (email/irc only)
> - should page hwine on critical alerts
>
The following remain production:
> github-sync1-dev.dmz.scl3.mozilla.com
> github-sync1.dmz.scl3.mozilla.com
> github-sync3.dmz.scl3.mozilla.com
> - continue to be production machines
> - should page hwine on critical alerts
New member of pod:
github-sync4.dmz.scl3.mozilla.com
- is a "spare" box, and likely to be in production shortly
My vote would be to keep things simple and treat all 5 boxes as production at this time. We will not have the kind of development again that led to this bug being required.
| Assignee | ||
Comment 12•11 years ago
|
||
Sorry for the delay here. The ultimate solution is to move github-sync2 & 4 to generic-preprod from generic hostgroup. This will suppress afterhours alerts, and retain IRC alerts for system level services.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•