Closed Bug 997775 Opened 12 years ago Closed 11 years ago

set nagios to alert on XX% swap used for git1.dmz.scl3.mozilla.com, both buildduty and sheriffs

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hwine, Assigned: vinh)

References

Details

bug 997646 will change amount of swap, so actual trigger level may need to be adjusted. Almost any swap usage seems to indicate a pending event according to historical graphite reports. Certainly over 10% of the existing 2GB swap would be a notification point. The alert should go to #buildduty and however sheriffs have their alerts routed.
Assignee: server-ops-webops → bkero
agreed in triage that 10% swap usage is correct limit. This should page oncall as action is needed, and notify the other groups as in comment 0
Component: WebOps: Source Control → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → shyam
Off to the server ops folks. I don't have much insight into how nagios is operated anymore.
Assignee: bkero → server-ops
Assignee: server-ops → nobody
Component: Server Operations → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → dmoore
QA Contact: dmoore → lypulong
Summary: set nagios to alert on XX% swap used, both buildduty and sheriffs → set nagios to alert on XX% swap used for git1.dmz.scl3.mozilla.com, both buildduty and sheriffs
Assignee: nobody → vhua
(In reply to Hal Wine [:hwine] (use needinfo) from comment #0) > bug 997646 will change amount of swap, so actual trigger level may need to > be adjusted. Almost any swap usage seems to indicate a pending event > according to historical graphite reports. Certainly over 10% of the existing > 2GB swap would be a notification point. > > The alert should go to #buildduty and however sheriffs have their alerts > routed. :hwine - I'm still seeing swap is only 2gb instead of 16gb [root@git1.dmz.scl3 vhua]# free -m total used free shared buffers cached Mem: 60372 51710 8661 0 1262 40764 -/+ buffers/cache: 9683 50688 Swap: 2047 29 2018 Current threshold is set at check_swap!50%!25% (50% warning, 25% critical). I can change it 20% warning, 10% critical.
Flags: needinfo?(hwine)
:bkero - your change in bug 997646 didn't stick -- do we need to redo that bug, or adjust nagios limits? Also, do we need same change for git2?
Flags: needinfo?(hwine)
swap check updated for git[1,2] to 20% warning, 10% critical nagios-scl3> vinh: git1.dmz.scl3.mozilla.com:Git Swap is OK - SWAP OK - 100% free (18017 MB out of 18047 MB) Last Checked: 2015-04-01 10:37:19 PDT 10:39 <vinh> nagios-scl3: status git2.dmz.scl3.mozilla.com:git swap 10:39 <nagios-scl3> vinh: git2.dmz.scl3.mozilla.com:Git Swap is OK - SWAP OK - 100% free (18431 MB out of 18431 MB) Last Checked: 2015-04-01 10:39:13 PDT
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Just pasting the changes I made a while back per sal's request. "check_swap_git" => { service_description => "Git Swap", normal_check_interval => 5, check_command => 'check_swap!20%!10%', contact_groups => 'build, sheriffs, sysalerts', hostgroups => $::fqdn ? { 'nagios1.private.scl3.mozilla.com' => [ 'git-web', ], default => [ ] } },
You need to log in before you can comment on or make changes to this bug.