Closed
Bug 997775
Opened 12 years ago
Closed 11 years ago
set nagios to alert on XX% swap used for git1.dmz.scl3.mozilla.com, both buildduty and sheriffs
Categories
(Infrastructure & Operations :: MOC: Service Requests, task)
Infrastructure & Operations
MOC: Service Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: hwine, Assigned: vinh)
References
Details
bug 997646 will change amount of swap, so actual trigger level may need to be adjusted. Almost any swap usage seems to indicate a pending event according to historical graphite reports. Certainly over 10% of the existing 2GB swap would be a notification point.
The alert should go to #buildduty and however sheriffs have their alerts routed.
Updated•12 years ago
|
Assignee: server-ops-webops → bkero
agreed in triage that 10% swap usage is correct limit. This should page oncall as action is needed, and notify the other groups as in comment 0
Component: WebOps: Source Control → Server Operations
Product: Infrastructure & Operations → mozilla.org
QA Contact: nmaul → shyam
Comment 2•11 years ago
|
||
Off to the server ops folks. I don't have much insight into how nagios is operated anymore.
Assignee: bkero → server-ops
Updated•11 years ago
|
Assignee: server-ops → nobody
Component: Server Operations → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → dmoore
Updated•11 years ago
|
QA Contact: dmoore → lypulong
Updated•11 years ago
|
Summary: set nagios to alert on XX% swap used, both buildduty and sheriffs → set nagios to alert on XX% swap used for git1.dmz.scl3.mozilla.com, both buildduty and sheriffs
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → vhua
| Assignee | ||
Comment 3•11 years ago
|
||
(In reply to Hal Wine [:hwine] (use needinfo) from comment #0)
> bug 997646 will change amount of swap, so actual trigger level may need to
> be adjusted. Almost any swap usage seems to indicate a pending event
> according to historical graphite reports. Certainly over 10% of the existing
> 2GB swap would be a notification point.
>
> The alert should go to #buildduty and however sheriffs have their alerts
> routed.
:hwine - I'm still seeing swap is only 2gb instead of 16gb
[root@git1.dmz.scl3 vhua]# free -m
total used free shared buffers cached
Mem: 60372 51710 8661 0 1262 40764
-/+ buffers/cache: 9683 50688
Swap: 2047 29 2018
Current threshold is set at check_swap!50%!25% (50% warning, 25% critical). I can change it 20% warning, 10% critical.
Flags: needinfo?(hwine)
:bkero - your change in bug 997646 didn't stick -- do we need to redo that bug, or adjust nagios limits? Also, do we need same change for git2?
Flags: needinfo?(hwine)
| Assignee | ||
Comment 5•11 years ago
|
||
swap check updated for git[1,2] to 20% warning, 10% critical
nagios-scl3> vinh: git1.dmz.scl3.mozilla.com:Git Swap is OK - SWAP OK - 100% free (18017 MB out of 18047 MB) Last Checked: 2015-04-01 10:37:19 PDT
10:39 <vinh> nagios-scl3: status git2.dmz.scl3.mozilla.com:git swap
10:39 <nagios-scl3> vinh: git2.dmz.scl3.mozilla.com:Git Swap is OK - SWAP OK - 100% free (18431 MB out of 18431 MB) Last Checked: 2015-04-01 10:39:13 PDT
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 6•10 years ago
|
||
Just pasting the changes I made a while back per sal's request.
"check_swap_git" => {
service_description => "Git Swap",
normal_check_interval => 5,
check_command => 'check_swap!20%!10%',
contact_groups => 'build, sheriffs, sysalerts',
hostgroups => $::fqdn ? {
'nagios1.private.scl3.mozilla.com' => [
'git-web',
],
default => [
]
}
},
You need to log in
before you can comment on or make changes to this bug.
Description
•