Closed
Bug 723815
Opened 12 years ago
Closed 12 years ago
Redo nagios checks for rsync module checks on surf
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: arich)
Details
That's mozilla-releases rsync current mozilla-releases rsync pre-releases mozilla-releases rsync releases in the web/irc interfaces, and check_rsync_current_size check_rsync_releases_size check_rsync_prereleases_size when defined in surf:/etc/nagios/nrpe.d/check-release-size.cfg These checks will be adding load to the netapps by doing rsync -n to get module sizes, although probably less than the machines actually fetching the files. Mainly I don't like the nagios behavior of checking more frequently after a check goes WARNING/CRITICAL. It tends to pound a server which is already behind.
Assignee | ||
Comment 1•12 years ago
|
||
I agree that the less load on surf/the netapp the better, but is there some other way we intend to monitor this? I presume we had reason to in the past.
Assignee | ||
Updated•12 years ago
|
Assignee: server-ops-releng → arich
Reporter | ||
Comment 2•12 years ago
|
||
Over in bug 725711 I've enhanced the rsync motd so that it gives sizes for mozilla-prereleases, mozilla-releases, and mozilla-current, updated daily at about 4am Pacific. We can use that to add checks on mozilla-prereleases and mozilla-current, and replace the current one for mozilla-releases. At surf:/usr/lib/nagios/plugins/contrib/check_rsync_releases_size.py there is a simple python script to read the motd file and extract the size, which is based on rtucker's version (which did an rsync -n but that took too long for nagios to cope with; backed up at ~nthomas/check_rsync_releases_size.py.rtucker). I'm hoping puppet won't come along and wipe the new version, but there's a copy in ~nthomas if it does. I suggest we call the commands defined in surf:/etc/nagios/nrpe.d/check-release-size.cfg, where I've adjusted to sensible thresholds. If you prefer to consolidate that to one definition and pass the three arguments using the nagios server then feel free. And we should deprecate the call to check_moz_rel_rsync (defined in surf:/etc/nagios/nrpe.d/rsync-size.cfg).
Assignee | ||
Comment 3•12 years ago
|
||
To make things clear and sane (unlike the old rsync checks which had all sorts of different names pointing at different things), the configuration below is now being used. nthomas, can you please verify that the checks on surf match what you were after (and that I haven't disabled anything that should still be there)? Also note that prereleases is currently over size and that releases is *under* size, but only prereleases is alerting. Was that the intended behavior of your script? ==== The individual service check definitions for each category in /etc/nagios/mpt/services.cfg: define service{ use generic-service host_name surf service_description rsync size mozilla-current contact_groups build notification_options u,c,r normal_check_interval 360 max_check_attempts 4 retry_check_interval 360 notification_interval 1440 check_command check_rsync_releases_size!mozilla-current!30!40 } define service{ use generic-service host_name surf service_description rsync size mozilla-prereleases contact_groups build notification_options u,c,r normal_check_interval 360 max_check_attempts 4 retry_check_interval 360 notification_interval 1440 check_command check_rsync_releases_size!mozilla-prereleases!230!250 } define service{ use generic-service host_name surf service_description rsync size mozilla-releases contact_groups build notification_options u,c,r normal_check_interval 360 max_check_attempts 4 retry_check_interval 360 notification_interval 1440 check_command check_rsync_releases_size!mozilla-releases!125!140 } ======================== The global check definition in /etc/nagios/checkcommands.cfg on mradm01 and dm-nagios01: define command{ command_name check_rsync_releases_size command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 15 -c check_rsync_releases_size -a $ARG1$ $ARG2$ $ARG3$ } ============================ The check definition in /etc/nagios/nrpe.d/check_rsync_releases_size.cfg on surf: command[check_rsync_releases_size]=/usr/lib/nagios/plugins/contrib/check_rsync_releases_size.py $ARG1$ $ARG2$ $ARG3$ ============================ And then the script itself is located on surf at: /usr/lib/nagios/plugins/contrib/check_rsync_releases_size.py
Reporter | ||
Comment 4•12 years ago
|
||
This all looks great to me, thanks for cleaning it all up and documenting it here. I'm not surprised prereleases is alerting at the moment given the the recent gaggle of releases, and the limits on it are an estimate because we haven't monitored it before now. We've had issues with cn-adm01.cn getting low on space from this module so I don't want to increase them at until I've looked at doing some cleanup first. Should be lots there that can get removed as the update traffic dies away on the older releases.
Summary: Disable nagios checks for rsync module checks on surf → Redo nagios checks for rsync module checks on surf
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•