Closed
Bug 674504
Opened 13 years ago
Closed 13 years ago
please add nagios check for stage-rsync.m.o:mozilla-prereleases module size
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: rtucker)
Details
We've got an existing nagios check on releases-rsync.m.o:mozilla-releases that watches that module size, can we get one set up for stage-rsync.m.o:mozilla-prereleases with the same thresholds?
Comment 1•13 years ago
|
||
Could you point me at the existing check? I couldn't find anything obvious by looking at the web interface or the config files.
Assignee: server-ops-releng → arich
Comment 2•13 years ago
|
||
I'm not sure if it's part of the check_releasesrsynclag plugin or something else, but it occurs to me that I'll need to hand this over to infra since I don't have permissions to install plugins on these hosts, anyway.
Assignee: arich → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → mrz
Updated•13 years ago
|
Assignee: server-ops → rtucker
Reporter | ||
Comment 3•13 years ago
|
||
(In reply to comment #1)
> Could you point me at the existing check? I couldn't find anything obvious
> by looking at the web interface or the config files.
(In reply to comment #1)
> Could you point me at the existing check? I couldn't find anything obvious
> by looking at the web interface or the config files.
Whoops, it's on surf, apparently:
https://nagios.mozilla.org/nagios/cgi-bin/extinfo.cgi?type=2&host=surf&service=mozilla-releases+rsync+size
Assignee | ||
Comment 4•13 years ago
|
||
I just want to make sure that I'm understanding this properly.
Duplicate this check:
https://nagios.mozilla.org/nagios/cgi-bin/extinfo.cgi?type=2&host=surf&service=mozilla-releases+rsync+size
for host:
stage-rsync.m.o:mozilla-prereleases
Once this is confirmed, I'll set it up.
Comment 5•13 years ago
|
||
Right now we're syncing modules on stage so we can use du and populate /pub/mozilla.org/zz/rsyncd-motd, and then use that in the nagios check. That takes up as much disk space as the rsync modules, so for -releases and -current that's 135G at the moment, and if we add prereleases more like 255G.
I think we can do better by having a cron job that does rsync -nav, and uses the info in the last couple of lines, eg for mozilla-releases:
sent 242564 bytes received 2027041 bytes 6744.74 bytes/sec
total size is 114676442966 speedup is 50527.05 (DRY RUN)
That total size (106G) is what we want to know for the motd/nagios, without using up the space.
Comment 6•13 years ago
|
||
Also, we need to be careful how stuff is set up:
stage.m.o::mozilla-releases is really mozilla-preleases (for stage-rsync)
stage.m.o::mozilla-releases-mirrors is really mozilla-releases (for motd)
stage-rsync.m.o:mozilla-releases is mozilla-releases
mozilla-currrent is mozilla-current everywhere
Assignee | ||
Comment 7•13 years ago
|
||
Nick,
Did you ever figure out all the criteria for this?
Comment 8•13 years ago
|
||
Rob, my thoughts are
* we need a (new?) nagios plugin which does a rsync -nav on a given host::module, and looks for a line starting 'total size', and reports the fourth word in that line converted from bytes into gigabytes. Eg
total size is 114676442966 speedup is 50527.05 (DRY RUN)
becomes
module size is 107G
* three checks using that plugin, running on surf.m.o
* 'mozilla-releases rsync size', using localhost::mozilla-releases-mirrors
* 'mozilla-prereleases rsync size', using localhost::mozilla-releases
* 'mozilla-current rsync size', using localhost::mozilla-current
If you can give me the limits for the existing check ('mozilla-releases rsync size') we can work up some for the latter two.
Assignee | ||
Comment 9•13 years ago
|
||
Nick,
So the thresholds for the existing check are 110GB. The check is simply doing:
SIZE_REL=`grep releases /pub/mozilla.org/zz/rsyncd-motd | sed -re 's/.*: ([0-9 ]+)GB/\1/'`
I don't see the directories you're referring to on surf (mozilla-releases, mozilla-prereleases). Do you have any additional information?
Assignee | ||
Comment 10•13 years ago
|
||
Nick,
So I did some more digging, on pv-mirror01:
[root@pv-mirror01 tmp]# rsync -nav /root/mozilla-current
sending incremental file list
-rw-r--r-- 495713 2010/04/20 08:32:35 mozilla-current
sent 40 bytes received 12 bytes 104.00 bytes/sec
total size is 495713 speedup is 9532.94 (DRY RUN)
[root@pv-mirror01 tmp]# rsync -nav /root/mozilla-releases
sending incremental file list
-rw-r--r-- 6207751 2010/04/20 08:38:46 mozilla-releases
sent 41 bytes received 12 bytes 106.00 bytes/sec
total size is 6207751 speedup is 117127.38 (DRY RUN)
Still nothing for mozilla-prereleases though
Comment 11•13 years ago
|
||
The double colon notation in comment #8 is rsync's way of specifying host::module.
eg: this is prereleases
nthomas@surf:~$ time rsync -nav localhost::mozilla-releases . | grep ^total
total size is 129286759754 speedup is 55571.09 (DRY RUN)
real 1m13.785s
NB: in comment #8, the nagios check name isn't the same as the module name for historical reasons.
Assignee | ||
Comment 12•13 years ago
|
||
Nick,
I've got nrpe checks that function for both localhost::mozilla-current and localhost::mozilla-releases but nothing for localhost::mozilla-prereleases as I can't seem to get the rsync -nav to work to that one.
Do you know of a different way to access that rsync module?
I changed the nrpe script execution timeout to 300 for this to work since the default value of 60 seconds isn't long enough. I'm not sure if this is going to get clobbered by puppet or not. Here are the script executions from mradm01 and the responses
check_nrpe -H 10.2.74.116 -t 300 -c check_rsync_releases_size
OK: RSYNC SIZE is 132.77GB
/usr/lib/nagios/plugins/check_nrpe -H 10.2.74.116 -t 120 -c check_rsync_current_size
OK: RSYNC SIZE is 34.22GB
How often should these be checked?
What should the warning and critical values for each be set at?
Comment 13•13 years ago
|
||
(In reply to Rob Tucker [:rtucker] from comment #12)
$ rsync localhost::
<snip motd>
mozilla-all Mozilla FTP
mozilla-releases Mozilla Software Releases
mozilla-releases-mirrors Mozilla Software Releases (for Mirrors)
mozilla-current Mozilla Current Release Only - high bandwidth low disk space
releases-com Mozilla Corporation Partner Releases
$ grep pre /etc/rsyncd.conf
<nothing relevant>
There doesn't seem to be a mozilla-prereleases rsync module on stage.m.o.
Assignee | ||
Comment 14•13 years ago
|
||
So that leads me to believe that we're good with just mozilla-current and mozilla-releases, I very well am incorrect though.
If that is the case I just need to know how often to check, the thresholds and possibly fix puppet clobbering my updated config file.
Comment 15•13 years ago
|
||
Please take another look at the second paragraph of comment #8 for the details of the names for the checks and modules.
A limit of 110GB is fine for the mozilla-prereleases and mozilla-releases modules, and lets use 40G for mozilla-current (for now anyway). They can go directly to CRITICAL on crossing those values. Checking once a day will still be fine. Notifications should go to #build and RelEng people (see the existing check).
Assignee | ||
Comment 16•13 years ago
|
||
I just finished adding the 3 checks, they are green and setup as requested. Closing this one out!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 17•13 years ago
|
||
Almost there, just would like a few more tweaks:
* 'mozilla-releases rsync pre-releases' is 'OK: RSYNC SIZE is 133.98GB', but that should be over the 110G limit and CRITICAL ?
* verify how frequently the checks run, seems to be more frequent than once a day. Does it depend on the state of the check ?
* at one point I saw all three checks with 'Service Check Timed Out', may need a longer timeout and/or to make sure the checks don't run at the same time
* the old check 'mozilla-releases rsync size' can be removed
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 18•13 years ago
|
||
I originally increased the warning and critical thresholds so that these don't page, evidently you want me to setup the checks while the services are in a failed state, so I just set them exactly where you wanted them, so now they will page.
I also did as you said and copied the settings from the existing rsync check, I just now backtracked on that and hard set them at 1440.
I cannot increase the timeout anymore. It's at 5 minutes. Anything beyond that the problem isn't the check, the problem is the box.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•