Closed Bug 1075635 Opened 10 years ago Closed 10 years ago

nagios checks for proxxy

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gmiroshnykov, Assigned: arich)

References

Details

Attachments

(2 files, 1 obsolete file)

No description provided.
We need checks that the host is up, and answering on ports 80 and 443.
Assignee: gmiroshnykov → nobody
Our current proxxy hosts are: proxxy1.srv.releng.use1.mozilla.com 10.134.100.191 proxxy1.srv.releng.usw2.mozilla.com 10.132.100.234 proxxy1.srv.releng.scl3.mozilla.com 10.26.48.46 If we need to pick a static URL to check rather than just a port check, we could use this one: http://tooltool.pvt.build.mozilla.org.$HOSTNAME/build/
Attached patch nagios.patch (obsolete) — Splinter Review
Ashish confirmed the check_tcp works... [ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.scl3.mozilla.com -p 80 TCP OK - 0.002 second response time on port 80|time=0.001639s;;;0.000000;10.000000 [ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.use1.mozilla.com -p 80 TCP OK - 0.079 second response time on port 80|time=0.079111s;;;0.000000;10.000000 [ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.usw2.mozilla.com -p 80 TCP OK - 0.026 second response time on port 80|time=0.025532s;;;0.000000;10.000000
Assignee: nobody → bugspam.Callek
Status: NEW → ASSIGNED
Attachment #8598317 - Flags: review?(arich)
Comment on attachment 8598317 [details] [diff] [review] nagios.patch Review of attachment 8598317 [details] [diff] [review]: ----------------------------------------------------------------- You don't need a servicegroup unless you're going to set up a cluster check (not really appropriate here). You care if any one individual host goes down, not a "pool" of them. What you DO want to add are some basic system checks (disk, swap, load, ntp), though, similar to the hostgroup log-aggregator-servers in services.pp. Since these are http/https checks, we actually want to use a better check than just a TCP connect (it gives a better indication of whether or not things are working, and we don't clutter up the error logs with disconnects with no request). There are http/https specific checks where we can check the return code against a specific page using check_http and check_https or check_http_expect and check_https_expect. The mozilla/services.pp file has lots of examples, and releng/services.pp has a couple of check_http_expect (to look for specific return codes).
Attachment #8598317 - Flags: review?(arich) → review-
So I learned <url>/heartbeat exists as a non-404 page I can point to... :catlee So I want to use check_http @ localhost on proxxy servers. This of course requires the proxxy servers to have the nagios plugins installed, afaict they don't, can you find someone who knows proxxy setup/code to make this happen? /usr/lib64/nagios/plugins/check_http should exist. http://mxr.mozilla.org/build/source/puppet/modules/nrpe/ is a good launching point. :arr I'm open to other suggestions/thoughts though.
Flags: needinfo?(catlee)
Flags: needinfo?(arich)
Looking into it more, nagios-plugins is installed via puppet, however we do install_options => [ '--no-install-recommends' ]; And afaict nagios-plugins has "Depends: nagios-plugins-basic, nagios-plugins-standard Suggests: nagios3 | icinga, nagios-plugins-contrib" But -basic and -standard are not installed: [jwood@proxxy1 ~]$ dpkg --get-selections | grep nagios nagios-nrpe-plugin install nagios-nrpe-server install Which likely explains the lack of check_http
err nagios-nrpe-plugin is different than nagios-plugins, so not sure how we should proceed here...
Looks to me like there was a mistake in the original addition. I think nagios-nrpe-plugin (which I believe is the plugin you want on the SERVER to perform remote checks) was meant to be nagios-plugins. And I bet we never noticed because we've never used nrpe on ubuntu because it's only been testslaves. So, step 1, fix puppet to install nagios-plugins and not nagios-nrpe-plugin.
Flags: needinfo?(arich)
Flags: needinfo?(catlee)
Assignee: bugspam.Callek → relops
Status: ASSIGNED → NEW
Component: Tools → RelOps: Puppet
Product: Release Engineering → Infrastructure & Operations
QA Contact: hwine → dustin
Version: unspecified → other
Patch to install the correct package for nagios plugins on ubuntu.
Attachment #8606461 - Flags: review?(bugspam.Callek)
Comment on attachment 8606461 [details] [diff] [review] nagios-plugins.diff Review of attachment 8606461 [details] [diff] [review]: ----------------------------------------------------------------- sure
Attachment #8606461 - Flags: review?(bugspam.Callek) → review+
callek: the plugins are installed now, but I'm not sure what the "heartbeat" URL you refer to is. Can you give me an explicit static URL which should return a valid page?
Flags: needinfo?(bugspam.Callek)
Ah, it's http, not https, I've got it.
Flags: needinfo?(bugspam.Callek)
Attachment #8598317 - Attachment is obsolete: true
Host and service checks added. No swap check for the AWS nodes, since they seem to not have swap configured.
Assignee: relops → arich
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: