Closed
Bug 1075635
Opened 10 years ago
Closed 10 years ago
nagios checks for proxxy
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gmiroshnykov, Assigned: arich)
References
Details
Attachments
(2 files, 1 obsolete file)
554 bytes,
patch
|
Callek
:
review+
|
Details | Diff | Splinter Review |
6.79 KB,
patch
|
Details | Diff | Splinter Review |
No description provided.
Comment 1•10 years ago
|
||
We need checks that the host is up, and answering on ports 80 and 443.
Assignee: gmiroshnykov → nobody
Comment 2•10 years ago
|
||
Our current proxxy hosts are:
proxxy1.srv.releng.use1.mozilla.com 10.134.100.191
proxxy1.srv.releng.usw2.mozilla.com 10.132.100.234
proxxy1.srv.releng.scl3.mozilla.com 10.26.48.46
If we need to pick a static URL to check rather than just a port check, we could use this one:
http://tooltool.pvt.build.mozilla.org.$HOSTNAME/build/
Comment 3•10 years ago
|
||
Ashish confirmed the check_tcp works...
[ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.scl3.mozilla.com -p 80
TCP OK - 0.002 second response time on port 80|time=0.001639s;;;0.000000;10.000000
[ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.use1.mozilla.com -p 80
TCP OK - 0.079 second response time on port 80|time=0.079111s;;;0.000000;10.000000
[ashish@nagios1.private.releng.scl3 ~]$ /usr/lib64/nagios/plugins/check_tcp -H proxxy1.srv.releng.usw2.mozilla.com -p 80
TCP OK - 0.026 second response time on port 80|time=0.025532s;;;0.000000;10.000000
Assignee | ||
Comment 4•10 years ago
|
||
Comment on attachment 8598317 [details] [diff] [review]
nagios.patch
Review of attachment 8598317 [details] [diff] [review]:
-----------------------------------------------------------------
You don't need a servicegroup unless you're going to set up a cluster check (not really appropriate here). You care if any one individual host goes down, not a "pool" of them. What you DO want to add are some basic system checks (disk, swap, load, ntp), though, similar to the hostgroup log-aggregator-servers in services.pp.
Since these are http/https checks, we actually want to use a better check than just a TCP connect (it gives a better indication of whether or not things are working, and we don't clutter up the error logs with disconnects with no request). There are http/https specific checks where we can check the return code against a specific page using check_http and check_https or check_http_expect and check_https_expect. The mozilla/services.pp file has lots of examples, and releng/services.pp has a couple of check_http_expect (to look for specific return codes).
Attachment #8598317 -
Flags: review?(arich) → review-
Comment 5•10 years ago
|
||
So I learned <url>/heartbeat exists as a non-404 page I can point to...
:catlee
So I want to use check_http @ localhost on proxxy servers.
This of course requires the proxxy servers to have the nagios plugins installed, afaict they don't, can you find someone who knows proxxy setup/code to make this happen?
/usr/lib64/nagios/plugins/check_http
should exist.
http://mxr.mozilla.org/build/source/puppet/modules/nrpe/ is a good launching point.
:arr I'm open to other suggestions/thoughts though.
Flags: needinfo?(catlee)
Flags: needinfo?(arich)
Comment 6•10 years ago
|
||
Looking into it more, nagios-plugins is installed via puppet, however we do install_options => [ '--no-install-recommends' ];
And afaict nagios-plugins has "Depends: nagios-plugins-basic, nagios-plugins-standard
Suggests: nagios3 | icinga, nagios-plugins-contrib"
But -basic and -standard are not installed:
[jwood@proxxy1 ~]$ dpkg --get-selections | grep nagios
nagios-nrpe-plugin install
nagios-nrpe-server install
Which likely explains the lack of check_http
Comment 7•10 years ago
|
||
err nagios-nrpe-plugin is different than nagios-plugins, so not sure how we should proceed here...
Assignee | ||
Comment 8•10 years ago
|
||
Looks to me like there was a mistake in the original addition. I think nagios-nrpe-plugin (which I believe is the plugin you want on the SERVER to perform remote checks) was meant to be nagios-plugins. And I bet we never noticed because we've never used nrpe on ubuntu because it's only been testslaves.
So, step 1, fix puppet to install nagios-plugins and not nagios-nrpe-plugin.
Flags: needinfo?(arich)
Updated•10 years ago
|
Flags: needinfo?(catlee)
Assignee | ||
Updated•10 years ago
|
Assignee: bugspam.Callek → relops
Status: ASSIGNED → NEW
Component: Tools → RelOps: Puppet
Product: Release Engineering → Infrastructure & Operations
QA Contact: hwine → dustin
Version: unspecified → other
Assignee | ||
Comment 9•10 years ago
|
||
Patch to install the correct package for nagios plugins on ubuntu.
Attachment #8606461 -
Flags: review?(bugspam.Callek)
Comment 10•10 years ago
|
||
Comment on attachment 8606461 [details] [diff] [review]
nagios-plugins.diff
Review of attachment 8606461 [details] [diff] [review]:
-----------------------------------------------------------------
sure
Attachment #8606461 -
Flags: review?(bugspam.Callek) → review+
Assignee | ||
Comment 11•10 years ago
|
||
Assignee | ||
Comment 12•10 years ago
|
||
callek: the plugins are installed now, but I'm not sure what the "heartbeat" URL you refer to is. Can you give me an explicit static URL which should return a valid page?
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(bugspam.Callek)
Assignee | ||
Comment 13•10 years ago
|
||
Ah, it's http, not https, I've got it.
Flags: needinfo?(bugspam.Callek)
Assignee | ||
Comment 14•10 years ago
|
||
Attachment #8598317 -
Attachment is obsolete: true
Assignee | ||
Comment 15•10 years ago
|
||
Host and service checks added.
No swap check for the AWS nodes, since they seem to not have swap configured.
Assignee: relops → arich
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•