Closed
Bug 905616
Opened 11 years ago
Closed 10 years ago
Add redis health check to redis01.build.scl1
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: ashish)
References
Details
Attachments
(3 files, 2 obsolete files)
There's a check on the redis process count, but nothing to check that redis is actually responsive. Please add a health check for responsiveness. If there's nothing already in the nrpe toolbox then lets talk about writing something that telnets in or something.
Assignee | ||
Comment 1•11 years ago
|
||
AMO/SUMO redis have checks that do a TCP check against the corresponding redis port. It's trivial to set that up. Let me know whether that works and I can setup that up a jiffy :)
Assignee: server-ops → ashish
Reporter | ||
Comment 2•11 years ago
|
||
Does that just try to open a socket ? Bug 735252 indicates that may not be enough for our case.
Comment 3•11 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #2) > Does that just try to open a socket ? Bug 735252 indicates that may not be > enough for our case. How else can we help here?
Reporter | ||
Comment 4•11 years ago
|
||
A pointer to the code + config for what AMO/SUMO does would be great.
Comment 5•11 years ago
|
||
Needinfo'ing Jeremy, Jason and Ricky.
Flags: needinfo?(rrosario)
Flags: needinfo?(oremj)
Flags: needinfo?(jthomas)
Comment 6•11 years ago
|
||
And Ashish to see what the SUMO/AMO nagios configs are.
Flags: needinfo?(ashish)
Comment 7•11 years ago
|
||
On SUMO, we have the services monitor page: https://support.mozilla.org/services/monitor For redis, just connects and calls the exists command since that is cheap: http://redis.io/commands/exists That should return 0 or 1 and not blow up.
Flags: needinfo?(rrosario)
Reporter | ||
Comment 8•11 years ago
|
||
Ok. We don't already have a predictable key to call EXISTS on, so I suggest * do a 'SET nagios:<timestamp> "nagios woz here" EX 1', then do an EXISTS on that * use PING, look for PONG response
Comment 9•11 years ago
|
||
Here's the zamboni redis check: https://github.com/mozilla/zamboni/blob/master/apps/amo/monitors.py#L153 It just runs the info command and returns OK if it succeeds.
Flags: needinfo?(oremj)
Updated•11 years ago
|
Flags: needinfo?(jthomas)
Comment 10•11 years ago
|
||
Hal, Nick : Seems like we don't have a ready made script for this. The AMO one is part of their monitor webapp that they use to check. If someone can whip up a script, we'd be happy to hook it up to nagios. CC'ing Rob to see if this is something he can pick up, I'm not sure he has the time.
Flags: needinfo?(nthomas)
Flags: needinfo?(hwine)
Flags: needinfo?(ashish)
Comment 11•11 years ago
|
||
I should be able to whip this up soon. Can someone point me at a dev redis instance that I can use for testing?
Comment 12•11 years ago
|
||
I setup a local instance of redis to play around. I wrote a check script that should be configurable to do the simple checks we need. Attaching it now.
Comment 13•11 years ago
|
||
Simple linear redis check script.
Updated•11 years ago
|
Attachment #8338808 -
Attachment mime type: text/x-python-script → text/plain
Comment 14•11 years ago
|
||
Updated with proper exit(0) and output text.
Attachment #8338808 -
Attachment is obsolete: true
Comment 15•11 years ago
|
||
Rob - cool. Thanks! Nick - does this look like it'll do the job? Shyam, is this to be an IT plugin, or releng only? if the latter, we'll drop the script in http://hg.mozilla.org/build/nagios-tools/ to get it wrapped for NRPE.
Flags: needinfo?(hwine) → needinfo?(shyam)
Comment 16•11 years ago
|
||
Hal, I'll let Ashish decide. I think we can use it in other places too. I don't see why it can't be shared...
Flags: needinfo?(shyam)
Reporter | ||
Comment 17•11 years ago
|
||
This ran fine against redis01.build.mozilla.org.
> 'statement': "set nagios:%s foo" % this_second,
> 'response' : 'OK'
Would be good to set an expiry on this, given the cleanup doesn't get run if EXISTS fails. eg
"SETEX nagios:%s 60 foo" % this_second,
for a 60 second expiry. We can't use the SET form because our redis doesn't have support for it.
Flags: needinfo?(nthomas)
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #16) > Hal, I'll let Ashish decide. I think we can use it in other places too. I > don't see why it can't be shared... I would have this script shared so that it can be used for other redis instances as well.
Comment 19•11 years ago
|
||
Here is a new version of the check script that allows a sleep interval to be set so that we can confirm EXPIRED keys
Comment 20•11 years ago
|
||
Added optparse to pass in host and port via -H and -P respectively
Assignee | ||
Comment 21•11 years ago
|
||
:nthomas Can you verify the script in Comment 20? If this looks good, I shall import it into NRPE/Nagios. Thanks!
Flags: needinfo?(nthomas)
Reporter | ||
Comment 22•11 years ago
|
||
It works fine. I would suggest these though: * making the Debug variable default to off and have a -v argument to swap that * s/set/SET/g in the statements definitions * for debugging, when the output doesn't match the expected response print out the actual response
Flags: needinfo?(nthomas)
Comment 23•11 years ago
|
||
Added requested features from nthomas
Attachment #8339283 -
Attachment is obsolete: true
Reporter | ||
Comment 24•11 years ago
|
||
Works fine against redis01.build.mozilla.org. All set to go ahead with installing and using this ?
Assignee | ||
Comment 25•11 years ago
|
||
Installed this: https://nagios.mozilla.org/releng-scl3/cgi-bin/extinfo.cgi?type=2&host=redis01.build.scl1.mozilla.com&service=redis As a last thought, could the plugin have a timeout, since it isn't run via NRPE? Nagios' timeout is much longer (close to 180s) and it would be nice to have "-t <timeout seconds>" in the plugin itself.
Status: NEW → ASSIGNED
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•