Closed
Bug 709108
Opened 13 years ago
Closed 13 years ago
nagios checks for the signing servers
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: arich)
Details
Attachments
(1 file)
777 bytes,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
Now that things have settled down in signing server land I think we're ready to add nagios checks for the server processes. Each server has two instances, each of which will 3 in total because of forking. So, each machine should have 6 processes running, where one of the arguments is "tools/releases/signing/signing-server.py". I tried to verify this locally, but Nagios seems to think there's an extra one....: ./check_nrpe -H localhost -c check_procs_regex -a tools/release/signing/signing-server.py 6 6 PROCS CRITICAL: 7 processes with regex args 'tools/release/signing/signing-server.py' [cltsign@signing1 plugins]$ ps auxwww | grep tools/release/signing/signing-server.py cltsign 1469 0.0 0.0 61176 752 pts/0 R+ 08:57 0:00 grep tools/release/signing/signing-server.py cltsign 31198 0.0 0.2 199488 10756 ? Sl 07:19 0:00 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d cltsign 31199 0.0 0.2 178340 9580 ? S 07:19 0:00 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d cltsign 31200 0.0 0.2 178340 9580 ? S 07:19 0:00 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d cltsign 31225 0.0 0.3 201112 13352 ? Sl 07:20 0:03 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d --restart cltsign 31226 2.9 0.2 179324 10740 ? S 07:20 2:51 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d --restart cltsign 31227 3.0 0.2 179328 10740 ? S 07:20 2:53 bin/python tools/release/signing/signing-server.py signing.ini -v -l signing.log -d --restart (I wouldn't expect the grep one to show up through check_procs.)
Comment 1•13 years ago
|
||
Hm, I'm not sure that counting # of processes is going to be a good thing to do since the # of worker processes can vary. Can we check the process referred to by the .pid file instead?
Assignee | ||
Comment 2•13 years ago
|
||
You can tell nagios to check for a min and a max range of processes, but check_procs can't determine the number to check for form an external source. I had already set the check for 6 as bhearsum requested, but I can change that.
Assignee | ||
Updated•13 years ago
|
Assignee: server-ops-releng → arich
Reporter | ||
Comment 3•13 years ago
|
||
Looks like we can use the "-p" flag to only find the root processes: -p, --ppid=PPID Only scan for children of the parent process ID indicated. [cltsign@signing1 plugins]$ ./check_procs -p 1 --ereg-argument-array tools/release/signing/signing-server.py PROCS OK: 2 processes with PPID = 1, regex args 'tools/release/signing/signing-server.py' Probably need to add a new command in the config, since we use check_procs_regex on Buildbot masters and elsewhere IIRC. How does this sound to you two?
Assignee | ||
Comment 4•13 years ago
|
||
Okay, based on our irc conversation: define command{ command_name check_nrpe_child_procs_regex command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_child_procs_regex -a $ARG1$ $ARG2$ $ARG3$ $ARG4$ } child_procs_regex&contact_groups build:signing-server::$signing-servers:tools/release/signing/signing-server.py!1!2!2 Which matches up with the client-side definition you put in irc.
Reporter | ||
Comment 5•13 years ago
|
||
Attachment #581273 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #581273 -
Flags: review?(catlee) → review+
Reporter | ||
Updated•13 years ago
|
Attachment #581273 -
Attachment description: update nrpe.cfg template with check_child_procs_regex command → [checked in] update nrpe.cfg template with check_child_procs_regex command
Reporter | ||
Comment 6•13 years ago
|
||
Both the signing servers picked up the change, and the checks are now green - I think we're all done here?
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•