Closed
Bug 669229
Opened 13 years ago
Closed 13 years ago
Race condition in puppet nagios config
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: catlee)
References
Details
(Whiteboard: [puppet])
The nrpe.cfg changes in bug 656413 led to a lot of alerts like this:
moz2-linux-slave07.build.sjc1:disk - /var is 4CRITICAL: NRPE: Command check_disk not defined
moz2-linux-slave07.build.sjc1:disk - / is 4CRITICAL: NRPE: Command check_disk not defined
moz2-linux-slave07.build.sjc1:disk - /builds is 4CRITICAL: NRPE: Command check_disk not defined
moz2-linux-slave07.build.sjc1:buildbot is 4CRITICAL: NRPE: Command check_buildbot not defined
Only some linux32/linux64 hosts so far, and fixed by a 'service nrpe restart' or a reboot.
The template for /etc/nagios/nrpe.cfg changed in
http://hg.mozilla.org/build/puppet-manifests/rev/369888bba343
Turns out it's a race condition (moz2-linux-slave07 again):
Jul 4 13:32:27 moz2-linux-slave07 puppetd[2189]: Starting catalog run
...
Jul 4 13:32:36 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave/nagios/nagios::service/File[/etc/nagios/nrpe.cfg]/content) content changed '{md5}74e04c65fcd07eca040415ea87ab1449' to '{md5
}ced55880e90e540b14c576892d3554e6'
Jul 4 13:32:36 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave/nagios/nagios::service/Service[nrpe]) Triggering 'refresh' from 1 dependencies
We got the new nrpe.cfg and restart the service ...
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Caught SIGTERM - shutting down...
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Cannot remove pidfile '/var/run/nrpe.pid' - check your privileges.
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2027]: Daemon shutdown
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2603]: Could not open config directory '/etc/nagios/nrpe.d' for reading.
... but didn't create /etc/nagios/nrpe.d yet ...
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Starting up daemon
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Warning: Daemon is configured to accept command arguments from clients!
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Listening for connections on port 5666
Jul 4 13:32:37 moz2-linux-slave07 nrpe[2604]: Allowing connections from: <redacted>
... here it goes ...
Jul 4 13:32:37 moz2-linux-slave07 puppetd[2189]: (//Node[moz2-linux-slave07]/buildslave
/nagios/nagios::service/File[/etc/nagios/nrpe.d]/ensure) created
...
Jul 4 13:32:51 moz2-linux-slave07 puppetd[2189]: Finished catalog run in 24.49 seconds
Puppet bug ? Puppet config bug ?
Assignee | ||
Comment 1•13 years ago
|
||
Landed http://hg.mozilla.org/build/puppet-manifests/rev/c6647469e99c to see if it helps.
Assignee | ||
Comment 2•13 years ago
|
||
I think this is fixed now.
Assignee: nobody → catlee
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•