Closed Bug 785270 Opened 12 years ago Closed 12 years ago

nagios bots not updating downtime

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86_64
FreeBSD
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: afernandez, Assigned: rtucker)

Details

Hello,

Seems the nagios bots are not updating nagios GUI with the correct "downtime" message.
Instead of the intended message, a generic one is used;

Entry Time: 08-23-2012 17:06:33	
Author: (Nagios Process)
Comment: This service has been scheduled for fixed downtime from 08-23-2012 17:06:33 to 08-23-2012 17:07:33. Notifications for the service will not be sent out during that time period.

The actual message;
nagios-phx1: downtime support4.webapp.phx1.mozilla.com:HP Log 1m testing downtime

Acknowledging, however, does work.
Looks like the comment field is missing the \n at the end. I added that and will test/confirm/close this out.
Just tested with;
nagios-phx1: downtime support4.webapp.phx1.mozilla.com:HP Log 1m testing downtime

Seems still not working;
Entry Time: 08-24-2012 09:32:59	
Author: (Nagios Process)	
Comment: This service has been scheduled for fixed downtime from 08-24-2012 09:32:47 to 08-24-2012 09:33:47. Notifications for the service will not be sent out during that time period.

Thanks
hmm, it's looking like 1 of 2 things.
1. nagios user permissions are messed up and it doesn't know who you are
2. the command file spec is now different than it used to be.
Aj,
I think that this is some kind of nagios configuration issue.
I've created a new test to confirm the syntax that I'm writing to the file and it follows the spec exactly.

https://github.com/rtucker-mozilla/mozilla-nagios-bot/commit/8d8fe800a2dcf316a34f13d3f420a0d1da45f799#L2R94

I'm replacing the integer timestamps for testability, but you can see what i'm getting at with the test.

According to the docs the command needs to follow this format:
"[%lu] SCHEDULE_HOST_DOWNTIME;host1;1110741500;1110748700;0;0;7200;Some One;Some Downtime Comment\n"

My test confirms the message is as follows:
self.assertEqual(cmd, '[000000] SCHEDULE_HOST_DOWNTIME;test-host.fake.mozilla.com;1234;5678;2;0;60;%s;blah blah\n' % (self.my_nick ))

Which using variable expansion turns into:
"[000000] SCHEDULE_HOST_DOWNTIME;test-host.fake.mozilla.com;1234;5678;2;0;60;rtucker;blah blah\n"
I did some further investigating on this. Nagios is not honoring either the comment or the user who is setting the downtime. I've confirmed this through both the web interface as well as from a command line script.

Example written to nagios cmd file:
[1346856100] SCHEDULE_HOST_DOWNTIME;host.here.mozilla.com;1346856100;1346856200;1;0;60;rtucker@mozilla.com;Some Downtime Comment
SCHEDULE_HOST_DOWNTIME;host.here.mozilla.com;1346856100;1346856200;1;0;60;rtucker;Some Downtime Comment
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.