nagios bots not updating downtime

RESOLVED INVALID

Status

mozilla.org Graveyard
Server Operations
RESOLVED INVALID
6 years ago
3 years ago

People

(Reporter: Aj, Assigned: rtucker)

Tracking

Details

(Reporter)

Description

6 years ago
Hello,

Seems the nagios bots are not updating nagios GUI with the correct "downtime" message.
Instead of the intended message, a generic one is used;

Entry Time: 08-23-2012 17:06:33	
Author: (Nagios Process)
Comment: This service has been scheduled for fixed downtime from 08-23-2012 17:06:33 to 08-23-2012 17:07:33. Notifications for the service will not be sent out during that time period.

The actual message;
nagios-phx1: downtime support4.webapp.phx1.mozilla.com:HP Log 1m testing downtime

Acknowledging, however, does work.
(Assignee)

Comment 1

6 years ago
Looks like the comment field is missing the \n at the end. I added that and will test/confirm/close this out.
(Reporter)

Comment 2

6 years ago
Just tested with;
nagios-phx1: downtime support4.webapp.phx1.mozilla.com:HP Log 1m testing downtime

Seems still not working;
Entry Time: 08-24-2012 09:32:59	
Author: (Nagios Process)	
Comment: This service has been scheduled for fixed downtime from 08-24-2012 09:32:47 to 08-24-2012 09:33:47. Notifications for the service will not be sent out during that time period.

Thanks
(Assignee)

Comment 3

6 years ago
hmm, it's looking like 1 of 2 things.
1. nagios user permissions are messed up and it doesn't know who you are
2. the command file spec is now different than it used to be.
(Assignee)

Comment 4

6 years ago
Aj,
I think that this is some kind of nagios configuration issue.
I've created a new test to confirm the syntax that I'm writing to the file and it follows the spec exactly.

https://github.com/rtucker-mozilla/mozilla-nagios-bot/commit/8d8fe800a2dcf316a34f13d3f420a0d1da45f799#L2R94

I'm replacing the integer timestamps for testability, but you can see what i'm getting at with the test.

According to the docs the command needs to follow this format:
"[%lu] SCHEDULE_HOST_DOWNTIME;host1;1110741500;1110748700;0;0;7200;Some One;Some Downtime Comment\n"

My test confirms the message is as follows:
self.assertEqual(cmd, '[000000] SCHEDULE_HOST_DOWNTIME;test-host.fake.mozilla.com;1234;5678;2;0;60;%s;blah blah\n' % (self.my_nick ))

Which using variable expansion turns into:
"[000000] SCHEDULE_HOST_DOWNTIME;test-host.fake.mozilla.com;1234;5678;2;0;60;rtucker;blah blah\n"
(Assignee)

Comment 5

6 years ago
I did some further investigating on this. Nagios is not honoring either the comment or the user who is setting the downtime. I've confirmed this through both the web interface as well as from a command line script.

Example written to nagios cmd file:
[1346856100] SCHEDULE_HOST_DOWNTIME;host.here.mozilla.com;1346856100;1346856200;1;0;60;rtucker@mozilla.com;Some Downtime Comment
SCHEDULE_HOST_DOWNTIME;host.here.mozilla.com;1346856100;1346856200;1;0;60;rtucker;Some Downtime Comment
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.