Closed
Bug 1463779
Opened 7 years ago
Closed 6 years ago
nagiosctl feature request: Add ability to Undowntime hosts to nagiosctl
Categories
(Infrastructure & Operations :: MOC: Service Requests, task)
Infrastructure & Operations
MOC: Service Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: phrozyn, Assigned: Usul)
Details
Attachments
(2 files, 5 obsolete files)
52.00 KB,
patch
|
Usul
:
checked-in+
|
Details | Diff | Splinter Review |
5.88 KB,
patch
|
ryanc
:
review+
Usul
:
checked-in+
|
Details | Diff | Splinter Review |
Per our IRC conversation I'm making this bug for :Usul
@Usul> phrozyn: file a bug , assign it to me I'll work on that
This request is to add functionality to nagiosctl to be able to programmatically undowntime hosts.
Thanks!
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → ludovic
Updated•7 years ago
|
Severity: normal → enhancement
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•7 years ago
|
||
so looking at the script we use https://old.nagios.org/developerinfo/externalcommands/commandlist.php?category_id=1 to ack host.
We use https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=118 to make the downtime.
Removing the downtime is done with https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=125, but we need to figure out the donwtime_id and I don't see a way to query for these.
Maybe https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=53 will do the job, but my internal knowledge of Nagios for that is limited.
keegan do you think ENABLE_HOST_CHECK would override the downtime?
Flags: needinfo?(kferrando)
Comment 2•7 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> keegan do you think ENABLE_HOST_CHECK would override the downtime?
Nah, that's not the way.
I modified the downtime report to spit out the downtime ID based on a hostname search. See if you can make use of this.
Flags: needinfo?(kferrando)
Assignee | ||
Comment 3•7 years ago
|
||
Thanks, I also found this https://stackoverflow.com/questions/37247772/automate-schedule-and-cancel-downtime-in-nagios but auth0 is in the way. Will try to use your script.
Assignee | ||
Comment 4•7 years ago
|
||
Attachment #8982500 -
Flags: review?(rchilds)
Comment 5•7 years ago
|
||
Comment on attachment 8982500 [details] [diff] [review]
Add nagiosctl to ur stage instances so I can work from there.
I don't see what is actually being modified here.
Attachment #8982500 -
Flags: review?(rchilds)
Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8982500 [details] [diff] [review]
Add nagiosctl to ur stage instances so I can work from there.
Am copy the scripts stuff from prod to stage just to have something similar because nagiosctl is not present on stage, just on prod machines.
Comment 7•7 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #6)
> Comment on attachment 8982500 [details] [diff] [review]
> Add nagiosctl to ur stage instances so I can work from there.
>
> Am copy the scripts stuff from prod to stage just to have something similar
> because nagiosctl is not present on stage, just on prod machines.
WFM if this is only going to stage
Assignee | ||
Comment 8•7 years ago
|
||
Comment on attachment 8982500 [details] [diff] [review]
Add nagiosctl to ur stage instances so I can work from there.
pushed in 7bcc74daaf1245bc8cd03cb887484f300ba178a1
Attachment #8982500 -
Flags: checked-in+
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #8)
> Comment on attachment 8982500 [details] [diff] [review]
> Add nagiosctl to ur stage instances so I can work from there.
>
> pushed in 7bcc74daaf1245bc8cd03cb887484f300ba178a1
I forgot something :
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find template 'nagios4/stage/data-bin/change-mailman-oncall.sh.erb' at /etc/puppet/modules/nagios4/manifests/stage/scripts.pp:68 on node nagios1.stage.private.mdc1.mozilla.com
Info: Not using expired catalog for nagios1.stage.private.mdc1.mozilla.com from cache; expired at 2018-04-20 09:21:28 +0000
Notice: Using cached catalog
Error: Could not retrieve catalog; skipping run
fixed bt adding two missing scripts from prod to stage.
Assignee | ||
Comment 10•7 years ago
|
||
Using your python script I came up with this. Thouhts ?
Attachment #8986749 -
Flags: review?(kferrando)
Comment 11•7 years ago
|
||
I'm ok with this if we've tested it all works on stage first.
Ryan, are you good here?
Flags: needinfo?(rchilds)
Updated•7 years ago
|
Attachment #8986749 -
Flags: review?(kferrando)
Attachment #8986749 -
Flags: review+
Attachment #8986749 -
Flags: feedback?(rchilds)
Comment 12•7 years ago
|
||
Going over this,
1. Where is the following script located?
> + '/data/bin/get_downtime.py':
2. This is to undowntime anything
> + echo " undowntime Undowntime a host"
3. This shouldn't even be touched and looks like a mistake
> + info=`curl --request GET --max-time 15 --silent --write-out %{http_code} --fail --basic -u '<%= scope.function_hiera(['secrets_inventory_oncall_username'])
> + %>:<%= scope.function_hiera(['secrets_inventory_oncall_password']) %>' --url "https://inventory.mozilla.org/en-US/systems/getoncall/${group}/?format=delimited" --output ${TMPDIR}/oncall-info 2>/dev/null`
I can provide further feedback once I see the bullet 1.
Flags: needinfo?(rchilds)
Updated•7 years ago
|
Attachment #8986749 -
Flags: feedback?(rchilds)
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Ryan C [:ryanc] from comment #12)
> Going over this,
>
> 1. Where is the following script located?
> > + '/data/bin/get_downtime.py':
**** forgot to add them to te diff - here you go.
> 3. This shouldn't even be touched and looks like a mistake
> > + info=`curl --request GET --max-time 15 --silent --write-out %{http_code} --fail --basic -u '<%= scope.function_hiera(['secrets_inventory_oncall_username'])
> > + %>:<%= scope.function_hiera(['secrets_inventory_oncall_password']) %>' --url "https://inventory.mozilla.org/en-US/systems/getoncall/${group}/?format=delimited" --output ${TMPDIR}/oncall-info 2>/dev/null`
that's due to me not wanting to leak credentials as I did my dev on a machine that had the key.
There is some cleanup to do in nagiosctl. I'll do it once this is in.
Attachment #8986749 -
Attachment is obsolete: true
Attachment #8987017 -
Flags: review?(kferrando)
Attachment #8987017 -
Flags: feedback?(rchilds)
Comment 14•7 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #13)
> Created attachment 8987017 [details] [diff] [review]
> addundowntimetonagiosctl_v2.patch
+GETDOWNTIME='/data/bin/get_downtime.py'
...
+ downtimeid=`$GETDOWNTIME ${host}`
This works? I would imagine you would need to pass that through the python interpreter explicitly.
E.G.
downtimeid=$(/usr/bin/python ${GETDOWNTIME} ${host})
Assignee | ||
Comment 15•7 years ago
|
||
(In reply to Keegan Ferrando [:fauweh] from comment #14)
> (In reply to Ludovic Hirlimann [:Usul] from comment #13)
> > Created attachment 8987017 [details] [diff] [review]
> > addundowntimetonagiosctl_v2.patch
>
> +GETDOWNTIME='/data/bin/get_downtime.py'
> ...
> + downtimeid=`$GETDOWNTIME ${host}`
>
> This works? I would imagine you would need to pass that through the python
> interpreter explicitly.
>
> E.G.
> downtimeid=$(/usr/bin/python ${GETDOWNTIME} ${host})
it does feel free to play with it on nagios1.stage in mdc1 where I did the development.
Comment 16•7 years ago
|
||
Comment on attachment 8987017 [details] [diff] [review]
addundowntimetonagiosctl_v2.patch
The downtime should be queried from livestatus, not the status file
Attachment #8987017 -
Flags: feedback?(rchilds)
Assignee | ||
Comment 17•7 years ago
|
||
/var/log/nagios/rw/live
https://mathias-kettner.de/checkmk_livestatus.html
and Nagios bot as some interesting stuff in it.
Comment 18•7 years ago
|
||
Comment on attachment 8987017 [details] [diff] [review]
addundowntimetonagiosctl_v2.patch
Review of attachment 8987017 [details] [diff] [review]:
-----------------------------------------------------------------
r- until Ryan's feedback is applied.
Attachment #8987017 -
Flags: review?(kferrando) → review-
Reporter | ||
Comment 19•6 years ago
|
||
is this done?
Assignee | ||
Comment 20•6 years ago
|
||
Nope, I'm having an issue with live status and been working on other things.
Assignee | ||
Comment 21•6 years ago
|
||
Attachment #9023994 -
Flags: review?(rchilds)
Comment 22•6 years ago
|
||
Comment on attachment 9023994 [details] [diff] [review]
1463779_v3.patch undowrime using mklive status
I'd say first remove anything related to your prior script, "get_downtime.py".
Once you've done that, I only want to see a diff for the new stuff on stage with a working example that I can poke at.
Attachment #9023994 -
Flags: review?(rchilds)
Assignee | ||
Comment 23•6 years ago
|
||
moc-test1.private.mdc1.mozilla.com is downtimed on nagios1.stage.private.mdc1. Scripst are deployed there for testing.
Attachment #8980752 -
Attachment is obsolete: true
Attachment #8987017 -
Attachment is obsolete: true
Attachment #9023994 -
Attachment is obsolete: true
Attachment #9024196 -
Flags: review?(rchilds)
Comment 24•6 years ago
|
||
(In reply to Ludovic Hirlimann [:Usul] from comment #23)
> Created attachment 9024196 [details] [diff] [review]
> undowntime3.patch
>
> moc-test1.private.mdc1.mozilla.com is downtimed on
> nagios1.stage.private.mdc1. Scripst are deployed there for testing.
> diff --git a/modules/nagios4/files/prod/data-bin/undowntime.py b/modules/nagios4/files/prod/data-bin/undowntime.py
> diff --git a/modules/nagios4/manifests/prod/scripts.pp b/modules/nagios4/manifests/prod/scripts.pp
> diff --git a/modules/nagios4/templates/prod/data-bin/nagiosctl.erb b/modules/nagios4/templates/prod/data-bin/nagiosctl.erb
Stage only for now, please
>+ '/data/bin/nagiosctl':
This will conflict since "/data/bin/nagiosctl" is already defined above for "nagios4/stage/data-bin/nagiosctl.erb"
>+ require => File['/data/bin'],
>+ ensure => present,
>+ mode => '0755',
>+ source => "puppet:///modules/nagios4/prod/data-bin/undowntime.py";
What is the purpose of the "schedule-host-downtime.sh" script? Overall undowntime.py seems alright
Flags: needinfo?(ludovic)
Updated•6 years ago
|
Attachment #9024196 -
Flags: review?(rchilds)
Assignee | ||
Comment 25•6 years ago
|
||
(In reply to Ryan C [:ryanc] (UTC-4) from comment #24)
> (In reply to Ludovic Hirlimann [:Usul] from comment #23)
> > Created attachment 9024196 [details] [diff] [review]
> > undowntime3.patch
> >
> > moc-test1.private.mdc1.mozilla.com is downtimed on
> > nagios1.stage.private.mdc1. Scripst are deployed there for testing.
>
> > diff --git a/modules/nagios4/files/prod/data-bin/undowntime.py b/modules/nagios4/files/prod/data-bin/undowntime.py
> > diff --git a/modules/nagios4/manifests/prod/scripts.pp b/modules/nagios4/manifests/prod/scripts.pp
> > diff --git a/modules/nagios4/templates/prod/data-bin/nagiosctl.erb b/modules/nagios4/templates/prod/data-bin/nagiosctl.erb
>
> Stage only for now, please
Here you go stage only.
>
>
> What is the purpose of the "schedule-host-downtime.sh" script? Overall
> undowntime.py seems alright
One of the many downtime scripts in that module also see schedule-downtime.pl.
Attachment #9024196 -
Attachment is obsolete: true
Flags: needinfo?(ludovic)
Attachment #9025345 -
Flags: review?(rchilds)
Comment 26•6 years ago
|
||
Comment on attachment 9025345 [details] [diff] [review]
stageonlyctl.patch
LGTM, let's poke it on stage for a minute before pushing it to prod
Attachment #9025345 -
Flags: review?(rchilds) → review+
Assignee | ||
Comment 27•6 years ago
|
||
Comment on attachment 9025345 [details] [diff] [review]
stageonlyctl.patch
checked-in 8e2fb0970ab5211498c20af83561fb0dc9925c61
Attachment #9025345 -
Flags: checked-in+
Assignee | ||
Comment 28•6 years ago
|
||
pushed in 5a7e9ae0209f6b3bf37187fc045bbb192ac65a9e.
Assignee | ||
Updated•6 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•