Closed Bug 1463779 Opened 7 years ago Closed 6 years ago

nagiosctl feature request: Add ability to Undowntime hosts to nagiosctl

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: phrozyn, Assigned: Usul)

Details

Attachments

(2 files, 5 obsolete files)

Per our IRC conversation I'm making this bug for :Usul @Usul> phrozyn: file a bug , assign it to me I'll work on that This request is to add functionality to nagiosctl to be able to programmatically undowntime hosts. Thanks!
Assignee: nobody → ludovic
Severity: normal → enhancement
Status: NEW → ASSIGNED
so looking at the script we use https://old.nagios.org/developerinfo/externalcommands/commandlist.php?category_id=1 to ack host. We use https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=118 to make the downtime. Removing the downtime is done with https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=125, but we need to figure out the donwtime_id and I don't see a way to query for these. Maybe https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=53 will do the job, but my internal knowledge of Nagios for that is limited. keegan do you think ENABLE_HOST_CHECK would override the downtime?
Flags: needinfo?(kferrando)
Attached file get_downtimes.py (obsolete) —
(In reply to Ludovic Hirlimann [:Usul] from comment #1) > keegan do you think ENABLE_HOST_CHECK would override the downtime? Nah, that's not the way. I modified the downtime report to spit out the downtime ID based on a hostname search. See if you can make use of this.
Flags: needinfo?(kferrando)
Thanks, I also found this https://stackoverflow.com/questions/37247772/automate-schedule-and-cancel-downtime-in-nagios but auth0 is in the way. Will try to use your script.
Comment on attachment 8982500 [details] [diff] [review] Add nagiosctl to ur stage instances so I can work from there. I don't see what is actually being modified here.
Attachment #8982500 - Flags: review?(rchilds)
Comment on attachment 8982500 [details] [diff] [review] Add nagiosctl to ur stage instances so I can work from there. Am copy the scripts stuff from prod to stage just to have something similar because nagiosctl is not present on stage, just on prod machines.
(In reply to Ludovic Hirlimann [:Usul] from comment #6) > Comment on attachment 8982500 [details] [diff] [review] > Add nagiosctl to ur stage instances so I can work from there. > > Am copy the scripts stuff from prod to stage just to have something similar > because nagiosctl is not present on stage, just on prod machines. WFM if this is only going to stage
Comment on attachment 8982500 [details] [diff] [review] Add nagiosctl to ur stage instances so I can work from there. pushed in 7bcc74daaf1245bc8cd03cb887484f300ba178a1
Attachment #8982500 - Flags: checked-in+
(In reply to Ludovic Hirlimann [:Usul] from comment #8) > Comment on attachment 8982500 [details] [diff] [review] > Add nagiosctl to ur stage instances so I can work from there. > > pushed in 7bcc74daaf1245bc8cd03cb887484f300ba178a1 I forgot something : Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find template 'nagios4/stage/data-bin/change-mailman-oncall.sh.erb' at /etc/puppet/modules/nagios4/manifests/stage/scripts.pp:68 on node nagios1.stage.private.mdc1.mozilla.com Info: Not using expired catalog for nagios1.stage.private.mdc1.mozilla.com from cache; expired at 2018-04-20 09:21:28 +0000 Notice: Using cached catalog Error: Could not retrieve catalog; skipping run fixed bt adding two missing scripts from prod to stage.
Attached patch addundowntimetonagiosctl.patch (obsolete) — Splinter Review
Using your python script I came up with this. Thouhts ?
Attachment #8986749 - Flags: review?(kferrando)
I'm ok with this if we've tested it all works on stage first. Ryan, are you good here?
Flags: needinfo?(rchilds)
Attachment #8986749 - Flags: review?(kferrando)
Attachment #8986749 - Flags: review+
Attachment #8986749 - Flags: feedback?(rchilds)
Going over this, 1. Where is the following script located? > + '/data/bin/get_downtime.py': 2. This is to undowntime anything > + echo " undowntime Undowntime a host" 3. This shouldn't even be touched and looks like a mistake > + info=`curl --request GET --max-time 15 --silent --write-out %{http_code} --fail --basic -u '<%= scope.function_hiera(['secrets_inventory_oncall_username']) > + %>:<%= scope.function_hiera(['secrets_inventory_oncall_password']) %>' --url "https://inventory.mozilla.org/en-US/systems/getoncall/${group}/?format=delimited" --output ${TMPDIR}/oncall-info 2>/dev/null` I can provide further feedback once I see the bullet 1.
Flags: needinfo?(rchilds)
Attachment #8986749 - Flags: feedback?(rchilds)
(In reply to Ryan C [:ryanc] from comment #12) > Going over this, > > 1. Where is the following script located? > > + '/data/bin/get_downtime.py': **** forgot to add them to te diff - here you go. > 3. This shouldn't even be touched and looks like a mistake > > + info=`curl --request GET --max-time 15 --silent --write-out %{http_code} --fail --basic -u '<%= scope.function_hiera(['secrets_inventory_oncall_username']) > > + %>:<%= scope.function_hiera(['secrets_inventory_oncall_password']) %>' --url "https://inventory.mozilla.org/en-US/systems/getoncall/${group}/?format=delimited" --output ${TMPDIR}/oncall-info 2>/dev/null` that's due to me not wanting to leak credentials as I did my dev on a machine that had the key. There is some cleanup to do in nagiosctl. I'll do it once this is in.
Attachment #8986749 - Attachment is obsolete: true
Attachment #8987017 - Flags: review?(kferrando)
Attachment #8987017 - Flags: feedback?(rchilds)
(In reply to Ludovic Hirlimann [:Usul] from comment #13) > Created attachment 8987017 [details] [diff] [review] > addundowntimetonagiosctl_v2.patch +GETDOWNTIME='/data/bin/get_downtime.py' ... + downtimeid=`$GETDOWNTIME ${host}` This works? I would imagine you would need to pass that through the python interpreter explicitly. E.G. downtimeid=$(/usr/bin/python ${GETDOWNTIME} ${host})
(In reply to Keegan Ferrando [:fauweh] from comment #14) > (In reply to Ludovic Hirlimann [:Usul] from comment #13) > > Created attachment 8987017 [details] [diff] [review] > > addundowntimetonagiosctl_v2.patch > > +GETDOWNTIME='/data/bin/get_downtime.py' > ... > + downtimeid=`$GETDOWNTIME ${host}` > > This works? I would imagine you would need to pass that through the python > interpreter explicitly. > > E.G. > downtimeid=$(/usr/bin/python ${GETDOWNTIME} ${host}) it does feel free to play with it on nagios1.stage in mdc1 where I did the development.
Comment on attachment 8987017 [details] [diff] [review] addundowntimetonagiosctl_v2.patch The downtime should be queried from livestatus, not the status file
Attachment #8987017 - Flags: feedback?(rchilds)
/var/log/nagios/rw/live https://mathias-kettner.de/checkmk_livestatus.html and Nagios bot as some interesting stuff in it.
Comment on attachment 8987017 [details] [diff] [review] addundowntimetonagiosctl_v2.patch Review of attachment 8987017 [details] [diff] [review]: ----------------------------------------------------------------- r- until Ryan's feedback is applied.
Attachment #8987017 - Flags: review?(kferrando) → review-
is this done?
Nope, I'm having an issue with live status and been working on other things.
Attachment #9023994 - Flags: review?(rchilds)
Comment on attachment 9023994 [details] [diff] [review] 1463779_v3.patch undowrime using mklive status I'd say first remove anything related to your prior script, "get_downtime.py". Once you've done that, I only want to see a diff for the new stuff on stage with a working example that I can poke at.
Attachment #9023994 - Flags: review?(rchilds)
Attached patch undowntime3.patch (obsolete) — Splinter Review
moc-test1.private.mdc1.mozilla.com is downtimed on nagios1.stage.private.mdc1. Scripst are deployed there for testing.
Attachment #8980752 - Attachment is obsolete: true
Attachment #8987017 - Attachment is obsolete: true
Attachment #9023994 - Attachment is obsolete: true
Attachment #9024196 - Flags: review?(rchilds)
(In reply to Ludovic Hirlimann [:Usul] from comment #23) > Created attachment 9024196 [details] [diff] [review] > undowntime3.patch > > moc-test1.private.mdc1.mozilla.com is downtimed on > nagios1.stage.private.mdc1. Scripst are deployed there for testing. > diff --git a/modules/nagios4/files/prod/data-bin/undowntime.py b/modules/nagios4/files/prod/data-bin/undowntime.py > diff --git a/modules/nagios4/manifests/prod/scripts.pp b/modules/nagios4/manifests/prod/scripts.pp > diff --git a/modules/nagios4/templates/prod/data-bin/nagiosctl.erb b/modules/nagios4/templates/prod/data-bin/nagiosctl.erb Stage only for now, please >+ '/data/bin/nagiosctl': This will conflict since "/data/bin/nagiosctl" is already defined above for "nagios4/stage/data-bin/nagiosctl.erb" >+ require => File['/data/bin'], >+ ensure => present, >+ mode => '0755', >+ source => "puppet:///modules/nagios4/prod/data-bin/undowntime.py"; What is the purpose of the "schedule-host-downtime.sh" script? Overall undowntime.py seems alright
Flags: needinfo?(ludovic)
Attachment #9024196 - Flags: review?(rchilds)
(In reply to Ryan C [:ryanc] (UTC-4) from comment #24) > (In reply to Ludovic Hirlimann [:Usul] from comment #23) > > Created attachment 9024196 [details] [diff] [review] > > undowntime3.patch > > > > moc-test1.private.mdc1.mozilla.com is downtimed on > > nagios1.stage.private.mdc1. Scripst are deployed there for testing. > > > diff --git a/modules/nagios4/files/prod/data-bin/undowntime.py b/modules/nagios4/files/prod/data-bin/undowntime.py > > diff --git a/modules/nagios4/manifests/prod/scripts.pp b/modules/nagios4/manifests/prod/scripts.pp > > diff --git a/modules/nagios4/templates/prod/data-bin/nagiosctl.erb b/modules/nagios4/templates/prod/data-bin/nagiosctl.erb > > Stage only for now, please Here you go stage only. > > > What is the purpose of the "schedule-host-downtime.sh" script? Overall > undowntime.py seems alright One of the many downtime scripts in that module also see schedule-downtime.pl.
Attachment #9024196 - Attachment is obsolete: true
Flags: needinfo?(ludovic)
Attachment #9025345 - Flags: review?(rchilds)
Comment on attachment 9025345 [details] [diff] [review] stageonlyctl.patch LGTM, let's poke it on stage for a minute before pushing it to prod
Attachment #9025345 - Flags: review?(rchilds) → review+
Comment on attachment 9025345 [details] [diff] [review] stageonlyctl.patch checked-in 8e2fb0970ab5211498c20af83561fb0dc9925c61
Attachment #9025345 - Flags: checked-in+
pushed in 5a7e9ae0209f6b3bf37187fc045bbb192ac65a9e.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: