Closed Bug 1479619 Opened 7 years ago Closed 6 years ago

Move bouncer-check and related nagios alert from scl3 to mdc

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dlabici, Unassigned)

References

Details

Attachments

(1 file)

With BuildBot and SCL3 EOL approaching at a rapid speed, we need to move bouncer nagios checks from scl3 to mdc. Based on the emails between Jordan and Nick we will need to fix up nagios configuration from IT puppet repo to point to the new datacenter and appropriate host locations. @fubar Do you or the team happen to know where the services/checks will be living after BB/SCL3 dies? So we can work on a Puppet IT patch for the move.
Flags: needinfo?(klibby)
Most likely, we'll just need to copy stuff from modules/nagios4/manifests/prod/releng/scl3.pp to modules/nagios4/manifests/prod/releng/mdc1.pp, and from prod/releng/services/scl3.pp to mdc1.pp It's all managed the same way, just separate files for mdc1.
Flags: needinfo?(klibby)
:nthomas I have checked the two files mdc1.pp and scl3.pp as I believe the nagios checks are setup there. I've searched for the bouncer as that was one nagios check explicitly called out in your mail, that something needs to live beyond buildbot. I've found this : 'buildbot-master81.bb.releng.scl3.mozilla.com' => { parents => 'fw1.private.releng.scl3.mozilla.net', contact_groups => 'build', hostgroups => [ 'scl3-production-buildbot-masters', 'bouncer-checks', 'selfserve-agents' I'm planning on moving that to a host in the mdc1 file. Do you have any suggestions of which host? Do you know of any other checks that obviously should move?
Flags: needinfo?(nthomas)
I don't know what we have in mdc1 to help pick a new spot, up to relops I think. Would like to point out that the bouncer check is implemented (in the releng puppet) as a python package and needs a py27 virtualenv: https://github.com/mozilla-releng/build-puppet/blob/master/manifests/moco-nodes.pp#L626 https://github.com/mozilla-releng/build-puppet/blob/master/modules/bouncer_check/manifests/init.pp A stop-gap solution could be to keep a buildbot master alive in usw2/use1, and move the check to there. If we happen to be keeping bm01/bm71 alive for bug 1479620 then they could be a natural place. Plus the nagios config moved from scl3 to mdc1 in IT puppet. re other checks, I think there's only bug 1479620.
Flags: needinfo?(nthomas)
I will raise this this with releng. We should figure out what hosts we want to keep or add to after the scl3 migration. Perhaps ciduty can then help reconfiguring that once we know more.
Flags: needinfo?(jlund)
answering in tracking bug: 1478215
Flags: needinfo?(jlund)
Summary: Move bouncer nagios checks from scl3 to mdc → Move bouncer-check and related nagios alert from scl3 to mdc
Assignee: nthomas → nobody
I don't see a patch on bug 1484880 for moving bouncer's nagios check (search for check_bouncer). Could ciduty work one up asap ? NB there's a pending ni on bug 1484880 about the right bot to use.
:nthomas, I've just made a patch for that. Would you mind taking a look ?
Status: - the patch here moved the check to buildbot-master01, and it runs OK locally - the nagios code changes happened in bug 1484880, but there's a network flow issue somewhere between the nagios server and bm01 so the check times out
This is working now.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: