Closed
Bug 1479619
Opened 7 years ago
Closed 6 years ago
Move bouncer-check and related nagios alert from scl3 to mdc
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dlabici, Unassigned)
References
Details
Attachments
(1 file)
55 bytes,
text/x-github-pull-request
|
jlorenzo
:
review+
jlorenzo
:
checked-in+
|
Details | Review |
With BuildBot and SCL3 EOL approaching at a rapid speed, we need to move bouncer nagios checks from scl3 to mdc.
Based on the emails between Jordan and Nick we will need to fix up nagios configuration from IT puppet repo to point to the new datacenter and appropriate host locations.
@fubar Do you or the team happen to know where the services/checks will be living after BB/SCL3 dies? So we can work on a Puppet IT patch for the move.
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(klibby)
Comment 1•7 years ago
|
||
Most likely, we'll just need to copy stuff from modules/nagios4/manifests/prod/releng/scl3.pp to modules/nagios4/manifests/prod/releng/mdc1.pp, and from prod/releng/services/scl3.pp to mdc1.pp
It's all managed the same way, just separate files for mdc1.
Flags: needinfo?(klibby)
Comment 2•7 years ago
|
||
:nthomas I have checked the two files mdc1.pp and scl3.pp as I believe the nagios checks are setup there. I've searched for the bouncer as that was one nagios check explicitly called out in your mail, that something needs to live beyond buildbot.
I've found this :
'buildbot-master81.bb.releng.scl3.mozilla.com' => {
parents => 'fw1.private.releng.scl3.mozilla.net',
contact_groups => 'build',
hostgroups => [
'scl3-production-buildbot-masters',
'bouncer-checks',
'selfserve-agents'
I'm planning on moving that to a host in the mdc1 file. Do you have any suggestions of which host? Do you know of any other checks that obviously should move?
Flags: needinfo?(nthomas)
Comment 3•7 years ago
|
||
I don't know what we have in mdc1 to help pick a new spot, up to relops I think. Would like to point out that the bouncer check is implemented (in the releng puppet) as a python package and needs a py27 virtualenv:
https://github.com/mozilla-releng/build-puppet/blob/master/manifests/moco-nodes.pp#L626
https://github.com/mozilla-releng/build-puppet/blob/master/modules/bouncer_check/manifests/init.pp
A stop-gap solution could be to keep a buildbot master alive in usw2/use1, and move the check to there. If we happen to be keeping bm01/bm71 alive for bug 1479620 then they could be a natural place. Plus the nagios config moved from scl3 to mdc1 in IT puppet.
re other checks, I think there's only bug 1479620.
Flags: needinfo?(nthomas)
Comment 4•7 years ago
|
||
I will raise this this with releng. We should figure out what hosts we want to keep or add to after the scl3 migration. Perhaps ciduty can then help reconfiguring that once we know more.
Flags: needinfo?(jlund)
Comment 5•7 years ago
|
||
answering in tracking bug: 1478215
Flags: needinfo?(jlund)
Summary: Move bouncer nagios checks from scl3 to mdc → Move bouncer-check and related nagios alert from scl3 to mdc
Comment 6•6 years ago
|
||
Assignee: nobody → nthomas
Updated•6 years ago
|
Assignee: nthomas → nobody
Comment 7•6 years ago
|
||
I don't see a patch on bug 1484880 for moving bouncer's nagios check (search for check_bouncer). Could ciduty work one up asap ? NB there's a pending ni on bug 1484880 about the right bot to use.
Comment 9•6 years ago
|
||
:nthomas, I've just made a patch for that. Would you mind taking a look ?
Comment 10•6 years ago
|
||
Comment on attachment 9007646 [details] [review]
GitHub Pull Request - add test code on buildbot-maseter01
r+'d at https://github.com/mozilla-releng/build-puppet/pull/200#pullrequestreview-153691137
Landed on master at https://github.com/mozilla-releng/build-puppet/commit/50826477ff80eaa30fffd7cd3ee76754c9d61436
Attachment #9007646 -
Flags: review+
Attachment #9007646 -
Flags: checked-in+
Comment 11•6 years ago
|
||
Status:
- the patch here moved the check to buildbot-master01, and it runs OK locally
- the nagios code changes happened in bug 1484880, but there's a network flow issue somewhere between the nagios server and bm01 so the check times out
Comment 12•6 years ago
|
||
This is working now.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•