Closed
Bug 1479620
Opened 7 years ago
Closed 6 years ago
Move l10n nagios checks from scl3 to mdc
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dlabici, Assigned: dlabici)
References
Details
Attachments
(2 files)
1.84 KB,
patch
|
Details | Diff | Splinter Review | |
2.11 KB,
patch
|
fubar
:
review+
|
Details | Diff | Splinter Review |
With BuildBot and SCL3 EOL approaching we will need to move l10n bumper from bm71 & bm01
Based on the emails between Jordan and Nick we will need to fix up nagios configuration from IT puppet repo to point to the new datacenter and appropriate host locations.
@fubar Do you or the team happen to know where the services/checks will be living after BB/SCL3 dies?
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(klibby)
Comment 1•7 years ago
|
||
Same answer as in https://bugzilla.mozilla.org/show_bug.cgi?id=1479619#c1
Flags: needinfo?(klibby)
Comment 2•7 years ago
|
||
Also I think that it's worth mentioning that since we are doing this move, would also be a good idea to remove the buildbot specific checks from nagios?
For example PING for all the IX machines, buildbot masters checks (command queue, mysql connectivity, puppet freshness, buldbot masters age and process, and many more..)
Is there a bug where all of these things mentioned above are being tracked?
Comment 3•7 years ago
|
||
Luckily bm01 & bm71 are both in use1, so they could live on for a time once scl3 expires. In the long run it would make sense to move l10n-bumper into https://github.com/mozilla-releng/treescript, since that's our modern scriptworker approach to pushing things into the tree. Lando might change that, tbd!
(In reply to Bogdan Crisan [:bcrisan] (UTC +3, EEST) from comment #2)
> Also I think that it's worth mentioning that since we are doing this move,
> would also be a good idea to remove the buildbot specific checks from nagios?
>
> For example PING for all the IX machines, buildbot masters checks (command
> queue, mysql connectivity, puppet freshness, buldbot masters age and
> process, and many more..)
Yes, there's a lot of teardown to be done but I don't know of any bug tracking that yet. We've kind been using bug 1478215 even though that's a specific service.
Comment 4•6 years ago
|
||
FTR, bug 1488913 tracks turning off buildbot.
I don't see a patch on bug 1484880 for l10n bumper's nagios check (l10n_bumper_lock). Could ciduty work one up asap ?
Assignee | ||
Comment 5•6 years ago
|
||
Assignee: nobody → dlabici
Attachment #9007699 -
Flags: review?(nthomas)
Comment 6•6 years ago
|
||
(In reply to Danut Labici [:dlabici] from comment #5)
> Created attachment 9007699 [details] [diff] [review]
> l10n_bumper_check.patch
I think what we need to do is to move the l10n-bumper-servers hostgroup from the scl3 configs to mdc1, move bm01 and bm77 from releng/scl3.pp to releng/mdc1.pp, and also move the l10n_bumper_lock check from releng/services/scl3.pp to releng/services/mdc1.pp.
Assignee | ||
Comment 7•6 years ago
|
||
adding myself to NI so I have a reminder for tomorrow.
Flags: needinfo?(dlabici)
Comment 8•6 years ago
|
||
Comment on attachment 9007699 [details] [diff] [review]
l10n_bumper_check.patch
Obsoleted by fubar's comment. I'm not a good reviewer for those changes, suggest RelOps/Moc instead.
Attachment #9007699 -
Flags: review?(nthomas)
Comment 9•6 years ago
|
||
agree; :ryanc reviewed the other checks, so if he's amenable to doing these that'd be great. otherwise jake or I could do it.
Assignee | ||
Comment 10•6 years ago
|
||
I will be in PTO and I somehow missed this bug.
@ciduty, can you please check and see how the status is?
Flags: needinfo?(dlabici) → needinfo?(ciduty)
Comment 11•6 years ago
|
||
The patch for moving the l10n-bumper-lock check from releng/services/scl3.pp to releng/services/mdc1.pp.
Also I've checked the following:
-bm01 and bm77 are moved in releng/mdc1.pp Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1484880#c26
-the l10n-bumper-servers hostgroup is to bm01 and bm77 in mdc1
Could you take a look, please?
Attachment #9010841 -
Flags: review?(klibby)
Comment 12•6 years ago
|
||
Comment on attachment 9010841 [details] [diff] [review]
l10n-bumper-lock-check.patch
Review of attachment 9010841 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, other than that extra blank line!
::: modules/nagios4/manifests/prod/releng/services/mdc1.pp
@@ +857,5 @@
> + default => [
> + ]
> + }
> + },
> +
Extra blank line here, to remove
Attachment #9010841 -
Flags: review?(klibby) → review+
Comment 13•6 years ago
|
||
FYI, when this check lands it's likely to hit the same 'CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds' error that the bouncer check on bm01 is hitting over in bug 1484880.
Comment 14•6 years ago
|
||
CIDuty do not have access to push into this repository. Could someone push this patch or give us the write access to the nagios module?
Patch: https://bug1479620.bmoattachments.org/attachment.cgi?id=9010841
Thank you !
Assignee | ||
Comment 15•6 years ago
|
||
Did this got landed?
Comment 16•6 years ago
|
||
The patch is landed.
commit a9250d0c17f73d5ebb0820e074d986945c01d974
Comment 17•6 years ago
|
||
This alert came from bm77 and bm01 after the patch was landed. I have acknowledged it: bug 1495920
Fri 23:20:29 UTC [8499] [] buildbot-master77.bb.releng.use1.mozilla.com:L10n bumper lock age is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 15 seconds
Fri 23:24:49 UTC [8500] [] buildbot-master01.bb.releng.use1.mozilla.com:L10n bumper lock age is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 15 seconds.
Comment 18•6 years ago
|
||
Bug 1495920 fixed this.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(ciduty)
Resolution: --- → FIXED
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•