If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Figure out holes in nagios coverage and plug them

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: nthomas, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
Eg nothing for talos-master which would have been good during bug 505153). Also recent try slaves.
(Reporter)

Comment 1

8 years ago
Oh, and make sure we always request nagios tests when new machines are cloned/set up.
Futuring until someone is ready to take this on.
Component: Release Engineering → Release Engineering: Future
(Reporter)

Comment 3

8 years ago
Some examples:
production-patrocles (win32 + buildbot)
fx-linux-1.9-slave03,04  (linux + buildbot)
fx-win32-1.9-slave03,04  (win32 + buildbot)
(Reporter)

Comment 4

8 years ago
talos-master, talos-staging-master
moz2-darwin9-slave09 -> 18 are missing from nagios entirely
talos-master and talos-staging-master are being handled in bug 509700.
(Reporter)

Comment 7

8 years ago
(In reply to comment #4)
> talos-master, talos-staging-master

Bug 509700 for this.
(Reporter)

Updated

8 years ago
Depends on: 514979
(Reporter)

Comment 8

8 years ago
I went through the inventory and these are the machines without nagios checks:

moz2-darwin9-slave09 thru 18  - this whole block filed as bug bug 514979
try-linux-slave10 thru 19
try-mac-slave10 thru 19
try-w32-slave10 thru 19
moz2-linux64-slave02

moz2-linux-slave26 thru 35 - reopened bug 510601
moz2-win32-slave40 thru 49 - reopened bug 510597

talos-master, talos-staging-master - reopened bug 509700
production-1.8-master, production-crazyhorse, production-patrocles
staging-1.9-master

cruncher
staging/production-opsi
staging/production-puppet

staging-nightly-updates (copy prometheus-vm)
staging-stage (disk checks!)

fx-linux-1.9-slave03,04
fx-win32-1.9-slave03,04
egg  (this may not be worth it, really old OS)

qm-pleopard-try05,06
qm-ptiger-try05,06
qm-pubuntu-try05,06
qm-pvista-try05,06
qm-pxp-try05,06
talos-rev1-leopard01,02
talos-rev2-vista15

The inventory could do with some love too, eg talos machines that became talos-rev-FOO but still have an qm-BAR entry, qm-rhel02 etc.
Depends on: 510601, 510597, 509700
(Reporter)

Comment 9

8 years ago
Revisiting the status here.

(In reply to comment #8)
> moz2-darwin9-slave09 thru 18  - this whole block filed as bug bug 514979
> try-linux-slave10 thru 19
> try-mac-slave10 thru 19
Verified fixed.

> try-w32-slave10 thru 19
Verified fixed, but we didn't get nagios for later slaves 20 thru 29. Reopened bug 532714.

> moz2-linux64-slave02
Verified fixed, and the new VMs to slave12 are also added.

> moz2-linux-slave26 thru 35 - reopened bug 510601
Verified fixed, and moz2-linux-slave50 also has checks so all recent slaves should be OK too.

> moz2-win32-slave40 thru 49 - reopened bug 510597
Verified fixed, and moz2-win32-slave59 also has checks so newer slaves should be OK too.

For mac we have everything up to and including moz2-darwin9-slave26 covered. The new slaves from bug 528281 (29-37) are in MV so no nagios until bug 527814 is fixed (noted there). 27 & 28 got missed out somehow when allocating hostnames.

> talos-master, talos-staging-master - reopened bug 509700
Verified fixed. Added request for talos-staging-master02 to bug 527814.

> production-1.8-master, production-crazyhorse, production-patrocles
bug 539859.

> staging-1.9-master
VM deleted in the meantime.
 
> cruncher
> staging/production-opsi
> staging/production-puppet
> staging-nightly-updates (copy prometheus-vm)
> staging-stage (disk checks!)
All bug 539859. We'll need to setup nagios on the opsi machines (Debian).

> fx-linux-1.9-slave03,04
> fx-win32-1.9-slave03,04
The 03's got deleted. 04's in bug 539859.

> egg  (this may not be worth it, really old OS)
Don't care.

> qm-ptiger-try05,06
> talos-rev1-leopard01,02
No longer in DNS.

> qm-pleopard-try05,06
> qm-pubuntu-try05,06
> qm-pvista-try05,06
> qm-pxp-try05,06
> talos-rev2-vista15
All fixed elsewhere. I didn't attempt to check that every Talos host is in nagios.
Depends on: 532714, 527814, 539859
(Reporter)

Comment 10

8 years ago
All deps are closed so lets consider this FIXED.

Phong, could you make sure that adding nagios checks is on the list for new machine setup ?
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.