Closed Bug 505157 Opened 15 years ago Closed 14 years ago

Figure out holes in nagios coverage and plug them

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

References

Details

Eg nothing for talos-master which would have been good during bug 505153). Also recent try slaves.
Oh, and make sure we always request nagios tests when new machines are cloned/set up.
Futuring until someone is ready to take this on.
Component: Release Engineering → Release Engineering: Future
Some examples:
production-patrocles (win32 + buildbot)
fx-linux-1.9-slave03,04  (linux + buildbot)
fx-win32-1.9-slave03,04  (win32 + buildbot)
talos-master, talos-staging-master
moz2-darwin9-slave09 -> 18 are missing from nagios entirely
talos-master and talos-staging-master are being handled in bug 509700.
(In reply to comment #4)
> talos-master, talos-staging-master

Bug 509700 for this.
Depends on: 514979
I went through the inventory and these are the machines without nagios checks:

moz2-darwin9-slave09 thru 18  - this whole block filed as bug bug 514979
try-linux-slave10 thru 19
try-mac-slave10 thru 19
try-w32-slave10 thru 19
moz2-linux64-slave02

moz2-linux-slave26 thru 35 - reopened bug 510601
moz2-win32-slave40 thru 49 - reopened bug 510597

talos-master, talos-staging-master - reopened bug 509700
production-1.8-master, production-crazyhorse, production-patrocles
staging-1.9-master

cruncher
staging/production-opsi
staging/production-puppet

staging-nightly-updates (copy prometheus-vm)
staging-stage (disk checks!)

fx-linux-1.9-slave03,04
fx-win32-1.9-slave03,04
egg  (this may not be worth it, really old OS)

qm-pleopard-try05,06
qm-ptiger-try05,06
qm-pubuntu-try05,06
qm-pvista-try05,06
qm-pxp-try05,06
talos-rev1-leopard01,02
talos-rev2-vista15

The inventory could do with some love too, eg talos machines that became talos-rev-FOO but still have an qm-BAR entry, qm-rhel02 etc.
Depends on: 510601, 510597, 509700
Revisiting the status here.

(In reply to comment #8)
> moz2-darwin9-slave09 thru 18  - this whole block filed as bug bug 514979
> try-linux-slave10 thru 19
> try-mac-slave10 thru 19
Verified fixed.

> try-w32-slave10 thru 19
Verified fixed, but we didn't get nagios for later slaves 20 thru 29. Reopened bug 532714.

> moz2-linux64-slave02
Verified fixed, and the new VMs to slave12 are also added.

> moz2-linux-slave26 thru 35 - reopened bug 510601
Verified fixed, and moz2-linux-slave50 also has checks so all recent slaves should be OK too.

> moz2-win32-slave40 thru 49 - reopened bug 510597
Verified fixed, and moz2-win32-slave59 also has checks so newer slaves should be OK too.

For mac we have everything up to and including moz2-darwin9-slave26 covered. The new slaves from bug 528281 (29-37) are in MV so no nagios until bug 527814 is fixed (noted there). 27 & 28 got missed out somehow when allocating hostnames.

> talos-master, talos-staging-master - reopened bug 509700
Verified fixed. Added request for talos-staging-master02 to bug 527814.

> production-1.8-master, production-crazyhorse, production-patrocles
bug 539859.

> staging-1.9-master
VM deleted in the meantime.
 
> cruncher
> staging/production-opsi
> staging/production-puppet
> staging-nightly-updates (copy prometheus-vm)
> staging-stage (disk checks!)
All bug 539859. We'll need to setup nagios on the opsi machines (Debian).

> fx-linux-1.9-slave03,04
> fx-win32-1.9-slave03,04
The 03's got deleted. 04's in bug 539859.

> egg  (this may not be worth it, really old OS)
Don't care.

> qm-ptiger-try05,06
> talos-rev1-leopard01,02
No longer in DNS.

> qm-pleopard-try05,06
> qm-pubuntu-try05,06
> qm-pvista-try05,06
> qm-pxp-try05,06
> talos-rev2-vista15
All fixed elsewhere. I didn't attempt to check that every Talos host is in nagios.
Depends on: 532714, 527814, 539859
All deps are closed so lets consider this FIXED.

Phong, could you make sure that adding nagios checks is on the list for new machine setup ?
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.