Closed Bug 682290 Opened 14 years ago Closed 14 years ago

scl1 nagios alerts storm

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: arich)

Details

Host affected: only admin1.infra.scl1.mozilla.com Service affected: scl1 nagios Time of issue: 08/26/2011 07:28 PDT Length of outage: 2 minutes Issue: When admin1 was failed over to the new kvm instance to free up kvm2 for upgrade, it caused a storm of nagios alerts due to a fixed bad default route. Since admin1 was using dhcp for all of its interfaces, and it had interfaces on multiple VLANs, it chose the wrong dhcp information to pick it's route and resolver hosts. Nagios therefore sent out a large number of false alerts until the route and resolver hosts were corrected. Resolution: I made all of the interface, route, and resolver information on admin1 static and rebooted it to make sure things came up correctly.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.