Closed
Bug 682290
Opened 14 years ago
Closed 14 years ago
scl1 nagios alerts storm
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: arich)
Details
Host affected: only admin1.infra.scl1.mozilla.com
Service affected: scl1 nagios
Time of issue: 08/26/2011 07:28 PDT
Length of outage: 2 minutes
Issue: When admin1 was failed over to the new kvm instance to free up kvm2 for upgrade, it caused a storm of nagios alerts due to a fixed bad default route. Since admin1 was using dhcp for all of its interfaces, and it had interfaces on multiple VLANs, it chose the wrong dhcp information to pick it's route and resolver hosts. Nagios therefore sent out a large number of false alerts until the route and resolver hosts were corrected.
Resolution: I made all of the interface, route, and resolver information on admin1 static and rebooted it to make sure things came up correctly.
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•