Closed Bug 802450 Opened 13 years ago Closed 13 years ago

Network outage in build.mtv1

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
All
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: nthomas, Unassigned)

Details

<nagios-scl1> [14] kvm3.build.mtv1:PING is CRITICAL: CRITICAL - Plugin timed out after 10 seconds <nagios-scl1> [59] ganglia3.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [65] kvm1.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [81] kvm.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [84] ns1b.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% <nagios-svc-scl2> Tue 17:50:56 PDT [151] mon1.scl2.svc:cepmon_sync_503 is OK: (null) <nagios-scl1> [93] ns1a.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [37] kvm2.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% <Aj> nagios-scl3: ack 533 Bug 773129 <nagios-scl1> [19] ganglia3.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [25] kvm1.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% <nagios-phx1> Tue 17:56:58 PDT [132] bouncer1.webapp.phx1.mozilla.com:Disk - All is OK: DISK OK (http://m.allizom.org/Disk+-+All) <nagios-scl1> [38] kvm.build.mtv1 is DOWN: (Host Check Timed Out) [43] ns1b.build.mtv1 is DOWN: CRITICAL - Plugin timed out after 10 seconds [67] ns1a.build.mtv1 is DOWN: PING CRITICAL - Packet loss = 100% [84] kvm2.build.mtv1 is DOWN: (Host Check Timed Out) In #buildduty a great many slaves are also reported PING CRITICAL.
Assignee: server-ops → ravi
Assignee: ravi → server-ops
Severity: blocker → normal
Whiteboard: [buildduty][outage][treeclosure]
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
Priority: -- → P1
QA Contact: shyam → ravi
Whiteboard: [buildduty][outage][treeclosure]
These recovered later though that wasn't added to the bug. I saw no cause on the firewall and due to the previous power outage and pending maintenance to refresh software on core1, there's no logs available there. There's not enough info available to give a cause.
Status: NEW → RESOLVED
Closed: 13 years ago
Priority: P1 → --
Resolution: --- → INCOMPLETE
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.