Closed
Bug 448908
Opened 16 years ago
Closed 16 years ago
bm-vmware05 crashed, taking down a bunch of VMs
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Assigned: mrz)
Details
I tried to ssh or vnc to it but no luck Any suggestion?
Comment 1•16 years ago
|
||
There's a lot of this on the console: sd 0:0:0:0 timit out command, waited 360s end_request: I/O error, dev sda, sector 20291311 for assorted sectors. And some Read-error on swap-device (0:0:0:20291319) but not always paired with the first set. It's not using very much CPU, RAM, or doing a bunch of disk or network access, according to the Performance info in the VI Client. Can't get a prompt on the console, so trying rebooting.
Assignee: nobody → nthomas
Comment 2•16 years ago
|
||
The gentle method (using "Reboot Guest VM") got no response. The more determined method (using "Reset") is hung at 95% completion, so need help from Server Ops. There isn't a good place to reclone this machine from so lets try to recover it first.
Assignee: nthomas → server-ops
Severity: normal → major
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Comment 3•16 years ago
|
||
FWIW, looks like it failed sometime after 09:10 PDT on Wed 30 Jul.
Assignee | ||
Comment 4•16 years ago
|
||
management process on bm-vmware05 died, trying to figure out how to restart.
Assignee | ||
Comment 5•16 years ago
|
||
Everything is pointing to an iSCSI issue - can't tell which iSCSI LUN was at fault. While debugging with VMware on the phone, the box crashed taking down the following VMs: fx-linux-1.9-slave1 moz2-win32-slave06 prometheus-vm tb-linux-tbox try-master Punting over to RE to bring things back up. Still working with VMware on root cause.
Assignee: server-ops → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Comment 6•16 years ago
|
||
Talked to mrz: 1) Filed bug#449059 to track reviving the downed VMs. 2) Pushing this bug back to IT to track fixing the root cause of kernel panic on bm-vmware05.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Summary: fx-linux-1.9-slave1 is unreachable → bm-vmware05 crashed, taking down a bunch of VMs
Assignee | ||
Updated•16 years ago
|
Assignee: server-ops → mrz
Assignee | ||
Comment 7•16 years ago
|
||
vmware blades HP's management tools. HP doesn't think so but does recommend upgrading from 8.0 to 8.1. I doubt it's related to that since it only crashed after the vmware tech was trying to fix some other SAN issues but I'll upgrade.
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•16 years ago
|
||
updated.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•