Closed
Bug 726384
Opened 13 years ago
Closed 13 years ago
bm-vmware01.build.sjc1 Down
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rbryce, Assigned: rbryce)
Details
Server started to alert ilo communication errors. I restarted the iLO card to hopefully regain access to Blade 7 the blade appears failed
Assignee | ||
Comment 1•13 years ago
|
||
The blade seemed to crash as soon I as logged in after the iLO reset.
Comment 2•13 years ago
|
||
That host was running the following VMs:
dev-stage1
moz2-linux-slave19
moz2-linux-slave27
moz2-linux-slave34
moz2-linux-slave37
moz2-linux-slave39
moz2-linux64-template
preproduction-master
preproduction-stage
try-linux-slave02
try-linux-slave03
try-linux-slave22
Of those, we can survive until working hours without the slaves. That leaves
preproduction-master
preproduction-stage
dev-stage1
all of which are a part of the releng dev (aka preprod aka staging) environment, and not mission-critical, but they should be up sooner rather than later as releng can't stage changes without those components. If possible, we should re-start the latter three VMs on another ESX host. I don't know how to do that in vSphere, but I think it's possible..
Assignee | ||
Comment 3•13 years ago
|
||
preproduction-master
preproduction-stage
Have been moved to other vm hosts. Dev-stage1 is still in the process of copying files should finish up in an hour or so.
Updated•13 years ago
|
Group: infra
Assignee | ||
Comment 4•13 years ago
|
||
The VMs listed above are running again. Bm-vmware01 suddenly came back hours later. I dont trust the hardware and am reluctant to bring the host back into the esx cluster until a hardware diagnostic can be run.
Comment 5•13 years ago
|
||
Of the VMs in comment #2, only preproduction-master and preproduction-stage are responding to ping. Could we have dev-stage1 back asap, and the others at your convenience ?
Severity: normal → critical
Updated•13 years ago
|
Severity: critical → major
Updated•13 years ago
|
Assignee: server-ops → rbryce
Assignee | ||
Comment 6•13 years ago
|
||
:nthomas I forgot to change the MAC address in dhcp when I migrated the VM. Server should be up now.
Comment 7•13 years ago
|
||
Thanks for fixing up dev-stage1. What's the plan for investigating the host problem and bringing up the remaining VMs ?
Assignee | ||
Comment 8•13 years ago
|
||
I have tried to find with no success any problems with this server.
Comment 9•13 years ago
|
||
rbeyce: can we bring it back up along with the linux builders vms?
Assignee | ||
Comment 10•13 years ago
|
||
bm-vmware01 is back online. I started the slaves below. I still have no clue what caused bm-vmware01 to act so badly for 12 hours and then suddenly resurrect. For now I think preproduction-master, preproduction-stage,and dev-stage1 should stay on other esx hosts.
--booted vms --
moz2-linux-slave19
moz2-linux-slave27
moz2-linux-slave34
moz2-linux-slave37
moz2-linux-slave39
moz2-linux64-template
try-linux-slave02
try-linux-slave03
try-linux-slave22
--
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•