Closed Bug 726384 Opened 12 years ago Closed 12 years ago

bm-vmware01.build.sjc1 Down

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rbryce, Assigned: rbryce)

Details

Rick Bryce [:rbryce]

Assignee

Description

•

12 years ago

Server started to alert ilo communication errors. I restarted the iLO card to hopefully regain access to Blade 7 the blade appears failed

Rick Bryce [:rbryce]

Assignee

Comment 1

•

12 years ago

The blade seemed to crash as soon I as logged in after the iLO reset.

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

12 years ago

That host was running the following VMs:

dev-stage1
moz2-linux-slave19
moz2-linux-slave27
moz2-linux-slave34
moz2-linux-slave37
moz2-linux-slave39
moz2-linux64-template
preproduction-master
preproduction-stage
try-linux-slave02
try-linux-slave03
try-linux-slave22

Of those, we can survive until working hours without the slaves.  That leaves

preproduction-master
preproduction-stage
dev-stage1

all of which are a part of the releng dev (aka preprod aka staging) environment, and not mission-critical, but they should be up sooner rather than later as releng can't stage changes without those components.  If possible, we should re-start the latter three VMs on another ESX host.  I don't know how to do that in vSphere, but I think it's possible..

Rick Bryce [:rbryce]

Assignee

Comment 3

•

12 years ago

preproduction-master
preproduction-stage

Have been moved to other vm hosts.  Dev-stage1 is still in the process of copying files should finish up in an hour or so.

matthew zeier [:mrz]

Updated

•

12 years ago

Group: infra

Rick Bryce [:rbryce]

Assignee

Comment 4

•

12 years ago

The VMs listed above are running again.  Bm-vmware01 suddenly came back hours later.  I dont trust the hardware and am reluctant to bring the host back into the esx cluster until a hardware diagnostic can be run.

Nick Thomas [:nthomas] (UTC+12)

Comment 5

•

12 years ago

Of the VMs in comment #2, only preproduction-master and preproduction-stage are responding to ping. Could we have dev-stage1 back asap, and the others at your convenience ?

Severity: normal → critical

Nick Thomas [:nthomas] (UTC+12)

Updated

•

12 years ago

Severity: critical → major

Dumitru Gherman [:dumitru]

Updated

•

12 years ago

Assignee: server-ops → rbryce

Rick Bryce [:rbryce]

Assignee

Comment 6

•

12 years ago

:nthomas I forgot to change the MAC address in dhcp when I migrated the VM.  Server should be up now.

Nick Thomas [:nthomas] (UTC+12)

Comment 7

•

12 years ago

Thanks for fixing up dev-stage1. What's the plan for investigating the host problem and bringing up the remaining VMs ?

Rick Bryce [:rbryce]

Assignee

Comment 8

•

12 years ago

I have tried to find with no success any problems with this server.

Amy Rich [:arr] [:arich]

Comment 9

•

12 years ago

rbeyce: can we bring it back up along with the linux builders vms?

Rick Bryce [:rbryce]

Assignee

Comment 10

•

12 years ago

bm-vmware01 is back online.  I started the slaves below. I still have no clue what caused bm-vmware01 to act so badly for 12 hours and then suddenly resurrect. For now I think preproduction-master, preproduction-stage,and dev-stage1 should stay on other esx hosts.

--booted vms --
moz2-linux-slave19
moz2-linux-slave27
moz2-linux-slave34
moz2-linux-slave37
moz2-linux-slave39
moz2-linux64-template
try-linux-slave02
try-linux-slave03
try-linux-slave22
--

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

bm-vmware01.build.sjc1 Down

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: rbryce, Assigned: rbryce)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated