Closed Bug 598535 Opened 14 years ago Closed 14 years ago

tm-b01-master01 array controller battery failure

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justdave, Assigned: jlaz)

References

Details

(Whiteboard: 09/26/2010 @ 2pm)

Smart Array P400i in Slot 0 (Embedded)
   Cache Status: Temporarily Disabled
   Battery/Capacitor Status: Failed (Replace Batteries)

This is production impacting - database writes will be slow with the write cache disabled.  This will require downtime for the battery swap.
This server supports the following services:

bonsai.mozilla.org
bonsai-l10n.mozilla.org
bonsai-www.mozilla.org
Buildbot
Despot (CVS access management)
developer.mozilla.org
graphs.mozilla.org
litmus.mozilla.org
viewvc.svn.mozilla.org
Flags: needs-downtime+
Flags: colo-trip+
Assignee: server-ops → jlazaro
jlaz - get that HP case opened as soon as you can.
Severity: major → blocker
Case ID:  4620075056
Submitted: 	2010-09-21 21:57:39 PDT
Problem description :	

Smart Array P400i in Slot 0 (Embedded)
   Cache Status: Temporarily Disabled
   Battery/Capacitor Status: Failed (Replace Batteries)

We'll need to replace the battery for the smart array, which will
need to be rushed/expedited to us since this impacts
production/user-facing services.
Dear Justin, 

Thank you for contacting HP e-Solutions. 

This e mail is with reference to the case number: 4620075056

The part has been shipped. 

Note: It is always recommended  to use Support Case Manager (Link is given below) to update your case as it will generate a sub case which will get our immediate attention even if I am not available, whereas when you reply directly to the mail, it does not generate a sub case.

Please use the link provided, in order to receive prompt action on this case. 

http://www2.itrc.hp.com/service/mcm/homepageRequest.do

Regards,
Rakhee Achuthan
Technical Solutions Consultant, HP e-Solutions 
Mon-Fri: 8.30-17.30 [EST]
UPS Tracking number: 1Z4295AR0121145656
Received the battery today, ready to head to MPT but we'll need to coordinate downtime for this machine.
When's the best time to downtime for this?  It'll take out buildbot and devmo for about a half hour (as well as the other things mentioned in comment 1)
Weekends are really the best otherwise Tuesday night.  Who can handle doing it?
We have a hard drive to replace on dm-ftp01 (which is internal so it'll have to be taken apart to replace it, rather than a hot-swap) also (bug 586285), which will require taking down stage.mozilla.org because of the nfs mounts for firefox as well.  Probably good to do them both at the same time.
I can do this over the weekend, what time would be ideal?
Whiteboard: 09/26/2010 @ 2pm
(In reply to comment #1)
> This server supports the following services:
> 
> bonsai.mozilla.org
> bonsai-l10n.mozilla.org
> bonsai-www.mozilla.org
> Buildbot
> Despot (CVS access management)
> developer.mozilla.org
> graphs.mozilla.org
> litmus.mozilla.org
> viewvc.svn.mozilla.org

None of this has an announced downtime except for Buildbot. Can we still go ahead with it ?
(In reply to comment #11)
> None of this has an announced downtime except for Buildbot. Can we still go
> ahead with it ?

The answer to that question should have been No, but didn't see your comment in time, and stupidly didn't check myself. :(

It's done now, pending verification.
It's not seeing the new battery, shows there's no battery installed now.  We'll have to wait until we get notice sent for a downtime to try again.
yey.

[15:20]  <justdave> we have battery
[15:20]  <justdave>    Battery/Capacitor Count: 1
[15:20]  <justdave>    Battery/Capacitor Status: OK
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Since it's apparently near impossible to get downtime for buildbot, we opted (with justin's okay) to go ahead and try again now; second attempt succeeded.  This is done.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.