Closed Bug 462693 Opened 16 years ago Closed 16 years ago

bm-xserve02 has a corrupted disk ?

Categories

(Infrastructure & Operations :: RelOps: General, task)

PowerPC
macOS
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: phong)

References

Details

Something went rogue on bm-xserve02 sometime after 07:23 Saturday (PDT), probably consuming all the CPU. I can't reach it with VNC or ssh (the connection opens and it hangs after sending identity files). 

This box is doing Tb2.0.0.x nightlies (en-US and locales), and Firefox 2.0.0.x localized nightlies, and is marked Tier 1 in the inventory, but it's not that urgent. Monday will be fine.

Please reboot it and watch out for any errors.
Assignee: server-ops → oremj
GNI has rebooted bm-xserve02.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Thanks!

The end of the build log (from when it died) has
 cvs -q -z 3 co -P -r MOZILLA_1_8_BRANCH -D 11/01/2008 14:23 +0000 SeaMonkeyAll
which is pretty similar to what happened to balsa-18branch in bug 461685. Did anything change on the cvs server recently ? Maybe an update to the cvs package ? Perhaps there's a bad interaction with the older clients on these Fx2 boxes.
Passed Disk Verify on the RAID array, restarted tinderbox.
... and it promptly stopped responding again. Could we please have some hardware diagnostics done on boot, and a more exhaustive disk/RAID check.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: bm-xserve02 needs a reboot → bm-xserve02 has a corrupted disk ?
Assignee: oremj → phong
This will require downtime for this box.  When can I do this?
Please go ahead and do this ASAP - this box is useless to us right now.
I can't even boot the diagnostic CD.  Now it's stuck at the boot screen for the past 15 minutes.
If this turns out to be a hardware failure, will be able to rebuild this server?
We have images we can use to recreate it on the software side, if the hardware is replaceable/fixable. We have just the one spare machine we could use to sub, so we'd like to try to fix it.
I will have to call this in for service.
order #1725346

case #110033042
Any news from Apple ?
I was able to restore from bm-xserve03 image, but it failed again after a reboot.  It was running fine from a clean install from the disk.  I'm going to try another re-image.
So what's the story with this CRITICAL bug ? It's now a month since this Tier1 box went down, and we're getting close to Tb2.0.0.19 so QA needs builds to verify bug fixes.
I'm going to try and image this xserve again.  If this doesn't work, then we might have to go with a fresh install.
nagios says this machine is up, status update please.
I found an image of bm-xserve02 from around April of this year.  I restored it from that image.  It is up again.  Please reopen if we run into errors again.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Thanks Phong, glad to have it back.

Tinderbox config checked and restarted.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.