Closed
Bug 462693
Opened 16 years ago
Closed 16 years ago
bm-xserve02 has a corrupted disk ?
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: phong)
References
Details
Something went rogue on bm-xserve02 sometime after 07:23 Saturday (PDT), probably consuming all the CPU. I can't reach it with VNC or ssh (the connection opens and it hangs after sending identity files). This box is doing Tb2.0.0.x nightlies (en-US and locales), and Firefox 2.0.0.x localized nightlies, and is marked Tier 1 in the inventory, but it's not that urgent. Monday will be fine. Please reboot it and watch out for any errors.
Updated•16 years ago
|
Assignee: server-ops → oremj
Comment 1•16 years ago
|
||
GNI has rebooted bm-xserve02.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•16 years ago
|
||
Thanks! The end of the build log (from when it died) has cvs -q -z 3 co -P -r MOZILLA_1_8_BRANCH -D 11/01/2008 14:23 +0000 SeaMonkeyAll which is pretty similar to what happened to balsa-18branch in bug 461685. Did anything change on the cvs server recently ? Maybe an update to the cvs package ? Perhaps there's a bad interaction with the older clients on these Fx2 boxes.
Reporter | ||
Comment 3•16 years ago
|
||
Passed Disk Verify on the RAID array, restarted tinderbox.
Reporter | ||
Comment 4•16 years ago
|
||
... and it promptly stopped responding again. Could we please have some hardware diagnostics done on boot, and a more exhaustive disk/RAID check.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: bm-xserve02 needs a reboot → bm-xserve02 has a corrupted disk ?
Updated•16 years ago
|
Assignee: oremj → phong
Assignee | ||
Comment 5•16 years ago
|
||
This will require downtime for this box. When can I do this?
Comment 6•16 years ago
|
||
Please go ahead and do this ASAP - this box is useless to us right now.
Assignee | ||
Comment 7•16 years ago
|
||
I can't even boot the diagnostic CD. Now it's stuck at the boot screen for the past 15 minutes.
Assignee | ||
Comment 8•16 years ago
|
||
If this turns out to be a hardware failure, will be able to rebuild this server?
Reporter | ||
Comment 9•16 years ago
|
||
We have images we can use to recreate it on the software side, if the hardware is replaceable/fixable. We have just the one spare machine we could use to sub, so we'd like to try to fix it.
Assignee | ||
Comment 11•16 years ago
|
||
I will have to call this in for service.
Assignee | ||
Comment 12•16 years ago
|
||
order #1725346 case #110033042
Reporter | ||
Comment 13•16 years ago
|
||
Any news from Apple ?
Assignee | ||
Comment 14•16 years ago
|
||
I was able to restore from bm-xserve03 image, but it failed again after a reboot. It was running fine from a clean install from the disk. I'm going to try another re-image.
Reporter | ||
Comment 15•16 years ago
|
||
So what's the story with this CRITICAL bug ? It's now a month since this Tier1 box went down, and we're getting close to Tb2.0.0.19 so QA needs builds to verify bug fixes.
Assignee | ||
Comment 16•16 years ago
|
||
I'm going to try and image this xserve again. If this doesn't work, then we might have to go with a fresh install.
Reporter | ||
Comment 17•16 years ago
|
||
nagios says this machine is up, status update please.
Assignee | ||
Comment 18•16 years ago
|
||
I found an image of bm-xserve02 from around April of this year. I restored it from that image. It is up again. Please reopen if we run into errors again.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 19•16 years ago
|
||
Thanks Phong, glad to have it back. Tinderbox config checked and restarted.
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•