Closed
Bug 462693
Opened 17 years ago
Closed 17 years ago
bm-xserve02 has a corrupted disk ?
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: phong)
References
Details
Something went rogue on bm-xserve02 sometime after 07:23 Saturday (PDT), probably consuming all the CPU. I can't reach it with VNC or ssh (the connection opens and it hangs after sending identity files).
This box is doing Tb2.0.0.x nightlies (en-US and locales), and Firefox 2.0.0.x localized nightlies, and is marked Tier 1 in the inventory, but it's not that urgent. Monday will be fine.
Please reboot it and watch out for any errors.
Updated•17 years ago
|
Assignee: server-ops → oremj
Comment 1•17 years ago
|
||
GNI has rebooted bm-xserve02.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 2•17 years ago
|
||
Thanks!
The end of the build log (from when it died) has
cvs -q -z 3 co -P -r MOZILLA_1_8_BRANCH -D 11/01/2008 14:23 +0000 SeaMonkeyAll
which is pretty similar to what happened to balsa-18branch in bug 461685. Did anything change on the cvs server recently ? Maybe an update to the cvs package ? Perhaps there's a bad interaction with the older clients on these Fx2 boxes.
| Reporter | ||
Comment 3•17 years ago
|
||
Passed Disk Verify on the RAID array, restarted tinderbox.
| Reporter | ||
Comment 4•17 years ago
|
||
... and it promptly stopped responding again. Could we please have some hardware diagnostics done on boot, and a more exhaustive disk/RAID check.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: bm-xserve02 needs a reboot → bm-xserve02 has a corrupted disk ?
Updated•17 years ago
|
Assignee: oremj → phong
| Assignee | ||
Comment 5•17 years ago
|
||
This will require downtime for this box. When can I do this?
Comment 6•17 years ago
|
||
Please go ahead and do this ASAP - this box is useless to us right now.
| Assignee | ||
Comment 7•17 years ago
|
||
I can't even boot the diagnostic CD. Now it's stuck at the boot screen for the past 15 minutes.
| Assignee | ||
Comment 8•17 years ago
|
||
If this turns out to be a hardware failure, will be able to rebuild this server?
| Reporter | ||
Comment 9•17 years ago
|
||
We have images we can use to recreate it on the software side, if the hardware is replaceable/fixable. We have just the one spare machine we could use to sub, so we'd like to try to fix it.
| Assignee | ||
Comment 11•17 years ago
|
||
I will have to call this in for service.
| Assignee | ||
Comment 12•17 years ago
|
||
order #1725346
case #110033042
| Reporter | ||
Comment 13•17 years ago
|
||
Any news from Apple ?
| Assignee | ||
Comment 14•17 years ago
|
||
I was able to restore from bm-xserve03 image, but it failed again after a reboot. It was running fine from a clean install from the disk. I'm going to try another re-image.
| Reporter | ||
Comment 15•17 years ago
|
||
So what's the story with this CRITICAL bug ? It's now a month since this Tier1 box went down, and we're getting close to Tb2.0.0.19 so QA needs builds to verify bug fixes.
| Assignee | ||
Comment 16•17 years ago
|
||
I'm going to try and image this xserve again. If this doesn't work, then we might have to go with a fresh install.
| Reporter | ||
Comment 17•17 years ago
|
||
nagios says this machine is up, status update please.
| Assignee | ||
Comment 18•17 years ago
|
||
I found an image of bm-xserve02 from around April of this year. I restored it from that image. It is up again. Please reopen if we run into errors again.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 19•17 years ago
|
||
Thanks Phong, glad to have it back.
Tinderbox config checked and restarted.
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•