Closed
Bug 438811
Opened 16 years ago
Closed 16 years ago
bm-xserve11 wont boot
Categories
(Release Engineering :: General, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: coop)
Details
(Keywords: fixed1.9.0.1, verified1.9.0.1)
bm-xserve11 does debug builds for Firefox 3.0 on mac, so it's on the Firefox tinderbox tree. It's normally solid as a rock but the build starting at 2008/06/11 23:25 PDT never completed (normally takes 10 minutes). There were no checkins so this is looking like a random glitch or hardware problem.
Reporter | ||
Comment 1•16 years ago
|
||
Can't get a login prompt with ssh despite the initial connection being made, and doesn't respond to VNC either. Over to Server Ops for someone to look at the console. It's a Tier 1 box but the tree is closed at the moment, so no need to set blocker severity.
Assignee: nobody → server-ops
Severity: normal → critical
Component: Release Engineering: Maintenance → Server Operations: Tinderbox Maintenance
QA Contact: release → justin
Updated•16 years ago
|
Assignee: server-ops → mrz
Flags: colo-trip+
Comment 2•16 years ago
|
||
Console was hung (all keyboard LEDs lit up but nothing on the monitor). Power cycled. Box is up at login prompt.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•16 years ago
|
||
Looks like it was doing a checkout when it went boom. Saved at /builds/tinderbox/Fx-Trunk-test_mem/Darwin_8.8.4_Depend/Darwin_8.8.4_Depend.log.20080611-hang Tinderbox restarted.
Reporter | ||
Comment 4•16 years ago
|
||
Gah, it did it again. Could you do a disk check please ?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 5•16 years ago
|
||
called for remote hands to power cycle.
Comment 6•16 years ago
|
||
power cycled but it's down again?
Comment 7•16 years ago
|
||
I can keep power cycling but it keeps going dark, usually after you start doing something. Something to fix on your end?
Assignee: mrz → nobody
Status: REOPENED → NEW
Component: Server Operations: Tinderbox Maintenance → Release Engineering
Flags: colo-trip+
QA Contact: justin → release
Reporter | ||
Comment 8•16 years ago
|
||
Maybe. Tinderbox says a build started at 2008/06/12 09:49, so that's comment #3 & 4. As far as I can tell no-one touched it after that, but CC'ing people that might've. If it was touched then we should definitely clobber the build and have a general poke around. If not, then I think we're requesting some hardware diagnostics and a disk verify.
Comment 9•16 years ago
|
||
I haven't touched it
Reporter | ||
Comment 10•16 years ago
|
||
Ok, I'm on the hook from the build side. Please reboot this again when you can get hold of me on IRC, then I can look at it immediately.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
Priority: -- → P2
QA Contact: release → justin
Comment 12•16 years ago
|
||
Box rebooted.
Status: NEW → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Comment 13•16 years ago
|
||
Host won't boot off local disks.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•16 years ago
|
||
Nick - OS reload?
Reporter | ||
Comment 15•16 years ago
|
||
It's not too bad if we have to put the 10.4 Intel image on there and set up tinderbox again, but I'd really like to be confident in the hardware first. Can we boot of a diagnostic CD and try to figure out if it's a problem with the RAID setup or disk failure or some other hardware fault? If one disk has died then we could replace that disk and rebuild the RAID. If the RAID is corrupted, could we look at the SMART status on the disks in case there's a failure right around the corner.
Comment 16•16 years ago
|
||
Over to Phong to check.
Assignee: mrz → phong.tran
Status: REOPENED → NEW
Flags: colo-trip+
Updated•16 years ago
|
Status: NEW → ASSIGNED
Comment 17•16 years ago
|
||
Verifying volume "untitled RAID Set 1" Checking HFS Plus Volume Checking Extents Overflows file. Checking Catalog file. Invalid extent entry Incorrect block count for file Compose.strings (It should be 0 instead of 49247) Invalid extent entry Checking multi-linked files. Checking catalog hierarchy Checking Extent Attributes file. Checking volume bitmap. Volume Bit Map needs minor repair. Checking volume information. Invalid volume free block count. (It should be 13608976 instead of 13608963). The volume Server RAID needs to be repaired. Error: The underlying task reported failure on exit 1 HFS volume checked Volume needs repair
Comment 18•16 years ago
|
||
volume was successfully repaired, but it still won't boot to OS.
Comment 19•16 years ago
|
||
What else can we try here, to get this machines back online? Now that the disk has been repaired, maybe we could try imaging from another similar xserve, instead of imaging from a clean OSinstall?
Summary: bm-xserve11 is hung → bm-xserve11 wont boot
Comment 20•16 years ago
|
||
Do you have a server I can take down to create an image from?
Reporter | ||
Comment 21•16 years ago
|
||
Please use Build's "gold image" for Intel 10.4 rather than a running machine.
Comment 22•16 years ago
|
||
if needed, we can return qm-xserve06 for re-imaging. It's redundant on the unittest farm.
Comment 23•16 years ago
|
||
I'll try the 10.4 image first.
Comment 24•16 years ago
|
||
bm-xserve11 has been imaged with gold 10.4 image from bm-xserve09.
Reporter | ||
Comment 25•16 years ago
|
||
Thanks Phong, looks good. Taking back to RelEng for tinderbox setup. I've set the hostname to bm-xserve11.build.m.o. Coop, anything special for a debug tinderbox setup ? Do you have time to get this going again ?
Assignee: phong → nobody
Status: ASSIGNED → NEW
Component: Server Operations → Release Engineering
Flags: colo-trip+
QA Contact: justin → release
Comment 26•16 years ago
|
||
This is a tier 1 machine on the 1.9.0 branch... why isn't the tree closed?
Comment 27•16 years ago
|
||
(In reply to comment #26) > This is a tier 1 machine on the 1.9.0 branch... why isn't the tree closed? Tree is now closed.
Comment 28•16 years ago
|
||
This is blocking our work on Gecko 1.9.0.1, and should be considered a P1 blocker. To whom should it be assigned?
Flags: blocking1.9.0.1+
Priority: P2 → P1
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → ccooper
Assignee | ||
Updated•16 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 29•16 years ago
|
||
A build is in progress now, but we're running in one-off mode until I'm sure the config is right. Don't let me forget to get a multi-config.pl setup!
Assignee | ||
Updated•16 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Keywords: fixed1.9.0.1,
verified1.9.0.1
Comment 30•16 years ago
|
||
(In reply to comment #29) > Don't let me forget to get a multi-config.pl setup! Coop reminded me that its done. And that it was done when I asked last time too. Adding this note so I dont stumble across comment#29 and worry anymore. :-)
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•