Closed
Bug 423816
Opened 17 years ago
Closed 17 years ago
tinderbox cb-xserve03 unreachable after reboot
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: ause, Unassigned)
Details
ssh login from jumphost fails with "no route to host" which, as far as i understand, is rather no problem of the machine itself
Comment 1•17 years ago
|
||
Phong - box just needs a reboot. It's in 101.05-9.
Assignee: server-ops → phong.tran
Flags: colo-trip+
Comment 2•17 years ago
|
||
server won't boot up. it is stuck at the apple screen.
Status: NEW → ASSIGNED
Comment 3•17 years ago
|
||
I was able to boot it to the cd after a very long wait. I am leaving it here.
Comment 4•17 years ago
|
||
Phong, could you please help us to get the machine up and running again? Ause mentioned it's still *not* accessible from outside. We're just about to release the next calendar version and this whole outage hits us hard...
i still can not reach it and have no physical access to that machine. i even
have no clue where this box may be located physically. it's the tinderbox we
need to do our 0.8 release which was scheduled this week.
any suggestion on how to proceed?
Updated•17 years ago
|
Severity: normal → critical
Priority: -- → P1
Comment 6•17 years ago
|
||
<lowering the sev as not to page people>
This is a hardware issue from the sounds of it, and it will have to be sent in to apple for repair (usually 5-7 days), assuming it is still under warranty. Unfortunately this won't be a quick fix.
Severity: critical → major
Comment 7•17 years ago
|
||
Apologies if I'm telling folks things they already know but before sending it in, the following might help:
1. Attempt to boot with a keyboard connected (DIRECTLY... not via a KVM switch), holding down Shift while booting. If this works, it will boot in Safe Mode, and you can run Disk First Aid from there.
2. If that doesn't work, attempt to boot with Apple-V held down. This is verbose mode and should allow you to see kernel messages, etc. Perhaps you can see where it's hanging up.
3. Once booted from the OSXS install CD, run Disk Utility, and First Aid on it to see if there are errors. If so, attempt to fix them. If Disk First Aid can't fix it, try Alsoft DiskWarrior.
4. Run the Apple Hardware Diagnostics CD. It usually can find what's broke.
5. Put the drives in another Xserve and see if it boots there.
6. Put each drive of the mirror separately in another Xserve and see if it boots there.
Comment 8•17 years ago
|
||
One more thing:
If the mirror is still intact, but just the OS is borked, don't wipe both mirror drives when installing, so we can attempt to copy over all the tinderbox bits, rather than having to do so from scratch.
Comment 9•17 years ago
|
||
> 1. Attempt to boot with a keyboard connected (DIRECTLY... not via a KVM
> switch), holding down Shift while booting. If this works, it will boot in Safe
> Mode, and you can run Disk First Aid from there.
>
> 2. If that doesn't work, attempt to boot with Apple-V held down. This is
> verbose mode and should allow you to see kernel messages, etc. Perhaps you can
> see where it's hanging up.
>
> 3. Once booted from the OSXS install CD, run Disk Utility, and First Aid on it
> to see if there are errors. If so, attempt to fix them. If Disk First Aid
> can't fix it, try Alsoft DiskWarrior.
All done - no help, no disk errors.
> 4. Run the Apple Hardware Diagnostics CD. It usually can find what's broke.
We could do this, but still has to go to apple.
> 5. Put the drives in another Xserve and see if it boots there.
>
> 6. Put each drive of the mirror separately in another Xserve and see if it
> boots there.
All others are in use - no spares - do you have a spare to try (or someone in the community)?
Also, your next comment is a good one. If we have to go down that route, will do.
Comment 10•17 years ago
|
||
Justin, has the machine been sent to Apple yet? We really need a quick turn around on this, since it is blocking the next calendar release (supposed to have released *this* week).
Thanks for your help.
Comment 11•17 years ago
|
||
This machine is defined as having tier3 support per (http://wiki.mozilla.org/Build:Farm) - we have a lot else going on with betas/outages, so may be sometime next week before we get this back.
Copying John Oduinn so he's in the loop.
Comment 12•17 years ago
|
||
(In reply to comment #11)
> This machine is defined as having tier3 support per
> (http://wiki.mozilla.org/Build:Farm) - we have a lot else going on with
> betas/outages, so may be sometime next week before we get this back.
>
> Copying John Oduinn so he's in the loop.
>
We know everyone is busy with the upcoming beta. Sam S. offered to loan us a tinderbox from the Camino project in the meantime, if that pans out, then this bug won't block the calendar release. We'll report back here if we can use the tinderbox he has available.
Comment 13•17 years ago
|
||
(In reply to comment #11)
> This machine is defined as having tier3 support per
> (http://wiki.mozilla.org/Build:Farm) - we have a lot else going on with
> betas/outages, so may be sometime next week before we get this back.
>
> Copying John Oduinn so he's in the loop.
aiui, cb-xserve03 is a PPC-based xserve.
We do have a PPC-based xserve (xserve04) that we just mothballed. Our plan was to let xserve04 sit idle for a little bit in case anyone needed it turned back on urgently, then do backup and reimage so we would have a spare PPC-xserve in case one of ours died. If the loaner from Sam doesnt work out, let me know and we can speed up the backup/reimage/switchnetworks/re-key and let you use xserve04 while Apple do repairs on cb-xserve03... Keep in mind, with all the other releases going on this week, its still going to be at least mid-late next week though.
(We also have an intel-based xserve with 10.5, but not sure thats useful to you).
(...and yuk, lousy timing on the hardware blowout. Between this and the colo fun, it seems like the photocopier gremlins have gone high tech this week.)
Comment 14•17 years ago
|
||
Ok, so looks like this may be a corrupt OS image - apple wants a fresh install of the OS - can we do this? We can't really keep one copy of the drive as you won't have a raid'd device to install the OS onto. Can you re-create your dev env from a fresh install?
If so, someone can do the install tonight.
Reporter | ||
Comment 15•17 years ago
|
||
to be honest, i simply don't know. i'm not aware of anything magic and it had been done once. so it should be possible again.
on the other hand i havn't setup this machine. lilmatt?
Severity: major → normal
Priority: P1 → --
Comment 16•17 years ago
|
||
coop, what's the best way forward here ? Can we use our PPC image and re-scrub ?
Comment 17•17 years ago
|
||
(In reply to comment #16)
> coop, what's the best way forward here ? Can we use our PPC image and re-scrub
> ?
>
Nick: yes, that's what we've done in the past.
Comment 18•17 years ago
|
||
As a reminder, the original src ppc image is still available on one of the external FW drives.
Comment 19•17 years ago
|
||
The new IP is 10.2.73.252
Comment 20•17 years ago
|
||
Ignore that last comment. I was updating the wrong bug.
Comment 21•17 years ago
|
||
The first restore took over 3 hours and failed with 2 files not copied. I recreated the RAID and trying the restore again. I will leave it running overnight. I will come back and check on it tomorrow.
Comment 22•17 years ago
|
||
CB-XSERVER03 is up and running again. What IP address should I assign it?
Comment 23•17 years ago
|
||
I have it attached to one of the console in 101.05.
Comment 24•17 years ago
|
||
I think you left - any idea what console channel that was?
Comment 25•17 years ago
|
||
IP: 63.245.210.20 / 255.255.255.224
GW: 63.245.210.1
Nameservers:
64.127.100.12
64.235.225.10
Comment 26•17 years ago
|
||
cb-xserve03 is up and running again.
Reporter | ||
Comment 27•17 years ago
|
||
i can connect to that machine now but i can not login a calbld. how to initially access this machine?
Reporter | ||
Comment 28•17 years ago
|
||
reading the comments in this bug, i'm not sure if this machine is now "ready to build up a tinderbox again" or "ready to be given to apple".
anyone got a hint for me?
Comment 29•17 years ago
|
||
Email me offline username/password - this is a clone of a build image with their logins. I'll add that account for you.
(or send me your keys and I'll dump you into root's key file)
Reporter | ||
Comment 30•17 years ago
|
||
(In reply to comment #29)
> Email me offline username/password - this is a clone of a build image with
> their logins. I'll add that account for you.
>
> (or send me your keys and I'll dump you into root's key file)
>
done
Comment 31•17 years ago
|
||
resolving
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 32•17 years ago
|
||
I don't see cb-xserve03 back on the tinderbox pages, neither trunk nor mozilla1.8 branch. Is there another bug to make it work again?
Reporter | ||
Comment 33•17 years ago
|
||
(In reply to comment #32)
> I don't see cb-xserve03 back on the tinderbox pages, neither trunk nor
> mozilla1.8 branch. Is there another bug to make it work again?
>
i can login as root now but neither does the user calbld exist nor does vnc work for me (black on black).
also there are lots of complains in system.log and asl.log that reverse dns and hostname do not match which may or may not have something to do with my vnc problem. i found this in windows.log:
Apr 03 04:19:46 [113] kCGErrorCannotComplete: CGXPostNotification2 : Time out waiting for reply from "" for notification type 102 (CID 0x9fff, PID 18851)
i think that's worth reopening...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 34•17 years ago
|
||
No longer blocks Bug 420840 due to the tinderbox borrowed from the Camino project.
No longer blocks: 420840
Comment 35•17 years ago
|
||
VNC works for me on OSX to that box - verified that the other day with you.
DNS matches to me:
cb-xserve03:/var/log root# host 63.245.210.20
20.210.245.63.in-addr.arpa domain name pointer cb-xserve03.mozilla.com.
cb-xserve03:/var/log root# host cb-xserve03
cb-xserve03.mozilla.com has address 63.245.210.20
cb-xserve03:/var/log root# host cb-xserve03.mozilla.com
cb-xserve03.mozilla.com has address 63.245.210.20
IIRC, you're not on OSX - can you try a different VNC client?
Blocks: 420840
Reporter | ||
Comment 36•17 years ago
|
||
i meanwhile tried four different clients on three different machines (linux and windows). the result is all the same: windows is black on black when limiting protocol version to 3.x, connection lost after some garbage with 4.x.
which account did you use for tunneling? could you try as root, which is currently the only option for me?
regarding dns, logs are full of this messages. no idea what's wrong.
Reporter | ||
Comment 37•17 years ago
|
||
ok, found a mac an tried a mac vnc client (tunneled vnc over vnc...) - just to find me on a display belonging to cltbld and requiring its password for whatever i need.
Comment 38•17 years ago
|
||
The standard setup for a community box is
* set root password to the one known by Build and community admins
* setup up a calbld user with the defined password for that account
* set VNC password to the calbld password
* delete the cltbld user
I'll try to catch mrz to do step one, and will then do the rest.
Assignee: phong.tran → nrthomas
Status: REOPENED → NEW
Comment 39•17 years ago
|
||
root password's been changed. Nothing left for IT to do.
Assignee: nrthomas → nobody
Component: Server Operations → Release Engineering
Flags: colo-trip+
QA Contact: justin → release
Comment 40•17 years ago
|
||
All done, and also installed calbld keys and fixed up the CVS/Root files in /builds/tinderbox/mozilla.
Status: NEW → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Status: RESOLVED → VERIFIED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•