Closed Bug 408761 Opened 17 years ago Closed 17 years ago

cb-xserve02 is AWOL

Categories

(Infrastructure & Operations :: RelOps: General, task, P2)

PowerPC
macOS

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: kairo, Assigned: coop)

Details

I noticed that cb-xserve02 has been building for 4 hours on the SeaMonkey tinderbox tree, while usually builds take ~15min, so I wanted to check what the machine is doing, when I got this from the jumphost:

[kairo@cb-jumphost01 ~]$ ssh -l seabld cb-xserve02.mozilla.com
ssh: connect to host cb-xserve02.mozilla.com port 22: No route to host
Assignee: server-ops → sean
Can't get this machine to boot with the current memory configuration (new DIMM), the previous memory configuration (old bad DIMM that used to work), or a minimal memory configuration (removed new DIMM and corresponding DIMM in the opposite bank). Pretty sure it's not memory at this point, but the machine still isn't getting passed the Apple screen when booted off the HDD. I guess the next step would be to wipe the drives and reinstall. Thoughts?
Status: NEW → ASSIGNED
From our (SeaMonkey) side there's nothing out of the ordinary on that box, so I see no problem with wiping and reinstalling, as long as we can get the box back...
Actually, it would be easier to re-image if possible. Which box can I take an image of that would work for cb-xserve02?

Would cb-xserve03 work?
You'll have to clear it with the Calendar guys in terms of downtime (for cb-xserve03), and then I'll need to scrub the Calendar-specific keys, etc. from cb-xserve02 prior to handing it back to the SeaMonkey guys.

If this gets it back up faster, go for it.
I wouldn't need any downtime to take an image of cb-xserve03, but if there are things on that box that you're not supposed to have access to then that's a different story. Was there another machine with a like configuration that I could just take an image of?
The scrubbing and account re-creation process would be fairly fast...probably about an hour once the machine is back up. Improper access wouldn't be an issue; the SeaMonkey people couldn't login until their key was in place anyway, and that's the final step in the procedure.

The only other similarly configured machine is cb-xserve01, which is used by Camino.
(In reply to comment #6)
> The only other similarly configured machine is cb-xserve01, which is used by
> Camino.

Which would need the scrubbing as well, I'm guessing.

(In reply to comment #7)
> Which would need the scrubbing as well, I'm guessing.

Correct. 

Would be nice if we got some progress here, as SeaMonkey has been missing Mac coverage for a few days now and it would be good to get back testing on that platform...
I'm taking an image of cb-xserve03 now.
Tried twice to take an image of cb-xserve03 and it failed out twice, trying cb-xserve01 now.
cb-xserve01 also failed to give a complete image, guess this is going to require an erase/install after all.
coop, was this box originally the build image then accounts scrubbed etc etc ?
Sean got the machine itself up and running again, but it still needs the tinderbox stuff set up in the new seabld user account.
KaiRo: I'll get it set up for you in a couple of hours.
Assignee: sean → ccooper
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Priority: -- → P2
Build is back reporting again.

KaiRo: everything *should* be as it was before, but let me know if it is not. Login info is the same as previously.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
It reports to the tinderbox waterfall page, produces nightlies and I can log in, so everything seems to be OK. Thanks for your help!
Status: RESOLVED → VERIFIED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.