Closed
Bug 690051
Opened 13 years ago
Closed 12 years ago
rust-mac1.mv.mozilla.com is unresponsive
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: andersrb, Unassigned)
References
Details
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:9.0a1) Gecko/20110926 Firefox/9.0a1
Build ID: 20110926030901
Steps to reproduce:
This machine hasn't been performing its duties for a few days. I can ping it but can't shell in. Can we get a reboot?
Updated•13 years ago
|
Assignee: server-ops-labs → gozer
Updated•13 years ago
|
Assignee: gozer → server-ops-labs
Comment 1•13 years ago
|
||
I'll smack it on Monday. This machine is falling over a lot, are you doing something particuarly evil to it?
Assignee: server-ops-labs → zandr
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Reporter | ||
Comment 2•13 years ago
|
||
I'm not aware of any evil being committed against this machine.
Comment 3•13 years ago
|
||
Kicking over to the Relops queue since that lets me set colo-trip: mtv1
Assignee: zandr → server-ops-releng
Component: Server Operations: Labs → Server Operations: RelEng
Updated•13 years ago
|
colo-trip: --- → mtv1
Updated•13 years ago
|
Assignee: server-ops-releng → jwatkins
Comment 4•13 years ago
|
||
This machine remains offline. Can we get someone to poke at it? Might not take much more than a power-cycle.
Comment 5•13 years ago
|
||
Machine has been poked (power-cycled)
Comment 6•13 years ago
|
||
Thanks, back online.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 7•13 years ago
|
||
This machine keeps going "unresponsive". I am able to ssh in, and everything appears normal, but it's dead until I ssh in.
I've dug further into its syslogs and it appears to be ... powering off? Idling? I think it's waking up only when I ssh to it. It's a build slave, so it actually has to _not sleep_, even if it's not busy at the moment. I would run 'pmset sleep 0' on it, but I do not have root access, only administrator access.
I compared configuration to the other mac we're using (rust-mac2), which does _not_ randomly fall off the net, and it seems like they have the same power settings, but mac2 is constantly (every 5 seconds) associating and disassociating with a bluetooth mouse someone has left near the machine. So I think mac2 is only staying awake due to its mistaken belief that someone is "eternally wiggling its mouse". This can't be healthy.
Can someone do the following:
- Move the 'moco mouse' bluetooth mouse away from mac2
- log in to both mac1 and mac2 and 'pmset sleep 0' the machines
(or whatever else is "normal make-the-mac-not-sleep configuration" for
build slaves in mozilla clusters)
- Tell me what was done
- Possibly: email me the root password or put my ssh pubkey on root as
well so I can do this sort of thing myself?
Thanks
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•13 years ago
|
||
Hi, Graydon, you should be able to change the power settings as Administrator (via the GUI at the very least) and change the root password as well. We don't have the password to these machines since they were built and handed off.
Comment 9•13 years ago
|
||
I forgot to mention that you should also be able to turn off bluetooth as well.
Comment 10•13 years ago
|
||
So it turns out that you never changed the Administrator password since these machines were built. I VNCed in and turned off sleep and bluetooth for you.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Comment 11•12 years ago
|
||
This has gone from "randomly sleeping but available to wake-on-LAN" to "totally unreachable". Can someone take another look at it? Whatever was attempted last time made matters worse.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 12•12 years ago
|
||
FYI, there are lots of network problems w/ the entire labs cluster as part of the move. It's nothing rust specific in other words =)
Comment 13•12 years ago
|
||
(In reply to David Ascher (:davida) from comment #12)
> FYI, there are lots of network problems w/ the entire labs cluster as part
> of the move. It's nothing rust specific in other words =)
I think it is. Unless by "move" you mean a longer-term move (in MV). I've been meaning to point this out for a week or two; it's been unreachable for some time.
Comment 14•12 years ago
|
||
This isn't a labs box, afaik. It's a one-off, basically.
We should get you admin on this mini, and turn it back on. Luckily, all of the flags are still set for that from the last time it was reopened..
Comment 15•12 years ago
|
||
Im pretty sure this mini is toast. It powers up but doesn't get past the grey boot screen. Sending it to Desktop for repairs.
Comment 16•12 years ago
|
||
rust-mac2.mv.mozilla.com has also become unreachable. Just now. That's all our macs, so rust development is now at "no test coverage" on macs.
Can we get some loaner macs? Or can someone take a look at what happened to mac2?
Comment 17•12 years ago
|
||
mac-rust2 was asleep. I woke it up and it seems to be running now.
Connection to 10.250.1.239 22 port [tcp/ssh] succeeded!
Comment 18•12 years ago
|
||
Ok. In https://bugzilla.mozilla.org/show_bug.cgi?id=690051#c10 the sleep function on mac2 was supposedly turned off, yet it slept here. Is there a normal routine of setting macs so they don't fall asleep?
(I realize this stuff is fussy, I read the instructions concerning sleep-inhibition a dozen times and couldn't make heads or tails of them. There appear to be multiple overlapping tools and settings. Just wondering if IT has a standard incantation...)
Comment 19•12 years ago
|
||
More than likely the power button on rust-mac2 got knocked while extracting rust-mac1. Both minis were mounted in the same sonnet chassis and it was close to impossible to remove one without disturbing the other. Especially since the power and network cables are so taut. I did my best, my apologies.
Comment 20•12 years ago
|
||
No worries. I mostly just want to stop bugging y'all. Appreciate the ongoing babysitting. I have a mini here as well I'll probably put back into service once the vancouver move completes, should add a little redundancy to our party.
Comment 21•12 years ago
|
||
mac-rust1 has returned from repair from which the harddrive was replaced. Can someone provide some insight as to what needs be installed? (eg. OS)
Comment 22•12 years ago
|
||
per irc with :graydon
I've installed osx Lion and set the username/password.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Comment 23•12 years ago
|
||
Apologies for the run-around. While attempting to put the machine back in service, I appear to have disabled the machine again. This was due to my mistaken belief that the GCC packages made by 3rd party packagers were still usable on lion. This is no longer true. I need to use Xcode, and (sadly) install it via the app store, remotely operated by VNC.
Unfortunately in the process of discovering and attempting to repair (un-install) the broken GCC package, I wound up deleting substantial portions of the OS. I suspect it won't even boot anymore, due to my own damage. It needs to be re-imaged.
I'm sorry. Would prefer to be doing all this manually but at a distance have to keep asking for help.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 24•12 years ago
|
||
With the reorg and relops no longer being part of specops, I'm going to pass this bug off to the SRE group to see who should be handling things for these machines going forward.
Assignee: jwatkins → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → phong
Comment 25•12 years ago
|
||
This got taken to SCL1 yesterday to have a new image put on it again. It is now up and running back in 3.mdf.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•