Closed Bug 795017 Opened 12 years ago Closed 12 years ago

nss-vm-darwin9-1.community.scl3.mozilla.com and nss-vm-darwin10-1 are offline

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KaiE, Assigned: mburns)

References

Details

Both nss-vm-darwin9-1.community.scl3.mozilla.com and nss-vm-darwin10-1.community.scl3.mozilla.com are offline.

When I try to connect to them from jump1.community.scl3.mozilla.com
I get "no route to host".

Could you please have a look? Routing issue or VMs down? Thank you.
Looks like parallels2 *and* parallels3 are down, but not parallels1, suggesting that the sonnet shelf got unplugged or the like..


dmitchell@jump1 ~ $ ping parallels3
PING parallels3.community.scl3.mozilla.com (63.245.223.18) 56(84) bytes of data.
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=1 Destination Host Unreachable
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=2 Destination Host Unreachable
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=3 Destination Host Unreachable
^C
--- parallels3.community.scl3.mozilla.com ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4091ms
pipe 3
dmitchell@jump1 ~ $ ping parallels2
PING parallels2.community.scl3.mozilla.com (63.245.223.17) 56(84) bytes of data.
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=1 Destination Host Unreachable
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=2 Destination Host Unreachable
From jump1.community.scl3.mozilla.com (63.245.223.8) icmp_seq=3 Destination Host Unreachable
^C
--- parallels2.community.scl3.mozilla.com ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3516ms
pipe 3
dmitchell@jump1 ~ $ ping parallels1
PING parallels1.community.scl3.mozilla.com (63.245.223.22) 56(84) bytes of data.
64 bytes from parallels1.community.scl3.mozilla.com (63.245.223.22): icmp_seq=1 ttl=64 time=0.282 ms
64 bytes from parallels1.community.scl3.mozilla.com (63.245.223.22): icmp_seq=2 ttl=64 time=0.312 ms
^C
--- parallels1.community.scl3.mozilla.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1566ms
rtt min/avg/max/mdev = 0.282/0.297/0.312/0.015 ms
Component: Server Operations → Server Operations: DCOps
QA Contact: jdow → dmoore
Both parallels2 and parallels3 had frozen, having run out of swap.  Googling around for the error message didn't really find any good explanations.

We restarted both, and upgraded them to 10.7.5.  Let's see if that's more stable.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Thank you, but darwin10 is still offline (no route to host)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hm, it had been stuck in the "Starting" state.  I rebooted parallels3 again, and now the VM is up.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Thank you.

It's really surprising how unreliable these VMs are.
Status: RESOLVED → VERIFIED
That's a kind word for it!
again both are offline.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
dcops/oncall, maybe it's time for a support call to parallels?

Or, just abandon parallels altogether?  Although it's not clear how we'd find hardware for a snow-leopard system.  Maybe we should just stop supporting OS X, since Apple seems determined to make it impossible for us to ship software for it :(
colo-trip: --- → scl3
Oncall, we're kinda out of ideas here.  What do you think the best fix is?

I *think* this hardware can run Darwin10 (Lion).  I think it can't run Snow Leopard, but I'm not certain.  I suspect Darwin10 would run better outside of parallels.
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → jdow
The problem is that we are about to release a new major release of NSS 3.14 by Thursday, we received reports about a potential Mac OSX specific bug (specific either to 64 bit or specific to 10.7), and we currently have zero testing coverage for those environments.
Blocks: 799572
DCOps, can you reboot these?
Component: Server Operations → Server Operations: DCOps
QA Contact: jdow → dmoore
Hosts have been rebooted and are pingable.

[vle@admin1b.private.scl3 ~]$ fping parallels3.community.scl3.mozilla.com
parallels3.community.scl3.mozilla.com is alive
[vle@admin1b.private.scl3 ~]$ fping parallels2.community.scl3.mozilla.com
parallels2.community.scl3.mozilla.com is alive
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
[kaie@jump1.community.scl3 ~]$ ssh tinderbox@nss-vm-darwin10-1.community.scl3.mozilla.com
ssh: connect to host nss-vm-darwin10-1.community.scl3.mozilla.com port 22: No route to host
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'll get that.  Thanks, Van!
They're both up, and I logged in on the console on both.  Sorry we got into long-term solutions and forgot the short-term solution.

Back to server-ops to figure out that long-term solution, or at least the next step.
Status: REOPENED → NEW
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → jdow
No longer blocks: 799572
Assignee: server-ops → mburns
Whatever you did, thanks a lot for keeping both machines online and stable since the last comment (except darwin9 being shut off once).

Closing this bug for now.
Status: NEW → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.