1069119 - sea-vm-win32-{1-4} are not responding to rdp connections

Reporter

Description

•

11 years ago

+++ This bug was initially created as a clone of Bug #1069117 +++ I can ping and ssh into sea-vm-win32-{1-4} from jump1 However, rdp'ing it via the jump1 tunnel doesn't connect.

Chris Knowles [:cknowles]

Assignee

Comment 1

•

11 years ago

All four of them were complaining that they'd exhausted memory, even swap - rebooted and RDP started responding on all of them.

Assignee: server-ops-virtualization → cknowles

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Edmund Wong (:ewong)

Reporter

Comment 2

•

11 years ago

(In reply to Chris Knowles [:cknowles] from comment #1) > All four of them were complaining that they'd exhausted memory, even swap - > rebooted and RDP started responding on all of them. Hi Chris, I'm sorry to report but they're back down. :(

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Greg Cox [:gcox]

Comment 3

•

11 years ago

Moving to the general queue (just because it's a VM doesn't mean it's a virtualization bug). Copying :callek (bless 'im, he seems to know if anyone else needs copying on seamonkey things). Leaving :cknowles as assigned... no real reason, probably not the right choice. I saw swap exhaustion on one node, and based on Chris' earlier diagnosis of memory exhaustion I gave all 4 a reboot. That this is starting to be patternistic suggests more triage is needed as to what's eating all the RAM.

Status: REOPENED → NEW

Component: Server Operations: Virtualization → Server Operations

QA Contact: cshields → shyam

Edmund Wong (:ewong)

Reporter

Comment 4

•

11 years ago

(In reply to Greg Cox [:gcox] (plz don't needinfo me) from comment #3) > Moving to the general queue (just because it's a VM doesn't mean it's a > virtualization bug). Thanks Greg. I'll keep that in mind. > Copying :callek (bless 'im, he seems to know if anyone else needs copying on > seamonkey things). > Leaving :cknowles as assigned... no real reason, probably not the right > choice. > > I saw swap exhaustion on one node, and based on Chris' earlier diagnosis of > memory exhaustion I gave all 4 a reboot. That this is starting to be > patternistic suggests more triage is needed as to what's eating all the RAM. I'm wondering if a run-away process is horking the system.

Justin Wood (:Callek)

Comment 5

•

11 years ago

Hey greg, can you/someone help us get these hosts back online. Sadly its a hard-to-diagnose issue on our end, (we just switched to "mozmake" instead of MSYS-based GNU Make. The system is running fine on moco systems, (w2k8, while we have w2k3) and seems to run fine on non-debug seamonkey builds, its just the debug builds that seem to cause this issue. We are also running tests on these systems unlike moco jobs. In order to diagnose and actually fix it will take some elbow grease on our part. I'm not aware of any smoking gun at this point. Unfortunately that time and elbow grease may mean a few repeated bugs for these being down (or at least reopens of this one). I ask for your patience while we do so, and particularly am open to screams/concerns from your end if this even comes close to destabilizing the ESX cluster, which aiui its not currently doing so. I'm also *willing* to get the access and a brief training on how to do this very work so we don't bother you guys, but I'm unsure if there are political/procedural hurdles to cross that make it an untenable option, since we hope this is a short term solution. TL;DR: please reboot these from the admin console, and don't worry about diagnosing any current state, we'll try and handle that on our end for now.

Flags: needinfo?(gcox)

Greg Cox [:gcox]

Comment 6

•

11 years ago

Couldn't give them a graceful reboot, they'd gone unresponsive. Kicked harder, they seem back. From what I've seen on the console / VM resources, it looks like it's running out of RAM. It doesn't seem to be adversely affecting affecting the cluster. It's more that it's in that "community, so it's not really supported" vs "building product, so it kinda is" limbo that is going to have to be sorted out someday, for exactly cases like this.

Flags: needinfo?(gcox)

Edmund Wong (:ewong)

Reporter

Comment 7

•

11 years ago

They're back up now for a few days.

Chris Knowles [:cknowles]

Assignee

Comment 8

•

11 years ago

Any word on these, or do you need anything so that we can close this out?

Justin Wood (:Callek)

Updated

•

11 years ago

Flags: needinfo?(ewong)

Edmund Wong (:ewong)

Reporter

Comment 9

•

11 years ago

(In reply to Chris Knowles [:cknowles] from comment #8) > Any word on these, or do you need anything so that we can close this out? Ah yes, sorry. This bug is good for resolution. Thanks

Flags: needinfo?(ewong)

Chris Knowles [:cknowles]

Assignee

Comment 10

•

11 years ago

Alright - let us know if you need anything further.

Status: NEW → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

Bugzilla

sea-vm-win32-{1-4} are not responding to rdp connections

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: ewong, Assigned: cknowles)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Updated