Closed
Bug 1069119
Opened 11 years ago
Closed 11 years ago
sea-vm-win32-{1-4} are not responding to rdp connections
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ewong, Assigned: cknowles)
References
Details
+++ This bug was initially created as a clone of Bug #1069117 +++
I can ping and ssh into sea-vm-win32-{1-4} from jump1
However, rdp'ing it via the jump1 tunnel doesn't connect.
| Assignee | ||
Comment 1•11 years ago
|
||
All four of them were complaining that they'd exhausted memory, even swap - rebooted and RDP started responding on all of them.
Assignee: server-ops-virtualization → cknowles
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 2•11 years ago
|
||
(In reply to Chris Knowles [:cknowles] from comment #1)
> All four of them were complaining that they'd exhausted memory, even swap -
> rebooted and RDP started responding on all of them.
Hi Chris,
I'm sorry to report but they're back down. :(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•11 years ago
|
||
Moving to the general queue (just because it's a VM doesn't mean it's a virtualization bug).
Copying :callek (bless 'im, he seems to know if anyone else needs copying on seamonkey things).
Leaving :cknowles as assigned... no real reason, probably not the right choice.
I saw swap exhaustion on one node, and based on Chris' earlier diagnosis of memory exhaustion I gave all 4 a reboot. That this is starting to be patternistic suggests more triage is needed as to what's eating all the RAM.
Status: REOPENED → NEW
Component: Server Operations: Virtualization → Server Operations
QA Contact: cshields → shyam
| Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Greg Cox [:gcox] (plz don't needinfo me) from comment #3)
> Moving to the general queue (just because it's a VM doesn't mean it's a
> virtualization bug).
Thanks Greg. I'll keep that in mind.
> Copying :callek (bless 'im, he seems to know if anyone else needs copying on
> seamonkey things).
> Leaving :cknowles as assigned... no real reason, probably not the right
> choice.
>
> I saw swap exhaustion on one node, and based on Chris' earlier diagnosis of
> memory exhaustion I gave all 4 a reboot. That this is starting to be
> patternistic suggests more triage is needed as to what's eating all the RAM.
I'm wondering if a run-away process is horking the system.
Comment 5•11 years ago
|
||
Hey greg, can you/someone help us get these hosts back online.
Sadly its a hard-to-diagnose issue on our end, (we just switched to "mozmake" instead of MSYS-based GNU Make.
The system is running fine on moco systems, (w2k8, while we have w2k3) and seems to run fine on non-debug seamonkey builds, its just the debug builds that seem to cause this issue. We are also running tests on these systems unlike moco jobs. In order to diagnose and actually fix it will take some elbow grease on our part. I'm not aware of any smoking gun at this point.
Unfortunately that time and elbow grease may mean a few repeated bugs for these being down (or at least reopens of this one). I ask for your patience while we do so, and particularly am open to screams/concerns from your end if this even comes close to destabilizing the ESX cluster, which aiui its not currently doing so.
I'm also *willing* to get the access and a brief training on how to do this very work so we don't bother you guys, but I'm unsure if there are political/procedural hurdles to cross that make it an untenable option, since we hope this is a short term solution.
TL;DR: please reboot these from the admin console, and don't worry about diagnosing any current state, we'll try and handle that on our end for now.
Flags: needinfo?(gcox)
Comment 6•11 years ago
|
||
Couldn't give them a graceful reboot, they'd gone unresponsive. Kicked harder, they seem back.
From what I've seen on the console / VM resources, it looks like it's running out of RAM.
It doesn't seem to be adversely affecting affecting the cluster. It's more that it's in that "community, so it's not really supported" vs "building product, so it kinda is" limbo that is going to have to be sorted out someday, for exactly cases like this.
Flags: needinfo?(gcox)
| Reporter | ||
Comment 7•11 years ago
|
||
They're back up now for a few days.
| Assignee | ||
Comment 8•11 years ago
|
||
Any word on these, or do you need anything so that we can close this out?
Updated•11 years ago
|
Flags: needinfo?(ewong)
| Reporter | ||
Comment 9•11 years ago
|
||
(In reply to Chris Knowles [:cknowles] from comment #8)
> Any word on these, or do you need anything so that we can close this out?
Ah yes, sorry. This bug is good for resolution.
Thanks
Flags: needinfo?(ewong)
| Assignee | ||
Comment 10•11 years ago
|
||
Alright - let us know if you need anything further.
Status: NEW → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•