Closed Bug 718339 Opened 14 years ago Closed 14 years ago

please give cb-parallels01 a kick

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: mlarrain)

References

Details

+++ This bug was initially created as a clone of Bug #717205 +++ cb-parallels01 seems to be stuck. ssh: connect to host cb-seamonkey-linuxmaster-01 port 22: No route to host I tested all other hosts on parallels and all had No Route To Host... we need to reboot the whole host machine, (and possibly run checkdisk/fsck on the slaves if they don't come back up)
Apart from this obviously needing to be done, this happening twice in such a short order makes me wonder if there is anything we have been doing recently to that host that makes this failure more likely. If we can ovoid it, that would be really great going into the future.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #1) > Apart from this obviously needing to be done, this happening twice in such a > short order makes me wonder if there is anything we have been doing recently > to that host that makes this failure more likely. If we can ovoid it, that > would be really great going into the future. Yea, I have a few theories here that I hope to look into this week. Few Potentials and why I suspect: * cb-seamonkey-linux-{01,02} are expanding their HD space at the same time. ** The VMs on this host are currently all configured to use HD space ONLY when actually needed, and the Host will expand the HD space reserved to the VM's as necessary, up to the configured maximum. ** Parallels has been known to have weird issues when resource contention issues happen, and I suspect this may be one of them. ** SOLUTION: To arbitrarily increase the used HD space, on each slave to fill up the available space, and then clean that space back out. To be sure we expand at a point this won't happen. --or-- have IT change the VM's HD type on the VM itself (I'm not sure if this is possible to make it non-expandable without losing all data) * cb-seamonkey-linuxmaster-01 is becoming overloaded causing weird behavior on the Host system overall. ** We have been getting weird twisted log messages about a particular http page being loaded, that none of us 3 SeaMonkey releng guys are loading, and this page in particular (and many other pages on the site) cause high load and if many loads are done at once of the http interface, it can be blocking on the ability of the host to do other work. Since it will begin paging, and the master is only a 1CPU/Core setup, in a VM of course ** SOLUTION: Figure out what is being loaded, when, and from where. And investigate. Potentially necessary to stuff a robots.txt and a htaccess password prompt for our master http exposure. To do any of the above, we need the host itself brought back up though. And of course it could be something else, my plan to get Nagios stuff begun to be added this week should alleviate all the surprise of when this stuff might happen, (we'd get notified early when PINGs timeout, for example)
(In reply to Justin Wood (:Callek) from comment #2) > * cb-seamonkey-linux-{01,02} are expanding their HD space at the same time. I remember such things have caused Parallels to die previously. We should not assign more (maximum) HD space to slaves than the actual available space on the host machine and assign fixed space if possible. > * cb-seamonkey-linuxmaster-01 is becoming overloaded causing weird behavior > on the Host system overall. This shouldn't bring down the whole host, but at worst only this one VM, AFAIK.
Matt brought this host back up, (he said the RAID controller/battery/whatever was having issues) He left cb-seamonkey-linux-02 down for now, we'll get that triaged in another bug, and I'll get him to get a bug on file about the RAID issue he saw. THANKS matt!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.