Closed Bug 757421 Opened 13 years ago Closed 13 years ago

[HELPDESK]- Machine vc1.qa.mtv1.mozilla.com seems to have not rebooted after last week's power outage

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86_64
Linux
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cmtalbert, Assigned: phong)

Details

Last week's mountain view power outage seemed to take down our ESXi server vc1.qa.mtv1.mozilla.com. I don't know where this box lives, but I know it is in the MV office somewhere. Can someone go switch it on? (Or if it is on, please reboot it because it isn't responding to pings). Thanks
Assignee: desktop-support → hlangi
Summary: Machine vc1.qa.mtv1.mozilla.com seems to have not rebooted after last week's power outage → [HELPDESK]- Machine vc1.qa.mtv1.mozilla.com seems to have not rebooted after last week's power outage
Not sure why it takes more than 3 days to turn on a machine. Meanwhile this is blocking our work in the last two days. Raising bug severity to blocker.
Severity: major → blocker
Henry has been out sick. I can reboot this machine, but there are a lot of machine in Mountain View. I will need more details about it's location.
Assignee: henry → ghuerta
Status: NEW → ASSIGNED
Not sure that Desktop even manages this box. Who is the owner?
Status: ASSIGNED → NEW
We found the machine in the QA lab and restarted it. Note that it was not labeled at all, but now has a post it note on it. That machine should move out of the QA lab to a location where there is a power backup.
Whimboo can you please update the bug so we know we found the right machine?
No, I still can't connect to that box. Not sure what the box is you have turned on in the qa lab. AFAIK the ESX box should not be in the QA lab. Al, can you point us to the location of the vsphere center box? Thanks.
I don't know anything about this anymore. I haven't thought about this in six months. I documented everything at https://wiki.mozilla.org/QA/Infrastructure/Automation_Servers. All QA boxes were in 2.IDF and 3.IDF in MV for the mac minis and the racked servers. The only exception was the old qa-mozmill box in the QA Lab. Dan was involved in vc1. Maybe he can say where it is physically. I expect it is in 3.IDF.
vc1.qa.mtv1.mozilla.com is a VM, hosted by release11.qa.mtv1.mozilla.com
Huh? So how can I log into that machine then? As what I see release11.qa.mtv1.mozilla.com is managed by the vsphere client which lives at vc1.qa.mtv1.mozilla.com. Looks like a circular reference to me. Any hint?
You can point your vc client directly at release11.qa.mtv1.mozilla.com and login as root. From there, you can start vc1.
I'm on OS X and I run it via the web ui. So I don't have a way to start the box. Clint, can you do it please?
I get "cannot connect to server" from vsphere when I try to access release11.qa.mtv1.mozilla.com, either using its name or with the IP that it resolves to.
Repeated ping. This is an unacceptable situation here. It's a blocker for us and given that no-one from us can connect to the machine, please please boot this box for us. Beside that give us clear steps how to start this machine so we are not dependent on issues like that in the future. It would block Firefox releases once our Mozmill tests are running on that box. Thanks.
The ESX servers in Mountain View are officially not supported by IT. The most we can do is best effort break/fix activities. This was explained clearly, by me, to whomever requested this server a few months ago, when I built it. This is why it's taking so long to get going again. I'm sorry if this wasn't made clear to you when you started relying on vc1.qa.mtv1. There are a few reasons why the ESX servers in Mountain View are unsupported, including lack of appropriate network & storage resources and reliable infrastructure. Additionally, please do not run anything that might block a Firefox release or anything else really important on any server in Mountain View. That sort of stuff needs to run on servers in real datacenters, like scl3 or phx1. :phong, can you verify release11.qa.mtv1 is powered on?
Dan, thanks for the clarification. But given that issue we should find a solution until the hardware is in a datacenter. We will discuss possible solutions on our side in our Automation Development meeting later today. So release11.qa.mtv1.mozilla.com is up and reachable but it looks like that the vc1 VM hasn't been started. I think if we could make sure that the server reboots after a power outage and the vc1 gets automatically started we should be fine. Is that possible as a short term solution? (In reply to Clint Talbert ( :ctalbert ) from comment #12) > I get "cannot connect to server" from vsphere when I try to access > release11.qa.mtv1.mozilla.com, either using its name or with the IP that it > resolves to. Oh wait. I think you should use a VNC client to do that. vsphere is not running on the release11 host itself. Sorry for not noticing that earlier.
So I tried to connect with a VNC client but not sure on which port the server is running on. I can't connect to the the default one and following ports like 5900, 5901, or 5902.
I'll be back in the office tomorrow. I can take a look at this first thing in the morning.
Assignee: ghuerta → phong
Component: Server Operations: Desktop Issues → Server Operations
QA Contact: tfairfield → phong
You'll need to run actual, real vSphere Client for Windows and connect to release11 directly, in order to start up the vc1 VM. I just tried doing this myself however, and release11 seems quite dead. It's not responding to pings or anything. Looks like :phong will be able to look at it soon.
Looks like the install CD was still in the drive when there recovered from the power outage. It booted to the CD. I removed the CD and the server is up again. I don't have any of the root password to bring power the VMs up again. If someone can give me that via a IRC, then I can power everything up for you.
Never mind, I remembered the original password I set up for this.
Everything should be online again.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Looks good Phong! Does it mean when another power outage will hit us, the box will now start up automatically and launch the vsphere VM?
Now I tried to connect to the vSphere Web Client at: https://vc1.qa.mtv1.mozilla.com:9443/vsphere-client/# The error I get now is: vSphere Web Client could not connect to the vCenter Server "localhost". So it seems like something isn't setup correctly.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Try using the actual windows vsphere client.
This was resolved.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Phong, can you please answer my question from comment 22?
Not sure what's the issue again, if it has been returned or if it's a new problem. But once again the machine is not available. I can't connect to https://vc1.qa.mtv1.mozilla.com:9443/vsphere-client/ in the MV network. release11.qa.mtv1.mozilla.com which hosts the vsphere VM works just fine.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
It looks like release11 got rebooted again or something similar, none of its VMs were running. I powered vc1 back on. There is no way to make it automatically do this. Again, this is an unsupported system, and all these difficulties are part of continuing to use an unsupported system.
(In reply to Henrik Skupin (:whimboo) from comment #26) > Phong, can you please answer my question from comment 22? This was answered by Dan in comment 28. Looks like it's all up again so I am going to close this bug.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.