Closed
Bug 915243
Opened 11 years ago
Closed 11 years ago
Please investigate bld-lion-r5-040
Categories
(Infrastructure & Operations :: DCOps, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: arich)
References
Details
This slave is pingable, and ssh and VNC seem to be running, but I cannot connect via either protocol. I'm guessing this is due to an aborted re-image. Can someone from DCOps please investigate, and kick off a new re-image if required?
Updated•11 years ago
|
Assignee: server-ops-dcops → eramirez
colo-trip: --- → scl3
Assignee | ||
Comment 1•11 years ago
|
||
vinh was having trouble netbooting this machine, and I'm wondering if something broke the firewall rules for contacting the install.build.releng.scl3.mozilla.com. The symptom was that he would netboot and would get the flashing globe, then the sad folder (both of the disks were just replaced). He tried swapping the cable, network ports, etc. When he put it on the test vlan, he got DS to answer (but install didn't continue to work because it was on the wrong vlan). The address is offered and acked in the DHCP logs on admin1a/b. A tcpdump on install.build.releng.scl3.mozilla.com only shows the BOOTP transaction, nothing after that: 15:22:36.199304 IP 10.26.52.1.bootps > install.build.releng.scl3.mozilla.com.bootps: BOOTP/DHCP, Request from 3c:07:54:72:50:27 (oui Unknown), length 292 15:22:36.272694 IP 10.26.52.1.bootps > install.build.releng.scl3.mozilla.com.bootps: BOOTP/DHCP, Request from 3c:07:54:72:50:27 (oui Unknown), length 304 15:22:36.285950 IP 10.26.52.1.bootps > install.build.releng.scl3.mozilla.com.bootps: BOOTP/DHCP, Request from 3c:07:54:72:50:27 (oui Unknown), length 293 15:22:37.177240 IP 10.26.52.1.bootps > install.build.releng.scl3.mozilla.com.bootps: BOOTP/DHCP, Request from 3c:07:54:72:50:27 (oui Unknown), length 312 15:22:36.286195 IP install.build.releng.scl3.mozilla.com.bootps > bld-lion-r5-040.try.releng.scl3.mozilla.com.bootpc: BOOTP/DHCP, Reply, length 316 I also tried rebooting the install machine just in case it was something non-obvious wedged there (same result). He started a recovery install over the internet, and that seems to be functioning just fine (and the machine gets the proper IP). We don't currently have another machine to test in this VLAN, but I suspect that would confirm/deny the problem. Some more in-depth troubleshooting with netops is probably required. Dustin, can you take a look at this Friday?
Assignee: eramirez → dustin
Comment 2•11 years ago
|
||
I have the mac mini hooked up to a spider kvm for remote console. 10.22.0.155 If you need to do a PDU reboot (mini is connected to a temp. outlet): pdu1b.r101-22.ops.releng.scl3.mozilla.com Outlet - BB9 Ping me if you need the generic log in credentials for the mac mini.
Comment 3•11 years ago
|
||
When I had a look at this a while back, it was failing to puppetize because set_hostname.sh had set its hostanme to NXDOMAIN. Also, I power-cycled the mini, and the spider went away..
Comment 4•11 years ago
|
||
Well, I tried blessing and rebooting, and I can't access the machine anymore via spider, ssh, or VNC. However, I did see a good deal of traffic - the same kind of traffic that I saw when r4-mini-001 successfully imaged. I tried power-cycling and saw the same traffic, but still no image in the spider ("No signal"), no ssh, and no VNC. I don't see any activity in DeployStudio. I'll wait an hour or so and see what happens.
Comment 5•11 years ago
|
||
Now nothing. No video, no port 22, no port 5900. From here in NY, this machine is a black hole. If the spider is telling the truth about the video output, then this machine needs more hardware TLC. Otherwise, it needs some help reimaging with something other than a spider. Probably the best way to learn more here is to try re-imaging another system in this VLAN, as Amy suggested. That will at least rule out network issues.
Assignee: dustin → arich
Comment 6•11 years ago
|
||
attached a monitor and the host is sitting at the login prompt. did it reimage properly? [vle@admin1b.private.scl3 ~]$ fping !$ fping bld-lion-r5-040.try.releng.scl3.mozilla.com bld-lion-r5-040.try.releng.scl3.mozilla.com is alive [vle@admin1b.private.scl3 ~]$ ssh !$ ssh bld-lion-r5-040.try.releng.scl3.mozilla.com The authenticity of host 'bld-lion-r5-040.try.releng.scl3.mozilla.com (10.26.64.60)' can't be established. RSA key fingerprint is 71:2c:bb:c2:7c:ba:4a:90:3d:6c:6e:a5:25:5e:ab:02. Are you sure you want to continue connecting (yes/no)?
Comment 7•11 years ago
|
||
We have another mini that needs netbooting but has failed to connect to deploy studio. It is on the same vlan (vlan264). Bug 918082 - please run hardware diagnostics on bld-lion-r5-035
Comment 8•11 years ago
|
||
No, no known password gets me into -040. I think we need to find a known-good mini in the try VLAN (or move one to that VLAN) and netboot that to verify whether this is a network or host issue. The common thread with -040 and -035 is that they had new disks.
Comment 9•11 years ago
|
||
Ah, the password that was set when doing the recovery install in comment 0 gets me in. So -040 hasn't successfully netbooted.
Comment 10•11 years ago
|
||
Host reimaged successfully.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•