Closed
Bug 1472683
Opened 7 years ago
Closed 6 years ago
t-yosemite-r7-189.test.releng.mdc2.mozilla.com. is unreachable
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: relops-bug-generator, Assigned: van)
References
Details
(Whiteboard: REQ0245333, REQ0235357 ,REQ0235524 )
Reboot t-yosemite-r7-189.test.releng.mdc2.mozilla.com. 10.51.56.80
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-07-02T13:32:39.633531 ssh_reboot -l roller -i ssh.key TimeoutExpired
2018-07-02T13:32:39.635769 ipmi ipmi_reset KeyError
2018-07-02T13:32:39.637727 ipmi ipmi_cycle KeyError
2018-07-02T13:32:45.658640 snmp_reboot pdu1.gc132.ops.releng.mdc2.mozilla.com ba3 CalledProcessError
I tried a manual snmp call to the pdu and I get a timeout:
```
[root@roller-dev1 ~]# snmpget -v 2c -c communitystring pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.2.1.3
Timeout: No Response from pdu1.gc132.ops.releng.mdc2.mozilla.com.
[dhouse@roller1.srv.releng.mdc1.mozilla.com ~]$ snmpset -v 2c -c communitystring pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 i 3
Timeout: No Response from pdu1.gc132.ops.releng.mdc2.mozilla.com
```
However, on a public call snmp works:
```
[dhouse@roller1.srv.releng.mdc1.mozilla.com ~]$ snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.2.1.3
iso.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 = INTEGER: 0
```
I also tried from roller-dev1.srv.releng.mdc2 to see if I could reach it from within mdc2, but i get the same results.
I tested a reboot of another machine on this same pdu: t-yosemite-r7-187
And I get the same failure: https://bugzilla.mozilla.org/show_bug.cgi?id=1472713
So I think the snmp community string on this pdu may be not matching.
See Also: → 1472713
Reporter | ||
Comment 3•7 years ago
|
||
Reboot t-yosemite-r7-189.test.releng.mdc2.mozilla.com. 10.51.56.80
Requested by mozilla-auth0/ad|Mozilla-LDAP|dhouse
Relops controller action failed:
2018-07-02T17:26:24.271370 ssh_reboot -l roller -i ssh.key TimeoutExpired
2018-07-02T17:26:24.276580 ipmi ipmi_reset KeyError
2018-07-02T17:26:24.279591 ipmi ipmi_cycle KeyError
This machine needs physically netbooted/reimaged.
The snmp issue is fixed, but this and the 187 machine are not responding to ping after the snmp power is cycled or turned on. I confirmed turning power off and then back on, and waiting; I don't see them become ping-able.
```
# snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.2.1.2
iso.3.6.1.4.1.1718.3.2.3.1.5.2.1.2 = INTEGER: 1
# snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.2.1.3
iso.3.6.1.4.1.1718.3.2.3.1.5.2.1.3 = INTEGER: 1
```
Assignee | ||
Comment 5•7 years ago
|
||
power cycling didn't work, opened REQ0235357 with QTS for remote hands.
Assignee: server-ops-dcops → vle
Whiteboard: REQ0235357
Reporter | ||
Comment 6•7 years ago
|
||
Reboot t-yosemite-r7-189.test.releng.mdc2.mozilla.com. 10.51.56.80
Requested by mozilla-auth0/ad|Mozilla-LDAP|zfay
Relops controller action failed:
2018-07-03T10:52:16.701682 ssh_reboot -l roller -i ssh.key TimeoutExpired
2018-07-03T10:52:16.703472 ipmi ipmi_reset KeyError
2018-07-03T10:52:16.704844 ipmi ipmi_cycle KeyError
Assignee | ||
Comment 7•7 years ago
|
||
opened REQ0235524 with QTS to reimage mac mini
Whiteboard: REQ0235357 → REQ0235357 ,REQ0235524
Reporter | ||
Comment 8•7 years ago
|
||
Reboot t-yosemite-r7-189.test.releng.mdc2.mozilla.com. 10.51.56.80
Requested by mozilla-auth0/ad|Mozilla-LDAP|zfay
Relops controller action failed:
2018-07-08T12:46:19.987782 ssh_reboot -l roller -i ssh.key TimeoutExpired
2018-07-08T12:46:19.989639 ipmi ipmi_reset KeyError
2018-07-08T12:46:19.991406 ipmi ipmi_cycle KeyError
Assignee | ||
Comment 9•7 years ago
|
||
qts reimaged the mini but looks like there is still an issue. will check next time im on site.
Assignee | ||
Comment 10•7 years ago
|
||
bad cable, back online.
[vle@admin2a.private.mdc1 ~]$ fping !$
fping t-yosemite-r7-189.test.releng.mdc2.mozilla.com
t-yosemite-r7-189.test.releng.mdc2.mozilla.com is alive
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 12•7 years ago
|
||
I'm seeing problems with the machine. Could you check it again? It kept crashing and rebooting.
I tried shutting off the power for it, and bringing it back on, but now it hasn't coming back online (no ping/ssh, no logs).
```
# snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.2.1.3
iso.3.6.1.4.1.1718.3.2.3.1.5.2.1.3 = INTEGER: 1
# snmpset -v 2c -c secret pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 i 2
iso.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 = INTEGER: 2
# snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.2.1.3
iso.3.6.1.4.1.1718.3.2.3.1.5.2.1.3 = INTEGER: 0
# snmpset -v 2c -c secret pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 i 1
iso.3.6.1.4.1.1718.3.2.3.1.11.2.1.3 = INTEGER: 1
# snmpget -v 2c -c public pdu1.gc132.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.2.1.3
iso.3.6.1.4.1.1718.3.2.3.1.5.2.1.3 = INTEGER: 1
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ping t-yosemite-r7-189.test.releng.mdc2.mozilla.com
PING t-yosemite-r7-189.test.releng.mdc2.mozilla.com (10.51.56.80) 56(84) bytes of data.
^C
--- t-yosemite-r7-189.test.releng.mdc2.mozilla.com ping statistics ---
81 packets transmitted, 0 received, 100% packet loss, time 80458ms
```
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 13•6 years ago
|
||
Hey Van!
This one is missing for taskcluster for quite a while and any attempt of ssh-ing into it just hangs. Also doesn't show up in papertrail.
Flags: needinfo?(vle)
Assignee | ||
Comment 14•6 years ago
|
||
opened REQ0245333 with QTS for a reimage.
Flags: needinfo?(vle)
Whiteboard: REQ0235357 ,REQ0235524 → REQ0245333, REQ0235357 ,REQ0235524
Assignee | ||
Comment 15•6 years ago
|
||
back online.
vle@DESKTOP-3HK51T3:~$ fping t-yosemite-r7-189.test.releng.mdc2.mozilla.com
t-yosemite-r7-189.test.releng.mdc2.mozilla.com is alive
Status: REOPENED → RESOLVED
Closed: 7 years ago → 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•