Closed Bug 1477779 Opened 7 years ago Closed 7 years ago

[MDC2] t-yosemite-r7-229.test.releng.mdc2.mozilla.com. is unreachable

Categories

(Infrastructure & Operations :: DCOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhouse, Assigned: van)

References

Details

(Whiteboard: REQ0260221, REQ0259960, requires on site vist, REQ0239167)

Please physically check and reboot+reimage t-yosemite-r7-229.test.releng.mdc2.mozilla.com It does not respond to ping or ssh, and I tried snmp power on+off without that bringing it back: Checking power: ``` # snmpget -v 2c -c public pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 iso.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 = INTEGER: 1 ``` Then I powered it off, waited a few seconds, and powered it back on, and finally starting pinging for it to come alive (no response): ``` # snmpget -v 2c -c public pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 iso.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 = INTEGER: 1 # snmpset -v 2c -c secret pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.1.3.4 i 2 iso.3.6.1.4.1.1718.3.2.3.1.11.1.3.4 = INTEGER: 2 # snmpget -v 2c -c public pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 iso.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 = INTEGER: 0 # snmpget -v 2c -c public pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 iso.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 = INTEGER: 0 # snmpset -v 2c -c secret pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.11.1.3.4 i 1 iso.3.6.1.4.1.1718.3.2.3.1.11.1.3.4 = INTEGER: 1 # snmpget -v 2c -c public pdu1.gc131.ops.releng.mdc2.mozilla.com 1.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 iso.3.6.1.4.1.1718.3.2.3.1.5.1.3.4 = INTEGER: 1 # ping t-yosemite-r7-229.test.releng.mdc2.mozilla.com PING t-yosemite-r7-229.test.releng.mdc2.mozilla.com (10.51.56.45) 56(84) bytes of data. ^C --- t-yosemite-r7-229.test.releng.mdc2.mozilla.com ping statistics --- 60 packets transmitted, 0 received, 100% packet loss, time 59297ms ```
Please physically check this machine, and reboot it again. But it does not need reimaged. I'm not sure if anything was done, but this machine is responding today. I cycled it again with snmp power off and then on (no ping/ssh before, but it showed it had power), and then it came up and responded to ping and then ssh (successful login).
I'll close this. If we have a repeat of the machine getting stuck and not responding to ping/ssh again, I'll open a new bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The problem has repeated. This machine appears fine after a reboot, but then stops responding to ping/ssh and no logs are forwarded. Please physically inspect this machine and netboot/reimage it.
opened REQ0239167 for reimage.
Assignee: server-ops-dcops → vle
Whiteboard: REQ0239167
will need to check next visit. 07-26-2018 18:04 EDT - Nicholas Trout Additional comments Could not pull up display on any of the three effected mac minis. Spoke to Van. He will correct upon his next visit.
Whiteboard: REQ0239167 → requires on site vist, REQ0239167
Summary: t-yosemite-r7-229.test.releng.mdc2.mozilla.com. is unreachable → [MDC2] -yosemite-r7-229.test.releng.mdc2.mozilla.com. is unreachable
back online after reimage. vle@DESKTOP-3HK51T3:~$ fping t-yosemite-r7-229.test.releng.mdc2.mozilla.com t-yosemite-r7-229.test.releng.mdc2.mozilla.com is alive
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Looks like the worker is not ssh-able again. However, it is pingable
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: [MDC2] -yosemite-r7-229.test.releng.mdc2.mozilla.com. is unreachable → [MDC2] t-yosemite-r7-229.test.releng.mdc2.mozilla.com. is unreachable
opened REQ0259960 with QTS for reimage.
Whiteboard: requires on site vist, REQ0239167 → REQ0259960, requires on site vist, REQ0239167
QTS might have missed this one, opened REQ0260221 for reimage.
Whiteboard: REQ0259960, requires on site vist, REQ0239167 → REQ0260221, REQ0259960, requires on site vist, REQ0239167
QTS reimged mini. vle@DESKTOP-3HK51T3:~$ fping t-yosemite-r7-229.test.releng.mdc2.mozilla.com t-yosemite-r7-229.test.releng.mdc2.mozilla.com is alive
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Seems like the worker is not reachable once again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
the machine seems to be up and running and taking jobs. https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc2/t-yosemite-r7-229 We will close the bug for now. If the problem will persist in the future, we will re-open this bug.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.