[MDC1] moon-chassis-4 connection issues
Categories
(Infrastructure & Operations :: RelOps: Hardware, task)
Tracking
(Not tracked)
People
(Reporter: dhouse, Assigned: dhouse)
Details
We have seen connection drops (reported by nagios) for the moon-chassis-4 iLO starting yesterday.
Yesterday, I asked QTS to check the iLO network cable connection for this moonshot's management module.
They found no problems and reported that the cable has solid connections at both ends.
Today the problem has repeated intermittently.
I opened another case with QTS to re-seat the module, verify there is not a secondary network cable in place and that the health LED is green: REQ0273399
QTS found no problems. They reseated the module. However, nagios found it down again twice since then (once just now).
I updated the case with QTS and asked them to replace the network cable.
QTS replaced the network cable and the problem did not repeat over the weekend. So, I think it was a bad cable.
Reopening; CIDuty found that the workers on this chassis are all unavailable. They show as on, but there is no response to ping.
Maybe we broke all the workers with the troublshooting for the ilo.
I powered off, and then back on, the switches (A and B) for moon-chassis-4 and that recovered the network connection for the cartridges.
Comment 7•7 years ago
|
||
I've just checked the windows machines from moon chassis 4, there are 17 are not available on taskcluster, 12 are available and took jobs and 1 is in quarantine.
Comment 8•7 years ago
|
||
I've checked the windows machines, now only one is not available ( T-W1064-MS-164 ). Currently, I have started a reimage process on it. I'll keep monitoring it.
Comment 9•7 years ago
|
||
Checked the machine T-W1064-MS-164 and all the other machines from chassis 4 and everything is fine. I will close the ticket. If the problem will persist in the future, we will re-open the ticket.
Description
•