Closed Bug 1028956 Opened 10 years ago Closed 10 years ago

b-2008-sm machines being incorrectly flagged as unreachable

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: Callek)

Details

Amy flagged this in bug 1028852. Some seamicro nodes are incorrectly being labeled as 'unreachable.' I just checked this particular node and it's up and running a job, despite the bug filed by the slaverebooter.

Callek: can you dig into this as buildduty this week, please? Maybe slaveapi or slaverebooter need a logic tweak.
I had a quick look at this today. 

It seems that ssh login on these machines slows down *a lot* when we're also trying to access the twistd.log to see whether the buildslave shutdown is complete. I don't know if that's actually the root cause of the slowdown, but the timing is suspicious. Both Amy and I tried to connect via ssh, and it took multiple minutes to connect, although it *did* eventually connect. 

I'm guessing slaveapi is timing out with the ssh attempt, assumes the slave is busted, and files the intervention bug.
c#1's assumptions are pretty likely.

Unless there are objections before tomorrow I'm going to patch the seamicro's to be ignored by slaverebooter for now.
...so part of this problem:

slaveapi.log:2014-06-26 09:10:59,064 - DEBUG - b-2008-sm-0014.wintry.releng.scl3.mozilla.com - ping was successful
slaveapi.log:2014-06-26 09:20:11,789 - INFO - b-2008-sm-0014 - Waiting 60 seconds for reboot.
slaveapi.log:2014-06-26 09:20:11,820 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:12,850 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:13,882 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:14,911 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:15,941 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:16,970 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:18,000 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:19,030 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:20,058 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:21,087 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:22,119 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:23,139 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:24,171 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:25,200 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:26,229 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:27,259 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:28,287 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:29,317 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:30,342 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:31,373 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:32,402 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:33,431 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:34,462 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:35,492 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:36,522 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:37,551 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:38,583 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:39,613 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:40,643 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:41,671 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:42,699 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:43,729 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:44,758 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:45,790 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:46,819 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:47,853 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:48,883 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:49,913 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:50,944 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:51,975 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:53,005 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:54,036 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:55,064 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:56,093 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:57,123 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:58,156 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:59,184 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:00,212 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:01,243 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:02,273 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:03,302 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:04,331 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:05,360 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:06,394 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:07,425 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:08,454 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:09,483 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:10,512 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:11,540 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:12,541 - ERROR - b-2008-sm-0014 - Machine didn't go down in allotted time, assuming it didn't reboot.

Which might be additionally indicative of a problem rebooting, not merely ssh slow down.  (the "is not down yet" is based on ping ensuring the host actually does shut down when we try to reboot)
I'm going to call this fixed, though with the knowledge that we're no longer planning to deal with SeaMicros in releng production
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.