b-2008-sm machines being incorrectly flagged as unreachable

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: coop, Assigned: Callek)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Amy flagged this in bug 1028852. Some seamicro nodes are incorrectly being labeled as 'unreachable.' I just checked this particular node and it's up and running a job, despite the bug filed by the slaverebooter.

Callek: can you dig into this as buildduty this week, please? Maybe slaveapi or slaverebooter need a logic tweak.
(Reporter)

Comment 1

3 years ago
I had a quick look at this today. 

It seems that ssh login on these machines slows down *a lot* when we're also trying to access the twistd.log to see whether the buildslave shutdown is complete. I don't know if that's actually the root cause of the slowdown, but the timing is suspicious. Both Amy and I tried to connect via ssh, and it took multiple minutes to connect, although it *did* eventually connect. 

I'm guessing slaveapi is timing out with the ssh attempt, assumes the slave is busted, and files the intervention bug.
(Assignee)

Comment 2

3 years ago
c#1's assumptions are pretty likely.

Unless there are objections before tomorrow I'm going to patch the seamicro's to be ignored by slaverebooter for now.
(Assignee)

Comment 3

3 years ago
...so part of this problem:

slaveapi.log:2014-06-26 09:10:59,064 - DEBUG - b-2008-sm-0014.wintry.releng.scl3.mozilla.com - ping was successful
slaveapi.log:2014-06-26 09:20:11,789 - INFO - b-2008-sm-0014 - Waiting 60 seconds for reboot.
slaveapi.log:2014-06-26 09:20:11,820 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:12,850 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:13,882 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:14,911 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:15,941 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:16,970 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:18,000 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:19,030 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:20,058 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:21,087 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:22,119 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:23,139 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:24,171 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:25,200 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:26,229 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:27,259 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:28,287 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:29,317 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:30,342 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:31,373 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:32,402 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:33,431 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:34,462 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:35,492 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:36,522 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:37,551 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:38,583 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:39,613 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:40,643 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:41,671 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:42,699 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:43,729 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:44,758 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:45,790 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:46,819 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:47,853 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:48,883 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:49,913 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:50,944 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:51,975 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:53,005 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:54,036 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:55,064 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:56,093 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:57,123 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:58,156 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:20:59,184 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:00,212 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:01,243 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:02,273 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:03,302 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:04,331 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:05,360 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:06,394 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:07,425 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:08,454 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:09,483 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:10,512 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:11,540 - DEBUG - b-2008-sm-0014 - Machine is not down yet...
slaveapi.log:2014-06-26 09:21:12,541 - ERROR - b-2008-sm-0014 - Machine didn't go down in allotted time, assuming it didn't reboot.

Which might be additionally indicative of a problem rebooting, not merely ssh slow down.  (the "is not down yet" is based on ping ensuring the host actually does shut down when we try to reboot)
(Assignee)

Comment 4

3 years ago
disabled this from slaverebooter:

https://hg.mozilla.org/build/puppet/rev/dca4416c98d2
https://hg.mozilla.org/build/puppet/rev/b632587a2661
(Assignee)

Comment 5

3 years ago
I'm going to call this fixed, though with the knowledge that we're no longer planning to deal with SeaMicros in releng production
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.