Closed
Bug 1028956
Opened 10 years ago
Closed 10 years ago
b-2008-sm machines being incorrectly flagged as unreachable
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
x86_64
Windows Server 2008
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: Callek)
Details
Amy flagged this in bug 1028852. Some seamicro nodes are incorrectly being labeled as 'unreachable.' I just checked this particular node and it's up and running a job, despite the bug filed by the slaverebooter. Callek: can you dig into this as buildduty this week, please? Maybe slaveapi or slaverebooter need a logic tweak.
Reporter | ||
Comment 1•10 years ago
|
||
I had a quick look at this today. It seems that ssh login on these machines slows down *a lot* when we're also trying to access the twistd.log to see whether the buildslave shutdown is complete. I don't know if that's actually the root cause of the slowdown, but the timing is suspicious. Both Amy and I tried to connect via ssh, and it took multiple minutes to connect, although it *did* eventually connect. I'm guessing slaveapi is timing out with the ssh attempt, assumes the slave is busted, and files the intervention bug.
Assignee | ||
Comment 2•10 years ago
|
||
c#1's assumptions are pretty likely. Unless there are objections before tomorrow I'm going to patch the seamicro's to be ignored by slaverebooter for now.
Assignee | ||
Comment 3•10 years ago
|
||
...so part of this problem: slaveapi.log:2014-06-26 09:10:59,064 - DEBUG - b-2008-sm-0014.wintry.releng.scl3.mozilla.com - ping was successful slaveapi.log:2014-06-26 09:20:11,789 - INFO - b-2008-sm-0014 - Waiting 60 seconds for reboot. slaveapi.log:2014-06-26 09:20:11,820 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:12,850 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:13,882 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:14,911 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:15,941 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:16,970 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:18,000 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:19,030 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:20,058 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:21,087 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:22,119 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:23,139 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:24,171 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:25,200 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:26,229 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:27,259 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:28,287 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:29,317 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:30,342 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:31,373 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:32,402 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:33,431 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:34,462 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:35,492 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:36,522 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:37,551 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:38,583 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:39,613 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:40,643 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:41,671 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:42,699 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:43,729 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:44,758 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:45,790 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:46,819 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:47,853 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:48,883 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:49,913 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:50,944 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:51,975 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:53,005 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:54,036 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:55,064 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:56,093 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:57,123 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:58,156 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:20:59,184 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:00,212 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:01,243 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:02,273 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:03,302 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:04,331 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:05,360 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:06,394 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:07,425 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:08,454 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:09,483 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:10,512 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:11,540 - DEBUG - b-2008-sm-0014 - Machine is not down yet... slaveapi.log:2014-06-26 09:21:12,541 - ERROR - b-2008-sm-0014 - Machine didn't go down in allotted time, assuming it didn't reboot. Which might be additionally indicative of a problem rebooting, not merely ssh slow down. (the "is not down yet" is based on ping ensuring the host actually does shut down when we try to reboot)
Assignee | ||
Comment 4•10 years ago
|
||
disabled this from slaverebooter: https://hg.mozilla.org/build/puppet/rev/dca4416c98d2 https://hg.mozilla.org/build/puppet/rev/b632587a2661
Assignee | ||
Comment 5•10 years ago
|
||
I'm going to call this fixed, though with the knowledge that we're no longer planning to deal with SeaMicros in releng production
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•