Closed
Bug 691244
Opened 14 years ago
Closed 14 years ago
SeaMonkey Idle Slaves don't reboot properly...
Categories
(SeaMonkey :: Release Engineering, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: Callek)
Details
Attachments
(1 file)
|
2.69 KB,
patch
|
Details | Diff | Splinter Review |
Ok, so I recently deployed Buildbot-0.8.4-pre-moz2 to linux slaves and enabled Idleizer on them
Our master is 0.8.2 just like MoCo masters atm.
I just noticed that all-but-one slave was disconnected for the moment.
We should figure out what is wrong on this setup, and fix it. Even though SeaMonkey machines rarely go idle.
Dustin do you have any ideas given the data I am about to provide below?
twistd.log:
2011-10-01 12:19:20-0700 [-] command finished with signal None, exit code 0, elapsedTime: 0.322054
2011-10-01 12:19:20-0700 [-] SlaveBuilder.commandComplete <buildslave.commands.shell.SlaveShellComma
nd instance at 0x8fe79cc>
2011-10-01 19:19:43-0700 [-] I feel very idle and was thinking of rebooting as soon as the buildmast
er says it's OK
2011-10-01 19:19:43-0700 [-] Telling the master we want to shutdown after any running builds are fin
ished
2011-10-01 19:19:43-0700 [Broker,client] Master does not support slave initiated shutdown. Upgrade
master to 0.8.3 or later to use this feature.
2011-10-01 19:19:43-0700 [Broker,client] rebooting NOW, since the master won't talk to us
2011-10-01 19:19:43-0700 [Broker,client] Invoking platform-specific reboot command
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] lost remote
2011-10-01 19:19:44-0700 [Broker,client] Lost connection to cb-seamonkey-linuxmaster-01.mozilla.org:
9010
2011-10-01 19:19:44-0700 [Broker,client] Stopping factory <buildslave.bot.BotFactory instance at 0x8
de00ec>
2011-10-01 19:19:44-0700 [-] Main loop terminated.
2011-10-01 19:19:44-0700 [-] Server Shut Down.
[seabld@cb-sea-linux-tbox ~]$
==============
buildbot.tac (sanitized)
==============
[seabld@cb-sea-linux-tbox ~]$ cat /builds/slave/buildbot.tac
from twisted.application import service
from buildslave.bot import BuildSlave
maxdelay = 300
buildmaster_host = r<masterURL>
passwd = 'somethingS3CRET'
maxRotatedFiles = None
basedir = r'/builds/slave'
umask = 002
slavename = 'cb-sea-linux-tbox'
usepty = False
rotateLength = 1000000
port = 90210 # Yes not really
keepalive = None
application = service.Application('buildslave')
try:
from twisted.python.logfile import LogFile
from twisted.python.log import ILogObserver, FileLogObserver
logfile = LogFile.fromFullPath("twistd.log", rotateLength=rotateLength,
maxRotatedFiles=maxRotatedFiles)
application.setComponent(ILogObserver, FileLogObserver(logfile).emit)
except ImportError:
pass # old Twisted install - mostly on geriatric slaves
s = BuildSlave(buildmaster_host, port, slavename, passwd, basedir,
keepalive, usepty, umask=umask, maxdelay=maxdelay)
s.setServiceParent(application)
# enable idleizer
from buildslave import idleizer
idlz = idleizer.Idleizer(s,
# 7 hours idle time before a reboot
max_idle_time=3600*7,
# 1 hour disconnect from a master before a reboot
max_disconnected_time=3600*1)
idlz.setServiceParent(application)
=======================
Exceptions on master
=======================
The following exceptions (total 3) were detected on cb-seamonkey-linuxmaster-01 master01:
Exception in /builds/buildbot/master01/master/twistd.log.1:
2011-10-01 19:19:32-0700 [Broker,819,63.245.212.102] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 153, in gotItem
self.callExpressionReceived(item)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived
self.expressionReceived(obj)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 514, in expressionReceived
method(*sexp[1:])
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 826, in proto_message
self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
--- <exception caught here> ---
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 840, in _recvMessage
netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 223, in perspectiveMessageReceived
method = getattr(self, "perspective_%s" % message)
exceptions.AttributeError: BuildSlave instance has no attribute 'perspective_shutdown'
--------------------------------------------------------------------------------
Exception in /builds/buildbot/master01/master/twistd.log.1:
2011-10-01 19:19:48-0700 [Broker,571,63.245.210.16] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 153, in gotItem
self.callExpressionReceived(item)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived
self.expressionReceived(obj)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 514, in expressionReceived
method(*sexp[1:])
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 826, in proto_message
self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
--- <exception caught here> ---
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 840, in _recvMessage
netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 223, in perspectiveMessageReceived
method = getattr(self, "perspective_%s" % message)
exceptions.AttributeError: BuildSlave instance has no attribute 'perspective_shutdown'
--------------------------------------------------------------------------------
Exception in /builds/buildbot/master01/master/twistd.log.1:
2011-10-01 19:24:44-0700 [Broker,567,63.245.210.36] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 153, in gotItem
self.callExpressionReceived(item)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/banana.py", line 116, in callExpressionReceived
self.expressionReceived(obj)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 514, in expressionReceived
method(*sexp[1:])
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 826, in proto_message
self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
--- <exception caught here> ---
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 840, in _recvMessage
netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
File "/builds/buildbot/master01/lib/python2.6/site-packages/twisted/spread/pb.py", line 223, in perspectiveMessageReceived
method = getattr(self, "perspective_%s" % message)
exceptions.AttributeError: BuildSlave instance has no attribute 'perspective_shutdown'
| Assignee | ||
Comment 1•14 years ago
|
||
I rebooted all the slaves except cn-sea-qm-centos5-01 which I'm leaving in its "twisted/buildbot is shutdown but machine still up" state for now, incase it helps isolate the issue.
Comment 2•14 years ago
|
||
So the exceptions are normal. The slave tries to gracefully shut itself down, but the old masters don't support it, so they log an exception and the slave falls back to just rebooting.
My guess is that the code the slaves use to reboot isn't, for whatever reason. On linux, that's running 'sudo reboot'. Should that work?
| Assignee | ||
Comment 3•14 years ago
|
||
yes it certainly should, and is what I used to reboot myself manually
Comment 4•14 years ago
|
||
So I would recommend shortening the time-scale (in buildbot.tac), and then watching a machine try to reboot. Does it go down for reboot and then abort?
| Assignee | ||
Comment 5•14 years ago
|
||
Ok, with dustins help I learned that the problem was buildbot failing to properly sudo reboot...
It did work from command line, but buildbot was configured to not use a tty, and of course sudoers was configured to require a tty. This was fixed in Firefox's end in Bug 649683.
When I updated these slaves, I didn't catch that sudoers was now hosted in puppet (http://mxr.mozilla.org/build/source/puppet-manifests/modules/sudoers/templates/sudoers.erb) and dustin grabbed the file from the production-puppet-files, which in this case was a bit older.
Attached is the sudoers diff vs what was on the machines that I just deployed.
Made sure it was root/root, chmod 0440
rebooted the last slave.
| Assignee | ||
Updated•14 years ago
|
Assignee: nobody → bugspam.Callek
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•