Closed
Bug 918932
Opened 12 years ago
Closed 12 years ago
slaveapi doesn't escalate reboots properly when exceptions are thrown
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
Details
Attachments
(1 file)
|
1.59 KB,
patch
|
jhopkins
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
2013-09-20 11:17:43,278 - INFO - talos-r4-lion-003 - Getting inventory info
2013-09-20 11:17:43,717 - INFO - talos-r4-lion-003 - Getting bug info
2013-09-20 11:17:43,717 - INFO - Sending request: GET https://bugzilla-dev.allizom.org/rest/bug/talos-r4-lion-003
2013-09-20 11:17:44,005 - INFO - Got response: 200
2013-09-20 11:17:44,006 - INFO - 10.12.51.165 - Attempting to reboot
2013-09-20 11:18:05,006 - ERROR - talos-r4-lion-003 - Caught exception.
Traceback (most recent call last):
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/actions/reboot.py", line 19, in reboot
console.reboot()
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/ssh.py", line 137, in reboot
rc, output = self.run_cmd(cmd)
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/ssh.py", line 81, in run_cmd
self.connect()
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/ssh.py", line 50, in connect
self.client.connect(hostname=self.fqdn, username=username, password=p, timeout=timeout, look_for_keys=False, allow_agent=False)
File "/builds/slaveapi/dev/lib/python2.7/site-packages/paramiko/client.py", line 305, in connect
retry_on_signal(lambda: sock.connect(addr))
File "/builds/slaveapi/dev/lib/python2.7/site-packages/paramiko/util.py", line 278, in retry_on_signal
return function()
File "/builds/slaveapi/dev/lib/python2.7/site-packages/paramiko/client.py", line 305, in <lambda>
retry_on_signal(lambda: sock.connect(addr))
File "/builds/slaveapi/dev/lib/python2.7/site-packages/gevent/socket.py", line 384, in connect
raise error(err, strerror(err))
error: [Errno 110] Connection timed out
2013-09-20 11:18:05,006 - INFO - pdu1.r102-1.build.scl1.mozilla.com - Powercycling via PDU.
2013-09-20 11:18:06,210 - INFO - 10.26.48.43 - - [2013-09-20 11:18:06] "GET /slave/talos-r4-lion-003/action/reboot HTTP/1.1" 200 190 0.000954
2013-09-20 11:18:11,024 - INFO - 10.26.48.43 - - [2013-09-20 11:18:11] "GET /slave/bld-centos6-hp-008/action/reboot HTTP/1.1" 200 829 0.001024
2013-09-20 11:18:11,043 - ERROR - Something went wrong while processing!
Traceback (most recent call last):
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/processor.py", line 58, in _worker
res, msg = action(slave, *args, **kwargs)
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/actions/reboot.py", line 35, in reboot
slave.pdu.powercycle()
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/pdu.py", line 34, in powercycle
self.poweroff()
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/pdu.py", line 27, in poweroff
self._run_cmd(self.off_cmd)
File "/builds/slaveapi/dev/lib/python2.7/site-packages/slaveapi/clients/pdu.py", line 45, in _run_cmd
return check_output(full_cmd, stderr=STDOUT)
File "/builds/slaveapi/dev/lib/python2.7/site-packages/gevent_subprocess/gevent_subprocess.py", line 371, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['snmpset', '-v', '1', '-c', 'private', u'pdu1.r102-1.build.scl1.mozilla.com', u'1.3.6.1.4.1.1718.3.2.3.1.11.1.2.5', 'i', '2']' returned non-zero exit status 1
It should escalate to a bug modification in this case. Probably just need a try/except somewhere.
| Assignee | ||
Comment 1•12 years ago
|
||
Looks like this is only true for PDU/IPMI reboots. SSH ones already try/except. This patch should fix PDU/IPMI ones and improve the logging a little bit.
Attachment #817823 -
Flags: review?(jhopkins)
Updated•12 years ago
|
Attachment #817823 -
Flags: review?(jhopkins) → review+
| Assignee | ||
Comment 2•12 years ago
|
||
Comment on attachment 817823 [details] [diff] [review]
reboot-escalation-slaveapi.diff
Landed. Will need to roll a new version of the package to deploy...going to wait to see if there's more changes in the near future before doing that.
Attachment #817823 -
Flags: checked-in+
| Assignee | ||
Comment 3•12 years ago
|
||
The slaveapi hosts are getting upgraded for this along with bug 922858.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•