Closed
Bug 869447
Opened 13 years ago
Closed 9 years ago
zombie jobs when sql query fails
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: catlee, Unassigned)
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot])
http://cruncher.build.mozilla.org/~catlee/reportor/2013-05-07:14/long_jobs/long_jobs.html
Has a list of jobs that have been running for more than 12 hours.
Many of these jobs have actually completed. However, the update to the DB failed:
2013-05-06 11:20:56-0700 [Broker,21007,10.26.56.167] <Build Ubuntu HW 12.04 birch pgo talos chromez>: build finished
2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] setting expectations for next time
2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] new expectations: 518.070086718 seconds
2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] rollback failed, will reconnect next query
2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/process/builder.py", line 934, in buildFinished
self.db.builds_finished(bids)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 931, in builds_finished
return self.runInteractionNow(self._txn_build_finished, bids)
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow
return self._runInteractionNow(interaction, *args, **kwargs)
--- <exception caught here> ---
File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 244, in _runInteractionNow
conn.rollback()
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
The current state of the db is
mysql> select * from buildrequests where id=23866634;
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
| id | buildsetid | buildername | priority | claimed_at | claimed_by_name | claimed_by_incarnation | complete | results | submitted_at | complete_at |
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
| 23866634 | 6294394 | Ubuntu HW 12.04 birch pgo talos chromez | 0 | 1367936901 | buildbot-master52.srv.releng.use1.mozilla.com:/builds/buildbot/tests1-linux/master | pid621-boot1366727682 | 0 | NULL | 1367863931 | NULL |
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
1 row in set (0.00 sec)
mysql> select * from builds where brid=23866634;
+----------+--------+----------+------------+-------------+
| id | number | brid | start_time | finish_time |
+----------+--------+----------+------------+-------------+
| 24100785 | 5 | 23866634 | 1367863938 | NULL |
+----------+--------+----------+------------+-------------+
1 row in set (0.00 sec)
Interestingly, the master is still claiming the build, so it's not getting automatically re-built.
| Reporter | ||
Updated•13 years ago
|
Whiteboard: [buildbot]
| Reporter | ||
Comment 1•13 years ago
|
||
all of the jobs above finished around 11:21am yesterday. the sql server or network must have hiccuped at that time.
Comment 2•13 years ago
|
||
What time did you reconfig yesterday?
| Reporter | ||
Comment 3•13 years ago
|
||
On bm52, the reconfig happened from 10:54:37 to 10:59:34
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•11 years ago
|
Whiteboard: [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot]
Updated•11 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot]
| Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
| Assignee | ||
Updated•8 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•