Closed Bug 869447 Opened 13 years ago Closed 9 years ago

zombie jobs when sql query fails

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: catlee, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot])

http://cruncher.build.mozilla.org/~catlee/reportor/2013-05-07:14/long_jobs/long_jobs.html Has a list of jobs that have been running for more than 12 hours. Many of these jobs have actually completed. However, the update to the DB failed: 2013-05-06 11:20:56-0700 [Broker,21007,10.26.56.167] <Build Ubuntu HW 12.04 birch pgo talos chromez>: build finished 2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] setting expectations for next time 2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] new expectations: 518.070086718 seconds 2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] rollback failed, will reconnect next query 2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/process/builder.py", line 934, in buildFinished self.db.builds_finished(bids) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 931, in builds_finished return self.runInteractionNow(self._txn_build_finished, bids) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow return self._runInteractionNow(interaction, *args, **kwargs) --- <exception caught here> --- File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 244, in _runInteractionNow conn.rollback() _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away') The current state of the db is mysql> select * from buildrequests where id=23866634; +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ | id | buildsetid | buildername | priority | claimed_at | claimed_by_name | claimed_by_incarnation | complete | results | submitted_at | complete_at | +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ | 23866634 | 6294394 | Ubuntu HW 12.04 birch pgo talos chromez | 0 | 1367936901 | buildbot-master52.srv.releng.use1.mozilla.com:/builds/buildbot/tests1-linux/master | pid621-boot1366727682 | 0 | NULL | 1367863931 | NULL | +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ 1 row in set (0.00 sec) mysql> select * from builds where brid=23866634; +----------+--------+----------+------------+-------------+ | id | number | brid | start_time | finish_time | +----------+--------+----------+------------+-------------+ | 24100785 | 5 | 23866634 | 1367863938 | NULL | +----------+--------+----------+------------+-------------+ 1 row in set (0.00 sec) Interestingly, the master is still claiming the build, so it's not getting automatically re-built.
Whiteboard: [buildbot]
all of the jobs above finished around 11:21am yesterday. the sql server or network must have hiccuped at that time.
What time did you reconfig yesterday?
On bm52, the reconfig happened from 10:54:37 to 10:59:34
Product: mozilla.org → Release Engineering
Whiteboard: [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot]
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.