Closed
Bug 869447
Opened 11 years ago
Closed 8 years ago
zombie jobs when sql query fails
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: catlee, Unassigned)
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot])
http://cruncher.build.mozilla.org/~catlee/reportor/2013-05-07:14/long_jobs/long_jobs.html Has a list of jobs that have been running for more than 12 hours. Many of these jobs have actually completed. However, the update to the DB failed: 2013-05-06 11:20:56-0700 [Broker,21007,10.26.56.167] <Build Ubuntu HW 12.04 birch pgo talos chromez>: build finished 2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] setting expectations for next time 2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] new expectations: 518.070086718 seconds 2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] rollback failed, will reconnect next query 2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/process/builder.py", line 934, in buildFinished self.db.builds_finished(bids) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 931, in builds_finished return self.runInteractionNow(self._txn_build_finished, bids) File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow return self._runInteractionNow(interaction, *args, **kwargs) --- <exception caught here> --- File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 244, in _runInteractionNow conn.rollback() _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away') The current state of the db is mysql> select * from buildrequests where id=23866634; +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ | id | buildsetid | buildername | priority | claimed_at | claimed_by_name | claimed_by_incarnation | complete | results | submitted_at | complete_at | +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ | 23866634 | 6294394 | Ubuntu HW 12.04 birch pgo talos chromez | 0 | 1367936901 | buildbot-master52.srv.releng.use1.mozilla.com:/builds/buildbot/tests1-linux/master | pid621-boot1366727682 | 0 | NULL | 1367863931 | NULL | +----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+ 1 row in set (0.00 sec) mysql> select * from builds where brid=23866634; +----------+--------+----------+------------+-------------+ | id | number | brid | start_time | finish_time | +----------+--------+----------+------------+-------------+ | 24100785 | 5 | 23866634 | 1367863938 | NULL | +----------+--------+----------+------------+-------------+ 1 row in set (0.00 sec) Interestingly, the master is still claiming the build, so it's not getting automatically re-built.
Reporter | ||
Updated•11 years ago
|
Whiteboard: [buildbot]
Reporter | ||
Comment 1•11 years ago
|
||
all of the jobs above finished around 11:21am yesterday. the sql server or network must have hiccuped at that time.
Comment 2•11 years ago
|
||
What time did you reconfig yesterday?
Reporter | ||
Comment 3•11 years ago
|
||
On bm52, the reconfig happened from 10:54:37 to 10:59:34
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Whiteboard: [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot]
Updated•10 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot]
Reporter | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•