If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

zombie jobs when sql query fails

RESOLVED INCOMPLETE

Status

Release Engineering
General Automation
RESOLVED INCOMPLETE
4 years ago
a year ago

People

(Reporter: catlee, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot])

(Reporter)

Description

4 years ago
http://cruncher.build.mozilla.org/~catlee/reportor/2013-05-07:14/long_jobs/long_jobs.html

Has a list of jobs that have been running for more than 12 hours.

Many of these jobs have actually completed. However, the update to the DB failed:
2013-05-06 11:20:56-0700 [Broker,21007,10.26.56.167]  <Build Ubuntu HW 12.04 birch pgo talos chromez>: build finished
2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167]  setting expectations for next time
2013-05-06 11:21:03-0700 [Broker,21007,10.26.56.167] new expectations: 518.070086718 seconds
2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] rollback failed, will reconnect next query 
2013-05-06 11:21:04-0700 [Broker,21007,10.26.56.167] Unhandled Error 
    Traceback (most recent call last):
      File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
        self.result = callback(self.result, *args, **kw) 
      File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/process/builder.py", line 934, in buildFinished
        self.db.builds_finished(bids)
      File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 931, in builds_finished
        return self.runInteractionNow(self._txn_build_finished, bids) 
      File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 212, in runInteractionNow
        return self._runInteractionNow(interaction, *args, **kwargs)
    --- <exception caught here> ---
      File "/builds/buildbot/tests1-linux/lib/python2.7/site-packages/buildbot-0.8.2_hg_bccbfc2a314f_production_0.8-py2.7.egg/buildbot/db/connector.py", line 244, in _runInteractionNow
        conn.rollback()
    _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')

The current state of the db is
mysql> select * from buildrequests where id=23866634;
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
| id       | buildsetid | buildername                             | priority | claimed_at | claimed_by_name                                                                    | claimed_by_incarnation | complete | results | submitted_at | complete_at |
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
| 23866634 |    6294394 | Ubuntu HW 12.04 birch pgo talos chromez |        0 | 1367936901 | buildbot-master52.srv.releng.use1.mozilla.com:/builds/buildbot/tests1-linux/master | pid621-boot1366727682  |        0 |    NULL |   1367863931 |        NULL |
+----------+------------+-----------------------------------------+----------+------------+------------------------------------------------------------------------------------+------------------------+----------+---------+--------------+-------------+
1 row in set (0.00 sec)

mysql> select * from builds where brid=23866634;
+----------+--------+----------+------------+-------------+
| id       | number | brid     | start_time | finish_time |
+----------+--------+----------+------------+-------------+
| 24100785 |      5 | 23866634 | 1367863938 |        NULL |
+----------+--------+----------+------------+-------------+
1 row in set (0.00 sec)

Interestingly, the master is still claiming the build, so it's not getting automatically re-built.
(Reporter)

Updated

4 years ago
Whiteboard: [buildbot]
(Reporter)

Comment 1

4 years ago
all of the jobs above finished around 11:21am yesterday. the sql server or network must have hiccuped at that time.
What time did you reconfig yesterday?
(Reporter)

Comment 3

4 years ago
On bm52, the reconfig happened from 10:54:37 to 10:59:34
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering

Updated

3 years ago
Whiteboard: [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot]

Updated

3 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2065] [buildbot] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2075] [buildbot]
(Reporter)

Updated

a year ago
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.