Closed
Bug 1231560
Opened 8 years ago
Closed 8 years ago
selfserve-agent is CRITICAL: PROCS CRITICAL: 0 processes with args bin/selfserve-agent
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: catlee)
Details
Attachments
(3 files)
14.83 KB,
text/plain
|
Details | |
1.99 KB,
patch
|
nthomas
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
921 bytes,
patch
|
nthomas
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
nagios has been saying this for various masters, then recovering by itself. In the logs there's this: 2015-12-09 14:49:03,106 Received {u'action': u'new_build_for_builder', u'body': {u'files': [None, None], u'when': 1449301401, u'priority': 0, u'builder_name': u'b2g_fx-team_emulator-debug_ dep', u'branch': u'fx-team-selfserve', u'request_id': 1674813, u'properties': {u'mozci_request': {u'builders': [u'b2g_emulator_vm fx-team debug test mochitest-debug-16'], u'type': u'manual _backfill'}, u'branch': u'fx-team', u'revision': u'378ef42875ed296adb790020ebeee26490c026d8'}, u'revision': u'378ef42875ed296adb790020ebeee26490c026d8'}, u'who': u'mozci-bot@mozilla.com'} 2015-12-09 14:49:03,107 Loading masters from https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-masters.json 2015-12-09 14:49:03,349 Loading branches from https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-branches.json 2015-12-09 14:49:03,519 New build for b2g_fx-team_emulator-debug_dep by mozci-bot@mozilla.com of fx-team-selfserve 378ef42875ed296adb790020ebeee26490c026d8 2015-12-09 14:49:03,520 INSERT INTO sourcestamps (`branch`, `revision`, `patchid`, `repository`, `project`) VALUES (:real_branch, :revision, NULL, '', '') 2015-12-09 14:49:03,551 Created sourcestamp 12739101 2015-12-09 14:49:03,551 INSERT INTO changes (`author`, `comments`, `is_dir`, `branch`, `revision`, `revlink`, `when_timestamp`, `category`, `repository`, `project`) VALUES (:who, '', 0, :branch, :revision, NULL, :when, NULL, '', '') 2015-12-09 14:49:03,556 Created change 7182009 2015-12-09 14:49:03,557 INSERT INTO change_files (`changeid`, `filename`) VALUES (:cid, :f) 2015-12-09 14:49:03,562 Error processing message Traceback (most recent call last): File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 249, in receive_message retval = action_func(message_data, message) File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 659, in do_new_build_for_builder r = self.db.execute(q, cid=cid, f=f) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1788, in execute return connection.execute(statement, *multiparams, **params) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1191, in execute params) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1271, in _execute_clauseelement return self.__execute_context(context) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1302, in __execute_context context.parameters[0], context=context) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1401, in _cursor_execute context) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1394, in _cursor_execute context) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 299, in do_execute cursor.execute(statement, parameters) File "/builds/selfserve-agent/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute self.errorhandler(self, exc, value) File "/builds/selfserve-agent/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue OperationalError: (OperationalError) (1048, "Column 'filename' cannot be null") 'INSERT INTO change_files\n (`changeid`, `filename`)\n VALUES\n (%s, %s)\n ' (7182009L, None) As a result the agents are restarting several times a minute. armenzg, could you please look to see where the |'files': [None, None]| is coming from. catlee, how do we go about purging this message from the queue ?
Flags: needinfo?(catlee)
Comment 1•8 years ago
|
||
That's a backfill from the first push which got a buildbot b2g emulator build after they were shut off by bug 1227277 for a while and only taskcluster builds were being built, going back into the period when there weren't any buildbot builds, so perhaps it got confused by either the existence of the unusable taskcluster-built emulator build, or by a seemingly-fresh cached allthethings that didn't admit to the existence of b2g builds and tests on fx-team.
Comment 2•8 years ago
|
||
I'm adding some logs for that job. I will be analyzing it.
Comment 3•8 years ago
|
||
nthomas: I've fixed buildapi_client and updated pulse_actions to not do this anymore.
Assignee | ||
Comment 4•8 years ago
|
||
I don't know how to purge this from the queue. We could adjust the agent to log the error and ack the message in this case.
Flags: needinfo?(catlee)
Comment 5•8 years ago
|
||
Depending on your knowledge of the existence of this bug and your ability to remember that retriggering is now something that takes up to five tries to have it actually happen, this is somewhere between major and critical.
Severity: normal → critical
Assignee | ||
Comment 6•8 years ago
|
||
Hmm, I don't understand why other requests would be affected...
Assignee | ||
Comment 7•8 years ago
|
||
Attachment #8697641 -
Flags: review?(nthomas)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → catlee
Assignee | ||
Comment 8•8 years ago
|
||
I think perhaps what was happening was these badly formed messages were always at the top of the queue. Other agents were only able to process other requests in the short time when the bad messages were being handled.
Reporter | ||
Updated•8 years ago
|
Attachment #8697641 -
Flags: review?(nthomas) → review+
Comment 9•8 years ago
|
||
I've also been having trouble cancelling jobs.
Assignee | ||
Comment 10•8 years ago
|
||
https://hg.mozilla.org/build/buildapi/rev/9a9039732de95f26acf757ae7fdfcf9feb2960dc Bug 1231560: Log and ack messages that we can't handle r=nthomas
Assignee | ||
Updated•8 years ago
|
Attachment #8697641 -
Flags: checked-in+
Assignee | ||
Comment 11•8 years ago
|
||
Attachment #8697668 -
Flags: review?(nthomas)
Reporter | ||
Updated•8 years ago
|
Attachment #8697668 -
Flags: review?(nthomas) → review+
Assignee | ||
Comment 12•8 years ago
|
||
https://hg.mozilla.org/build/puppet/rev/2c84f0c9f986f25a57fb843dd5bba93e8e841ad0 Bug 1231560: Deploy new selfserve agent to log and ack failing messages r=nthomas
Assignee | ||
Updated•8 years ago
|
Attachment #8697668 -
Flags: checked-in+
Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•