Closed
Bug 946196
Opened 11 years ago
Closed 10 years ago
self-serve agent on bm66 eating jobs
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Unassigned)
Details
Attachments
(2 files)
2.09 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
897 bytes,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
the self-serve agent on bm66 was taking messages and failing to operate on them due to sql exceptions: 2013-12-04 05:05:54,385 rebuilding build by ryanvm@gmail.com of 33597270 2013-12-04 05:06:24,385 Error processing message Traceback (most recent call last): File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 233, in receive_message retval = action_func(message_data, message) File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 292, in do_rebuild_build ), bid=bid).fetchone() File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1787, in execute connection = self.contextual_connect(close_with_result=True) File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1829, in contextual_connect self.pool.connect(), File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/pool.py", line 182, in connect return _ConnectionFairy(self).checkout() File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/pool.py", line 369, in __init__ rec = self._connection_record = pool.get() File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/pool.py", line 213, in get return self.do_get() File "/builds/selfserve-agent/lib/python2.7/site-packages/sqlalchemy/pool.py", line 722, in do_get (self.size(), self.overflow(), self._timeout)) TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30
Reporter | ||
Comment 1•11 years ago
|
||
I killed the agent, and supervisor started it back up.
Comment 2•11 years ago
|
||
Any idea how to fix the root clause?
Reporter | ||
Comment 3•11 years ago
|
||
A simple way would be to catch this and then abort the process.
Comment 4•11 years ago
|
||
[10:52] <rail> catlee: http://hg.mozilla.org/build/buildapi/file/b60098b51361/buildapi/scripts/selfserve-agent.py#l232 it fails here. it sounds like the messages shouldn't be acked in case of exception and shouldn't be dropped in this case. Am I following the code? [10:54] <catlee> rail: yeah, I wonder if they get retried eventually... (In reply to Chris AtLee [:catlee] from comment #3) > A simple way would be to catch this and then abort the process. This will help with the following requests. Let's doo eet!
Comment 5•11 years ago
|
||
Attachment #8342442 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8342442 -
Flags: review?(catlee) → review+
Comment 6•11 years ago
|
||
Comment on attachment 8342442 [details] [diff] [review] catch it! https://hg.mozilla.org/build/buildapi/rev/d0be6ec810ba
Attachment #8342442 -
Flags: checked-in+
Comment 7•11 years ago
|
||
... once the tarball is synced
Attachment #8342531 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8342531 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8342531 -
Flags: review?(catlee) → review+
Comment 8•11 years ago
|
||
Comment on attachment 8342531 [details] [diff] [review] puppet https://hg.mozilla.org/build/puppet/rev/31c9f6f3b1ee
Attachment #8342531 -
Flags: checked-in+
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•