Closed
Bug 980086
Opened 12 years ago
Closed 8 years ago
self-serve agents' error handling needs more care
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: dustin, Unassigned)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2025] )
Attachments
(1 file, 2 obsolete files)
|
633 bytes,
patch
|
rail
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
While debugging a while back, I added some invalid messages. They're still circulating:
2014-03-05 14:38:29,457 Channel open
2014-03-05 14:40:04,426 Received {u'action': u'reprioritize', u'body': {u'priority': 4, u'brid': -1}, u'who': u'dmitchell@mozilla.com (testing)'}
2014-03-05 14:40:04,427 Loading masters from https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-masters.json
2014-03-05 14:40:05,067 Loading branches from https://hg.mozilla.org/build/tools/raw-file/default/buildfarm/maintenance/production-branches.json
2014-03-05 14:40:05,572 reprioritizing request by dmitchell@mozilla.com (testing) of request -1 to priority 4
2014-03-05 14:40:06,851 No request with id -1, giving up
2014-03-05 14:40:06,852 Error processing message
Traceback (most recent call last):
File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 236, in receive_message
msg['request_id'] = message_data['body']['request_id']
KeyError: 'request_id'
2014-03-05 14:40:06,853 Received {u'action': u'reprioritize', u'body': {u'priority': 2, u'brid': -1}, u'who': u'dmitchell@mozilla.com (testing)'}
2014-03-05 14:40:06,854 reprioritizing request by dmitchell@mozilla.com (testing) of request -1 to priority 2
2014-03-05 14:40:07,023 No request with id -1, giving up
2014-03-05 14:40:07,023 Error processing message
Traceback (most recent call last):
File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 236, in receive_message
msg['request_id'] = message_data['body']['request_id']
KeyError: 'request_id'
2014-03-05 14:40:11,031 Received {u'action': u'reprioritize', u'body': {u'priority': 1, u'brid': -1}, u'who': u'dmitchell@mozilla.com (testing)'}
2014-03-05 14:40:11,032 reprioritizing request by dmitchell@mozilla.com (testing) of request -1 to priority 1
2014-03-05 14:40:11,201 No request with id -1, giving up
2014-03-05 14:40:11,201 Error processing message
Traceback (most recent call last):
File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 236, in receive_message
msg['request_id'] = message_data['body']['request_id']
KeyError: 'request_id'
2014-03-05 14:40:11,202 Received {u'action': u'reprioritize', u'body': {u'priority': 4, u'brid': -1}, u'who': u'dmitchell@mozilla.com (testing)'}
2014-03-05 14:40:11,202 reprioritizing request by dmitchell@mozilla.com (testing) of request -1 to priority 4
2014-03-05 14:40:11,371 No request with id -1, giving up
2014-03-05 14:40:11,372 Error processing message
Traceback (most recent call last):
File "/builds/selfserve-agent/lib/python2.7/site-packages/buildapi/scripts/selfserve_agent.py", line 236, in receive_message
msg['request_id'] = message_data['body']['request_id']
KeyError: 'request_id'
The problem is, every time a self-serve agent starts up, it gets some of these messages, then just leaves them un-acked. So when that agent disconnects, rabbit re-queues the messages and the whole thing starts over again.
Instead, self-serve should log and ack any messages that cannot be processed, preferably sending a response message with error information.
Comment 1•12 years ago
|
||
WIP patch
| Reporter | ||
Comment 2•12 years ago
|
||
Comment on attachment 8386433 [details] [diff] [review]
fix leaks of VoiceEngines in getUserMedia
I think this is the wrong bug Randell
Attachment #8386433 -
Attachment is obsolete: true
Comment 3•12 years ago
|
||
Something like this maybe?
| Reporter | ||
Comment 4•12 years ago
|
||
Comment on attachment 8390837 [details] [diff] [review]
ss.diff
That's a very specific fix for these "trapped" messages. I think we could address this specific issue much more directly by checking for and discarding such messages before calling action_func.
More generally, though, we need a way to distinguish "this will never work" errors (which should be acked and, if a request_id is present, replied to), transient failures (which should result in a msg.requeue() followed by a brief pause), connection errors (which should be raised to the ReliableConsumer for reconnect), and other Python exceptions (which shouldn't be handled at all). The code's doing a pretty terrible job of all of that right now.
Attachment #8390837 -
Flags: review-
| Reporter | ||
Comment 5•12 years ago
|
||
Assignee: nobody → dustin
Attachment #8390837 -
Attachment is obsolete: true
Attachment #8391170 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #8391170 -
Flags: review?(rail) → review+
| Reporter | ||
Updated•12 years ago
|
Attachment #8391170 -
Flags: checked-in+
| Reporter | ||
Updated•12 years ago
|
Summary: self-serve doesn't ack invalid messages → self-serve agents' error handling needs more care
| Reporter | ||
Comment 6•11 years ago
|
||
Let's try to fix this up while moving buildapi into relengapi.
Depends on: 1026110
Updated•11 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2025]
Updated•11 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2025] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2032]
Updated•11 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2032] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2025]
| Reporter | ||
Updated•11 years ago
|
Assignee: dustin → nobody
| Assignee | ||
Updated•9 years ago
|
Component: Tools → General
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•