Closed
Bug 1221538
Opened 9 years ago
Closed 9 years ago
Autophone - AutophonePulseMonitor falls down and can't get up after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol
Categories
(Testing Graveyard :: Autophone, defect)
Testing Graveyard
Autophone
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bc, Assigned: bc)
References
Details
Attachments
(2 files)
2.13 KB,
patch
|
mcote
:
review-
|
Details | Diff | Splinter Review |
5.99 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
Yesterday Autophone suffered a pulse failure which resulted in it failing to consume Pulse messages. Autophone did not recover properly from the error The queue filled and was deleted and new builds were not detected afterwards. 2015-11-03 14:23:29,160|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception Traceback (most recent call last): File "/mozilla/projects/autophone/src/autophone/autophonepulsemonitor.py", line 245, in listen auto_declare=False) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 652, in Consumer return Consumer(channel or self, queues, *args, **kwargs) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 359, in __init__ self.revive(self.channel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 364, in revive channel = self.channel = maybe_channel(channel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 1054, in maybe_channel return channel.default_channel File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel self.connection File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection self._connection = self._establish_connection() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection conn = self.transport.establish_connection() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 112, in establish_connection conn = self.Connection(**opts) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/connection.py", line 165, in __init__ self.transport = self.Transport(host, connect_timeout, ssl) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/connection.py", line 186, in Transport return create_transport(host, connect_timeout, ssl) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 297, in create_transport return SSLTransport(host, connect_timeout, ssl) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 199, in __init__ super(SSLTransport, self).__init__(host, connect_timeout) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 102, in __init__ self._setup_transport() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 206, in _setup_transport self.sock = ssl.wrap_socket(self.sock) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 392, in wrap_socket ciphers=ciphers) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 148, in __init__ self.do_handshake() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 310, in do_handshake self._sslobj.do_handshake() SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol 2015-11-03 14:23:33,857|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception 2015-11-03 14:23:33,857|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception Traceback (most recent call last): File "/mozilla/projects/autophone/src/autophone/autophonepulsemonitor.py", line 245, in listen auto_declare=False) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 652, in Consumer return Consumer(channel or self, queues, *args, **kwargs) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 359, in __init__ self.revive(self.channel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 364, in revive channel = self.channel = maybe_channel(channel) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 1054, in maybe_channel return channel.default_channel File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 758, in default_channel self._default_channel = self.channel() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 242, in channel chan = self.transport.create_channel(self.connection) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 88, in create_channel return connection.channel() AttributeError: 'NoneType' object has no attribute 'channel'
Assignee | ||
Comment 1•9 years ago
|
||
mcote: do you have any insights into what is happening on Pulse Guardian? Have there been known issues or upgrades recently?
Flags: needinfo?(mcote)
Assignee | ||
Updated•9 years ago
|
Summary: Autophone - AutophonePulseMonitor falls down and can't get uip after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol → Autophone - AutophonePulseMonitor falls down and can't get up after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol
Assignee | ||
Comment 2•9 years ago
|
||
mcote: also, can we increase the size of the queues before they are deleted? The current load means they are deleted if the consumer is down for an hour. I frequently get warnings during restarts when the consumer is paused for the orderly shutdown of the workers.
Assignee | ||
Comment 3•9 years ago
|
||
mcote increased the queue size to 16,000 and the warning limit to 4,000. https://github.com/mozilla/autophone/blob/master/autophonepulsemonitor.py#L268 releases the Kombu Connection when an error occurs but does not recreate it. I'm thinking that I just need to re-create it afterwards as in https://github.com/mozilla/autophone/blob/master/autophonepulsemonitor.py#L197
Flags: needinfo?(mcote)
Assignee | ||
Comment 4•9 years ago
|
||
This patch recovers from exceptions by releasing the previous connection, recreating it, then starting a new listening thread as the old one exits. Tested locally with mcote's help.
Attachment #8683179 -
Flags: review?(mcote)
Comment 5•9 years ago
|
||
Comment on attachment 8683179 [details] [diff] [review] bug-1221538-pulse-connection-errors.patch Review of attachment 8683179 [details] [diff] [review]: ----------------------------------------------------------------- As discussed on IRC, starting a new thread from within the existing thread just before it exits is kind of weird, and its side-effects are not entirely obvious. Would be better to loop in listen(), ideally creating the connection and related objects at the beginning of the function.
Attachment #8683179 -
Flags: review?(mcote) → review-
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8683270 -
Flags: review?(mcote)
Comment 7•9 years ago
|
||
Comment on attachment 8683270 [details] [diff] [review] bug-1221538-pulse-connection-errors-v2.patch Review of attachment 8683270 [details] [diff] [review]: ----------------------------------------------------------------- ::: autophonepulsemonitor.py @@ +274,5 @@ > + logger.exception('AutophonePulseMonitor Exception') > + if connection: > + connection.release() > + restart = True > + time.sleep(1) This should probably have some sort of back-off timer, but I'm fine with that being a follow-up.
Attachment #8683270 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 8•9 years ago
|
||
https://github.com/mozilla/autophone/commit/44f7029f481dc9f38eb8aa70c6019a35b902ad5b Filed Bug 1221723
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•