If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Autophone - AutophonePulseMonitor falls down and can't get up after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol

RESOLVED FIXED

Status

Testing
Autophone
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bc, Assigned: bc)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Assignee)

Description

2 years ago
Yesterday Autophone suffered a pulse failure which resulted in it failing to consume Pulse messages. Autophone did not recover properly from the error  The queue filled and was deleted and new builds were not detected afterwards.

2015-11-03 14:23:29,160|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception
Traceback (most recent call last):
  File "/mozilla/projects/autophone/src/autophone/autophonepulsemonitor.py", line 245, in listen
    auto_declare=False)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 652, in Consumer
    return Consumer(channel or self, queues, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 359, in __init__
    self.revive(self.channel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 364, in revive
    channel = self.channel = maybe_channel(channel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 1054, in maybe_channel
    return channel.default_channel
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 756, in default_channel
    self.connection
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 741, in connection
    self._connection = self._establish_connection()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 696, in _establish_connection
    conn = self.transport.establish_connection()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 112, in establish_connection
    conn = self.Connection(**opts)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/connection.py", line 165, in __init__
    self.transport = self.Transport(host, connect_timeout, ssl)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/connection.py", line 186, in Transport
    return create_transport(host, connect_timeout, ssl)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 297, in create_transport
    return SSLTransport(host, connect_timeout, ssl)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 199, in __init__
    super(SSLTransport, self).__init__(host, connect_timeout)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 102, in __init__
    self._setup_transport()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/amqp/transport.py", line 206, in _setup_transport
    self.sock = ssl.wrap_socket(self.sock)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 392, in wrap_socket
    ciphers=ciphers)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 148, in __init__
    self.do_handshake()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 310, in do_handshake
    self._sslobj.do_handshake()
SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol
2015-11-03 14:23:33,857|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception

2015-11-03 14:23:33,857|76|PulseMonitorThread|root|ERROR|AutophonePulseMonitor Exception
Traceback (most recent call last):
  File "/mozilla/projects/autophone/src/autophone/autophonepulsemonitor.py", line 245, in listen
    auto_declare=False)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 652, in Consumer
    return Consumer(channel or self, queues, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 359, in __init__
    self.revive(self.channel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/messaging.py", line 364, in revive
    channel = self.channel = maybe_channel(channel)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 1054, in maybe_channel
    return channel.default_channel
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 758, in default_channel
    self._default_channel = self.channel()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/connection.py", line 242, in channel
    chan = self.transport.create_channel(self.connection)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 88, in create_channel
    return connection.channel()
AttributeError: 'NoneType' object has no attribute 'channel'
(Assignee)

Comment 1

2 years ago
mcote: do you have any insights into what is happening on Pulse Guardian? Have there been known issues or upgrades recently?
Flags: needinfo?(mcote)
(Assignee)

Updated

2 years ago
Summary: Autophone - AutophonePulseMonitor falls down and can't get uip after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol → Autophone - AutophonePulseMonitor falls down and can't get up after SSLError: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol
(Assignee)

Comment 2

2 years ago
mcote: also, can we increase the size of the queues before they are deleted? The current load means they are deleted if the consumer is down for an hour. I frequently get warnings during restarts when the consumer is paused for the orderly shutdown of the workers.
(Assignee)

Comment 3

2 years ago
mcote increased the queue size to 16,000 and the warning limit to 4,000.

https://github.com/mozilla/autophone/blob/master/autophonepulsemonitor.py#L268 releases the Kombu Connection when an error occurs but does not recreate it. I'm thinking that I just need to re-create it afterwards as in https://github.com/mozilla/autophone/blob/master/autophonepulsemonitor.py#L197
Flags: needinfo?(mcote)
(Assignee)

Comment 4

2 years ago
Created attachment 8683179 [details] [diff] [review]
bug-1221538-pulse-connection-errors.patch

This patch recovers from exceptions by releasing the previous connection, recreating it, then starting a new listening thread as the old one exits.

Tested locally with mcote's help.
Attachment #8683179 - Flags: review?(mcote)

Comment 5

2 years ago
Comment on attachment 8683179 [details] [diff] [review]
bug-1221538-pulse-connection-errors.patch

Review of attachment 8683179 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed on IRC, starting a new thread from within the existing thread just before it exits is kind of weird, and its side-effects are not entirely obvious.  Would be better to loop in listen(), ideally creating the connection and related objects at the beginning of the function.
Attachment #8683179 - Flags: review?(mcote) → review-
(Assignee)

Updated

2 years ago
Blocks: 1220762
(Assignee)

Comment 6

2 years ago
Created attachment 8683270 [details] [diff] [review]
bug-1221538-pulse-connection-errors-v2.patch
Attachment #8683270 - Flags: review?(mcote)

Comment 7

2 years ago
Comment on attachment 8683270 [details] [diff] [review]
bug-1221538-pulse-connection-errors-v2.patch

Review of attachment 8683270 [details] [diff] [review]:
-----------------------------------------------------------------

::: autophonepulsemonitor.py
@@ +274,5 @@
> +                logger.exception('AutophonePulseMonitor Exception')
> +                if connection:
> +                    connection.release()
> +                restart = True
> +                time.sleep(1)

This should probably have some sort of back-off timer, but I'm fine with that being a follow-up.
Attachment #8683270 - Flags: review?(mcote) → review+
(Assignee)

Comment 8

2 years ago
https://github.com/mozilla/autophone/commit/44f7029f481dc9f38eb8aa70c6019a35b902ad5b

Filed  Bug 1221723
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.