Connection reset by peer and time outs when AMO talks to RabbitMQ

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
--
blocker
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: kumar, Assigned: cshields)

Tracking

Details

(Whiteboard: [Add-on upload fails])

We're seeing a lot of 'Connection reset by peer' exceptions in production right now.  The traceback points to where AMO talks to RabbitMQ to queue up a celery task.  Any ideas?  The graph of celery queues does not look abnormal, however.

Users are seeing this on the addon upload page (bug 688793) but the logs say that it affects other pages that try to queue up tasks.

Traceback (most recent call last):

 File "/data/www/addons.mozilla.org/zamboni/vendor/src/django/django/core/handlers/base.py", line 111, in get_response
   response = callback(request, *callback_args, **callback_kwargs)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 102, in wrapper
   return f(*args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 94, in wrapper
   return f(*args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 49, in wrapper
   return f(request, *args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/addons/decorators.py", line 23, in wrapper
   return f(request, addon, *args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 28, in wrapper
   return func(request, *args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/devhub/decorators.py", line 32, in wrapper
   return fun()

 File "/data/www/addons.mozilla.org/zamboni/apps/devhub/decorators.py", line 24, in <lambda>
   **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/devhub/views.py", line 703, in upload_for_addon
   return upload(request, addon_slug=addon.slug)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 28, in wrapper
   return func(request, *args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/amo/decorators.py", line 49, in wrapper
   return f(request, *args, **kw)

 File "/data/www/addons.mozilla.org/zamboni/apps/devhub/views.py", line 656, in upload
   tasks.validator.delay(fu.pk)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/task/base.py", line 338, in delay
   return self.apply_async(args, kwargs)

 File "/data/www/addons.mozilla.org/zamboni/vendor/src/nuggets/celeryutils.py", line 22, in apply_async
   return super(Task, self).apply_async(args, kwargs, **options)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/task/base.py", line 448, in apply_async
   exchange_type=exchange_type)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/task/base.py", line 301, in get_publisher
   **options)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/app/amqp.py", line 328, in TaskPublisher
   return TaskPublisher(*args, **self.app.merge(defaults, kwargs))

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/celery/app/amqp.py", line 156, in __init__
   super(TaskPublisher, self).__init__(*args, **kwargs)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/compat.py", line 79, in __init__
   self.backend = connection.channel()

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/connection.py", line 124, in channel
   chan = self.transport.create_channel(self.connection)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/connection.py", line 439, in connection
   self._connection = self._establish_connection()

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/connection.py", line 405, in _establish_connection
   conn = self.transport.establish_connection()

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/transport/pyamqplib.py", line 245, in establish_connection
   connect_timeout=conninfo.connect_timeout)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/kombu/transport/pyamqplib.py", line 51, in __init__
   super(Connection, self).__init__(*args, **kwargs)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/amqplib/client_0_8/connection.py", line 131, in __init__
   (10, 10), # start

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/amqplib/client_0_8/abstract_channel.py", line 89, in wait
   self.channel_id, allowed_methods)

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/amqplib/client_0_8/connection.py", line 198, in _wait_method
   self.method_reader.read_method()

 File "/data/www/addons.mozilla.org/zamboni/vendor/lib/python/amqplib/client_0_8/method_framing.py", line 215, in read_method
   raise m

error: [Errno 104] Connection reset by peer
Severity: normal → blocker
(Assignee)

Updated

7 years ago
Assignee: server-ops → cshields
(Assignee)

Updated

7 years ago
Depends on: 688853
(Assignee)

Comment 1

7 years ago
(didn't realize that my previous post mid-aired and sat there)

celery1 and celery2 were both OOM, and oomkiller had done a number on the nodes.

Reset both, they are up now.  Filed 688853 to investigate why we were not notified of these issues automatically.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

7 years ago
Whiteboard: [Add-on upload fails]
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.