Timeout on AMQP for SUMO's stage servers.

RESOLVED FIXED

Status

Infrastructure & Operations
WebOps: Other
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: mythmon, Assigned: cyliang)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/15] )

(Reporter)

Description

4 years ago
I've gotten about 200 of these tracebacks in my email over the weekend. The first that I can find is 6:20pm PST on Friday. This only appears to be affecting SUMO's stage environment.

Traceback (most recent call last):
  File "manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/cronjobs/management/commands/cron.py", line 64, in handle
    registered[script](*args)
  File "/data/support-stage/www/support.allizom.org/kitsune/kitsune/wiki/cron.py", line 38, in generate_missing_share_links
    tasks.add_short_links.delay(list(document_ids))
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/celery/app/task.py", line 357, in delay
    return self.apply_async(args, kwargs)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/celery/app/task.py", line 474, in apply_async
    **options)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/celery/app/amqp.py", line 250, in publish_task
    **kwargs
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/messaging.py", line 164, in publish
    routing_key, mandatory, immediate, exchange, declare)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/connection.py", line 470, in _ensured
    interval_max)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/connection.py", line 396, in ensure_connection
    interval_start, interval_step, interval_max, callback)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/utils/__init__.py", line 217, in retry_over_time
    return fun(*args, **kwargs)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/connection.py", line 246, in connect
    return self.connection
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/connection.py", line 761, in connection
    self._connection = self._establish_connection()
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/connection.py", line 720, in _establish_connection
    conn = self.transport.establish_connection()
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/kombu/transport/pyamqp.py", line 115, in establish_connection
    conn = self.Connection(**opts)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/amqp/connection.py", line 165, in __init__
    self.transport = create_transport(host, connect_timeout, ssl)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/amqp/transport.py", line 275, in create_transport
    return TCPTransport(host, connect_timeout)
  File "/data/support-stage/src/support.allizom.org/kitsune/virtualenv/lib/python2.6/site-packages/amqp/transport.py", line 89, in __init__
    raise socket.error(last_err)
socket.error: timed out
Possibly related: bug 1097118

Updated

4 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/15]
(Assignee)

Comment 2

4 years ago
mythmon: Based on the first part of the traceback, I'm assuming that these errors are the result of running some sort of task invokable from the CLI.  Would it be possible to point me at what those tasks might be so I can try to deliberately invoke these errors?

Looking through the load balancer logs, I can't find any corresponding errors with respect to trying to reach either of the new SUMO rabbit nodes.  =\
(In reply to C. Liang [:cyliang] from comment #2)

These are tasks that run on cron from the sumo admin node. From what I can tell, the webheads are able to talk to rabbit. Maybe it's just a network issue between the admin node and rabbit?
(Assignee)

Updated

4 years ago
Depends on: 1121752
(Assignee)

Comment 4

4 years ago
Indeed!   Bug filed for an ACL to the new rabbit cluster.

Updated

4 years ago
Assignee: server-ops-webops → cliang
(Assignee)

Comment 5

4 years ago
Have there been any additional AMQP timeouts to the new SUMO rabbitMQ cluster in staging since the ACL went into effect?  (There should be no timeouts since this past Friday.)
(In reply to C. Liang [:cyliang] from comment #5)
> Have there been any additional AMQP timeouts to the new SUMO rabbitMQ
> cluster in staging since the ACL went into effect?  (There should be no
> timeouts since this past Friday.)

I haven't seen any in the past few days. bd
(Reporter)

Comment 7

4 years ago
I haven't got any of these for 5 days. I think we're done here, thank you!
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.