Closed Bug 588910 Opened 15 years ago Closed 15 years ago

Occasional timeouts from Rabbit when voting on SUMO

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
trivial

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: paulc, Assigned: oremj)

Details

Low priority, but I'm filing this because it's happening several times per day, everyday, so worth investigating. Traceback (most recent call last): File "/data/virtualenvs/kitsune/src/django/django/core/handlers/base.py", line 100, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/data/virtualenvs/kitsune/src/django/django/views/decorators/http.py", line 37, in inner return func(request, *args, **kwargs) File "/data/www/support.mozilla.com/kitsune/apps/questions/views.py", line 338, in question_vote vote.save() File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 435, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 543, in save_base created=(not record_exists), raw=raw) File "/data/virtualenvs/kitsune/src/django/django/dispatch/dispatcher.py", line 162, in send response = receiver(signal=self, sender=sender, **named) File "/data/www/support.mozilla.com/kitsune/apps/questions/models.py", line 385, in send_vote_update_task update_question_votes.delay(q) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 304, in delay return self.apply_async(args, kwargs) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 321, in apply_async return apply_async(self, args, kwargs, **options) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 229, in _inner return fun(*args, **kwargs) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/execute/__init__.py", line 87, in apply_async countdown=countdown, eta=eta, **options) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 80, in delay_task self.send(message_data, **extract_msg_options(kwargs)) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/messaging.py", line 762, in send headers=headers) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 330, in publish ret = self.channel.basic_publish(message, exchange=exchange, File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 179, in channel self._channel_ref = weakref.ref(self.connection.get_channel()) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 150, in get_channel return self.connection.channel() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 120, in connection self._connection = self._establish_connection() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 133, in _establish_connection return self.create_backend().establish_connection() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 195, in establish_connection connect_timeout=conninfo.connect_timeout) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/connection.py", line 125, in __init__ self.transport = create_transport(host, connect_timeout, ssl) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 220, in create_transport return TCPTransport(host, connect_timeout) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 58, in __init__ self.sock.connect((host, port)) File "<string>", line 1, in connect timeout: timed out
This is a timeout when connecting from the web node to Rabbit. (You can tell because the top of the stack trace is the base handler.)
Summary: Daily timeouts from celery/carrot when voting on SUMO → Occasional timeouts from Rabbit when voting on SUMO
Can you tell which machines are having trouble connecting? We have been having a bunch of problems with rabbit on amo, so this doesn't surprise me.
Here is one of the IP addresses in the error email. If it's not specific to one of the generic0* machines, then we might need to find a way to get that info into the WSGI environment: 'SERVER_ADDR': '10.2.81.56',
Assignee: server-ops → jeremy.orem+bugs
Is this still happening? The only thing I can recommend is upgrading to rabbit 2. Do you want me to do that?
I haven't seen it in a while. It seems like the upgrade to Rabbit2 worked out for AMO, though, correct?
Yeah, slightly different problem though. Their rabbit instance was getting completely overloaded and crashing.
SUMO was upgraded to rabbit 2 when we switched it to the new cluster.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.