Closed Bug 588469 Opened 15 years ago Closed 14 years ago

Email notifications are not being sent out through celery

Categories

(support.mozilla.org :: Knowledge Base Software, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: paulc, Unassigned)

Details

We've had several people say that no email notifications have been sent out since this morning. Perhaps looking at the celery logs would help figure out why this happens. There have been some rabbitmq timeouts as well. Rebecca says that emails were being sent out post our 2.2.1-push, so it's unlikely to be a code issue.
Here's a traceback of a timeout (although not email-related): Traceback (most recent call last): File "/data/virtualenvs/kitsune/src/django/django/core/handlers/base.py", line 100, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/data/virtualenvs/kitsune/src/django/django/views/decorators/http.py", line 37, in inner return func(request, *args, **kwargs) File "/data/www/support.mozilla.com/kitsune/apps/questions/views.py", line 338, in question_vote vote.save() File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 435, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 543, in save_base created=(not record_exists), raw=raw) File "/data/virtualenvs/kitsune/src/django/django/dispatch/dispatcher.py", line 162, in send response = receiver(signal=self, sender=sender, **named) File "/data/www/support.mozilla.com/kitsune/apps/questions/models.py", line 385, in send_vote_update_task update_question_votes.delay(q) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 304, in delay return self.apply_async(args, kwargs) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 321, in apply_async return apply_async(self, args, kwargs, **options) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 229, in _inner return fun(*args, **kwargs) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/execute/__init__.py", line 87, in apply_async countdown=countdown, eta=eta, **options) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 80, in delay_task self.send(message_data, **extract_msg_options(kwargs)) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/messaging.py", line 762, in send headers=headers) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 330, in publish ret = self.channel.basic_publish(message, exchange=exchange, File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 179, in channel self._channel_ref = weakref.ref(self.connection.get_channel()) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 150, in get_channel return self.connection.channel() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 120, in connection self._connection = self._establish_connection() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 133, in _establish_connection return self.create_backend().establish_connection() File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 195, in establish_connection connect_timeout=conninfo.connect_timeout) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/connection.py", line 125, in __init__ self.transport = create_transport(host, connect_timeout, ssl) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 220, in create_transport return TCPTransport(host, connect_timeout) File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 58, in __init__ self.sock.connect((host, port)) File "<string>", line 1, in connect timeout: timed out
Assignee: server-ops → shyam
So Jeremy and I looked at it but can't find any obvious issues. Pushing this to Jeremy since I'm heading to bed in a bit.
Assignee: shyam → jeremy.orem+bugs
So, as James suggested, it's worth having someone physically watch all the logs stream in while a couple people (QA, sumodev) work to trigger both types of notifications. That way we can get a better sense of what's happening and where exactly something goes wrong.
Looks like this was a case of celery not being restarted after the last site upgrade.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Sorry to rain on this parade but emails are still not being received for questions/answers. They work on the question we tested, though! (Wtf?)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
How do you want to debug this? Can you add more logging around this area?
We can add more logging, but I'm still not sure what to look for. Some thoughts: * Are there any differences between stage and production that could be causing this (other than magnitude of amount of traffic)? * Is it possible that some of the processes haven't been killed and celery is still not picking up tasks? Tasks are logged when they run, so if we confirm that tasks fire off every time, then we can try pinning celery to master, like here: http://github.com/jsocol/kitsune/commit/b3037299dfbd7d86271c9482e5518cdcea631a6b
Shyam mentioned that we were using generic06's sendmail facility instead of configuring the normal outgoing servers in settings_local.py. He also mentioned that he was seeing outgoing mail in the sendmail log, with stat=Sent.
What changes should i make? Zamboni's only email config in settings_local.py is: EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend' Looks like kitsune is the same.
(In reply to comment #9) > What changes should i make? Shyam? ^ You're the one that pointed out we shouldn't(?) be using the sendmail instance on generic06.
(In reply to comment #10) > Shyam? ^ You're the one that pointed out we shouldn't(?) be using the sendmail > instance on generic06. I just don't like sendmail, but it looks like it's configured alright so I'm not sure what more we can do here apart from replacing sendmail with postfix. I've had pretty bad experiences with sendmail in the past and debugging is a real pain sometimes.
I also agree that sendmail is horrible, but it is using the same base sendmail configs as AMO, so I doubt that is the problem here.
Logging is in place. Let me know if you need help matching up stuff in the mail logs.
Assignee: jeremy.orem+bugs → nobody
Component: Server Operations → Knowledge Base Software
Product: mozilla.org → support.mozilla.com
QA Contact: mrz → kb-software
Version: other → unspecified
Everything seems to be working (knock on wood).
Status: REOPENED → RESOLVED
Closed: 15 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.