Closed Bug 588469 Opened 14 years ago Closed 13 years ago

Email notifications are not being sent out through celery

Categories

(support.mozilla.org :: Knowledge Base Software, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: paulc, Unassigned)

Details

We've had several people say that no email notifications have been sent out since this morning.

Perhaps looking at the celery logs would help figure out why this happens. There have been some rabbitmq timeouts as well.

Rebecca says that emails were being sent out post our 2.2.1-push, so it's unlikely to be a code issue.
Here's a traceback of a timeout (although not email-related):

Traceback (most recent call last):

  File "/data/virtualenvs/kitsune/src/django/django/core/handlers/base.py", line 100, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/data/virtualenvs/kitsune/src/django/django/views/decorators/http.py", line 37, in inner
    return func(request, *args, **kwargs)

  File "/data/www/support.mozilla.com/kitsune/apps/questions/views.py", line 338, in question_vote
    vote.save()

  File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 435, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "/data/virtualenvs/kitsune/src/django/django/db/models/base.py", line 543, in save_base
    created=(not record_exists), raw=raw)

  File "/data/virtualenvs/kitsune/src/django/django/dispatch/dispatcher.py", line 162, in send
    response = receiver(signal=self, sender=sender, **named)

  File "/data/www/support.mozilla.com/kitsune/apps/questions/models.py", line 385, in send_vote_update_task
    update_question_votes.delay(q)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 304, in delay
    return self.apply_async(args, kwargs)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/task/base.py", line 321, in apply_async
    return apply_async(self, args, kwargs, **options)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 229, in _inner
    return fun(*args, **kwargs)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/execute/__init__.py", line 87, in apply_async
    countdown=countdown, eta=eta, **options)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/celery/messaging.py", line 80, in delay_task
    self.send(message_data, **extract_msg_options(kwargs))

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/messaging.py", line 762, in send
    headers=headers)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 330, in publish
    ret = self.channel.basic_publish(message, exchange=exchange,

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 179, in channel
    self._channel_ref = weakref.ref(self.connection.get_channel())

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 150, in get_channel
    return self.connection.channel()

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 120, in connection
    self._connection = self._establish_connection()

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/connection.py", line 133, in _establish_connection
    return self.create_backend().establish_connection()

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 195, in establish_connection
    connect_timeout=conninfo.connect_timeout)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/connection.py", line 125, in __init__
    self.transport = create_transport(host, connect_timeout, ssl)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 220, in create_transport
    return TCPTransport(host, connect_timeout)

  File "/data/virtualenvs/kitsune/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 58, in __init__
    self.sock.connect((host, port))

  File "<string>", line 1, in connect

timeout: timed out
Assignee: server-ops → shyam
So Jeremy and I looked at it but can't find any obvious issues. Pushing this to Jeremy since I'm heading to bed in a bit.
Assignee: shyam → jeremy.orem+bugs
So, as James suggested, it's worth having someone physically watch all the logs stream in while a couple people (QA, sumodev) work to trigger both types of notifications. That way we can get a better sense of what's happening and where exactly something goes wrong.
Looks like this was a case of celery not being restarted after the last site upgrade.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Sorry to rain on this parade but emails are still not being received for questions/answers. They work on the question we tested, though! (Wtf?)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
How do you want to debug this? Can you add more logging around this area?
We can add more logging, but I'm still not sure what to look for. Some thoughts:

* Are there any differences between stage and production that could be causing this (other than magnitude of amount of traffic)?
* Is it possible that some of the processes haven't been killed and celery is still not picking up tasks?

Tasks are logged when they run, so if we confirm that tasks fire off every time, then we can try pinning celery to master, like here:
http://github.com/jsocol/kitsune/commit/b3037299dfbd7d86271c9482e5518cdcea631a6b
Shyam mentioned that we were using generic06's sendmail facility instead of configuring the normal outgoing servers in settings_local.py. He also mentioned that he was seeing outgoing mail in the sendmail log, with stat=Sent.
What changes should i make?

Zamboni's only email config in settings_local.py is:

EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'


Looks like kitsune is the same.
(In reply to comment #9)
> What changes should i make?

Shyam? ^ You're the one that pointed out we shouldn't(?) be using the sendmail instance on generic06.
(In reply to comment #10)

> Shyam? ^ You're the one that pointed out we shouldn't(?) be using the sendmail
> instance on generic06.

I just don't like sendmail, but it looks like it's configured alright so I'm not sure what more we can do here apart from replacing sendmail with postfix. I've had pretty bad experiences with sendmail in the past and debugging is a real pain sometimes.
I also agree that sendmail is horrible, but it is using the same base sendmail configs as AMO, so I doubt that is the problem here.
Logging is in place. Let me know if you need help matching up stuff in the mail logs.
Assignee: jeremy.orem+bugs → nobody
Component: Server Operations → Knowledge Base Software
Product: mozilla.org → support.mozilla.com
QA Contact: mrz → kb-software
Version: other → unspecified
Everything seems to be working (knock on wood).
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.