Closed Bug 1339288 Opened 7 years ago Closed 6 years ago

Consider using Celery acks_late to prevent loss of tasks if worker crashes

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

Details

Ed Morley [:emorley]

Reporter

Description

•

7 years ago

Turns out that if the worker crashes we can lose tasks.

Whilst that should be rare, I wonder if we can also hit this case when Heroku terminates dynos due to deploys/the 24-hourly dyno restart.

"""
Should I use retry or acks_late?

Answer: Depends. It’s not necessarily one or the other, you may want to use both.

Task.retry is used to retry tasks, notably for expected errors that is catchable with the try: block. The AMQP transaction is not used for these errors: if the task raises an exception it is still acknowledged!

The acks_late setting would be used when you need the task to be executed again if the worker (for some reason) crashes mid-execution. It’s important to note that the worker is not known to crash, and if it does it is usually an unrecoverable error that requires human intervention (bug in the worker, or task code).

...

So use retry for Python errors, and if your task is idempotent combine that with acks_late if that level of reliability is required.
"""

See:
http://docs.celeryproject.org/en/3.1/faq.html#faq-acks-late-vs-retry

Ed Morley [:emorley]

Reporter

Updated

•

7 years ago

Component: Treeherder → Treeherder: Infrastructure

Ed Morley [:emorley]

Reporter

Comment 1

•

6 years ago

We don't see worker crashes ever

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Consider using Celery acks_late to prevent loss of tasks if worker crashes

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1