Closed Bug 1337717 Opened 7 years ago Closed 5 years ago

Update to Celery/Kombu 4.x

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file, 1 obsolete file)

Broken out of bug 1333079 since the original landing had to be reverted.

We're currently using Celery/Kombu 3.x and at some point should try again switching to the latest 4.x series.

One of the smaller but helpful for us fixes that this brings, is the change to py-amqp that displays more helpful error messages in case of incorrect rabbitmq password (which has bitten us many times in the past). Only Celery/Kombu 4.x are compatible with this newer py-amqp.

Since the original landing attempt in bug 1333079, some possible regressions in Celery 4.x were reported in the celery issue tracker:
https://github.com/celery/celery/issues/3814
https://github.com/celery/celery/issues/3737

Given that's I'm inclined to wait a bit longer for Celery 4.x to mature first.

When we do attempt this again, we'll need to reland:
https://github.com/mozilla/treeherder/commit/f09694e3ccb60393ea171e2923a0633186049124
https://github.com/mozilla/treeherder/commit/1db3e2baf46c5df426dc234657d94e9585668456

...and then fix the cause of everything being put in the "default" queue (bug 1333079 comment 9).
(In reply to Ed Morley [:emorley] from comment #0)
> Since the original landing attempt in bug 1333079, some possible regressions
> in Celery 4.x were reported in the celery issue tracker:
> https://github.com/celery/celery/issues/3814
> https://github.com/celery/celery/issues/3737

Both of those issues are now resolved.
In addition to improving the password rotation UX (see comment 0), and improved Python 3 support, I believe upgrading will also improve performance due to:
https://github.com/celery/celery/pull/4292

...and also mean startup exceptions aren't silently ignored:
https://github.com/celery/celery/pull/4146
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Priority: P3 → P1
Even after trying with the newer versions of celery/kombu and following the now-updated docs for the celery 3->4 migration, I encountered the same issue locally that caused us to have to revert the update the first time.

If this is a Celery bug I would have expected others to hit it too, so I can only think we've always been using queues/exchanges incorrectly, and this has just exposed the issue.

I've not been able to figure out what the issue is so far, so going to punt on this for now.

Assignee: emorley → nobody
Status: ASSIGNED → NEW
Priority: P1 → P2
Priority: P2 → P3
Blocks: 1470243
Blocks: 1529243
Blocks: 1529404

I managed to figure out the issue that was causing the problems the last two times I tried to upgrade Celery/Kombu. There is an undocumented breaking change in the way that one can call apply_async() - in that now if you specify just routing_key (and not also queue), then Celery/Kombu v4 ignore the routing key entirely, when they didn't before.

I'm keen to get this landed before I leave since it unblocks Python 3.7, will hopefully help prevent the pulse listener hang seen in bug 1529404, and also unblocks bug 1470243 which can help avoid task loss in certain cases during restarts/deploys.

Assignee: nobody → emorley
Status: NEW → ASSIGNED
Priority: P3 → P1
Attachment #9030679 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: