[remo] Celery not working after upgrade on reps-dev.allizom.org

VERIFIED FIXED

Status

Infrastructure & Operations Graveyard
WebOps: Engagement
VERIFIED FIXED
2 years ago
2 years ago

People

(Reporter: nemo, Assigned: ericz)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2995] )

(Reporter)

Description

2 years ago
Because of the ongoing django upgrade [1] we upgraded celery to version 3.1.23 since the old version was not compatible. While debugging deployment on dev it looks like celery service is not working. Unfortunately chief doesn't give as any logs. Here are the log entries from chief that show celery service failing:

[2016-05-18 17:45:55] Running update_celery
[2016-05-18 17:45:57] [python1.dev.webapp.phx1.mozilla.com] running: /sbin/service celeryd-reps-dev restart
[2016-05-18 17:45:59] [python1.dev.webapp.phx1.mozilla.com] finished: /sbin/service celeryd-reps-dev restart (2.306s)
[python1.dev.webapp.phx1.mozilla.com] out: Restarting celery-reps-dev: celery-reps-dev: ERROR (not running) 

* Can you tell us what's the command that spawns the celery workers so we can debug this locally?
* Does supervisord show any logs related to celery service failures?

Mozillians.org has exactly the same setup and celery is working fine. Can you adapt the supervisord configuration for `celeryd-reps-dev` after `celeryd-mozillians-dev`. The same applies for celery beat.

FYI for now that affects only dev (reps-dev.allizom.org) since changes have not been propagated to the rest of the envs.

[1] bug 1173705
(Reporter)

Updated

2 years ago
Blocks: 1173705

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2995]
(Assignee)

Updated

2 years ago
Assignee: server-ops-webops → eziegenhorn
(Assignee)

Comment 1

2 years ago
The supervisord configs are the same for reps-dev and mozillians-dev.  The immediate issue was that the newrelic-admin script was not made relocatable in the virtualenv so it had a bad path to the interpreter for that script giving the confusing error:

  env: /data/www/reps-dev.allizom.org/venv/bin/newrelic-admin: No such file or directory

When indeed that file exists fine and the issue was the interpreter specified in that file.  Now that is fixed, I see another issue though:

  No handlers could be found for logger "product_details"
  Unknown command: 'celeryd'
  Type 'manage.py help' for usage.

which I'm betting you know how to fix.  Thoughts?
Celery should be fixed now. Regarding newrelic is it something that we need to do on our end on the update script in order to be relocatable?
(Assignee)

Comment 3

2 years ago
That is something done when we setup the virtualenv in the first place so I don't how it wasn't noticed before or broke recently.  Perhaps the update scripts rebuild the virtualenv but don't do the "virtualenv-2.7 --relocatable venv" step?
The virtualenv is rebuilt in each deployment [0] and in each push the "virtualenv-2.7 --relocatable venv" step is executed [1]. Given step [0], maybe this was just a glitch the first time that chief tried to deploy the new code. Everything seems to be working fine now. Thoughts?

[0] https://github.com/mozilla/remo/blob/master/bin/update/update.py#L107
[1] https://github.com/mozilla/remo/blob/master/bin/update/update.py#L117
(Reporter)

Comment 5

2 years ago
It looks like the same applies for celery on stage (and I guess the same issue will appear on prod). Also for some reason celery beat on stage works fine.
(Assignee)

Comment 6

2 years ago
I forced stage to be relocatable as well.  Hopefully this is just a one-time glitch.  If it reoccurs we can dive deeper.  Anything else need touching here?
I believe we are good. The only thing left is to do the same for prod. The release will take place on Monday 7am PST. I guess we can use the other bug [0] for this. Thanks for all the help!

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1274548
(Assignee)

Comment 8

2 years ago
I just checked prod on the admin node and it doesn't have this problem.
(Assignee)

Comment 9

2 years ago
Ah but I guess you're saying it will have this problem Monday morning?  If so, just ping me then.
(Reporter)

Comment 10

2 years ago
Django upgrade is now live on prod. We verified that the current celery version is working fine.
Thanks for the help!
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(Reporter)

Updated

2 years ago
Status: RESOLVED → VERIFIED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.