Closed Bug 690360 Opened 14 years ago Closed 14 years ago

Stuck cron jobs on AMO?

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Assigned: nmaul)

Details

We've been getting mail every few minutes for the past few days about some failing cron jobs. I was hoping it would recover, but it appears to not be the case. It's for production AMO and these are the crons: /usr/bin/python2.6 manage.py cron update_collections_subscribers /usr/bin/python2.6 manage.py cron update_addons_current_version /usr/bin/python2.6 manage.py cron update_collections_votes Before the cron jobs run, they check to see if a previous copy is still running (in which case they won't run). In the case of these three, they are claiming previous copies are running all the time. The lock files are these, respectively: `/tmp/django_cron.lock.update_collections_subscribers` `/tmp/django_cron.lock.update_addons_current_version` `/tmp/django_cron.lock.update_collections_votes` I don't have clear steps here so I'm filing this bug to feel out the situation. Can you give us the timestamps on those files? Can you confirm that there aren't more copies of AMO cron jobs running on addonsadm.private.phx1? In the past we've had stage and prod on the same box and the cron jobs collided. Anything else fishy you can think of?
apache apache 0 Sep 23 12:00 django_cron.lock.update_addons_current_version apache apache 0 Sep 23 12:05 django_cron.lock.update_collections_subscribers apache apache 0 Sep 23 12:25 django_cron.lock.update_collections_votes And it looks like they really are still running: [root@addonsadm.private.phx1 tmp]# ps axo pid,etime,command <snip> 27195 6-00:26:47 /usr/bin/python2.6 manage.py cron update_collections_votes 25549 6-00:47:29 /usr/bin/python2.6 manage.py cron update_collections_subscribers 21470 6-00:52:44 /usr/bin/python2.6 manage.py cron update_addons_current_version They're all for prod, by the way. None of them currently running for stage or dev. Want me to kill them and rm those 3 lock files?
Assignee: server-ops → nmaul
Wow, yes please. We can keep an eye on them when they rerun to make sure it's not something that is going to keep happening. Thanks.
All 3 killed and their lockfiles removed. Also killed the job and lockfile for 'manage.py cron deliver_hotness', per IRC conversation. So far, Things look to be back to normal. I'm not seeing any long-running "manage.py cron" processes hanging around, and at least update_collections_votes just ran. I think this is fixed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.