Closed Bug 1069736 Opened 10 years ago Closed 10 years ago

Fix possible colliding cron jobs.

Categories

(support.mozilla.org :: Code Quality, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED
2014Q3

People

(Reporter: mythmon, Assigned: rrosario)

References

Details

(Whiteboard: u=dev c=cron p=1 s=2014.17)

The peep+virtualenv upgrade included a few package upgrades, including an upgrade to django-cronjobs that adds cronjob locking by default. Either because of a false positive or actual colliding cron jobs, 3 jobs are spewing lots of locking warnings like:

    Script run multiple times. If this isn't true, delete `/tmp/django_cron.lock.collect_tweets`.

The 3 tasks I've seen do this are collect_tweets, enqueue_lag_monitor_task, and record_queue_size.

Based on the number and timing of the emails, I suspect there is a stale log file. This would also explain the celery report of the queue size not going down.

I suspect we need to remove the lock files, and/or disable locking on these tasks if that is safe to do so.

This is important, we should look at right away, and it is probably worth deploying on Friday for. Hopefully it should be pretty easy. 1pt and in this sprint.
Something fishy here. Here is one of the tasks that kicks off every 10 minutes from a cron job:

@task()
@timeit
def measure_queue_lag(queued_time):
    """A task that measures the time it was sitting in the queue.

    It saves the data to graphite via statsd.
    """
    lag = datetime.now() - queued_time
    lag = (lag.days * 3600 * 24) + lag.seconds
    statsd.gauge('rabbitmq.lag', max(lag, 0))


I don't see how this can be taking 10 minutes to collide with the next one. Maybe removing the lock files is the first thing to test.
Or possibly the best way forward, short term, is to go back to what we had before which is not locking tasks.
Oh, my bad. This isn't related to tasks.

The problem is all the cronjobs run on the same machine, that's the problem. So we need to use this setting:

If you run multiple sets of cronjobs on the same file system and need the locks to not collide, set CRONJOB_LOCK_PREFIX to something unique in your Django settings.
Priority: -- → P1
Target Milestone: --- → 2014Q3
Filed Webops bug 1071036
Depends on: 1071036
yay emails stopped
Assignee: nobody → rrosario
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.