Closed Bug 1339093 Opened 7 years ago Closed 5 years ago

The way we schedule Celery beat periodic tasks is broken

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Unassigned)

References

Details

A number of the tasks that run are queued on a schedule, rather than in response to something we ingest. For example polling Hg json-pushes/builds-4hr or running cycle data.

We currently use Celery's persistent file-based schedule (`celerybeat-schedule`) along with time deltas (eg `timedelta(days=1)`) and also use `relative=True` mode.

This is problematic, since on every dyno restart (after deployment or every 24 hours) the schedule will be lost (since dyno filesystems are ephemeral), and relative mode makes the task run X minutes/hours after celery beat started, not at a fixed time each day.

As such, we may run the daily tasks more or less often than we actually expect, depend on when we happen to deploy.

In addition, several of the high-load tasks are set to run at the same time, which isn't ideal.
Suggestions:
* Consider using Django's DB schedule
* Or else use the crontab type rather than timedelta (ie: set a specific time of day rather than just a frequency)
See Also: → 1176492
Priority: P2 → P1
Component: Treeherder → Treeherder: Infrastructure
Assignee: emorley → nobody

Wontfix since we're moving away from Celery beat tasks (see bug 1176492 and deps).

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.