The way we schedule Celery beat periodic tasks is broken

NEW
Unassigned

Status

Tree Management
Treeherder: Infrastructure
P1
normal
a year ago
7 months ago

People

(Reporter: emorley, Unassigned)

Tracking

Details

(Reporter)

Description

a year ago
A number of the tasks that run are queued on a schedule, rather than in response to something we ingest. For example polling Hg json-pushes/builds-4hr or running cycle data.

We currently use Celery's persistent file-based schedule (`celerybeat-schedule`) along with time deltas (eg `timedelta(days=1)`) and also use `relative=True` mode.

This is problematic, since on every dyno restart (after deployment or every 24 hours) the schedule will be lost (since dyno filesystems are ephemeral), and relative mode makes the task run X minutes/hours after celery beat started, not at a fixed time each day.

As such, we may run the daily tasks more or less often than we actually expect, depend on when we happen to deploy.

In addition, several of the high-load tasks are set to run at the same time, which isn't ideal.
(Reporter)

Comment 1

a year ago
Suggestions:
* Consider using Django's DB schedule
* Or else use the crontab type rather than timedelta (ie: set a specific time of day rather than just a frequency)
(Reporter)

Updated

a year ago
See Also: → bug 1176492
(Reporter)

Updated

9 months ago
Priority: P2 → P1
(Reporter)

Updated

8 months ago
Component: Treeherder → Treeherder: Infrastructure
(Reporter)

Updated

7 months ago
Assignee: emorley → nobody
You need to log in before you can comment on or make changes to this bug.