A number of the tasks that run are queued on a schedule, rather than in response to something we ingest. For example polling Hg json-pushes/builds-4hr or running cycle data. We currently use Celery's persistent file-based schedule (`celerybeat-schedule`) along with time deltas (eg `timedelta(days=1)`) and also use `relative=True` mode. This is problematic, since on every dyno restart (after deployment or every 24 hours) the schedule will be lost (since dyno filesystems are ephemeral), and relative mode makes the task run X minutes/hours after celery beat started, not at a fixed time each day. As such, we may run the daily tasks more or less often than we actually expect, depend on when we happen to deploy. In addition, several of the high-load tasks are set to run at the same time, which isn't ideal.
Suggestions: * Consider using Django's DB schedule * Or else use the crontab type rather than timedelta (ie: set a specific time of day rather than just a frequency)
Component: Treeherder → Treeherder: Infrastructure
You need to log in before you can comment on or make changes to this bug.