Closed Bug 1307782 Opened 8 years ago Closed 8 years ago

Raise the Celery task time_limit for the buildbot ingestion tasks

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: camd)

References

Details

During the prod Heroku migration, when ingestion was resumed on Heroku, the initial couple of builds-4hr task runs failed with:

 Oct 05 14:03:00 treeherder-prod app/worker_buildapi_4hr.1: TimeLimitExceeded: TimeLimitExceeded(180,) 
(https://papertrailapp.com/systems/treeherder-prod/events?centered_on_id=720310299303125003)

This is because with an empty memcached, the builds-4hr ingestion can't skip previously seen jobs so takes more time. This combined with high load from simultaneous Pulse jobs ingestion catch-up made the timeout exceed the 180s currently set here:

https://github.com/mozilla/treeherder/blob/f7e2c5cd423244d2963055fd2603e650ada845c3/treeherder/etl/tasks/buildapi_tasks.py#L31

It also appears that any Celery timeouts are not reported in New Relic.

We should raise the timeout and see why they aren't caught by the NR agent.
The above issue was worked around during the Heroku migration, by making the worker_buildapi_4hr dyno type use a P-M dyno not a P2.
builds-4hr ingestion is still timing out occasionally:
https://papertrailapp.com/systems/treeherder-prod/events?q=TimeLimitExceeded+program%3Aworker_buildapi
Priority: P2 → P1
Plus fetch-allthethings:
https://papertrailapp.com/systems/treeherder-prod/events?centered_on_id=721067034422829071

Cameron, I don't suppose you could raises all the time limits in buildapi_tasks.py for me, land on master and then cherry pick just that commit to the production branch? I have to head out shortly.
Flags: needinfo?(cdawson)
Depends on: 1308549
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/c88a9fb3fa5fd4d60dc6d3438fee5c65c9e91dae
Bug 1307782 - Raise the Celery task time_limit for the buildbot ingestion tasks
Yep, working on that now.  Wasn't sure what numbers would be best, but tried 10 for each, and 15 for fetch-allthethings.
OK, deployed to production now.
Flags: needinfo?(cdawson)
Many thanks! :-)
Assignee: nobody → cdawson
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.