Closed Bug 1340132 Opened 8 years ago Closed 6 years ago

Stop using --maxtasksperchild with celery now datasource isn't leaking

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Bug 1312007 found datasource was leaking for web requests. This is also presumably why the background tasks used to leak too, and so why we had had to use the `--maxtasksperchild` option for celery workers until now. However since we're no longer using datasource, we should try removing the option since: 1) It will improve celery worker performance (some of our workers are using a --maxtasksperchild value of 20, which is really inefficient when some tasks only take a second or two each) 2) It reduces the number of background tasks that don't end up reported to New Relic (since celery forcibly kills the worker, see https://docs.newrelic.com/docs/agents/python-agent/back-end-services/python-agent-celery) If testing shows us to still have small leaks we could always still use the option but at a much higher threshold.
Blocks: 1059325
Testing for 30 minutes on stage showed: * pushlog: unchanged (previously maxtasksperchild=500) * buildapi_pending: unchanged (previously maxtasksperchild=20) * buildapi_running: unchanged (previously maxtasksperchild=20) * buildapi_4hr: unchanged (previously maxtasksperchild=20) * store_pulse_jobs: unchanged RAM usage, lower dyno load (previously maxtasksperchild=20) * store_pulse_resultsets: unchanged (previously maxtasksperchild=20) * default: unchanged (previously maxtasksperchild=50) * hp: unchanged (previously maxtasksperchild=50) * log_parser: increase in RAM usage, would need to investigate (previously maxtasksperchild=50) 30 minutes isn't long enough for a proper test, but at first glance it seems that most workers are unaffected by leaks any more, store_pulse_jobs performs much better when not having to restart every 20 jobs, and the log parser may need to be excluded initially, until further investigated.
Comment on attachment 8864662 [details] [review] [treeherder] mozilla:celery-no-maxtasksperchild > mozilla:master This removes `--maxtasksperchild` usage from all tasks apart from the logparser, since everything but that seemed fine with it when previously tested (comment 1).
Attachment #8864662 - Flags: review?(cdawson)
Comment on attachment 8864662 [details] [review] [treeherder] mozilla:celery-no-maxtasksperchild > mozilla:master Awesome!
Attachment #8864662 - Flags: review?(cdawson) → review+
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/f6e320aa92e2af7acf788a665c9111d36bddf02c Bug 1340132 - Reduce usage of --maxtasksperchild with celery Now we're no longer using datasource, the memory leaks previously seen have gone. As such there is no need to make the celery worker processes restart so frequently (they will now only be restarted at the daily Heroku dyno restart), which will improve performance. In addition, this will reduce the number of transactions that don't get reported to New Relic (when the maxtasksperchild threshold is reached, the worker is forcibly killed before it can submit the New Relic payload). The log parser tasks have been left unchanged for now, since they appear to still be leaking.
Component: Treeherder → Treeherder: Infrastructure
Depends on: 1513506
I've split out the remaining log parser part to bug 1513506.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: