Closed
Bug 1340132
Opened 8 years ago
Closed 6 years ago
Stop using --maxtasksperchild with celery now datasource isn't leaking
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P2)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
Bug 1312007 found datasource was leaking for web requests. This is also presumably why the background tasks used to leak too, and so why we had had to use the `--maxtasksperchild` option for celery workers until now.
However since we're no longer using datasource, we should try removing the option since:
1) It will improve celery worker performance (some of our workers are using a --maxtasksperchild value of 20, which is really inefficient when some tasks only take a second or two each)
2) It reduces the number of background tasks that don't end up reported to New Relic (since celery forcibly kills the worker, see https://docs.newrelic.com/docs/agents/python-agent/back-end-services/python-agent-celery)
If testing shows us to still have small leaks we could always still use the option but at a much higher threshold.
Assignee | ||
Comment 1•8 years ago
|
||
Testing for 30 minutes on stage showed:
* pushlog: unchanged (previously maxtasksperchild=500)
* buildapi_pending: unchanged (previously maxtasksperchild=20)
* buildapi_running: unchanged (previously maxtasksperchild=20)
* buildapi_4hr: unchanged (previously maxtasksperchild=20)
* store_pulse_jobs: unchanged RAM usage, lower dyno load (previously maxtasksperchild=20)
* store_pulse_resultsets: unchanged (previously maxtasksperchild=20)
* default: unchanged (previously maxtasksperchild=50)
* hp: unchanged (previously maxtasksperchild=50)
* log_parser: increase in RAM usage, would need to investigate (previously maxtasksperchild=50)
30 minutes isn't long enough for a proper test, but at first glance it seems that most workers are unaffected by leaks any more, store_pulse_jobs performs much better when not having to restart every 20 jobs, and the log parser may need to be excluded initially, until further investigated.
Comment 2•8 years ago
|
||
Assignee | ||
Comment 3•8 years ago
|
||
Comment on attachment 8864662 [details] [review]
[treeherder] mozilla:celery-no-maxtasksperchild > mozilla:master
This removes `--maxtasksperchild` usage from all tasks apart from the logparser, since everything but that seemed fine with it when previously tested (comment 1).
Attachment #8864662 -
Flags: review?(cdawson)
Comment 4•8 years ago
|
||
Comment on attachment 8864662 [details] [review]
[treeherder] mozilla:celery-no-maxtasksperchild > mozilla:master
Awesome!
Attachment #8864662 -
Flags: review?(cdawson) → review+
Comment 5•8 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/f6e320aa92e2af7acf788a665c9111d36bddf02c
Bug 1340132 - Reduce usage of --maxtasksperchild with celery
Now we're no longer using datasource, the memory leaks previously
seen have gone. As such there is no need to make the celery worker
processes restart so frequently (they will now only be restarted at
the daily Heroku dyno restart), which will improve performance.
In addition, this will reduce the number of transactions that don't get
reported to New Relic (when the maxtasksperchild threshold is
reached, the worker is forcibly killed before it can submit the
New Relic payload).
The log parser tasks have been left unchanged for now, since they
appear to still be leaking.
Assignee | ||
Updated•7 years ago
|
Component: Treeherder → Treeherder: Infrastructure
Assignee | ||
Comment 6•6 years ago
|
||
I've split out the remaining log parser part to bug 1513506.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•