Closed
Bug 1176492
Opened 10 years ago
Closed 6 years ago
Consider moving the less frequent periodic tasks on Heroku to use the scheduler addon
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P2)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
We currently have periodic tasks like cycle data running on the "worker_default" dyno:
https://dashboard.heroku.com/apps/treeherder-heroku/resources
This seems problematic for a few reasons:
1) Since dynos are restarted once every 24 hours, we may end up interrupting say the cycle-data task, since it's long-running.
2) The load on that dyno can vary quite considerably, depending on what periodic tasks are running at that point in time. We'll either end up overloading the dyno or else paying for more than we need 90% of the time.
3) Long running but low importance tasks like cycle-data can block more urgent tasks
It seems like the scheduler addon might be a better fit for things like cycle-data & fetch-bugs:
https://devcenter.heroku.com/articles/scheduler
https://elements.heroku.com/addons/scheduler
Assignee | ||
Updated•10 years ago
|
Summary: Consider moving the periodic tasks on Heroku to use the scheduler addon → Consider moving the less frequent periodic tasks on Heroku to use the scheduler addon
Assignee | ||
Comment 1•10 years ago
|
||
One limitation of this scheduler addon is that the job frequency has to be one of every {10 minutes, hour, day}.
Assignee | ||
Comment 3•8 years ago
|
||
Fixing this would mean the cycle_data task gets its own dyno so less likely to run out of RAM as seen in bug 1346567.
Assignee | ||
Updated•7 years ago
|
Assignee: emorley → nobody
Priority: P1 → P2
Comment 4•6 years ago
|
||
Updated•6 years ago
|
Attachment #9008136 -
Flags: review?(emorley)
Comment 5•6 years ago
|
||
I have added the scheduler add-on to proto, stage and prod already. I also scheduled the tasks since running more often won't hurt anything, and this ensures we don't forget to do it if/when we merge the PR. :)
Assignee | ||
Comment 6•6 years ago
|
||
(In reply to Cameron Dawson [:camd] from comment #5)
> I have added the scheduler add-on to proto, stage and prod already. I also
> scheduled the tasks since running more often won't hurt anything, and this
> ensures we don't forget to do it if/when we merge the PR. :)
Ah thank you :-)
We'll need to see how the tasks get on -- I suspect the cycle_data task might run out of RAM on the smaller P1 dyno (the default worker currently uses a P2 that has double the RAM) - but might as well start small and work our way up.
Comment 7•6 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/982c1ec2fe24bab45e32179ba6f0e1cd42b2b64a
Bug 1176492 - Move fetch_bugs and cycle_data to heroku scheduler (#4019)
Comment 8•6 years ago
|
||
I'll leave this open to consider moving ``seta-analyze-failures`` and the intermittents commenter tasks. I suppose we could even move over the ``fetch-push-logs-every-5-minutes`` if we could change it to every 10, which I imagine we could.
Assignee | ||
Comment 9•6 years ago
|
||
Comment on attachment 9008136 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4019
(Was reviewed on GitHub and merged already; forgot to sync the r+ back to Bugzilla too)
Attachment #9008136 -
Flags: review?(emorley) → review+
Assignee | ||
Comment 10•6 years ago
|
||
This worked great at fixing bug 1484642 :-)
Something we need to keep an eye on, is whether the tasks need larger sizes of dynos (the default worker was a P2 dyno and these new tasks are using a P1) - though now we can fine tune more than before, since tasks are separated out. The dyno specs are listed here:
https://devcenter.heroku.com/articles/dyno-types
Also, since these tasks don't run via a permanent dyno, they don't show up in metrics (https://dashboard.heroku.com/apps/treeherder-prod/metrics), so we'll need to monitor via Papertrial instead.
Logs:
* Prod: https://papertrailapp.com/systems/treeherder-prod/events?q=program%3Ascheduler
* Stage: https://papertrailapp.com/systems/treeherder-stage/events?q=program%3Ascheduler
* Prototype: https://papertrailapp.com/systems/treeherder-prototype/events?q=program%3Ascheduler
For `./manage.py update_bugscache`, the peak memory usage is only 128MB (so the smallest P1 dyno seems fine):
Sep 12 10:47:56 treeherder-prod heroku/scheduler.7655: source=scheduler.7655 dyno=<SNIP> sample#memory_total=128.48MB sample#memory_rss=124.78MB sample#memory_cache=3.70MB sample#memory_swap=0.00MB ...
For `./manage.py cycle_data`, the task is being killed since it exceeded the 512MB RAM of the P1 dyno (it reached 1271MB usage before it was killed):
https://papertrailapp.com/systems/treeherder-stage/events?centered_on_id=976239624282349590&q=program%3Aheroku%2Fscheduler.7040
I've bumped it to a Performance-M (2.5GB RAM) for now (which I hope will be enough?), but we should see about reducing usage in bug 1346567 to save credits later on.
At the moment the tasks won't appear in New Relic. However we should be able to make that happen by changing the command to `newrelic-admin run-program ./manage.py ...` and adding the relevant management commands to the list here:
https://github.com/mozilla/treeherder/blob/b5a6736f9b26ac7c6441fb5da3a95831933e7dd7/newrelic.ini#L28-L31
Finally, since cycle_data is no longer hogging RAM, I've dropped the "default" worker dyno type down from a P2 to a P1, and reduced the count from 2 to 1 for prototype/stage (but not prod, since the commenter has more to do there as the API key is set, and we don't want it blocking perf alert generation). Across prototype+stage+prod, that saves us another 8 dyno credits, lowering total Treherder Heroku usage (after bug 1443251 comment 6) from 88 to 80 credits/month.
Blocks: 1484642
Assignee | ||
Comment 11•6 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #10)
> Finally, since cycle_data is no longer hogging RAM, I've dropped the
> "default" worker dyno type down from a P2 to a P1, and reduced the count
> from 2 to 1 for prototype/stage (but not prod, since the commenter has more
> to do there as the API key is set, and we don't want it blocking perf alert
> generation)
I've raised stage's default worker count from `1` back to `2` since there were a few queue spikes causing alerts. It's still a P1 dyno so still using less credits than prior to these changes.
Assignee | ||
Comment 12•6 years ago
|
||
The cycle_data task is still exceeding the RAM limits:
Sep 14 06:37:13 treeherder-prod heroku/scheduler.4138: Process running mem=5325M(208.0%)
Sep 14 06:37:13 treeherder-prod heroku/scheduler.4138: Error R15 (Memory quota vastly exceeded)
(https://papertrailapp.com/systems/treeherder-prod/events?centered_on_id=977131112423915535&q=program%3Aheroku%2Fscheduler.4138)
I've bumped it to a Performance-L for now (which has 14GB RAM instead of 2.5GB).
Assigning this to me to remind me to check back at cycle_data and also look at moving the remaining tasks at some point.
Assignee: nobody → emorley
Assignee | ||
Comment 13•6 years ago
|
||
I've updated the tasks to use the New Relic wrapper (ie prefixed with `newrelic-admin run-program`).
We will also need to update newrelic.ini to add these commands to the ones that are instrumented.
Assignee | ||
Comment 14•6 years ago
|
||
Most tasks have now been migrated. I've filed dep bugs for the remaining two (that are less urgent).
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•