Closed Bug 1176492 Opened 10 years ago Closed 6 years ago

Consider moving the less frequent periodic tasks on Heroku to use the scheduler addon

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

Product:

Component:

Type:

defect

Priority:

P2

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4019 7 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request	emorley : review+	Details \| Review

Ed Morley [:emorley]

Assignee

Description

•

10 years ago

We currently have periodic tasks like cycle data running on the "worker_default" dyno: https://dashboard.heroku.com/apps/treeherder-heroku/resources This seems problematic for a few reasons: 1) Since dynos are restarted once every 24 hours, we may end up interrupting say the cycle-data task, since it's long-running. 2) The load on that dyno can vary quite considerably, depending on what periodic tasks are running at that point in time. We'll either end up overloading the dyno or else paying for more than we need 90% of the time. 3) Long running but low importance tasks like cycle-data can block more urgent tasks It seems like the scheduler addon might be a better fit for things like cycle-data & fetch-bugs: https://devcenter.heroku.com/articles/scheduler https://elements.heroku.com/addons/scheduler

Ed Morley [:emorley]

Assignee

Updated

•

10 years ago

Summary: Consider moving the periodic tasks on Heroku to use the scheduler addon → Consider moving the less frequent periodic tasks on Heroku to use the scheduler addon

Ed Morley [:emorley]

Assignee

Comment 1

•

10 years ago

One limitation of this scheduler addon is that the job frequency has to be one of every {10 minutes, hour, day}.

Ed Morley [:emorley]

Assignee

Comment 2

•

9 years ago

This can wait until after the main move.

Blocks: treeherder-heroku-polish
No longer blocks: treeherder-heroku

Ed Morley [:emorley]

Assignee

Updated

•

8 years ago

See Also: → 1339093

Ed Morley [:emorley]

Assignee

Comment 3

•

8 years ago

Fixing this would mean the cycle_data task gets its own dyno so less likely to run out of RAM as seen in bug 1346567.

Assignee: nobody → emorley

Blocks: 1346567

Priority: P3 → P1

Ed Morley [:emorley]

Assignee

Updated

•

8 years ago

Assignee: emorley → nobody

Priority: P1 → P2

GitHub Bugzilla PR Linker

Comment 4

•

7 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4019 — Details

Cameron Dawson [:camd]

Updated

•

7 years ago

Attachment #9008136 - Flags: review?(emorley)

Cameron Dawson [:camd]

Comment 5

•

7 years ago

I have added the scheduler add-on to proto, stage and prod already. I also scheduled the tasks since running more often won't hurt anything, and this ensures we don't forget to do it if/when we merge the PR. :)

Ed Morley [:emorley]

Assignee

Comment 6

•

7 years ago

(In reply to Cameron Dawson [:camd] from comment #5) > I have added the scheduler add-on to proto, stage and prod already. I also > scheduled the tasks since running more often won't hurt anything, and this > ensures we don't forget to do it if/when we merge the PR. :) Ah thank you :-) We'll need to see how the tasks get on -- I suspect the cycle_data task might run out of RAM on the smaller P1 dyno (the default worker currently uses a P2 that has double the RAM) - but might as well start small and work our way up.

Treeherder GitHub Bugbot

Comment 7

•

7 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/982c1ec2fe24bab45e32179ba6f0e1cd42b2b64a Bug 1176492 - Move fetch_bugs and cycle_data to heroku scheduler (#4019)

Cameron Dawson [:camd]

Comment 8

•

7 years ago

I'll leave this open to consider moving ``seta-analyze-failures`` and the intermittents commenter tasks. I suppose we could even move over the ``fetch-push-logs-every-5-minutes`` if we could change it to every 10, which I imagine we could.

Ed Morley [:emorley]

Assignee

Comment 9

•

7 years ago

Comment on attachment 9008136 [details] [review] Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4019 (Was reviewed on GitHub and merged already; forgot to sync the r+ back to Bugzilla too)

Attachment #9008136 - Flags: review?(emorley) → review+

Ed Morley [:emorley]

Assignee

Comment 10

•

7 years ago

This worked great at fixing bug 1484642 :-) Something we need to keep an eye on, is whether the tasks need larger sizes of dynos (the default worker was a P2 dyno and these new tasks are using a P1) - though now we can fine tune more than before, since tasks are separated out. The dyno specs are listed here: https://devcenter.heroku.com/articles/dyno-types Also, since these tasks don't run via a permanent dyno, they don't show up in metrics (https://dashboard.heroku.com/apps/treeherder-prod/metrics), so we'll need to monitor via Papertrial instead. Logs: * Prod: https://papertrailapp.com/systems/treeherder-prod/events?q=program%3Ascheduler * Stage: https://papertrailapp.com/systems/treeherder-stage/events?q=program%3Ascheduler * Prototype: https://papertrailapp.com/systems/treeherder-prototype/events?q=program%3Ascheduler For `./manage.py update_bugscache`, the peak memory usage is only 128MB (so the smallest P1 dyno seems fine): Sep 12 10:47:56 treeherder-prod heroku/scheduler.7655: source=scheduler.7655 dyno=<SNIP> sample#memory_total=128.48MB sample#memory_rss=124.78MB sample#memory_cache=3.70MB sample#memory_swap=0.00MB ... For `./manage.py cycle_data`, the task is being killed since it exceeded the 512MB RAM of the P1 dyno (it reached 1271MB usage before it was killed): https://papertrailapp.com/systems/treeherder-stage/events?centered_on_id=976239624282349590&q=program%3Aheroku%2Fscheduler.7040 I've bumped it to a Performance-M (2.5GB RAM) for now (which I hope will be enough?), but we should see about reducing usage in bug 1346567 to save credits later on. At the moment the tasks won't appear in New Relic. However we should be able to make that happen by changing the command to `newrelic-admin run-program ./manage.py ...` and adding the relevant management commands to the list here: https://github.com/mozilla/treeherder/blob/b5a6736f9b26ac7c6441fb5da3a95831933e7dd7/newrelic.ini#L28-L31 Finally, since cycle_data is no longer hogging RAM, I've dropped the "default" worker dyno type down from a P2 to a P1, and reduced the count from 2 to 1 for prototype/stage (but not prod, since the commenter has more to do there as the API key is set, and we don't want it blocking perf alert generation). Across prototype+stage+prod, that saves us another 8 dyno credits, lowering total Treherder Heroku usage (after bug 1443251 comment 6) from 88 to 80 credits/month.

Blocks: 1484642

Ed Morley [:emorley]

Assignee

Comment 11

•

7 years ago

(In reply to Ed Morley [:emorley] from comment #10) > Finally, since cycle_data is no longer hogging RAM, I've dropped the > "default" worker dyno type down from a P2 to a P1, and reduced the count > from 2 to 1 for prototype/stage (but not prod, since the commenter has more > to do there as the API key is set, and we don't want it blocking perf alert > generation) I've raised stage's default worker count from `1` back to `2` since there were a few queue spikes causing alerts. It's still a P1 dyno so still using less credits than prior to these changes.

Ed Morley [:emorley]

Assignee

Comment 12

•

7 years ago

The cycle_data task is still exceeding the RAM limits: Sep 14 06:37:13 treeherder-prod heroku/scheduler.4138: Process running mem=5325M(208.0%) Sep 14 06:37:13 treeherder-prod heroku/scheduler.4138: Error R15 (Memory quota vastly exceeded) (https://papertrailapp.com/systems/treeherder-prod/events?centered_on_id=977131112423915535&q=program%3Aheroku%2Fscheduler.4138) I've bumped it to a Performance-L for now (which has 14GB RAM instead of 2.5GB). Assigning this to me to remind me to check back at cycle_data and also look at moving the remaining tasks at some point.

Assignee: nobody → emorley

Ed Morley [:emorley]

Assignee

Comment 13

•

6 years ago

I've updated the tasks to use the New Relic wrapper (ie prefixed with `newrelic-admin run-program`). We will also need to update newrelic.ini to add these commands to the ones that are instrumented.

Ed Morley [:emorley]

Assignee

Updated

•

6 years ago

Depends on: 1503576

Ed Morley [:emorley]

Assignee

Updated

•

6 years ago

Depends on: 1508228

Ed Morley [:emorley]

Assignee

Updated

•

6 years ago

Depends on: 1518780

Ed Morley [:emorley]

Assignee

Updated

•

6 years ago

Depends on: 1518782

Ed Morley [:emorley]

Assignee

Comment 14

•

6 years ago

Most tasks have now been migrated. I've filed dep bugs for the remaining two (that are less urgent).

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.