Closed Bug 1663474 Opened 4 years ago Closed 3 years ago

cron hooks occasionally exceed Github rate limiting

Categories

(Release Engineering :: Release Automation: Other, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mtabara, Assigned: mozilla)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Two hooks failing today because we've hit Github rate limiting. Example logs here. The characteristics of the hooks failing are similar:

0 matches
[taskcluster 2020-09-07 14:30:44.040Z] Task ID: KWd2wpp9SeGaU1ywi56Tww
[taskcluster 2020-09-07 14:30:44.040Z] Worker ID: i-0a5b9e6ac5d134bb6
[taskcluster 2020-09-07 14:30:44.040Z] Worker Group: us-west-2
[taskcluster 2020-09-07 14:30:44.040Z] Worker Node Type: c5a.large
[taskcluster 2020-09-07 14:30:44.040Z] Worker Type: build-decision
[taskcluster 2020-09-07 14:30:44.040Z] Public IP: 34.213.202.103
[taskcluster 2020-09-07 14:30:44.040Z] Hostname: ip-10-144-42-57

Two unrelated hooks failed, one for application services and the other for android-l10n, both based on the same worker type instance.

Johan and I looked into this today and realized that, based on the cron-basis we're generating enough of these per hour, see this for example. Each hour, most of these are handled by the same instance with the same IP address. If more than 15 get generated every 15 mins or w\e the cron time slot is, it's easy to break the Github's 60 limit per hour.

Johan checked the failed cron hooks logs and turns out this is really intermittent so we see this once in a few weeks or so so it's not really burning. But if we decided to add more hooks into the system, this could become a problem.

Solutions:

  1. ignore for now as it's intermittent
  2. branch out the worker type definition that we use in the cron job templates from build-decision to something more specific per project or so
  3. bake a Github token into taskgraph's code so that we bump these limits and no longer see this in the future.

IMO it's pretty clear that we should be using tokens

We hit this again today: 6 cron tasks failed.

Hit again today: 3 cron tasks failed. This may be accelerating.

We probably need to update https://hg.mozilla.org/ci/ci-configuration/file/tip/cron-task-template.yml to add a tc secret to the cron command, and update build-decision to take that option and download the tc secret via the tc proxy, and use it for any github api calls. Then we would take the new build-decision image, update the cron-task-template to use it, and test.

This patch needs to land first, to build a new build-decision docker image.
Then we need to bump the decision docker image used in https://hg.mozilla.org/ci/ci-configuration/file/default/cron-task-template.yml .
Finally we can land the generate/cron_tasks.py changes to use the new --github-token-secret option.

Assignee: nobody → aki
Status: NEW → ASSIGNED

Also add another cron commandline option if this is a github repo.

Depends on D110187

Attachment #9212386 - Attachment description: WIP: Bug 1663474 - add github token secret to github cron hooks. → Bug 1663474 - add github token secret to github cron hooks. r=#releng-reviewers
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/ci/ci-admin/rev/9330bd5ca635
add --github-token-secret to build-decision cron. r=releng-reviewers,jmaher
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/ci/ci-configuration/rev/28baeba47781
use the new build-decision image with --github-token-secret support. r=releng-reviewers,jmaher
Pushed by asasaki@mozilla.com:
https://hg.mozilla.org/ci/ci-admin/rev/949eacedadb8
add github token secret to github cron hooks. r=releng-reviewers,mtabara
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Blocks: 1702846
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: