Closed Bug 1593751 Opened 5 years ago Closed 5 years ago

Migrate code-coverage CI to community taskcluster deployment, hooks to firefox deployment

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bastien, Unassigned)

References

Details

Attachments

(7 files)

The project uses Taskcluster for its CI needs, and needs access on the Mozilla CI instance:

  • trigger some hooks to process coverage

As discussed on IRC, the CI for the project will be in the community deployment, the hooks will be in the Firefox deployment.

Summary: Migrate code-coverage to community taskcluster deployment → Migrate code-coverage CI to community taskcluster deployment, hooks to firefox deployment
Depends on: 1594010

I merged the PR to run CI on the community instance.

We still need a way to update the hooks on firefox CI

Depends on: 1594102
Blocks: 1594681

These hooks need to be migrated on the Firefox-CI instance before the 9th so that the code-coverage generation runs continously :

They each need access to their respective secrets on the same instance, related to their environment:

  • project/relman/code-coverage/runtime-testing
  • project/relman/code-coverage/runtime-production

The cron tasks are triggered by the following schedule: 0 0 0 * * *

The generated tasks run on a specific workerType relman-svc-memory using r5d.large EC2 instances.

Flags: needinfo?(mozilla)
Flags: needinfo?(bstack)

I copied the worker-type definition from the existing configuration, and it includes lots of instance types, among them r5d.large.

The patches above manage the hooks, but do not manage the secrets. I think those will remain "manually' managed. I'm happy to copy those over if someone gives me scopes on the firefox-ci deployment to write to them -- Callek, perhaps?

Flags: needinfo?(mozilla)
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?(bstack)

I think the next big step here is to land the patches for this repo, code-review, and bugzilla-dashboard-backend. If you need to make a tag to make a new image that includes the new rootUrls, we can then deploy the images to these hooks.

The code-coverage hooks look OK to me, but a scheduled cron job failed due to missing scopes

Flags: needinfo?(bstack)

Ah, oops! Copy-past error. I believe https://phabricator.services.mozilla.com/D52464 will fix this.

Flags: needinfo?(bstack)

The latest code-coverage (11 hours ago) still had those issues.
I do not have the scopes needed to trigger the task to check it now...

The task was successfully triggered now, but failed with:

taskcluster.exceptions.TaskclusterRestFailure: Secret not found

(the secret being project/relman/code-coverage/runtime-production)

(In reply to Marco Castelluccio [:marco] from comment #14)

The task was successfully triggered now, but failed with:

taskcluster.exceptions.TaskclusterRestFailure: Secret not found

(In reply to Marco Castelluccio [:marco] from comment #15)

(the secret being project/relman/code-coverage/runtime-production)

So turns out I can't access this secret in the legacy deployment of taskcluster atm [secrets service and clientID's are disabled so I can't login to view it]

Do you have the original secret, and if so can you send it to me in FirefoxSend and slack msg me the url?

Flags: needinfo?(bugspam.Callek) → needinfo?(mcastelluccio)

Tom has a copy of the old secrets in SOPS, if that helps.

(In reply to Justin Wood (:Callek) from comment #16)

So turns out I can't access this secret in the legacy deployment of taskcluster atm [secrets service and clientID's are disabled so I can't login to view it]

Do you have the original secret, and if so can you send it to me in FirefoxSend and slack msg me the url?

Unfortunately I don't have a copy of the original secret. Maybe Bastien has, or Tom can get it from SOPS.

(project/relman/code-coverage/runtime-production and project/relman/code-coverage/runtime-testing)

Flags: needinfo?(mozilla)
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(bastien)

Suspicion is this was already copied but not before the task referenced in this comment ran...

Assignee: nobody → dustin

I do not have a backup of our secrets.

We also need several Taskcluster auth clients on the firefox-ci instance to run parts of our stack on Heroku.
Fortunately the old clients are still accessible on the taskcluster.net instance, some firefox-ci admin needs to copy them (i don't have the scopes on project/relman !):

Could you create those 4 clients and send me their access tokens through email (here is my GPG public key)

Flags: needinfo?(bastien) → needinfo?(bugspam.Callek)

Johan created these clients, and i was able to restart cleanly the backend & events Heroku instance.

I confirm that the secrets are available, and work as expected.

I'm not marking this as fixed until a hook is triggered, but it should be OK soon.

Flags: needinfo?(mozilla)
Flags: needinfo?(bugspam.Callek)

Would it be possible to retrigger the job which failed?

needinfo me if I can help with this, but it looks mostly firefox-ci now.

Assignee: dustin → nobody

My understanding is that this is now done.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

We still have some OOM issues on some workers, but most tasks run successfully. Thanks all for your help.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: