Closed Bug 1574668 Opened 2 years ago Closed 2 years ago

Migrate wpt to community taskcluster deployment

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Make a plan to move this project to the new community deployment.

have a GH webhook integration that looks for taskcluster.net
have a dedicated worker-type but might need provisionerId changed
could do with some preparation in https://github.com/web-platform-tests/wpt.fyi e.g., using a TC client
** using custom workers, might need to upgrade them?
looking for bare-metal / android support (asked jgraham to wait until migrated)

Assignee: dustin → nobody
Assignee: nobody → wcosta
Status: NEW → ASSIGNED

OK! Here's what I think needs to happen, based on our conversations and some judicious application of git grep.

Set up a wpt project in https://github.com/mozilla/community-tc-config/

In the wpt repo's .taskcluster.yml:

  • provisionerId -> proj-wpt
  • workerType -> ci

In tools/ci/run_tc.py, QUEUE_BASE can be replaced with a calculation based on TASKCLUSTER_ROOT_URL, which is automatically supplied within a running task. Taskcluster-lib-urls could do this, but it appears that tools/ci is dependency-free, so I'll just duplicate the logic. It's very simple, and will get a whole lot simpler after November 9 when taskcluster.net is no longer a going concern.

tools/ci/tcdownload.py and tools/wpt/browser.py look similar, but I can't tell if these are run in a task or not. If not, what would be the best way to figure out the rootUrl, or should we just land a hard-coded change to the rootUrl when we make the switch?

I see a wpt-actions-test repo, but that looks like a fork of wpt presumably to test GH actions, so I will leave it alone. Have I missed anything else? I'm not sure what I meant about "custom workers" above!

Given answers to those questions, I'll submit PRs for all of the above.

Once those are merged, and before November 9, someone with permissions (not me):

Flags: needinfo?(james)

Oh, one more question: is there a GitHub team which should have admin access to the TC resources involved here?

https://github.com/mozilla/community-tc-config/pull/49

I see a wpt-actions-test repo, but that looks like a fork of wpt presumably to test GH actions, so I will leave it alone.

That's correct.

Oh, one more question: is there a GitHub team which should have admin access to the TC resources involved here?

Yes! https://github.com/orgs/web-platform-tests/teams/admins
Thanks!

BTW, have we decided on the domain of the community deployment, assuming it's going to be a different one?

Yes, it's up and running and everything - https://community-tc.services.mozilla.com/

Assignee: wcosta → dustin

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #2)

OK! Here's what I think needs to happen, based on our conversations and some judicious application of git grep.

Set up a wpt project in https://github.com/mozilla/community-tc-config/

In the wpt repo's .taskcluster.yml:

  • provisionerId -> proj-wpt
  • workerType -> ci

In tools/ci/run_tc.py, QUEUE_BASE can be replaced with a calculation based on TASKCLUSTER_ROOT_URL, which is automatically supplied within a running task. Taskcluster-lib-urls could do this, but it appears that tools/ci is dependency-free, so I'll just duplicate the logic. It's very simple, and will get a whole lot simpler after November 9 when taskcluster.net is no longer a going concern.

tools/ci/tcdownload.py and tools/wpt/browser.py look similar, but I can't tell if these are run in a task or not. If not, what would be the best way to figure out the rootUrl, or should we just land a hard-coded change to the rootUrl when we make the switch?

Those don't necessarily run in a task (although they do also run in tasks). For browser.py we don't want the instance for the task anyway, we want the instance that gecko's using since we're downloading artifacts from mozilla-central. I think a hardcoded change there once gecko switches is fine (we don't care much about historical artifacts and don't use the android bits in CI anyway).

tcdownload.py is afaik not used in a task at the moment and is just for interactive fetching of artifacts. I don't know if there's some clever way to look this up based on the supplied repo, but just hardcoding here probably isn't too bad. We could probably also switch this to use the taskcluster library (there's a PR to create a decision task that makes this stuff depend on the taskcluster library).

I see a wpt-actions-test repo, but that looks like a fork of wpt presumably to test GH actions, so I will leave it alone. Have I missed anything else? I'm not sure what I meant about "custom workers" above!

We were talking about adding bare metal instances to the repo for running the android emulator, but I've been waiting on this switch to happen to push on that.

Given answers to those questions, I'll submit PRs for all of the above.

Thanks!

Once those are merged, and before November 9, someone with permissions (not me):

I can be responsible for that.

Flags: needinfo?(james)

PSA: The existing (https://taskcluster.net) deployment will be shut down a week from today, on November 9. After that point, any CI not migrated to the new community cluster will stop functioning. The TC team is ready and eager to help get everything migrated by that time, but the deadline is firm.

Apologies for failing to communicate this as broadly and loudly as necessary, and for the bugspam now.

I just enabled the community-tc instance for wpt, however it seems like we are seeing some issues related to the workers. Chrome jobs are showing crashes in the new configuration but not the old one e.g. from https://github.com/web-platform-tests/wpt/pull/20119 compare https://community-tc.services.mozilla.com/tasks/groups/NrNItA0DQSWvRM9-MWSc2g vs https://tools.taskcluster.net/groups/CVSlWtiyRpyYm_JH1DNqjQ

We landed some changes to run those tasks in "privileged" mode yesterday. There was some wondering "how this ever worked", but the older workers were running an older docker version and an older kernel version, so it's possible that either docker wasn't as restrictive or the Chrome sandbox didn't try to use the feature being blocked.

Hi friends -- today's the last day :)

I know we missed the moment on Tuesday due to the GitHub issues. I hope I missed the switch over the last few days?

We got chrome working with --no-sandbox yesterday, after taking advice from the relevant Google engineers about the correct tradeoff. Everything seems to be working now; just need to flip wpt.fyi over and then disable the old app. The main remaining problem is that we will need to re-test every PR that people want to land because the changing name of the status check means that existing PRs will not meet the requirements for landing.

Got it -- sorry about that! It did help servo.. and also reduced a lot of confusion over which deployment was posting which errors. If that means pushing a bunch of buttons I have almost 20 years experience in the button-pushing industry and would be happy to help.

Per Hexcles, this is done!

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.