Migrate servo to community taskcluster deployment
Categories
(Taskcluster :: Operations and Service Requests, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: bstack)
References
Details
Make a plan to move servo to the new community deployment.
Notes:
have a manually set up macos builder in tor running 11.x
have some macstadium workers running 14.1.x
have a custom Windows AMI
using treeherder integration
Comment 1•5 years ago
|
||
I think WebRender and Servo should be considered separately.
CI for https://github.com/servo/webrender is owned by the Gecko gfx team and uses two worker types:
aws-provisioner-v1/github-worker
localprovisioner/webrender-ci-osx
with one mac mini in the Toronto office. I don’t know who has SSH access to that machine
CI for https://github.com/servo/servo/ is owned by the Servo team (managed in tree as much as possible) and has:
aws-provisioner-v1/servo-docker-worker
, set up by :wcostaaws-provisioner-v1/servo-docker-untrusted
, a copy ofservo-docker-worker
with fewer scopes. Used for pre-review testing of pull requests (so anyone can run anything there)aws-provisioner-v1/servo-win2016
, with a custom AMI running generic-workerproj-servo/macos
, with multiple machines from Macstadium running generic-workerproj-servo/docker-worker-kvm
, disabled at the moment because of a perma-failure. So it’ß not a priority, but we may want to bring it back at some point. This is for running tests in an Android emulator capable of OpenGLES 3, which requires CPU acceleration, which requires running KVM, which requires VT-x CPU instructions that are not available inside AWS EC2 VMs. So we used https://github.com/taskcluster/taskcluster-infrastructure/tree/master/modules/docker-worker to deploy docker-worker to dedicated hardware from Packet.net
Reporter | ||
Comment 2•5 years ago
|
||
Notes from meeting with :SimonSapin and :jdm:
The two docker-worker worker types are configured identically (or soon will be, when staging's docker-worker is upgraded); the difference in scopes is associated with the roles tc-github assigns to the task, and the different worker-types serve merely as a boundary to prevent cross-contamination.
The servo-win2016*
worker-types (including -staging
) are based on a custom AMI residing in the servo AWS account and generated by a Python script that runs a powershell script in an instance. It currently only generates an AMI in one region.
The macstadium workers are provisioned using salt: https://github.com/servo/servo/tree/master/etc/taskcluster/macos
The packet worker-type can probably be left more-or-less intact. If we get bare-metal working in EC2, we could transition to that, but later.
There is a daily hook -- https://tools.taskcluster.net/hooks/project-servo/daily -- that runs the decision task like pushes and PRs.
The "Treeherder Question" could have one of three answers:
- Treeherder can ingest these messages
- Servo could run in the Firefox-CI deployment
- We could run a distinct Treeherder instance for the community deployment.
The first option is preferred, and that's bug 1574651.
As for administration, servo would prefer to be able to admin their own resources within the larger scope of the community deployment, rather than waiting for PRs to some configuration repository to be approved and merged.
It may be beneficial to do a partial migration, running both in parallel for a while. The decision task could be modified to, for example, only run the docker-worker tasks in the new deployment while running everything in the old deployment.
Comment 3•5 years ago
|
||
As for administration, servo would prefer to be able to admin their own resources within the larger scope of the community deployment, rather than waiting for PRs to some configuration repository to be approved and merged.
It would also be fine to use a shared configuration repository if we (the Servo team) have access to review and deploy PRs to it. (With the understanding that we only use this access for PRs that only affect Servo’s CI.)
Reporter | ||
Comment 4•5 years ago
|
||
It looks like the Treeherder changes are tractable, and I'm working on them now. One question, though: do you use actions like retrigger in Treeherder, or just use it as a status display?
Comment 5•5 years ago
|
||
Assuming "you" means the Servo team: I personally don’t used actions from Treeherder and usually don’t log into Treeherder at all. Josh, how about you? Or as far as you know, other people on the team?
Reporter | ||
Comment 6•5 years ago
|
||
Haha, yeah, I guess the informal English pronoun should have been y'all :)
Comment 7•5 years ago
|
||
My attempts to use actions from the treeherder interface have been thwarted by https://github.com/servo/servo/issues/23217, so all my actions happen through the taskcluster interface.
Reporter | ||
Updated•5 years ago
|
Comment 8•5 years ago
•
|
||
Hi Dustin. With bug 1574651 fixed, what are the next steps? Is the new deployment available at least for testing? Can we have both enabled on the same GitHub repository, during a transition period?
Reporter | ||
Comment 9•5 years ago
|
||
Pete will be working with you on this.
Reporter | ||
Updated•5 years ago
|
Comment 10•5 years ago
|
||
Can we have both enabled on the same GitHub repository, during a transition period?
I’ve submitted https://github.com/taskcluster/taskcluster/pull/1738 as an attempt to make this possible.
Reporter | ||
Comment 11•5 years ago
|
||
Thanks! That the idea was invented twice in the same day is a confirmation that the change is a good idea. So yes, let's get that landed and we can make a TC release. Those generally go live in about a day.
The new integration is https://github.com/apps/community-tc-integration.
About 16h ago we switched again and bstack is now assigned (sorry for the churn). We're aware that this is work that we've created for you, so we are happy to make PRs, test changes, etc. I'll leave it to bstack to figure out the next steps.
If it wasn't clear from above, the Treeherder issue was resolved to "Treeherder can ingest these messages" -- treeherder will listen to messages from both the firefox-ci and community-tc deployments, and will "remember' which one is which.
Comment 12•5 years ago
|
||
Is there a timeline for the current https://taskcluster.net deployment going away?
we are happy to make PRs, test changes, etc.
Since you’re offering :) I guess some good next steps would be:
- Ensure that we can distinguish in the GitHub Status API entries from both deployments. A different
context
string would be best. Inhandlers.js
this appears to be based onthis.context.cfg.app.statusContext
- Alternatively, maybe Servo need to migrate to the Checks API first
- Set up a
servo
“project” and an initial worker pool that runs docker-worker - Figure out how to give administrative access to the above to select Servo contributors, ideally even if they don’t have a Mozilla LDAP account.
- A PR to https://github.com/servo/servo:
- Making
.taskcluster.yml
use https://github.com/taskcluster/taskcluster/pull/1738 so that GitHub push events and PR events trigger a decision task in both deployments - Making
etc/taskcluster/decision_task.py
runpprint.pprint(os.environ)
then exit early when running on the new deployment
- Making
I think the first point is most important to safely enable https://github.com/apps/community-tc-integration on servo/servo
without disrupting its CI.
treeherder will listen to messages from both the firefox-ci and community-tc deployments
And also the current deployment, as long is it still exists?
Unfortunately, the docs for treeherder integration seem to have been removed :( Is specifying a tc-treeherder.v2._/${tree}.${sha}
route on tasks still what we need in community-tc?
Reporter | ||
Comment 13•5 years ago
|
||
Set up a servo “project” and an initial worker pool that runs docker-worker
Done - https://github.com/mozilla/community-tc-config/pull/34. That's already applied, too, in the interest of expediency.
Figure out how to give administrative access to the above to select Servo contributors, ideally even if they don’t have a Mozilla LDAP account.
Done -- community-tc uses GitHub auth!
Reporter | ||
Comment 14•5 years ago
|
||
Comment 13 was based on an irc conversation this morning, in an attempt to be expedient. So I didn't answer the rest of the questions in comment 12. Sorry to interrupt, brian!
Comment 15•5 years ago
|
||
After spending some time looking at tc-admin and community-tc-config (and with IRC help, thanks Dustin!) I came up with https://github.com/servo/taskcluster-config. I’m glad to be able to version-control all this.
One aspect that still not clear to me are the cloud-provider-specific parts of worker pools. I think I’ll likely cargo-cult community-tc-config’s workers.py
Reporter | ||
Comment 16•5 years ago
|
||
I think that's fine.
Note, too, that we can if you wish continue to manage your worker pools while letting you manage everything else -- the advantage is that we then take care of upgrading worker versions, mitigating AWS or GCP issues, etc.
Comment 17•5 years ago
|
||
Could we do that for only some worker pools?
-
For pools running docker-worker that sounds great. We’d like to have some control over instance types[1] and mix/max capacity, but changes there should be infrequent enough that going through a PR to
mozilla/community-tc-config
sounds fine. We might need more frequent changes to the scopes granted to those workers but that can be managed separately, right? -
For static workers (such as those running macOS) I don’t know which is preferable. There isn’t much configuration for them in worker-manager, is there?
-
For Windows it may be better that we manage the VM image, in order to be able for example to install another MSVC component in it. Deploying a new image sounds easier if that worker pool is managed in
servo/taskcluster-config
.
[1] By the way I’d like at some point to benchmark different instance types. Compiling Servo benefits from more CPU cores but only to a point, so there’s likely a sweet spot to balance speed v.s. cost.
Reporter | ||
Comment 18•5 years ago
|
||
Could we do that for only some worker pools?
No problem! And the distinction you've described sounds like a good one.
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 19•5 years ago
|
||
Simon, two things:
- How's it going? Can we help? Are we on-track to turn off https://taskcluster.net in a week?
- Since we are considering
webrender
a different project, there's a bit of a conflict inherent in configuring theservo
project to manage the entireservo
org, which includesservo/webrender
. In https://github.com/mozilla/community-tc-config/pull/50/files#diff-70800a36c38d85e21390f3af255c8420L154 :pmoore has changed that to just manageservo/servo
. Are there other repos in that org that should be included as well?
Comment 20•5 years ago
|
||
Are we on-track to turn off https://taskcluster.net in a week?
No, not at all. I was not aware of this target date, despite asking in comment 12 :/ Bug 1591591 is still something I hoped we could do before starting the migration for servo/servo
.
:pmoore has changed that to just manage servo/servo.
That sounds OK.
Are there other repos in that org that should be included as well?
No, as far as I remember servo/servo
and servo/webrender
are the only two under servo/
using Taskcuster at the moment. When we want more, PRs to mozilla/community-tc-config
to add them on a case-by-case basis sound fine.
Reporter | ||
Comment 21•5 years ago
|
||
https://github.com/mozilla/community-tc-config/pull/54 for worker pools
Reporter | ||
Comment 22•5 years ago
|
||
I gave Simon a user (SimonSapin) in the community workers AWS account with EC2 Read-Only access, in order to set up and debug the win2016 images expeditiously. We can remove that once it's in place (and once we have better mechanics for debugging worker instances that don't require EC2 access).
Comment 23•5 years ago
|
||
With aws-provisioner I used this when instances running a new AMI were starting but not picking up tasks, in order to find their public IP address and RDP in so that I could read generic-worker log files.
We can remove that once it's in place
I may need this again for future AMI updates, until there’s some other way to find the IP address or read logs.
Comment 24•5 years ago
|
||
With help on IRC from Dustin and tomprince I’ve now managed to configure a Windows / AWS worker pool and run a task.
Bug 1591591 is the next blocker.
Reporter | ||
Comment 25•5 years ago
|
||
I believe bug 1593543 is now fixed, and anyhow had a temporary workaround.
Is there anything else I can do to assist or unblock?
Reporter | ||
Comment 26•5 years ago
|
||
Per irc, this is all set! THANK YOU!
Comment 27•5 years ago
|
||
- https://treeherder.allizom.org/#/jobs?repo=servo-auto is showing task data as expected (and I hear
treeherder.mozilla.org
will too soon) - https://github.com/servo/servo/pull/24689 has landed
- https://github.com/servo/saltfs/pull/986 is deployed
- https://github.com/apps/taskcluster is uninstalled from
servo/servo
- https://github.com/servo/servo/pull/24697 cleans up some loose ends
I think we’re done! Just in time for tomorrow.
Description
•