Can't connect to GCP workers for live logs or interactive tasks
Categories
(Infrastructure & Operations :: RelOps: General, defect)
Tracking
(Not tracked)
People
(Reporter: glandium, Assigned: miles)
References
Details
Attachments
(1 file)
|
47 bytes,
text/x-phabricator-request
|
Details |
The devtools network console shows SEC_ERROR_EXPIRE_CERTIFICATE errors on the $random.taskcluster-worker.net:$port websocket connections.
Comment 1•5 years ago
|
||
It sounds like the SSL certificate with which that worker image was built has expired. Can you point to a task so we can identify which worker image?
Comment 2•5 years ago
|
||
This was renewed April 9, and updated in the docker-worker images in bug 1619278, so it should be working.
| Reporter | ||
Comment 3•5 years ago
|
||
Here's a task I just triggered and that still can't be accessed: https://firefox-ci-tc.services.mozilla.com/tasks/HqDGRyDaSQamsNF_9mIq4w
Comment 4•5 years ago
|
||
https://firefox-ci-tc.services.mozilla.com/worker-manager/gecko-1%2Fb-linux-gcp
"sourceImage": "projects/taskcluster-imaging/global/images/docker-worker-gcp-googlecompute-2020-02-07t09-14-17z"
so that image is from February. I'll bake a new one.
Comment 5•5 years ago
|
||
Actually, I don't know how to do that. I've asked in #firefox-ci, and will needinfo someone here if I don't hear back.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
Miles said..
dustin: the naming scheme is lacking, looks like we are indeed missing production-l1
-new and -old was from the rotation in march
wander baked some images 5/27 that haven't been entered, we should probably re-do that at this point
because CoT isn't used for L1 I think the staging-l1 yaml has been used for all L1 images
So I think we can delete -old and rename -new to drop the suffix.
So it seems I should update
monopacker-docker-worker-current: monopacker-docker-worker-2020-02-07t09-14-17z
monopacker-docker-worker-trusted-current: monopacker-docker-worker-gcp-trusted-2020-02-13t03-22-56z
with the docker_worker_gcp (l1) and docker_worker_gcp_trusted (l3) builders? Or both with trusted, just with different secrets?
Also, I see
# Note: this project disallows port 22, so baking images requires
# temporarily allowing access
builder_vars:
project_id: fxci-production-level3-workers
how do I do that?
I'll try to write all of this down in worker-images.yml when I get it figured out. And hopefully not close the trees in the interim.
| Assignee | ||
Comment 7•5 years ago
|
||
I agree we can drop the suffix now.
You're correct, docker_worker_gcp is for L1 and docker_worker_gcp_trusted is for L3. The L3 images are created in the fxci-production-level3-workers account, releng will need to grant you access to that account (Compute Engine Instance Admin (v1) covers everything needed, noted here: https://github.com/taskcluster/monopacker/#pre-requisites - you don't need everything there though).
From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
I appear to have access to the account, but ..
From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.
I don't appear to have permission to do this.
Kendall, does it make sense for me to get permissions and learn how to do this, or would it make more sense for someone on your team to learn the ropes? If the former, can you grant me the necessary permissions?
Comment 9•5 years ago
|
||
Miles is working on this :D
I will write up the last few comments as docs in the community-tc-config repo.
Comment 10•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #8)
I appear to have access to the account, but ..
From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.
I don't appear to have permission to do this.
Kendall, does it make sense for me to get permissions and learn how to do this, or would it make more sense for someone on your team to learn the ropes? If the former, can you grant me the necessary permissions?
relops owns the fxci-*workers accounts in GCP, and I'm happy with is maintaining the FW rules. I see we've got one allowance for Miles; lemme know who else to add (+ IP, obv) and I'll make it go.
Comment 11•5 years ago
|
||
me: lamport.r.igoro.us has address 54.148.125.226
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #11)
me: lamport.r.igoro.us has address 54.148.125.226
added
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
I'm building 32.0.0 images in bug 1650813 which should fix this.
Comment 16•5 years ago
|
||
fubar:
==> docker_worker_gcp_trusted: Error creating instance: googleapi: Error 403: Required 'compute.zones.get' permission for 'projects/fxci-production-level3-workers/zones/us-west1-a', forbidden
Build 'docker_worker_gcp_trusted' errored: Error creating instance: googleapi: Error 403: Required 'compute.zones.get' permission for 'projects/fxci-production-level3-workers/zones/us-west1-a', forbidden
Comment 17•5 years ago
|
||
Sorry for dropping the ball on this; Dustin, is that error from your account or one of the service accounts?
Comment 19•5 years ago
|
||
Ok, updated your role. Let me know if you run into other errors (miles has two other roles assigned on the project but I'm not certain if they're required).
Comment 20•5 years ago
|
||
I was able to build an image -- thanks!
It looks like v36.0.0 was deployed in bug 1657412, so I expect that this issue is fixed. Please re-open if not!
Description
•