Closed Bug 1643562 Opened 5 years ago Closed 5 years ago

Can't connect to GCP workers for live logs or interactive tasks

Categories

(Infrastructure & Operations :: RelOps: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glandium, Assigned: miles)

References

Details

Attachments

(1 file)

The devtools network console shows SEC_ERROR_EXPIRE_CERTIFICATE errors on the $random.taskcluster-worker.net:$port websocket connections.

It sounds like the SSL certificate with which that worker image was built has expired. Can you point to a task so we can identify which worker image?

This was renewed April 9, and updated in the docker-worker images in bug 1619278, so it should be working.

Here's a task I just triggered and that still can't be accessed: https://firefox-ci-tc.services.mozilla.com/tasks/HqDGRyDaSQamsNF_9mIq4w

https://firefox-ci-tc.services.mozilla.com/worker-manager/gecko-1%2Fb-linux-gcp
"sourceImage": "projects/taskcluster-imaging/global/images/docker-worker-gcp-googlecompute-2020-02-07t09-14-17z"

so that image is from February. I'll bake a new one.

Actually, I don't know how to do that. I've asked in #firefox-ci, and will needinfo someone here if I don't hear back.

Assignee: nobody → relops
Component: General → RelOps: General
Product: Taskcluster → Infrastructure & Operations
QA Contact: klibby

Miles said..

dustin: the naming scheme is lacking, looks like we are indeed missing production-l1
-new and -old was from the rotation in march
wander baked some images 5/27 that haven't been entered, we should probably re-do that at this point
because CoT isn't used for L1 I think the staging-l1 yaml has been used for all L1 images

So I think we can delete -old and rename -new to drop the suffix.

So it seems I should update

monopacker-docker-worker-current: monopacker-docker-worker-2020-02-07t09-14-17z
monopacker-docker-worker-trusted-current: monopacker-docker-worker-gcp-trusted-2020-02-13t03-22-56z

with the docker_worker_gcp (l1) and docker_worker_gcp_trusted (l3) builders? Or both with trusted, just with different secrets?

Also, I see

# Note: this project disallows port 22, so baking images requires
# temporarily allowing access
builder_vars:
  project_id: fxci-production-level3-workers

how do I do that?

I'll try to write all of this down in worker-images.yml when I get it figured out. And hopefully not close the trees in the interim.

Flags: needinfo?(miles)

I agree we can drop the suffix now.

You're correct, docker_worker_gcp is for L1 and docker_worker_gcp_trusted is for L3. The L3 images are created in the fxci-production-level3-workers account, releng will need to grant you access to that account (Compute Engine Instance Admin (v1) covers everything needed, noted here: https://github.com/taskcluster/monopacker/#pre-requisites - you don't need everything there though).

From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.

Flags: needinfo?(miles)
Assignee: relops → dustin

I appear to have access to the account, but ..

From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.

I don't appear to have permission to do this.

Kendall, does it make sense for me to get permissions and learn how to do this, or would it make more sense for someone on your team to learn the ropes? If the former, can you grant me the necessary permissions?

Flags: needinfo?(klibby)

Miles is working on this :D

I will write up the last few comments as docs in the community-tc-config repo.

Assignee: dustin → miles

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #8)

I appear to have access to the account, but ..

From there, you'll need to whitelist your IP for ssh in VPC Network => Firewalls in the Google Cloud Console.

I don't appear to have permission to do this.

Kendall, does it make sense for me to get permissions and learn how to do this, or would it make more sense for someone on your team to learn the ropes? If the former, can you grant me the necessary permissions?

relops owns the fxci-*workers accounts in GCP, and I'm happy with is maintaining the FW rules. I see we've got one allowance for Miles; lemme know who else to add (+ IP, obv) and I'll make it go.

Flags: needinfo?(klibby)

me: lamport.r.igoro.us has address 54.148.125.226

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #11)

me: lamport.r.igoro.us has address 54.148.125.226

added

Pushed by dmitchell@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/a5864c02db3b add comments about how docker-worker images are generated r=milescrabill

I'm building 32.0.0 images in bug 1650813 which should fix this.

Blocks: 1650813

fubar:

==> docker_worker_gcp_trusted: Error creating instance: googleapi: Error 403: Required 'compute.zones.get' permission for 'projects/fxci-production-level3-workers/zones/us-west1-a', forbidden
Build 'docker_worker_gcp_trusted' errored: Error creating instance: googleapi: Error 403: Required 'compute.zones.get' permission for 'projects/fxci-production-level3-workers/zones/us-west1-a', forbidden
Flags: needinfo?(klibby)

Sorry for dropping the ball on this; Dustin, is that error from your account or one of the service accounts?

Flags: needinfo?(klibby) → needinfo?(dustin)

My account when trying to build workers

Flags: needinfo?(dustin)

Ok, updated your role. Let me know if you run into other errors (miles has two other roles assigned on the project but I'm not certain if they're required).

I was able to build an image -- thanks!

It looks like v36.0.0 was deployed in bug 1657412, so I expect that this issue is fixed. Please re-open if not!

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: