1520588 - Deploy shipitscript into GCP

testing (docker tag: scriptworker/shipit-docker-testing-latest)

staging (docker tag: scriptworker/shipit-docker-staging-latest)

production (docker tag: scriptworker/shipit-docker-production-latest)

I think we need to fix these tags, slashes are not allowed.

Rok Garbas [:garbas]

Comment 3

•

7 years ago

:autrilla

new tags are:

scriptworker_shipitscript_docker_testing
scriptworker_shipitscript_docker_staging
and those images are pushed to https://hub.docker.com/r/mozilla/release-services/tags

for now we can skip the production, until we figure out how to handle secrets.

Flags: needinfo?(autrilla)

Rail Aliiev [:rail]

Assignee

Comment 4

•

7 years ago

I updated the images and they are ready (tested on my laptop, TM) to go. I have the secrets, just need to hand them over.

We also need to figure out how to properly configure the network, so we can connect to both ship-it APIs (until we retire v1).

ship-it v1 is hosted by IT and requires a VPN connection (vpn_shipit for prod and vpn_shipitdev LDAP groups I believe). ericz may have better info.
ship-it v1 is hosted by cloudops and restricted by IP and vpn_cloudops_shipit is the LDAP group we use to add users. I'm not sure if the LDAP group is relevant in this case.

Adrian Utrilla [:autrilla]

Comment 5

•

7 years ago

(In reply to rail@mozilla.com from comment #4)

I updated the images and they are ready (tested on my laptop, TM) to go. I have the secrets, just need to hand them over.

Great! Is there any difference to how the image should be ran on each environment, other than the secrets?

We also need to figure out how to properly configure the network, so we can connect to both ship-it APIs (until we retire v1).

ship-it v1 is hosted by IT and requires a VPN connection (vpn_shipit for prod and vpn_shipitdev LDAP groups I believe). ericz may have better info.

This might be a bit problematic, I thought we only needed to talk to v2. I imagine IT would want us to have a single static IP from which we talk to ship-it, is that so :ericz?

I haven't done anything like this before on GCP, but someone from my team has, and AIUI it was for applications we control, not for something ran by IT.

ship-it v2 is hosted by cloudops and restricted by IP and vpn_cloudops_shipit is the LDAP group we use to add users. I'm not sure if the LDAP group is relevant in this case.

Talking to ship-it v2 won't be an issue since they're both in the same cluster and we won't need to cross into the internet.

Flags: needinfo?(autrilla) → needinfo?(eziegenhorn)

Rail Aliiev [:rail]

Assignee

Comment 6

•

7 years ago

(In reply to Adrian Utrilla [:autrilla] from comment #5)

Great! Is there any difference to how the image should be ran on each
environment, other than the secrets?

The command line is the same (the default CMD directive). They are configured to use different configs depending on the env/secrets.

Eric Ziegenhorn :ericz

Comment 7

•

7 years ago

To talk to ship-it v1 I think we'd have to set it up on a public-facing load balancer with a different DNS name and then we could potentially limit it by IP address (or maybe something else but offhand I can't think of anything better).

Flags: needinfo?(eziegenhorn)

Adrian Utrilla [:autrilla]

Comment 8

•

7 years ago

:ericz, all our traffic from our nonprod (staging and testing) environments will come from 35.197.23.59. Could you whitelist this so we can talk to shipitv1 from it? Let me know if you need any more information to do this.

Flags: needinfo?(eziegenhorn)

Eric Ziegenhorn :ericz

Updated

•

7 years ago

Comment 9

•

7 years ago

I'm spinning off that work in new bug 1525746.

Flags: needinfo?(eziegenhorn)

Rail Aliiev [:rail]

Assignee

Comment 10

•

6 years ago

Sigh, the name won't resolve, because we use split-horizon DNS.

https://tools.taskcluster.net/groups/VCn9f5PISjORFOZu84jhqA/tasks/arEl7VusQymG7S3Tt0yJDA/runs/0/logs/public%2Flogs%2Flive_backing.log

2019-02-15 03:04:59,899 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): ship-it-dev.allizom.org:443
Traceback (most recent call last):
File "/nix/store/sfx431rh4x09nv0sgripmn01rf6pwdb6-python3.7-urllib3-1.24.1/lib/python3.7/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/nix/store/sfx431rh4x09nv0sgripmn01rf6pwdb6-python3.7-urllib3-1.24.1/lib/python3.7/site-packages/urllib3/util/connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/nix/store/sh0rq55jaambzqx59g0kdk59g23vj8m6-python3-3.7.0/lib/python3.7/socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

Rail Aliiev [:rail]

Assignee

Comment 11

•

6 years ago

The good thing that the worker takes the tasks from the queue. \o/

We may want to tweak the worker name a bit, it uses the hostname (which is not unique) and equals to 22 first characters of the k8s workload name (scriptworker-stage-shipitapi-app-1 -> scriptworker-stage-shi). I'm not even sure that the name would be somehow useful...

Rail Aliiev [:rail]

Assignee

Comment 12

•

6 years ago

Adrian, do we autodeploy the docker images to stage/testing now? I tried to test a workaround, but it looks like scriptworker-stage-shipitapi-app-1 is still using the images from Feb 6.

Flags: needinfo?(autrilla)

Adrian Utrilla [:autrilla]

Comment 13

•

6 years ago

We did not when you commented, but we do now. There's an up-to-date image in staging now.

Flags: needinfo?(autrilla)

Rail Aliiev [:rail]

Assignee

Comment 14

•

6 years ago

Thank you!

Rok Garbas [:garbas]

Updated

•

6 years ago

Blocks: 1533337

Bastien Abadie [:bastien]

Updated

•

6 years ago

Comment 15

•

6 years ago

We finally dropped ship-it v1 and don't need any special routes to MDC1/2. We can undo the special settings.

Now I'm getting a 403 from https://api.shipit.staging.mozilla-releng.net/ when I try to run shipitscript. The idea was that they are in the same cluster, so the IP-based whitelisting works either out of the box or without any extra setup.

Adrian, can you

get rid of the customization made in comment #8, no need to communicate to ship-it v1 anymore. No rush with this.
make sure that scriptworker-stage-shipitapi-app-1 is whitelisted in either in shipitapi-dev-shipitapi-app-1 or shipitapi-stage-shipitapi-app-1. I always forget which one corresponds to our staging :/ Maybe it'll resolve itself if you get rid of 1)

Probably it'd be better to align the names at some point, to get rid of this dev/stage/staging confusion with shipit.

Thank you in advance!

Flags: needinfo?(autrilla)

Adrian Utrilla [:autrilla]

Comment 16

•

6 years ago

Regarding the 403, it's because you're trying to talk to it through the public IP. You should be able to connect to the Kubernetes service directly over HTTP (not HTTPS, since we terminate that at the edge).

In stage.shipitapi.nonprod.cloudops.mozgcp.net, this is http://shipitapi-stage-shipitapi-app-1.

In testing.shipitapi.nonprod.cloudops.mozgcp.net, this is http://shipitapi-testing-shipitapi-app-1.

Rail Aliiev [:rail]

Assignee

Comment 17

•

6 years ago

D'oh... We enforce HTTPS in our apps, so I get a 302 to the HTTPS:

2019-04-23T14:56:33 INFO - 2019-04-23 14:56:33,862 - urllib3.connectionpool - DEBUG - http://shipitapi-stage-shipitapi-app-1:80 "PATCH /releases/Fennec-67.0b3-build1 HTTP/1.1" 302 345
E
2019-04-23T14:56:33 INFO - 2019-04-23 14:56:33,864 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): shipitapi-stage-shipitapi-app-1:443

Need to think what to do....

Adrian Utrilla [:autrilla]

Comment 18

•

6 years ago

This could be our NGINX redirecting you. If you send an X-Forwarded-Proto header set to https, that should prevent NGINX from redirecting. Not sure if that's doable. Otherwise we could expose the application directly instead of through NGINX through a Kubernetes Service, but that's not ideal.

Rail Aliiev [:rail]

Assignee

Comment 19

•

6 years ago

Yeah, it's getting a bit hairier that I thought. :)

Rok, maybe you have some ideas?

There are a couple issues:

When I use FQDN to access the API endpoint, the requests end up hitting the public IP, which requires whitelisting of the k8s replicas, what defeats the idea that we should bypass public routes in the same cluster.

I wonder if the source IPs of the requests coming from the same cluster should be the same with the public IP of that cluster, so we can easily whitelist it.

If I use the k8s names (e.g. shipitapi-stage-shipitapi-app-1), then I have to use http instead of https, but flask-talisman redirects to https in our case. Then the request times out.

I can hack the client requests and set the X-Forwarded-Proto header to "https". In this case we bypass flask-talisman, but now I hit an issue with mohawk, which verifies the auth headers, but falls back to using port 443 for some reason instead of 80. Probably it fails to properly guess the port in https://github.com/mozilla/release-services/blob/30fe29c037cb2a58d64ebdbf6dcf5b1456e14820/lib/backend_common/backend_common/auth.py#L391.

TBH, 2) sounds a bit dirty and hacky. :/

Any other alternatives?

Flags: needinfo?(rgarbas)

Rail Aliiev [:rail]

Assignee

Comment 20

•

6 years ago

We chatted about this today with Rok and I think I'm going to take the second route. It will not require any special changes neither in ship-it or GCP/k8s. This way we don't rely on special setup, but only on the client.

Flags: needinfo?(rgarbas)

Flags: needinfo?(autrilla)

Rail Aliiev [:rail]

Assignee

Comment 21

•

6 years ago

I submitted a couple PRs to address this:
https://github.com/mozilla-releng/shipitscript/pull/30
https://github.com/mozilla-releng/shipitapi/pull/15

Rail Aliiev [:rail]

Assignee

Updated

•

6 years ago

Depends on: 1547317

Rail Aliiev [:rail]

Assignee

Comment 22

•

6 years ago

Looks like we are ready to go with prod in bug 1547317. Let's do eet! :)

Mihai Tabara [:mtabara]⌚️GMT

Comment 23

•

6 years ago

Found in triaging. We moved shipitscriptworkers into GCP a while ago in bug 1581149.
I think we can close this for now. Feel free to re-open should I'm wrong.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED