Closed
Bug 1132355
Opened 9 years ago
Closed 9 years ago
docker-worker: Serve livelogs over HTTPS using custom DNS server
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jonasfj, Unassigned)
Details
Attachments
(1 file)
It would be really nice if we could serve live logs over HTTPS. HTTPS support would also be nice when we start playing with interactive tasks that exposes websockets and what not... Though it might require a reverse proxy thing that we can hide behind a feature flag. Implementation is fairly simple: We setup a custom DNS server that when queried for: ec2-54-213-239-253.us-west-2.ec2.taskcluster.net Returns a CNAME record pointing to: ec2-54-213-239-253.us-west-2.compute.amazonaws.com When configure docker-worker with the SSL certificate for: *.ec2.taskcluster.net (I haven't tried it, but I've heard something like sub-certificates is possible) Implementing the DNS server is probably not so hard, just use something like: https://www.npmjs.com/package/node-named Or one of the other modules. A few notes: - We should ping a health check end-point on the worker before we return the DNS query. - We should limit DNS records to 5 min life time - when implementing expose port, we should make a reverse proxy that allows serving dynamic content from the task over HTTPS. Without exposing the SSL certificate to the task container. Security considerations: - If always accessed over HTTPS, then there is nothing to fear. - If accessed over HTTP, there is a risk Remark: It might also be possible to this with dynamic DNS. We can likely find a hosted dynamic DNS service; but these also seem expensive and likely require preconfigured hostnames. If they had an API we could just use a uuid as the subdomain and register it. But one of the nice things about the custom DNS server hack, is that we return a CNAME, so there is a pretty good chance that is already cached. And we don't have to do additional setup on docker-worker, ie. inform a dynamic dns server about our ip when we launch an ec2 instance. Granted a custom DNS server requires UDP access, so we have to run it on EC2 directly. Though might try using dotCloud or one of the other docker providers, like tutum.co (which is bring our own AWS/Azure/DigitalOcean subscription). --- Anyways, I think this could totally work. And doesn't seem impossibly hard to setup and test. Though I'm not sure it's possible to direct a full subdomain like ec2.tc.net to a single DNS server. If not we can just buy a domain like taskcluster-worker.net. And set our custom DNS server authority for that domain. (Hmm... I probably ought to take the time to figure out exactly how DNS records work)
Reporter | ||
Comment 1•9 years ago
|
||
One of these days I'm really going to learn DNS... Anyways, we don't a custom DNS server, or dynamic DNS... We just need a DNAME record, as follows: ec2.taskcluster.net IN DNAME compute.amazonaws.com And then put the SSL certificate for *.ec2.taskcluster.net into docker-worker and everybody wins. Ignore the title but see: http://blacka.com/david/2006/12/04/dns-dname-is-almost-useless/ (just read the first part) I'm not sure DNAME is widely support, but if it's a problem we can find a DNS server that emulates it. Which is exactly what I described above. Note, this will mean that anyone launching an ec2 node can host a HTTP server with a hostname postfixed: .ec2.taskcluster.net Ie. we cannot trust *.ec2.taskcluster.net domains. Well, unless they are over HTTPS, in which case everything is awesome. And since mixed content errors will protect us from accidentally accessing over HTTP this should work well :)
Reporter | ||
Comment 2•9 years ago
|
||
Remark, we can probably use: https://github.com/jwilder/nginx-proxy To reverse proxy the DNS... then we just give docker containers, environment variables: VIRTUAL_HOST = <prefix-from-hostname>.ec2.taskcluster.net VIRTUAL_PORT = <random port> They can probably proxy our existing live log stuff too. No need to reinvent any wheels.
Reporter | ||
Comment 3•9 years ago
|
||
Blocked on bug 1133131, figuring out of Mozilla support DNAME records. If not we can probably find somewhere else to host DNS... It seems pretty simple.
Depends on: 1133131
Reporter | ||
Comment 4•9 years ago
|
||
So I deployed bind9 on tutum.co and got a DNAME record working perfectly fine.. We'll except for the OpenDNS server that closest to me... Presumably I gave that one a bad record or something :) Anyways, we just need to figure out a good domain name, buy SSL certificate, and then I can deploy a custom DNS server for it...
No longer depends on: 1133131
Reporter | ||
Comment 5•9 years ago
|
||
I deployed bind9 servers, and filed: bug 1133994 for registration of domain name, bug 1134000 for SSL certificate
Reporter | ||
Comment 6•9 years ago
|
||
webops had issues making the SSL certificate we needed to use a DNAME record. So I deployed a custom DNS server. The names will now have to be on the format: ec2-52-11-22-87-dot-us-west-2-ec2.taskcluster-worker.net Basically, take the hostname, replace ".compute.amazonaws.com" with "-ec2.taskcluster-worker.net". And replace any "." before that with "-dot-". Good news is that we'll only need a single wildcard certificate.
Summary: docker-worker: Serve livelogs over HTTPS → docker-worker: Serve livelogs over HTTPS using custom DNS server
Reporter | ||
Comment 7•9 years ago
|
||
I got the cert!!! :) We can start playing with: https://github.com/jwilder/nginx-proxy As far as I can we just need to give it the certs, and set the environment variables: - VIRTUAL_HOST - VIRTUAL_PORT on the livelog serving container. Then the nginx-proxy will reload when containers are started and stopped. Granted we'll have for forbid others from setting the VIRTUAL_HOST and VIRTUAL_PORT variables. We can also change the env var names read by nginx-proxy and modify the docker image. Maybe TASKCLUSTER_VIRTUAL_HOST and TASKCLUSTER_VIRTUAL_PORT are better. IMO we should also extend task.payload with: task.payload = { image: ... ... // the usual stuff exposeWebInterface: true | false // or something like that, such that when exposeWeb === true, we // require the scope docker-worker:expose-https // and defined env vars: // TASKCLUSTER_VIRTUAL_HOST and TASKCLUSTER_VIRTUAL_PORT // which will auto-link with nginx-proxy and inject PORT and hostname into the container. } Note for now we should probably just decide what env var names to use. Or if we want to inject config into nginx-proxy by other means, for example a file. Anyways, I noticed a minor issue with the us-east-1 nodes not being under: .compute.amazonaws.com I'll update the custom DNS server to be more flexible and update here with what the pattern for rewriting ec2 hostnames to taskcluster-worker.net hostnames should be.
Reporter | ||
Comment 8•9 years ago
|
||
Looking at jwilder/nginx-proxy it seems we don't have to use the magic env vars. We just mount nginx configs from host into the official nginx image at: /etc/nginx/conf.d And certificates at: /etc/nginx/certs And whenever we update the nginx config, because we've added a new livelog container that needs a reverse HTTPS proxy we make nginx update it's config with: $ docker kill -s HUP <nginx-container-id> --- Okay, thinking about this I'm not sure it'll work, because we need to --link things correctly. And in this case I think it's the nginx container that has to be the client of the --link operation. So maybe we need to bake the HTTPS part into the livelog hosting image: http://golang.org/pkg/net/http/#ListenAndServeTLS That might actually be the easiest, then we can play with nginx when we want to expose ports from task containers to the world over HTTPS.
Reporter | ||
Comment 9•9 years ago
|
||
FYI, the livelog things lives at: https://github.com/taskcluster/livelog We just need to inject: - cert - key And probably a token though env var to fix bug 1132221. (We might as well do that when fixing this).
Reporter | ||
Comment 10•9 years ago
|
||
I updated our custom DNS server to serve the following mapping: -ec2.taskcluster-worker.net => .amazonaws.com Meaning that <prefix>-ec2.taskcluster-worker.net is CNAMEd to <prefix-with-dots>.amazonaws.com. , where <prefix-with-dots> = <prefix>.replace(/-dot-/g, '.') i.e. all instances of '-dot-' replaced with '.' So a node with a hostname like: ec2-54-213-239-253.us-west-2.compute.amazonaws.com can also be accessed using the hostname: ec2-54-213-239-253-dot-us-west-2-dot-compute-ec2.taskcluster-worker.net (for which we have an SSL certificate)
Reporter | ||
Comment 11•9 years ago
|
||
So the CNAME pattern described in Comment 10 was deployed yesterday. By I have this lovely/crazy idea. The thing that is a bit sketchy with the custom DNS server is that any EC2 node will be able to serve content under a CNAME on the form: ...-ec2.taskcluster-worker.net But we could actually avoid this. With some clever hacks, and an HMAC signature in the domain name. All EC2 nodes have a public ip adresse, all livelog URLs have an expiration (maxRuntime), and we can embed a shared secret in docker-worker and our custom DNS server. So the <prefix> before -ec2.taskcluster-worker.net could be constructed as follows: base32(<ip> + <timestamp> + <HMAC-256(ip + timestamp, secret)>) + '-ec2.taskcluster-worker.net' When the DNS server gets a request for this kind of domain, it would then, extract the ip, timestamp and HMAC, validate the HMAC and return an A record for the IP, if the HMAC signature is valid. With a 4 byte ip, 8 byte timestamp, 32 HMAC signature the domain names would be on the form: aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbbaaaaaaaaaac-ec2.taskcluster-worker.net And if we truncate the HMAC signature to 16 bytes (128 bits), we an even short domain name: aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbbccccc-ec2.taskcluster-worker.net The timestamp is obviously expiration of the livelog, such that the DNS record isn't worth much after our spot instance goes away. We could also tie the timestamp as expiration of the worker node, and set it 72 hours into the future. Then have the aws-provisioner generate the domain name, such that the HMAC secret only has to live in the AWS provisioner and custom DNS server. Otherwise it'll be enough to compromise docker-worker to get both the HMAC secret and the SSL certificates. Note: truncation of HMAC-256 to 128bit **might** be reasonable. It's hard to find a source who dares to claim is perfectly okay. But various google searches suggests that it's been done before :) I'm not sure it's worth guarding against this. It's not like it's a huge security issue if someone else starts hosting HTTP content under -ec2.taskcluster-worker.net. But it would be nice if others didn't host under this domain. Note, we could also do it with out any expiration or signature, and then have a very short domain name for workers, because we only need to embed the public IP and not the ec2 hostname. aaaaaaa-ec2.taskcluster.net where aaaaaaa encodes the public ip in base32. Note this would be less secure than the current approach which is already questionable, but at least bound to nodes with a hostname under amazonaws.com --- Anyways, just another crazy thought/suggestion. Let me know what people think.
Reporter | ||
Comment 12•9 years ago
|
||
So we have deployed: https://github.com/taskcluster/stateless-dns-server (Also published to npm and contains logic to generate hostname) And I have updated livelog: https://quay.io/repository/mozilla/livelog garndt has the secret for the DNS server. Support for secret accessToken (bug 1132221) will land along with this. I have SSL certificate and key, I'll email garndt so he also has it. We'll have to volume mount the certificate and key file. Then set env vars: * ACCESS_TOKEN secret access token required for access (required) * SERVER_CRT_FILE path to SSL certificate file (optional) * SERVER_KEY_FILE path to SSL private key file (optional)
Comment 13•9 years ago
|
||
This implements both secure logging as well as tokens appended to log url for obscurity
Attachment #8607302 -
Flags: review?(jopsen)
Reporter | ||
Comment 14•9 years ago
|
||
Comment on attachment 8607302 [details] [review] Worker PR 88 Yay, HTTPS livelogs...
Attachment #8607302 -
Flags: review?(jopsen) → review+
Comment 15•9 years ago
|
||
https://github.com/taskcluster/docker-worker/commit/a824b35490aeff0533c0b9fc178f3dc78ae05632
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Assignee | ||
Updated•5 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•