1336050 - [tracker] create and deploy host-secrets service

Reporter

Description

•

7 years ago

clients request temporary credentials and the service should validate reverse dns (ip/mac) and issue temporary scopes.

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Updated

•

7 years ago

Blocks: 1328640

Amy Rich [:arr] [:arich]

Updated

•

7 years ago

Assignee: relops → rthijssen

John Ford [:jhford] CET/CEST Berlin Time

Comment 1

•

7 years ago

Hi Rob, I've started a simple project for this here: https://github.com/jhford/taskcluster-hardware-secrets I can maybe hack on this a bit more, but I'm also happy to do reviews if you have time to hack on it.

This is mostly a rough skeleton for how this could be implemented using TC Apis and tooling.  Doing that makes it a lot easier for this service to integrate from an admin standpoint.  It also means that it works similarly to other taskcluster services.

The things that need to be done still:

1. Docs.  They're placeholders now, but need to be written
2. An implementation of src/api.js:isIpAllowed(ip, allowed)
3. Re-enable scopes (turned off for easier development)
4. Write unit tests -- basically reimplement test.sh in a better way
5. JSON Schemas for input and output
6. Design structure for allowed field of the Secret entity

To run what I've done so far, you can clone the repo then 'npm install'.  The server should be able to be started with:

npm run compile && NODE_ENV=development PORT=8080 node ./lib/main development

and then in another terminal, running test.sh after changing the host names to be not my work desktop :)

Regarding number 2. above, I'm not an expert on matching IPs and DNS/Reverse-DNS stuff.  The structure of the allowed field is something I'm not sure how to do.

Let me know if you have any questions

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 2

•

7 years ago

This is awesome John. Thank you!

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 3

•

7 years ago

Nice!  It's exciting to think we could build something like this so quickly!

To summarize my understanding from looking at the repo, including
comments at the top of api.js:

 * CRUD endpoints for updating named secrets, using normal auth scopes
 * getting a secret allows either taskcluster, per-IP, or per-token
authentication
 * secrets will have an "allow" configuration that determines what IPs
and/or tokens can get them.

Did I miss anything?

Storing secrets directly in this service seems to duplicate the work
of the secrets service.  Could we design a service that just issues TC
credentials, which hosts can then use to fetch secrets (or run a
worker, or anything else)?

I suspect such a service could operate with no Azure tables.  It would
just issue temporary credentials with a role based on the requesting
hostname.  If this occurred on every reboot, then the 30-day maximum
for temporary credentials wouldn't be a problem.  Then the role could
determine which secrets are available to which hosts, in an
inspectable fashion.  Something like
`assume:project:releng:host:com.mozilla.scl3.releng.build.b-3-win2012-0010`,
with scopes assigned to prefixes like
`project:releng:host:com.mozilla.scl3.releng.build.b-3-*`

I can definitely help with the forward/reverse DNS resolution.  I can
help with deployment, too.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 4

•

7 years ago

A few notes from our conversation just now:

 - the factoring in comment 3 probably does make sense: issue credentials, and let the host get secrets from the secrets service
 - in addition to forward/reverse DNS, we should also whitelist IP ranges and require a shared token as additional protection measures
 - this will need to be deployed within the releng network; perhaps we could deploy it on the puppetmasters, which are already high-security, redundant, and well-maintained

And it occured to me after our conversation, we don't have a name!  Ideas: `datacenter-auth`, `host-credentials`, `datacenter-credentials`..

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 5

•

7 years ago

Oh, "hardware-secrets", from the repo in comment 1, is good.  So we do have a name :)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 6

•

7 years ago

I threw together some code to verify forward/reverse DNS:
  https://gist.github.com/djmitche/6977bd19fb37b99510af4fb4c9a1dc2c

I don't think it makes sense to release this as an actual npm library -- it was just easiest to set it up and run it that way.  I admit I didn't write a lot of tests, because I can't find any public IPs that don't have good forward/reverse.  In production, we can write a fake dns module and test against that.

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 7

•

7 years ago

One thought about the name is that I'd like to not let hardware diverge too far from our cloud instance configuration. I'm thinking that perhaps our existing Windows cloud instances should be adapted to also use this service to obtain their secrets. If we did that, it might make sense to lose the "hardware" element in the service name. since taskcluster-secrets would be confusing, perhaps taskcluster-secrets-proxy?

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 8

•

7 years ago

Some thoughts on how we might implement secret acquisition from hardware in the dc:

- on boot, hardware-instance checks if it has an unexpired gpg key, if not it creates one and publishes to mozilla keyserver or some new git repo that accepts commits from untrusted sources
- something monitors the keyserver/git-repo for newly added keys and alerts in an IRC channel
- human monitors IRC channel and adds public key (if trusted) to metadata-service-encryption-key-list
- datacentre-metadata-service publishes metadata-secrets as .gpg files encrypted with all public keys in the metadata-service-encryption-key-list
- hardware-instance obtains encrypted taskcluster-secrets-token from datacentre-metadata-service
- hardware-instance decrypts taskcluster-secrets-token and uses it to access taskcluster-secrets-proxy
- taskcluster-secrets-proxy validates token and reverse dns and if appropriate makes taskcluster-secrets available to hardware-instance

hardware-instance: a hardware instance in one of our datacentres
human: anyone in the release operations team (maybe releng/buildduty/taskcluster too?)
datacentre-metadata-service: a service hosted at ip 169.254.169.254 inside the datacentre and on the same local network as the hardware-instance
metadata-service-encryption-key-list: a list of trusted hardware-instance public keys
taskcluster-secrets-token: a shared token trusted by the taskcluster-secrets-proxy
metadata-secrets:
 - root user credentials for hardware-instance
 - worker user credentials for hardware-instance
 - taskcluster-secrets-token
 - build/test secrets (oauth and api keys used by builds and tests)
taskcluster-secrets:
 - livelog credentials
 - generic-worker configuration secrets

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 9

•

7 years ago

dustin, jhford, markco: do the ideas above gel with how you envisaged this to work?

Flags: needinfo?(mcornmesser)

Flags: needinfo?(jhford)

Flags: needinfo?(dustin)

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 10

•

7 years ago

In general I like it. 

The only part I am concerned with is the "a service hosted at ip 169.254.169.254". That part might get a little sticky network wise.

Flags: needinfo?(mcornmesser)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

7 years ago

We could certainly run the same service in EC2 and in the datacentre.  But the situation is a little different in each:
 - in the datacenter, there's a strong association between IP and role; in EC2, the IP is basically random
 - in the datacenter, re-deploying hosts from scratch is slow and time-consuming, so we can't make a new token on every startup, whereas in EC2 re-deploying is the default and we can issue a distinct token to each instance.  I think we have a godo thing going in the AWS provisioner, and we should probably not try to re-solve that problem right now.

I tend to agree about "hardware" in the name.  Maybe "host-credentials" was better.

I don't think this service covers the gpg key generation - at least not directly.  Perhaps it provides access to some required secrets (maybe an ssh key that can push to this git repo..) in the secrets API.

Also, agreed wrt use of 169.254.169.254.  That will be problematic in the datacenter (requiring some tricky networking config on the server and special-casing on the routers) and impossible in Amazon (where EC2 already provides a service at that IP).  I see a fixed IP like that as being useful for early system startup, where the system is getting its first taste of its unique data.  This service, on the other hand, comes later in the startup process and clients can use a regular old domain name to find it.

Flags: needinfo?(dustin)

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 12

•

7 years ago

i probably wasn't clear enough. my thinking is to have two services. one is the host-credentials/tc-secrets-proxy providing access to taskcluster secrets and another separate service providing metadata in the dc. no need for the second service (metadata) in ec2 where its provided already.

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 13

•

7 years ago

and yes, the use of the ip 169.254.169.254 in the dc is necessarily complicated but intentional since it's conventionally used to provide just this sort of metadata.

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 14

•

7 years ago

johnb: It was suggested that you might be the person to talk to in regards of using the 169.254.169.254 ip address with in the datacenter. Refer to the discussion above starting at comment 8. 

Would this be feasible in regards of using the address? If so are there any pitfalls you see in using it?

Flags: needinfo?(jbircher)

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 15

•

7 years ago

It looks like we are going to move away from trying to use the 169 address.

Flags: needinfo?(jbircher)

John Ford [:jhford] CET/CEST Berlin Time

Comment 16

•

7 years ago

I've made a few changes to the tool I put in the comments earlier. I don't think setting up a proxy is how we'd want to do this, rather, I think a good approach is use this service to generate a set of taskcluster credentials which can be used with the taskcluster-client libraries (js, python, go and java) to interact directly with taskcluster-secrets to obtain secrets. Taskcluster-secrets could be used to store whichever secrets are involved in starting up the machine and getting it into production.

Rob, regarding matching the cloud-setup, I think the issue is that we don't want to duplicate what's done in the cloud. I think what we really want is to move the cloud to using basically what we have here. In the cloud, the work that this tool does would be implemented inside of the provisioner and in the future, we'd remove the secrets storage from the provisioner altogether.

I think that the metadata service is a cool idea, but should be unrelated to the auth/scopes/credentials issue. The concern I have is that it's an unauthenticated service in EC2, and we definitely don't want to have an unauthenticated endpoint that supplies credentials.

Dustin, do you think you could open a PR implementing that check in the tool? https://github.com/jhford/taskcluster-hardware-secrets/blob/master/src/api.js#L14-L16 is where I currently have stub code. I'm not sure that we care to have an IP whitelist, but it might be nice as an extra layer. The format for that is not something I'm sure of, whether it'd be simple wildcard matching or something smart based on cidr. The https://www.npmjs.com/package/ip library looks pretty neat for doing these checks. If you don't mind taking a quick look at the tool in general, I'd appreciate that.

As things are currently implemented, I issue scopes for all DNS reverse resolution records, but based on the snippet, I should only do it if there's a single record. I also haven't written tests yet, but that shouldn't be too much work.

John Ford [:jhford] CET/CEST Berlin Time

Updated

•

7 years ago

Flags: needinfo?(rthijssen)

Flags: needinfo?(jhford)

Flags: needinfo?(dustin)

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 17

•

7 years ago

i'll work with whatever you build. my initial motive for the metadata service evaporates if we don't need to supply a token/secret to your service but can rely on just the reverse dns for auth. so when it's ready, we'll take the service for a spin and see how it goes.

Flags: needinfo?(rthijssen)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 18

•

7 years ago

https://github.com/jhford/taskcluster-hardware-secrets/pull/1

Flags: needinfo?(dustin)

Greg Arndt [:garndt]

Comment 19

•

7 years ago

Attached file pr 1 - ip checking — Details

John, there was no way to flag you on your own repo for this PR, so figured I would add the flag here for it.  Mind giving it a gander?  Thanks!

Attachment #8838596 - Flags: review?(jhford)

John Ford [:jhford] CET/CEST Berlin Time

Comment 20

•

7 years ago

Comment on attachment 8838596 [details] [review]
pr 1 - ip checking

Merged.

The only thing left here is the IP checking, should we choose to do that, and some unit tests of the API itself.  I'll work on that this week, and then coordinate with Rob to make sure it's working correctly.

Attachment #8838596 - Flags: review?(jhford) → review+

John Ford [:jhford] CET/CEST Berlin Time

Comment 21

•

7 years ago

Attached file PR 2 — Details

Here's another PR to add a few features to the service

Attachment #8839230 - Flags: review?(dustin)

Attachment #8839230 - Flags: feedback?(rthijssen)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

7 years ago

Attachment #8839230 - Flags: review?(dustin) → review+

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 22

•

7 years ago

We had an irc conversation a few hours ago regarding how to prevent a task running on a host from getting access to that host's credentials.

The risk from such an attack is that the credentials give access to a number of secrets that are shared with lots of other things, notably the *.taskcluster-worker.net TLS key, relengApi proxy token, etc.  Sort of "medium" risk.

Some fixes we considered:

 * Share a single token among all hosts, and pass that along with the credentials request
   * Protect that token with filesystem permissions
   * Install that token with Puppet/OCC on startup, and delete it as soon as it is used
   * Generate a JWT with Puppet/OCC that lists an expiration time 10 minutes or so in the future
     (it's not clear this is possible with OCC)
 * Configure the host firewall to prevent outbound access to the credentials service after credentials are fetched

We also noted that in cases where taskcluster-worker is running as the same user as the task (which is the case on OS X and probably all platforms), all the protections in the world can't stop a task from reading /proc/$PPID/mem and looking for the credentials there, at least not without preventing crashreporter from working or tests from running.

So, maybe we just have to live with this risk?  I feel like we should pull in some more opinions on the topic..

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Updated

•

7 years ago

Depends on: 1341654

Rob Thijssen [:grenade (EET/UTC+0300)]

Reporter

Comment 23

•

7 years ago

Comment on attachment 8839230 [details] [review]
PR 2

working to deploy on puppet servers - bug 1341654

Attachment #8839230 - Flags: feedback?(rthijssen) → feedback+

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 24

•

7 years ago

I think we're agreed we need to live with the risk.

We also noted that we should be using HTTPS, bug 1342112.

At any rate, this is implemented now, but seems to be turning into a tracker.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

pr 1 - ip checking 7 years ago Greg Arndt [:garndt] 61 bytes, text/x-github-pull-request	jhford : review+	Details \| Review
PR 2 7 years ago John Ford [:jhford] CET/CEST Berlin Time 62 bytes, text/x-github-pull-request	dustin : review+ grenade : feedback+	Details \| Review