Closed Bug 1168534 Opened 9 years ago Closed 9 years ago

secrets: Implement secrets.taskcluster.net as secret key/value store behind scopes (proposal)

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Assigned: mrrrgn)

References

Details

I'm not sure this solves any problems. But it is an option that fits somewhat
well with our scopes, etc. However, someone still has to own the canonical
secrets and share them appropriately.

API Outline:
  GET    /v1/secret/<key>   scope: "secret:get:<key>", gets JSON payload
  PUT    /v1/secret/<key>   scope: "secret:set:<key>", sets JSON payload 
  DELETE /v1/secret/<key>   scope: "secret:delete:<key>"

JSON payload:
{
  value:       {<user defined object>},
  expires:     "<date-time>"
}

When `expires < now()` secret cannot be retrieved and they are eventually
deleted from data store.

### Use case 1, one-time secret injection
I want to inject <secret> into task, I choose <key> for which only me and
other trusted parties have the scope "secret:get<key>".
I do:
  secrets.set(<key>, {expires: now() + 24 hours, value: secret})
Then:
  queue.createTask(taskId, {
     scopes: ['secrets:get:<key>']
     ...
  });
Now, inside my task I can get my secret with taskcluster auth proxy:
  $ curl taskcluster/secrets/v1/secret/<key> | jq...

### Use case 2, Convert secret to scope
I have a <secret>, I want to delegate to people or to in-tree tasks.
Insert it with:
  secrets.set(<key>, {expires: year 3000, value: secret})
Then ensure that only trusted parties have the scope:
  "secret:get:<key>"
...

This basically transforms a <secret> into a piece of information that
is available with a given scope.

### Integration w. docker-worker
For use case 1, docker-worker integration might be nice so that:
  task.scopes = ['secrets:get:<key>']
  task.payload.secretEnv = <key>
Then, docker-worker fetches secrets using <key> and sets whatever values
as environment variables.
See Also: → 1168314
This is mostly just open for discussion. Please voice any opinions :)
I mentioned this in bug 1145695 as well, but you should look at:
https://www.vaultproject.io/
I did try the interactive vault tutorial, and sure there is some interesting uses.

One of the things that is different with the design outlined, here is that secrets would be protected by
taskcluster credentials and associated scopes. Which might be neat because tasks can already be assigned
scopes and we have UI for managing scopes associated with credentials.

Note, I'm not sure this proposal is great, ideally I want to access restricted services through proxies
(typically on docker-worker) that grants access based on taskcluster scopes. But such an approach will
always be service-specific and not generic enough to handle one-off cases.
(We'll likely do authenticating proxies for all the common cases)
The tricky bit with all of this is that in many cases we need to utilize secrets to perform jobs for people without allowing those people to learn the secrets.  Current examples: a try push may require access to files from tooltool, or to upload symbols.  The decision task can attach the proper scopes to the generated tasks, but if those tasks just run ['bash', '-c', 'echo $SEKRIT'] then we've leaked the secret itself.  That's just as likely to happen with an accidental logging of `os.environ` or something like that -- environment variables aren't the greatest place for secrets.  Keep in mind that tooltool doesn't just have Google's Android NDK/SDKs, but also partner builds which may be a target of corporate espionage.  Handing out the keys to that castle may do worse than violating Google's click-through license.

It happens that relengapi and symbols both support temporary secrets, so the decision task could generate new, temporary secrets (with a TTL of 1h or so) for these tools, then store those secrets with a key based on the taskid, and add the appropriate scopes and `task.payload.secretEnv` sauce to the generated tasks.

This limits the potential damage of an accidentally logged credential, but still allows 'echo $SEKRIT' try jobs.  It also doesn't work for services that don't support temporary credentials.  I don't see a great advantage here over the existing encrypted environment variables approach.

This brings me back to the idea of proxies.  With proxies, the task container itself has no secrets -- it identity is based on the linking of the containers, which is not something that can be forged by a committer.  The proxy itself can interpret the task's scopes as necessary and create temporary credentials, limit available API calls, etc.  In the rare cases where we have no choice but to provide credentials to the task, they can be provided via an HTTP API call, where the caller is unlikely to log the result and will only hang onto it as long as it is required.

I'd prefer to see the time used to implement secrets.tc.net put to implementing some more generic support for proxies in the docker-worker.  Briefly, I think that would mean that a task specifies task.payload.proxies = {"balrog": "gregarndt/taskcluster-balrog:0.0.1", "taskcluster": ".."} and task.scopes = ["proxies:balrog:foo", ..].  The docker-worker takes care of starting each specified docker image and binding it to the given name in the task container.  Then I can develop a new proxy for my pet service via try, without any modifications to docker-worker.
I added more detail about the generic proxies support in bug 1170784.

Some additional use-cases / requests:

This may be useful enough as a configuration management point for infrastructure details that aren't strictly "secret" -- the hostname at which to reach balrog, for example, or the influxdb hostname to submit statistics to.  The kind of configuration that we would want to change out-of-band with in-tree changes, and which should be effective retroactively for re-run tasks.

Access should be controllable using a hierarchy of secrets, so that we can grant access to more than one secret at a time (for example, 'secret:get:testing/*').
### Note, regarding secret tools (things we can't distribute)
This includes: Google's Android NDK/SDKs, Binary blobs from partners, etc.

We should use trusted images for this (also called self-validating images). This is an image that boots
then uses the env var TASK_ID to validate it's own task definition, for example to prohibit artifact
extraction of secret/non-distributable files/tools. This can be ensured by the ENTRYPOINT command in
the docker image, which docker-worker won't allow you to overwrite.

Ideally, we would in the future be able to do things like lock down network and rely on declarative
download. So we can run untrusted code w. access to non-distributables, but still validate that all
the artifacts extracted are uploaded to a private folder (artifact name: "private/..."). This way
we might eventually be running builds and tests for untrusted code.
I don't think that's necessary.  Those secret tools are already adequately protected by relengapi permissions, which we can translate from taskcluster scopes.  Maybe I don't understand the problem you're trying to solve?
This was in relation to comment 4 on tooltool, which I suppose is downloaded through a proxy. So it can't be isolated from semi-trusted code. Anyways, it's a side track, so nvm :)
Component: TaskCluster → General
Product: Testing → Taskcluster
Assignee: nobody → winter2718
This fits well with what we were discussing at the bottom of: https://etherpad.mozilla.org/jonasfj-encrypted-env-vars-v2

Taking this on unless anyone has an objection.
Starting on a skeleton of this service based on the design here: https://github.com/mrrrgn/taskcluster-secrets-proposal
Starting on a skeleton of this service based on the design here: https://github.com/mrrrgn/taskcluster-secrets-proposal
https://github.com/taskcluster/taskcluster-secrets
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General → Secrets
QA Contact: dustin
Component: Secrets → Services
You need to log in before you can comment on or make changes to this bug.