Closed Bug 1246620 Opened 8 years ago Closed 4 years ago

worker type secrets for storing secret keys instead of secret blobs

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pmoore, Unassigned)

References

Details

When secrets were added to AWS provisioner worker types, we didn't have the taskcluster secrets service.

Now that we have a dedicated secrets service, we could migrate the secrets out of the worker types, into the secrets service. As a side-effect, this has the very nice property of making worker types non-sensitive, which would help with e.g. bug 1246191.

I would propose that the secrets section would remain, but instead of directly embedding the secret, instead the values would be keys to the secrets store. The advantage of this is that secrets can be stored once in the secret store, but multiple worker types that require the secret can point to it.

This is very useful for worker start-up, where we want a generic mechanism to load secrets onto the worker. Furthermore, we already have code for converting secrets into binary file-system secrets, such as in the worker type win2012r2. We could reuse this in taskcluster-worker to enable filesystem secrets, e.g. for releng secrets.

For example, the secret could be referred to in the aws worker type config with:

      {
        "description": "SSL key for livelog",
        "path": "C:\\generic-worker\\livelog.key",
        "secret": "taskcluster-worker/livelog/livelog.key",
        "encoding": "base64",
        "format": "file"
      }

and then the secret "taskcluster-worker/livelog/livelog.key" would be retrieved from the secret store at start up.

On a linux/mac worker type, the `path` would be different, but the same secret containing the encoded file could be used.

I'm marking this as blocking the taskcluster-worker design, as if we go with this, it will affect how we implement secret loading in the taskcluster-worker.
Note, this doesn't relate only to file secrets, arbitrary secrets, such as taskcluster worker config, could also be stored via taskcluster secrets. The key point is that the worker type just refers to the secret key, not the secret value...
Blocks: 1244137
No longer blocks: 1244127
> but instead of directly embedding the secret, instead the values would be keys to the secrets store
This is no longer secret, so it should just be part of the public configuration options that
aws-provisioner hands out. See "userData" in current aws-provisioner.


> Note, this doesn't relate only to file secrets,
Good, because (unrelated to this) we shouldn't store secrets in files.
Memory is harder to steal, and in memory we can have a finalizer that zeros the bytes.
(just a random unrelated note)
Basically:
if we stop using the "data" property from provisioner.getSecret(token), then
we're only using the secret token to issue temporary credentials.
See: http://docs.taskcluster.net/aws-provisioner/api-docs/#getSecret
Assignee: nobody → dustin
So the plan here is to move everything in the `data` property into the secrets API, and fetch the secrets from there.  This will require some modifications to the worker implementation.  Somewhat longer-term, we'll want to get many of these secrets from the auth API, so that they are not fixed but generated dynamically for each worker instance.

How does the worker know what secret name to use? is this configurable? in the data property?  We need to balance the complexity of setting up a new workerType against code complexity in docker-worker and against difficulty in rotating secrets.
(In reply to Dustin J. Mitchell [:dustin] from comment #4)

> How does the worker know what secret name to use? is this configurable? in
> the data property?  We need to balance the complexity of setting up a new
> workerType against code complexity in docker-worker and against difficulty
> in rotating secrets.

I would envisage a worker-type definition like this:

{
  ....
  ....
  "userData": {
    "workerConfig": {
      "plugins": {
        "livelog": {
          "sharedSecret": "@secret{project/taskcluster/livelog/sharedSecret}",
          "subdomain": "taskcluster-worker.net",
          "certificate": "@secret{project/taskcluster/livelog/certificate}",
          "key": "@secret{project/taskcluster/livelog/key}"
        },
        ....
      },
      "engine": {
        ....
      },
      "global": {
        "engine": "win2012r2",
        "enabledPlugins": [
          "livelog",
          "success",
          ....
        ]
      }
    },
    "additionalEngineFiles": [
      {
        "description": "RelEng secrets",
        "path": "C:\\builds",
        "content": "@secret{project/releng/gecko-secrets}",
        "encoding": "base64",
        "format": "zip"
      }
    ]
  },
  ....
  ....
}

We'd need some pre-parser for translating the @secret{...} references, but I think it is worth doing. This way you get

  * visibility of config settings, even if you don't know what values are
  * public/private properties shown together (e.g. livelog.subdomain is not private, so does not need to be inside a secret)
  * no change on AWS provisioner side - userData is still arbitrary json object to be interpreted by worker
  * worker type definitions can be publicly viewed/audited

Of course, I'm open to a different format for the secret templating.
Note, also this way you can reuse the same AMI but different worker types can have different additional files, which can have either public or secret content. In this example I've added RelEng secrets, but this could also be for files that do not contain secrets - although ideally any public files would be downloaded/checked out as part of the task (not burned into the worker). It is more geared for secret files, such as RelEng secrets, that need to be available to a task, but without it being too easy to see what is in there (i.e. not checked in), and not easy to modify them, for example. I used such a scheme in the generic worker for getting the RelEng secrets onto the Windows environment, so that I didn't have to burn them into the AMI, but rather burned them into the worker type (which we would now be able to put into taskcluster-secrets with this bug).
Note, the provisioner would return the userData uninterpreted, i.e. with the secret template references still in it - it would be for the worker to resolve this user data it gets by retrieving the secrets it refers to.

Regarding the taskcluster credentials - I think these still need to be stored in provisioner, because of the chicken/egg problem of if credentials are stored as a secret, we don't have credentials to access them from the secret store.
That's awesome, thanks!  I think the only change I'd make is to format the secret as {"secret": "project/releng/gecko-secrets"} instead (that is, as a JSON object rather than a string), thereby avoiding ambiguity and the need to parse strings.

The taskcluster credentials given to the worker will be temporary credentials, so dynamically generated by the provisioner and not embedded in the workerType (and, in fact, not in the userData -- they are fetched using the one-time secret token).
Summary: Discussion: worker type secrets for storing secret keys instead of secret blobs → worker type secrets for storing secret keys instead of secret blobs
Assignee: dustin → nobody
I agree that we should remove provisioner secrets.  This is mostly blocked on workers migrating away from provisioner secrets.  Once that happens, we can remove the secrets storage in the entities and stop returning them anywhere.  This will also allow worker type configurations to hopefully become publicly viewable.
Depends on: 1437464, 1437465
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #9)
> I agree that we should remove provisioner secrets.  This is mostly blocked
> on workers migrating away from provisioner secrets.  Once that happens, we
> can remove the secrets storage in the entities and stop returning them
> anywhere.  This will also allow worker type configurations to hopefully
> become publicly viewable.

We think we have the information we need from Amazon to proceed here now. John was going to do some exploratory work with OpenSSL to verify.
We are still blocked on the workers removing support for provisioner secrets.
Component: AWS-Provisioner → Services
See Also: → 1375156
See Also: 1375156

No longer blocks bug 1375155 since publicly viewable definitions are made possible via bug 1375195.

No longer blocks: 1375155
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.