Closed
Bug 1094497
Opened 11 years ago
Closed 10 years ago
auth, base: Use auth.taskcluster.net from semi-trusted services
Categories
(Taskcluster :: Services, enhancement)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jonasfj, Assigned: jonasfj)
References
Details
(Whiteboard: [relsec])
Attachments
(9 files)
|
55 bytes,
text/x-github-pull-request
|
mrrrgn
:
review+
|
Details | Review |
|
54 bytes,
text/x-github-pull-request
|
mrrrgn
:
review+
|
Details | Review |
|
55 bytes,
text/x-github-pull-request
|
mrrrgn
:
review+
|
Details | Review |
|
55 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
|
56 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
|
59 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
|
61 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
|
54 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
|
57 bytes,
text/x-github-pull-request
|
jhford
:
review+
|
Details | Review |
We have a lot of internal services... Maybe we shouldn't trust all of them, with everything.
For services that:
- we only trust somewhat
- don't handle many requests (no high performance requirements)
- don't have issues with slightly higher latency
We could send the hawk signature to auth.taskcluster.net, so that the semi-trusted
service doesn't need the scope `auth:get-credentials`.
Note, `auth:get-credentials` allows you to get credentials for any clientId.
Normal trusted services, uses this to fetch the accessToken, so they can validate
signatures. Then they cache the accessToken, so that it's a lot faster next time.
Round tripping all authentication requests to auth.taskcluster.net will introduce a slightly higher latency, and incur a load on auth.taskcluster.net. So this should be
used for high traffic services.
| Assignee | ||
Comment 1•11 years ago
|
||
Okay, seriously finding everything needed for remote hawk signature validation is non-trivial.
Maybe we should just consider asymmetric keys, something like ECDSA is pretty efficient.
Check out:
http://kjur.github.io/jsrsasign/sample-ecdsa.html (ECDSA demo in javascript)
256 bit private key should be enough for ECDSA.
A base64 encoded private key then looks like this: "Ju18bI1GjH5S7VELV8FqwozZDV8ry7rAEPkWUsyIdVI="
That is pretty neat! We can totally tell people it's an "accessToken" and not a private key :)
(they don't need to know)
This would also make our temporary credentials scheme much safer.
The current temporary credentials scheme is to say the least a bit home-cooked.
There also seems to be both pure python and pure javascript libraries:
http://kjur.github.io/jsrsasign/
https://pypi.python.org/pypi/ecdsa/
Both libraries tests against OpenSSL which makes me happy.
There are patent issues, but most of them seems expired to me.
Note, I do not want to use GPG/openpgp.js because neither supports ECDSA, and getting the same security level as an ECDSA key with 256 bits requires between 2048 - 4096 bits in RSA. This then suddenly becomes a large accessToken. Also RSA is slow, and it's painful to generate keys.
ECDSA is much faster...
I suspect both libraries above a susceptible to timing attacks and other edge cases.
But network latency protects against good timing attacks, if not I'll happily add a random timeout :)
@jlal,
Any thoughts?
Flags: needinfo?(jlal)
| Assignee | ||
Comment 2•11 years ago
|
||
On-topic:
It seems ECDSA is used in bitcoin, which explains the sudden availability of libraries in python and node.
| Assignee | ||
Comment 3•10 years ago
|
||
So I started looking at ECDSA again, and interesting things have happened since I last played with it (half a decade ago).
Most notably: http://ed25519.cr.yp.to/
The scheme is implemented in libsodium which has both python and node bindings.
It is also implemented in http://tweetnacl.cr.yp.to/ which mainly serves to show that it can be audited.
There are also pure JS implementations and a very slow pure python implementation.
I benchmarked the implementations here:
https://gist.github.com/jonasfj/af453cc2c569312ac59f#file-results-txt
Note: I have PRs hanging for node 0.11 support for libsodium bindings, + some extra methods.
Summary:
ed25519 is a viable ECDSA variant with:
32 bytes private key (44 bytes base64)
32 bytes public key (44 bytes base64)
64 bytes signature
Security level similar to a 3000 bit RSA key.
Having looked at hawk in more detail, there are things such as reply signatures that we don't use, nor really want to support. I suspect we'll be better inventing a new scheme, jhford suggested we call it eagle. I love that drafted something that is very much inspired by AWS signatures in terms of features included, see:
https://etherpad.mozilla.org/jonasfj-eagle-auth-design
--------------------------- Alternative HAWK Scheme ---------------------------
I also figured out how AWS avoids distributing their credentials to all data centers. They use HMAC too, ie. symmetric keys, but they are doing something smart/insane.
Basically they do as follows:
> kSecret = Your AWS Secret Access Key
> kDate = HMAC("AWS4" + kSecret, Date)
> kRegion = HMAC(kDate, Region)
> kService = HMAC(kRegion, Service)
> kSigning = HMAC(kService, "aws4_request")
Then they sign the message with kSigning. This works great because their end-points don't need to have the
kSecret then. So they reduce risk of leaks.
In our world the equivalent would be to not use accessToken for signing but signKey, as follows:
signKey = HMAC(HMAC(accessToken, <date-only>), <service>);
, where
<date-only> is the date and not the time.
<service> is the service, such as "queue", "scheduler", etc.
If messages are signed with signKey, then auth.taskcluster.net, doesn't need to expose the accessToken to the
queue. instead it can expose signKey which it computes specifically for the "queue".
And if the signKeys stored in the queues cache (or the credentials the queue uses to fetch these signKeys) is
exposed, they will only be usable on the queue. And nowhere else.
Note, I strongly suspect that the entropy of a hash degrades when HMACs are applied repeatedly. And each, HMAC
does at least two hashes. I would not be surprised if there is still plenty of entropy, but at 10 hashes as AWS
uses it starts to set off my alarm bells. It should at least encourage review by a cryptanalyst.
---
In short, asymmetric keys are nicer, server security will be less of a concern. But if we want to we can
extend the hawk scheme with more obscurities. This would not require modifications to hawk, only how we use it.
It does solve the immediate problem, but doesn't solve the underlying issues.
I think that long term, something like what I proposed in the eagle design document, we serve us better.
Hawk already includes PORT/HTTP scheme under the HMAC, which is overkill and becomes very painful, when you're
deployed between a reverse proxy.
Summary: auth, base: Remote validation of signatures for semi-trusted services → auth, base: Use auth.taskcluster.net from semi-trusted services
| Assignee | ||
Comment 4•10 years ago
|
||
clearing NI, as we've since talked about libsodium, not sure if or when that'll happen, but it's certainly more fun that ECDSA.
Flags: needinfo?(jlal)
Updated•10 years ago
|
Component: TaskCluster → General
Product: Testing → Taskcluster
| Assignee | ||
Updated•10 years ago
|
Component: General → Authentication
| Assignee | ||
Comment 5•10 years ago
|
||
@dustin,
Let me know if you don't feel like reviewing this.
It ended up being somewhat complicated, and figure you're the only one who has
really read the existing code.
Goal here is to not have accessToken leave taskcluster-auth. But instead send
signatures to auth.taskcluster.net for verification.
Use cases includes allowing untrusted server to use TC creds for authentication,
for example workers that adds public API can validate signatures.
It'll also facilitate implementation of roles as scope information
doesn't have be cached everywhere.
Additionally, we let the credentials cache timeout much sooner,
so propagation doesn't take an hour.
| Assignee | ||
Comment 6•10 years ago
|
||
This creates the API end-point on tc-auth using utilities from tc-base.
Once this is deployed we can remove caching support from tc-base, and move
all components away from caching API credentials.
Attachment #8652080 -
Flags: review?(dustin)
| Assignee | ||
Comment 7•10 years ago
|
||
@dustin, let me know if you can comment on the github PRs.
We recently reduced number owners, because push-access can deploy some things and we
wanted to minimize that. But that strategy doesn't work if you can't comment on the PRs.
Comment 8•10 years ago
|
||
I made some detailed comments inline, but I'll write my overall thoughts here since they span the two PRs.
My understanding of what you've done here is that it is *not* Eagle or any of the other crypto options outlined earlier in this bug. Rather, you've added a `/authenticate-hawk` endpoint to tc-auth, which simply wraps the configured `signatureValidator`. You've then written two `signatureValidator` implementations, one of which calls into `hawk` using credential data from Azure (loaded through a cache); the other (default) calls `https://auth.tc.net/authenticate-hawk`. The upshot is that we can flexibly configure each service to either verify every request against auth.tc.net or to use `getCredentials` as is done currently.
My biggest concern is, this doesn't really solve the underlying issue of cleartext credentials. There are weekly examples of why this is a bad idea, e.g., [1]. Cleartext credential storage is sort of a requirement for symmetric encryption, and certainly for hawk. Even if we get to a point where only the auth endpoint and Azure ever see the key material, that's still two services too many with access to the secrets. A compromise of either service will immediately allow impersonation of any taskcluster client -- basically game-over. I'd really like to see the number of services with access to the accessToken go to zero, even if it does require some additional complexity to support PKI.
Setting that aside, my remaining concerns are fairly minor: This introduces quite a bit of code duplication (e.g., all of `limitClientWithExt`). Is that just temporary to support deployment of this patch, or would it continue to exist as long as any service remains that still uses `getCredentials`. Similarly, this doesn't remove the code that exposes the accessKey via API calls, as that code is still required to support services using `getCredentials`. Ideally we'd get to a point where there are no such services. What are your feelings on the feasibility of that goal in terms of reliability and efficiency? If we never get there, then the usefulness of this patch is substantially reduced.
I'm vaguely uncomfortable with the `/authenticate-hawk` endpoint requiring no authentication itself, although it exists to process unauthenticated data (essentially the contents of a client request), so I don't think the discomfort is justified. I'm curious what your thoughts are here, particularly around the risk of brute-force attacks.
[1] http://blog.moertel.com/posts/2006-12-15-never-store-passwords-in-a-database.html
Whiteboard: [relsec]
Updated•10 years ago
|
Attachment #8652080 -
Flags: review?(dustin)
Updated•10 years ago
|
Attachment #8652079 -
Flags: review?(dustin)
| Assignee | ||
Comment 9•10 years ago
|
||
> My biggest concern is, this doesn't really solve the underlying issue of cleartext credentials.
I should probably include something about the rationale, so here we go:
1) We will (in future) encrypt and sign secrets stored azure, so auth.taskcluster.net is the only
central attack vector.
(Yes, the plan is to migrate away from getCredentials and drop it completely!)
2) Downside of PKI is that asymmetric algorithms are slow and hard to roll out client-side
- fastest and possibly best option is Ed25519
- libsodium is a binary library it's hard to deploy client side (we have many client envs)
(tweetnacl.js does work in the browser, and is fast enough for browser clients, but still slow)
- HMAC is efficiently (and correctly) implemented in just about all environments
(and used by AWS, azure etc, so generally trusted)
3) With remote signature validation we can always introduce a new authentication scheme.
Ie. if we later decide to do an Ed25519 based scheme we can do so. We forward the raw header,
so we only need to update taskcluster-auth to support a new scheme.
4) Following (1) auth.taskcluster.net is the only central attack vector, with asymmetric keys this is
still the case unless the list of scopes is attached to each request and signed by a 3rd private
key stored offline. (ie. attacker can change lists of scopes to include "*").
> I'm curious what your thoughts are here, particularly around the risk of brute-force attacks.
Brute-force attacks on sha256 are generally considered infeasible even on your local machine.
I would not be worried about that when the validation step goes over network. It would would:
A) overload our servers, and
B) at 100 dynos processing 500 req/s, it will take approximately 1.79 Yottayears to explore just
1 percent of the key 244 bit key space. This is about 34 orders of magnitude longer than the
burning time of our local sun.
- I love doing this kind of math :)
Actually, a timing attack might be feasible, but only on certificates for temporary credentials,
since hawk already uses cryptiles.fixedTimeComparison. This will only give you the certificate
not the key for the certificate. But we should probably use cryptiles.fixedTimeComparison just
to be on the safe side, as this could be used to fake a certificate and elevate scopes.
Comment 10•10 years ago
|
||
OK, that rationale helps - thanks!
I do want to make a counterpoint to #2, though: it's possible to use PKI to share temporary secrets which are then used with a faster algorithm at the per-request level. In fact, this is what Hawk is designed for: the expectation is that the shared secret is ephemeral and agreed through a secure channel such as HTTPS or via some other cryptographic exchange. This is also the approach that JWT uses: a client uses some trusted but slow mechanism to get a token, but the token can be quickly validated and interpreted on a per-request basis. This would make things somewhat more complex for clients, and wouldn't fit with the model introduced in these patches (since clients would need to call some `getToken` method to get a new token first).
Here's roughly how this might work:
https://auth.tc.net/get-token takes a clientId and accessKey. It verifies the accessKey against the stored (salted and hashed!) credentials, and if those match, returns a JWT with an assertion that the bearer has the specified clientId, and a reasonably short expiration period (an hour, say). The JWT then becomes a bearer token for authenticating to other services. This loses replay protection, request signing, and a few other benefits of Hawk, but those are all rather unimportant on a secure channel. On the plus side, the services can validate the JWT directly without consulting the auth endpoint. This also requires absolutely no crypto on the client side.
Another alternative is for get-token to return {tokenId, token, expiration} with a freshly-generated token, and store that token in Azure or some other k/v storage. Clients would then use the token for Hawk transactions until it expires, passing tokenId in place of the clientId. The service would consult the auth endpoint to get the token for the given tokenId (caching, of course), then use that to complete the Hawk authentication. This is more in line with how Hawk is intended to be used: the secret is ephemeral and communicated out-of-band. It has the advantage of getting all of the benefits of Hawk, but the disadvantage of requiring (limited) communication between services and the auth endpoint.
In either case, the client needs to be smart enough to use its clientId/accessKey to request the token, then use the token in the actual request, renewing as necessary. Given that we have written client libraries in lots of languages and these seem to be universally used, I don't think this is a substantial barrier.
| Assignee | ||
Comment 11•10 years ago
|
||
@dustin,
So both of these requires that I trust the server we're authenticating against.
If the server is untrusted a random machine, or a worker that offers a protected API, then I want that
server to able to authenticate requests without ever knowing any secret that can be reused.
Options for that is:
- per-request asymmetric signatures
- remote signature validation
- double HMAC application, where sign = HMAC(msg, HMAC(hostname, key)), so the untrusted server
only needs to know HMAC(hostname, key) which can't be used for requests to other services.
(This option makes temporary credentials require a cache entry per temp-cred used for server)
> Another alternative is for get-token to return {tokenId, token, expiration}
This is actually what temporary credentials are. Except that the token isn't stored in k/v store, it's
generated from certificate.seed in a way only someone with accessToken can do.
I'm not a big fan of storing temp creds in k/v stores. Mostly because I want to have many of them! :)
Currently aws-provisioner issues temp creds to workers, and in the future the queue will issue temp creds
per task when a task is claimed.
I don't see perma creds used for anything but privileged devs like you and me and servers, like queue,
provisioner, scheduler, index, and these would be as tightly scoped possible.
---
Generally, I'm open to exploring other options long term, hawk isn't pretty but it does well.
If remote signature validation isn't considered crazy, then I say we roll it out. It'll certainly
give us the flexibility to move to anything else, without updating all services.
And even if we do, we might want low-traffic servers still use remote signature validation, just so
they don't have to cache any secrets (this would also mitigate issues for untrusted services).
Comment 12•10 years ago
|
||
OK, I don't think this is a step backward at any rate.
| Assignee | ||
Comment 13•10 years ago
|
||
Comment on attachment 8652079 [details] [review]
Github PR for tc-base
Hi, :mrrrgn,
A little bird (selena) told me that you were interested in auth.
So wondering if you would mind giving these two PRs are review.
----
Plan is to roll out remote signature validation first, and
then refactor auth so that:
- accessToken is only exposed at creation,
- accessToken is encrypted in the azure table storage,
- roles are support and only handled by taskcluster-auth
(so other components need not be concerned about the complexity of these)
- no secrets need be shared between clients and servers
(only secrets shared are between auth and clients)
Additionally, we'll have the ability to quickly roll out fixes and new
auth schemes without patching all our services, as signatures are forwarded
for validation.
Attachment #8652079 -
Flags: review?(winter2718)
| Assignee | ||
Updated•10 years ago
|
Attachment #8652080 -
Flags: review?(winter2718)
Comment 14•10 years ago
|
||
Comment on attachment 8652079 [details] [review]
Github PR for tc-base
Thanks for flagging me on this so I can get up to speed. :] Dustin's doing a pretty bang-up job on review; but I was able to go over it pretty well. I'm figuring we won't merge this until the tests are fixed up.
Attachment #8652079 -
Flags: review?(winter2718)
| Assignee | ||
Comment 15•10 years ago
|
||
Comment on attachment 8652079 [details] [review]
Github PR for tc-base
Sorry, travis tests are horribly broken because of intermittent network issues.
So ignore those failures.
I plan to move it circle-ci, when the next-gen configuration is merged:
https://bugzilla.mozilla.org/show_bug.cgi?id=1084661#c10
- feel free to comment on that too :)
I'll merge the config when I get some buying for others who like my crazy idea...
---
Re: this PR I want to land it pretty soon, so I can start rolling it out.
Updating all components takes a long time, especially with all sorts of small breakages in tc-base.
Attachment #8652079 -
Flags: review?(winter2718)
Comment 16•10 years ago
|
||
Comment on attachment 8652079 [details] [review]
Github PR for tc-base
Looks nice ! \o/ :)
Attachment #8652079 -
Flags: review?(winter2718) → review+
Comment 17•10 years ago
|
||
Comment on attachment 8652080 [details] [review]
Github PR for tc-auth
I left a few nits; but nothing that prevents a + for sure imo. :) Thanks !
Attachment #8652080 -
Flags: review?(winter2718) → review+
| Assignee | ||
Comment 18•10 years ago
|
||
Thanks for review, merged... will fix the rename nit and roll out. Thanks!
| Assignee | ||
Comment 19•10 years ago
|
||
use taskcluster-client to make: createRemoteSignatureValidator
Attachment #8656963 -
Flags: review?(winter2718)
| Assignee | ||
Comment 20•10 years ago
|
||
@jhford,
Figured you could review this one... I want this rolled out for all components.
Including aws-provisioner so you'll get to review some this eventually :)
Attachment #8656965 -
Flags: review?(jhford)
| Assignee | ||
Comment 21•10 years ago
|
||
Same thing for queue...
Attachment #8656966 -
Flags: review?(jhford)
| Assignee | ||
Comment 22•10 years ago
|
||
Note to self: scale up number of auth dynos before deploying the PR for the queue :)
Updated•10 years ago
|
Attachment #8656963 -
Flags: review?(winter2718) → review+
| Assignee | ||
Comment 23•10 years ago
|
||
Similar to the other two... Should be quick...
Attachment #8657242 -
Flags: review?(jhford)
| Assignee | ||
Comment 24•10 years ago
|
||
Same as the others...
Attachment #8657248 -
Flags: review?(jhford)
| Assignee | ||
Comment 25•10 years ago
|
||
Same as others... enjoy :)
Attachment #8657253 -
Flags: review?(jhford)
| Assignee | ||
Comment 26•10 years ago
|
||
Similar to the others, just even simpler... Just hit the merge button...
Attachment #8657256 -
Flags: review?(jhford)
Updated•10 years ago
|
Attachment #8656965 -
Flags: review?(jhford) → review+
Updated•10 years ago
|
Attachment #8657242 -
Flags: review?(jhford) → review+
Updated•10 years ago
|
Attachment #8657248 -
Flags: review?(jhford) → review+
Updated•10 years ago
|
Attachment #8657253 -
Flags: review?(jhford) → review+
Updated•10 years ago
|
Attachment #8657256 -
Flags: review?(jhford) → review+
Updated•10 years ago
|
Attachment #8656966 -
Flags: review?(jhford) → review+
Comment 27•10 years ago
|
||
Is this transition complete?
| Assignee | ||
Comment 28•10 years ago
|
||
yes... Fixed a while ago... I hope otherwise we'll find out what crashes when we introduce the new API.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Authentication → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•