560588 - Private API for for registration actions that bind with LDAP root DN (create / delete user)

Assignee

Description

•

15 years ago

Using an async queue to create and delete users could help in two areas: 1) Performance, in case user creation is too slow to keep up with requests. 2) Security, preventing the use of the root DN or other over-powered DN on webheads to create / delete users. Beyond creation / deletion of users, a user DN with the proper permissions could bind with the LDAP server to perform self-modifications (eg. email, password, etc) without affecting other users.

Les Orchard [:lorchard]

Assignee

Comment 1

•

15 years ago

#1 is not expected to be a problem from initial load tests. Need to get the specific capacity numbers, though. #2 can be a concern, though is it enough of a concern to warrant adding a job queue to the infrastructure mix? For what it's worth, all of our web sites with authentication based on MySQL tables have the same potential issue - that is, the credentials to affect all users are present on webheads.

Les Orchard [:lorchard]

Assignee

Comment 2

•

15 years ago

Some notes on what's in progress, slow going because my LDAP is rusty: * Revise LDAP driver for auth to use downgraded privileges as appropriate; eg. all user manipulation done by binding as that user, simple lookups (ie. cluster node location) done as anonymous. May need some LDAP ACL tweakery. * Gearman for the work queue (http://gearman.org/). Job server can live on a secured machine with a single port opened to webheads. Workers that bind as root DN can live on same machine as job server, or any machine protected from webheads. A break-in on one of the webheads would allow access to the job server to create / delete users, but no modification or read of existing user records.

Chris Lyon [:clyon]

Updated

•

15 years ago

Flags: blocking-weave1.3?

Ed Lee :Mardak

Updated

•

15 years ago

Flags: blocking-weave1.3? → blocking-weave1.3+

Les Orchard [:lorchard]

Assignee

Comment 3

•

15 years ago

Had a quick call with clyon and mcoates about the security of this thing. Notes from the meeting: * The Gearman job server is a persistent daemon that will live on a machine with limited access from webheads, ie. just port 4370. * The connection from webheads to the job server should be secure to deter packet sniffing of jobs (which can contain user credentials) from elsewhere on the network. Since Gearman doesn't support SSL connections, can we establish secure port tunnels as part of our infrastructure (eg. using stunnel)? * A job worker is a persistent daemon that connects to the job server, and can live on the same protected machine as the job server. Webheads will have no access to job workers, which speak only to the job server. * Only job workers will have access to LDAP root DN credentials. * Account deletion jobs will require the user's password, which job workers will validate before executing account deletion. This should limit the arbitrary deletion of accounts via a compromised webhead, but requires a secure connection between webheads and job server to protect the credentials. * Account creation jobs from a compromised webhead seem no worse than creation requests at the HTTP API level. * All jobs will be logged per https://intranet.mozilla.org/Security/Users_and_Logs#Logging_Recommendations Let me know if there's anything I missed!

Status: NEW → ASSIGNED

Michael Coates [:mcoates] (acct no longer active)

Comment 4

•

15 years ago

Looks right to me.

matthew zeier [:mrz]

Comment 5

•

15 years ago

> * The connection from webheads to the job server should be secure to deter > packet sniffing of jobs (which can contain user credentials) from elsewhere on > the network. Since Gearman doesn't support SSL connections, can we establish > secure port tunnels as part of our infrastructure (eg. using stunnel)? Is that really necessary? That'd imply that we need to safe guard against physical access to the switch. If you can root the web server I'd bet you could insert yourself before the stunnel. Is there enough value in doing crypto across a trusted network (in a locked cabinet with security cameras)?

Les Orchard [:lorchard]

Assignee

Comment 6

•

15 years ago

(In reply to comment #5) > Is that really necessary? That'd imply that we need to safe guard against > physical access to the switch. > > If you can root the web server I'd bet you could insert yourself before the > stunnel. Is there enough value in doing crypto across a trusted network (in a > locked cabinet with security cameras)? I think the concern was if someone got access to another machine (webhead or not) on that network and might then sniff all traffic between all webheads and the job server. Is that even possible with a switch between machines? You might want to arm-wrestle with clyon and mcoates about that.

matthew zeier [:mrz]

Comment 7

•

15 years ago

Not possible or not easily possible on a switched network. Easy if you have physical access to the switch of course.

Chris Lyon [:clyon]

Comment 8

•

15 years ago

(In reply to comment #5) > Is that really necessary? That'd imply that we need to safe guard against > physical access to the switch. > The issue would be any host on this network could see the password, access to the switch isn't necessary, if they are on the network, they can see it. > If you can root the web server I'd bet you could insert yourself before the > stunnel. Is there enough value in doing crypto across a trusted network (in a > locked cabinet with security cameras)? It isn't a matter if you can root the web server, it is a matter of root anything which access to that network stream.

matthew zeier [:mrz]

Comment 9

•

15 years ago

Switches are more point-to-point that broadcast mediums so you'd really have to get the switch to forward unicast packets out your root'd host's switch port before you could really sniff it.

Les Orchard [:lorchard]

Assignee

Comment 10

•

15 years ago

Okay, looks like I've got a working patch that uses the registered user's credentials for self-modifying actions and passes off creation/deletion to a Gearman worker: http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-registration-patches/file/2fc85d618697/ldap-queued It turned out simple than I expected, so I expect there to be something horribly wrong with it. One thing I noticed (maybe a question for telliott) is that cluster node assignment didn't happen at account creation, instead happening at the first attempt to get a node location. Since the HTTP request to get the cluster node location is unauthenticated, and I needed credentials to assign a node, I moved that into account creation. Does anyone know if that will break anything?

Toby Elliott [:telliott]

Comment 11

•

15 years ago

(In reply to comment #10) > One thing I noticed (maybe a question for telliott) is that cluster node > assignment didn't happen at account creation, instead happening at the first > attempt to get a node location. > > Since the HTTP request to get the cluster node location is unauthenticated, and > I needed credentials to assign a node, I moved that into account creation. > Does anyone know if that will break anything? That could be an issue, yes. We use the node assignment to throttle the maximum number of users who can register for a node at any particular time (since they're the ones uploading a ton of data the first time). Returning "no node" is a valid response that tells the client to try again in a while. I'm a little concerned with the KISS violation here. That's not a comment on Les' code, which looks like a good implementation from my first glance, but more a concern that we're creating complexity to solve problems we may not have, or may be easier to solve. As best I can tell, this implementation is being driven by two issues: (1) Concern over the ability of the LDAP server to handle the load under heavy usage. This concern seems to have been mitigated by Aravind's testing and tweaking, and his identification of a couple inefficiencies that were likely causing bottlenecks. (2) Concern over use of the LDAP master password for the account operations and the dangers of a compromise here. My question is - assuming that (1) is no longer an issue, is this the best approach to solving (2)? It may be, but I see a lot of moving parts in this flow, and I feel like I want to be reassured of this before we roll all the pieces into place.

Mike Connor [:mconnor]

Comment 12

•

15 years ago

(In reply to comment #10) > One thing I noticed (maybe a question for telliott) is that cluster node > assignment didn't happen at account creation, instead happening at the first > attempt to get a node location. To add to Toby's reply here, this is something we really need/want from the client/user experience side (and we explicitly asked for this around nine months ago). This separation means that, even if available storage nodes are melting/over capacity, we're still okay, and the client will understand what to do then. If we assign a node, we'll just hammer the storage nodes until they force backoff, and that's not a great user experience.

Les Orchard [:lorchard]

Assignee

Comment 13

•

15 years ago

(In reply to comment #12) > (In reply to comment #10) > > One thing I noticed (maybe a question for telliott) is that cluster node > > assignment didn't happen at account creation, instead happening at the first > > attempt to get a node location. > > To add to Toby's reply here, this is something we really need/want from the > client/user experience side (and we explicitly asked for this around nine > months ago). This separation means that, even if available storage nodes are > melting/over capacity, we're still okay, and the client will understand what to > do then. If we assign a node, we'll just hammer the storage nodes until they > force backoff, and that's not a great user experience. Modifying an attribute for a user (ie. node location) requires some credentials - either the user's own or the root DN. But, the request to fetch the node location is unauthenticated, so I'm left with the root DN. So, what I can probably do is hit Gearman with a synchronous job to assign a node location. I'll poke at that.

Les Orchard [:lorchard]

Assignee

Comment 14

•

15 years ago

(In reply to comment #11) > (In reply to comment #10) > > My question is - assuming that (1) is no longer an issue, is this the best > approach to solving (2)? It may be, but I see a lot of moving parts in this > flow, and I feel like I want to be reassured of this before we roll all the > pieces into place. As far as I understand, #1 doesn't sound like a huge issue at this point. As for #2, this does seem like a lot of moving parts. But, I can't think of an alternative to using the root DN to create / delete accounts. LDAP ACLs don't seem to offer granting of record creation and deletion without also modification and read rights over those same records. So, if the over-powered root DN needs protecting from a potential webhead compromise, then isolating it on a protected box that exposes a limited API is the best I can think of. Gearman seems a pretty simple way to do that, and it could come in handy in the future. There are other ways, but none that aren't about equivalent or worse in complexity (eg. a private internal HTTP service) What it boils down to is: * How likely is a webhead compromise versus the effort to maintain this infrastructure, and is it a good bargain? (A security vs IT question, I think.) * Can anyone think of a better approach? I'm fine with tossing this out and trying something else.

Toby Elliott [:telliott]

Comment 15

•

15 years ago

This appears to save us from one scenario: where attacker wants to modify an item in the account without the user knowing. As it currently stands, there's no particular use for this - there's no data worth modifying. If they want to create, delete, or gain control of accounts, it's still trivially easy. With root on the box, they have access to the password reset key db after which it's a quick hack to gain control of the account and do whatever you wanted with it. It means you've changed the password (since you don't know the original, you can't switch back) and people may complain, but until then, it's undetectable. Getting the passwords into an encrypted key store would save us from most stuff, but if the box gets rooted, we're pretty doomed regardless of our approach.

Les Orchard [:lorchard]

Assignee

Comment 16

•

15 years ago

(In reply to comment #15) > If they want to create, delete, or gain control of accounts, it's still > trivially easy. With root on the box, they have access to the password reset > key db after which it's a quick hack to gain control of the account and do > whatever you wanted with it. It means you've changed the password (since you > don't know the original, you can't switch back) and people may complain, but > until then, it's undetectable. > > Getting the passwords into an encrypted key store would save us from most > stuff, but if the box gets rooted, we're pretty doomed regardless of our > approach. We batted this around a bit on IRC, and it sounds like this could be improved by: 1) putting password reset codes and expiration times into LDAP, readable only by root DN and the user's own credentials. (requires an LDAP schema change) 2) pulling password reset code generation, emailing, and verification into Gearman. But... this is starting to feel like we're backing into the reinvention of a self-service LDAP wheel someone's got lying around somewhere. It's been years since I played this much with LDAP, so I feel like I'm missing something.

Toby Elliott [:telliott]

Comment 17

•

15 years ago

(In reply to comment #16) > But... this is starting to feel like we're backing into the reinvention of a > self-service LDAP wheel someone's got lying around somewhere. It's been years > since I played this much with LDAP, so I feel like I'm missing something. At this point, why wouldn't we just proxy all commands that require master ldap access and support that subset? It's slightly (slightly) less complicated, and probably almost as secure. The asynchronous approach was for performance reasons, and nobody seems to worried about that.

Ragavan S [:rags]

Comment 18

•

15 years ago

Not trying to pile on here, but this does seem like a *lot* of changes to a pretty key part of the service, one that is likely to be exercised a lot given we are turning on marketing to get more users. If the key issue here is protecting the credentials used to create/delete accounts, do we really need async + gearman for that?

Toby Elliott [:telliott]

Comment 19

•

15 years ago

Attached file Registration security model — Details

Attaching my proposed architecture diagram. Some of you have seen this already. It keeps things relatively simple while still allowing us to lock away the master credentials.

Les Orchard [:lorchard]

Assignee

Updated

•

15 years ago

Summary: Async queue for registration actions that bind with LDAP root DN (create / delete user) → Private API for for registration actions that bind with LDAP root DN (create / delete user)

Les Orchard [:lorchard]

Assignee

Comment 20

•

15 years ago

Okay, so I've got an initial stab as a new HTTP-based API for a private admin server. Haven't requested a labs HG repo yet, so I've just checked my progress so far into my own repos: http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-registration-admin/ http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-registration-admin/file/fe21e4fe9288/1.0/index.php http://hg.mozilla.org/users/lorchard_mozilla.com/weaveserver-registration-patches/file/b4f025756e60/admin-server This may all be moot pending a review of new LDAP ACLs, but I wanted to get this stuff out there.

Ed Lee :Mardak

Updated

•

15 years ago

Target Milestone: --- → 1.3

Les Orchard [:lorchard]

Assignee

Comment 21

•

15 years ago

Attached patch Integration of reg component with private API — Details — Splinter Review

Here's a patch that ties the reg API to the private admin account API from: http://hg.mozilla.org/labs/weaveserver-registration-secure/ It should work without the API (eg. using the mysql auth driver), if a base URL for it isn't configured.

Attachment #442431 - Flags: review?

Toby Elliott [:telliott]

Comment 22

•

15 years ago

Comment on attachment 442431 [details] [diff] [review] Integration of reg component with private API Looks good. Let's get this onto stage and test it.

Attachment #442431 - Flags: review? → review+

Les Orchard [:lorchard]

Assignee

Comment 23

•

15 years ago

Pushed to hg: https://hg.mozilla.org/labs/weaveserver-registration/rev/af5b9141346f

Les Orchard [:lorchard]

Assignee

Comment 24

•

15 years ago

Going to say this is done, since pushed to hg. Fire off more bugs if problems found

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

2 years ago

Product: Cloud Services → Cloud Services Graveyard

Registration security model 15 years ago Toby Elliott [:telliott] 52.31 KB, application/pdf		Details
Integration of reg component with private API 15 years ago Les Orchard [:lorchard] 16.03 KB, patch	telliott : review+	Details \| Diff \| Splinter Review