Closed Bug 1429555 Opened 6 years ago Closed 6 years ago

[ops infra socorro] set up roles in new databases

Categories

(Socorro :: Infra, task, P1)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: willkg, Assigned: miles)

References

Details

We have a script for setting up most db things including populating lookup tables with environment-appropriate data.

What we don't currently have is a script that sets up roles.

This bug covers figuring out what roles we want and writing a script to generate those. Plus we might need to change the create and data scripts accordingly.
I'm going to grab this and do it soon.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P1
Interestingly, 6 days ago, I deleted a sql/roles.sql script that was sitting around in the repo. That script dates back to 2013. You can see it here:

https://github.com/mozilla-services/socorro/blob/4f332b6dbbbfb4347511a107c05563dcca2cd952/sql/roles.sql

That script defines a handful of roles: "analyst", "breakpad", "breakpad_ro", "breakpad_rw", "breakpad_metrics", "processor", "monitor", "monitoring", "nagiosdaemon", "ganglia", "replicator", "django", "test", and "socorro".

The -prod database has the following roles defined: "analyst" and "breakpad_rw".

The setupdb_app sets up "analyst", "breakpad", "breakpad_ro", "breakpad_rw", and "monitor" roles of which only breakpad_ro and breakpd_rw can log in.

In the past, we used to use roles to restrict read/write permissions to data for a bunch of systems/people that interacted/looked-at the data via the db. As far as I understand it, we no longer support that. All data access is done through the webapp now.

Given all that, I think we just need to create the "breakpad_rw" and "analyst" roles. We can probably write up a bug for dropping the "analyst" role altogether, too. I'll write a script for this.

We should probably fix setupdb_app, but we can do that in a different bug.
I talked with Lonnen and Miles. Lonnen says we don't use the analyst role anymore, so we can nix that.

I think what we're going to do is Miles is going to create the breakpad_rw role for -stage-new and -prod-new, then run the scripts. When the breakpad_create.sql script runs, it'll output ERROR lines for anything to do with the analyst role--that's ok.

This way we end up with a role for all the services to use to access the database that doesn't have superuser access to the database.

Given that, I'm going to pass this to Miles.
Assignee: willkg → miles
I've created breakpad_rw roles in both -new-stage and -new-prod and have switched all instance types to use them. It seems like things are going fine in -new-stage based on the error rate, but there are still a lot of database permissions errors in -new-prod Sentry.

Given that this bug covers -new-stage explicitly, and those roles are configured and appear to be working, I'm marking this fixed.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Reopening this because we're having problems with roles.

Per conversation yesterday, Miles is going to look into whether the superuser role that AWS RDS gives us is watered down in ways that are breaking our create scripts.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tagging Miles with a needsinfo for figuring out superuser roles in rds per comment #5.
Flags: needinfo?(miles)
Yes, the rds_superuser role is watered down. Unlike the normal postgres superuser, it can't modify things it doesn't own. So, when we were running breakpad_create.sql as the socorro role (which is an rds_superuser), we created tables and relations and such and then changed their ownership to breakpad_rw - and later attempts to modify those failed with permissions errors.

So, a quick summary based on some hacking that osmose and I did today in -new-prod:

We skirted this problem by moving all transactions that set the ownership of something to breakpad_rw to the very end of the script.

We were then able to run breakpad_data.sql as breakpad_rw without issues.

We then manually ran create_weekly_tables.py [0] to create the relations that the processor had been complaining about missing [1].

End result: the crontabber appears to be happy in -new-prod.

[0] https://github.com/mozilla-services/socorro/blob/master/docker/create_weekly_tables.py
[1] https://sentry.prod.mozaws.net/operations/socorro-new-prod/issues/1327525/
Flags: needinfo?(miles)
On IRC, Miles said the following:

1. For projects, ops uses the superuser account to create a r/w account and then uses that to generate tables and other database structures.

2. For Socorro, we'll mirror that by using the superuser account to create a r/w account and then use that to run the breakpad_create.sql and breakpad_data.sql scripts.


Given that, I'll adjust the pg_dump scripts accordingly and generate a new set of scripts. That should be pretty easy. Pretty sure it involves adding a couple more flags to the pg_dump calls.

Parallel to that conversation, Lonnen and I had a conversation about how much of the db is deprecated and we're removing a lot of it in the near term and we might want to just clone it instead of trying to build it from scratch which has taken us many iterations and a lot of time. We're probably better off spending the dev time on pruning the db so there's less to migrate and not on creating and debugging db setup but instead pruning what's in the db.

We've talked about that before, but right now I think that's how I want to proceed. There are a couple of *BIG* tables that we could get rid of and reduce the db size significantly. I posit we could do a little work and be at the point where cloning the db is a much better option.
I fixed up the scripts and tweaked them a bit to make it clearer what happens. The updated scripts for the -new-stage environment are on the -stage admin node in the ~/willkg/ directory.

Miles: You were thinking you'd drop the RDS instance altogether for -new-stage and start over. I like that idea. Then you'd do something like this:

1. Edit the 0_breakpad_roles.sql file fixing the password.
2. Use the RDS super user account to run:

   2.1. 0_breakpad_roles.sql -- creates the breakpad_rw role
   2.2. 1_breakpad_db.sql -- creates the breakpad db

3. Use the breakpad_rw account to run:

   3.2. 2_breakpad_create.sql -- creates db things
   3.3. 3_breakpad_data.sql -- inserts all the data

I'm pretty sure that should work. Let me know how it goes. If you want, we can vidyo and do it together (this seems like a really good idea).

Once we've got -new-stage working, I'll update the scripts on the -prod admin node.
Flags: needinfo?(miles)
We re-created the database in -new-stage from scratch today.

After some iteration, Will and I decided on the following procedure:

1. Edit the 0_breakpad_roles.sql file fixing the password.
2. Use the RDS super user account to run:

   2.1. 0_breakpad_roles.sql -- creates the breakpad_rw role
   2.2. 1_breakpad_db.sql -- creates the breakpad db
   2.3. 2_breakpad_create.sql -- creates db things
   2.4. 3_breakpad_data.sql -- inserts all the data
   2.5. grant access on the database, tables, sequences, and functions to breakpad_rw

We ran into permissions issues, because even with all those grants, breakpad_rw needs to be the _owner_ of relations to do things with them.

I ran pg_dump again, the same as done to generate 2_breakpad_create.sql except for:
  - I removed the `--no-owner` option, so we would get change of ownership ALTERs
  - piped to `grep 'OWNER TO "breakpad_rw"'`
  - redirected output to 4_alter_ownership.sql

This (messy approach) generated a file only containing ownership changes, which we can run as step 2.6.

   2.6. run 4_alter_ownership.sql to change ownership of types/functions/tables in breakpad to breakpad_rw

After this, I ran /app/docker/create_weekly_tables.py in an app container with breakpad_rw credentials to generate relevant tables for the processor. Hence:

3. Use the breakpad_rw account to run:

   3.1. create_weekly_tables.py to create required weekly tables

So, my amended procedure:

1. Edit the 0_breakpad_roles.sql file fixing the password.
2. Use the RDS super user account to run:

   2.1. 0_breakpad_roles.sql -- creates the breakpad_rw role
   2.2. 1_breakpad_db.sql -- creates the breakpad db
   2.3. 2_breakpad_create.sql -- creates db things
   2.4. 3_breakpad_data.sql -- inserts all the data
   2.5. grant access on the database, tables, sequences, and functions to breakpad_rw
   2.6. run 4_alter_ownership.sql to change ownership from socorro to breakpad_rw

3. Use the breakpad_rw account to run:

   3.1. create_weekly_tables.py to create required weekly tables
Flags: needinfo?(miles)
Oh, and my modified version of pgdumpme.sh and related files is in /home/centos/miles on the stage admin node.
I'm pretty sure we're abandoning this approach to creating the -new-stage and -new-prod databases and we're going to go with dump-and-restore.

Given that, I'm going to close this out as WONTFIX.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.