Closed Bug 442140 Opened 16 years ago Closed 16 years ago

Read-only access to production db

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: fligtar, Assigned: xb95)

Details

(Whiteboard: needs VM)

Can we get a user with read-only access to production AMO only accessible from khan?

It would help greatly in debugging all of the site issues we're seeing.
What will you find out from the production database that you can't find out from another database running a production dump?
Basil is trying to debug issues changing a nickname that isn't saving. We are trying to determine whether the problem has to do with cache. That's the current example, but there have been many times when we've needed to debug something or help an add-on developer debug something that we don't want to wait 2 days before we can query the database.
Production no, but probably to the slave database that we use to do backups.

Mark - is that possible?  Any implications to be aware of?  Allow khan to only access mrdb-backup01:amo ?
Assignee: server-ops → mark
mrdb-backup01 does not listen on any ports for security.  It only accepts connections from localhost, so people have to SSH in.  I'm not very comfortable giving someone full access to the backup system.

That also won't help them with the issues they've been asking oremj and I to look at, anyway.  They've been having us look at the master and all of the slaves to account for replication issues, and just looking at the backup server won't allow them to get that particular viewpoint.

Also, the backup server is usually behind by an hour or two depending on the time of day.  It's not really suitable to this kind of debugging.
Let's delineate the requirements here:

* be able to perform read only queries against the master and all slaves
* be able to see the replication delay of the slaves
* realtime access to the above, the nightly backup is not good enough

Is this accurate?
Status: NEW → ASSIGNED
The only one I'm really asking for is the third requirement. The first two aren't necessary.
The third one says 'to the above', so it implies the first two.  I'm so confused now.  What do you need?  Access to a single slave?  To the master?  And what good is that if you can't see the replication lag?
I just want real-time access to the data for debugging site issues. Last night we were trying to see if the slaves were out of sync, but that's a rare case. Usually we just want to be able to look up something about an add-on that was just submitted that is having problems or get stats more quickly.

The real-time access can be to a slave - the tiny delay between master and slave isn't significant for most issues we come across.
this isn't something we can do - dont want to give devs access to production dbs.  please work with mark for a better way to either debug together or get non-production access.
Devs write the scripts and commands that run on production DBs. How is this going to hurt anything if it's read-only access?
For this problem, I don't know why it's blocked on getting access to the prod DB.  If we can't reproduce it in dev when we have a memcache install running, I don't know what prod debugging is going to change.  For stuff like this if we had tests to verify the save/edit action we'd have caught this before it was deployed.  So it seems to me that we're tackling it backwards and should work harder on it in a dev environment and then write tests for it.

If this save bug is a priority item, we can put more effort towards debugging it on khan?  Have you asked anybody else to take a look at it?
Where's the bug for the actual issue?  Can you cc: me?
I will add, though, that if it were possible to just set up a central server that had read-only access to replicated prod databases, it'd save us the time of re-importing everything from daily dumps and also let us debug more-or-less real-time.

So maybe fligtar is asking for that, more or less.  There's a difference between that and logging into the _actual_ production master database.
Not really.  Production requires MySQL to have credentials on the master.  If I setup a replicated slave and gave people access, you'd have de facto access to the master.  (cat /var/lib/mysql/master.info, mysql -hmrdb03 -uWHAT -pWHAT addons_remora, done.)

We'd also have to setup the iptables to allow the TCP between these two.  Which then is a separate security issue, too.

There is no way to do replication without this sort of issue cropping up.  Sure, we can make the account used read only, but still.  You'd have access to the master if we setup a replication slave.
To be clear, I'm all for solving this problem.  I just don't see a way that satisfies the requirements of "no access to production" and "real-time data" within the confines of our existing setup.  Open to ideas!
We could integrate phpMyAdmin into AMO.
phpmyadmin with a read-only account behind ldap or even vpn+ldap on AMO would be great.
I think phpMyAdmin is somewhat overkill, but I don't see why I can't make some sort of page behind VPN+LDAP that would let people run queries.  It solves the access problems.

Is that really the best solution?  It still rubs me somewhat the wrong way...
No one has replied to my comment #10 - what specific issues is IT concerned about with read-only queries?
Mostly, it's a policy issue - production is IT only.

As to what issues I'd be concerned with... I'm sure we've all written queries that were so bad the database took 20 minutes to finish them.  If you do that on a production database, you might have just taken down the site.  That'd be the main issue I'd worry about in this particular situation.

(Yes, I'm aware that with the size of the tables in question, and the kinds of queries you've talked about, this is improbable.  Just mentioning it.)
Another example where we really need this capability is in the editor's queue. The remora refresh is a day old and doing day-old queries creates misleading data.

Ping'ing to see if we can get this moving please.
Seems like you're missing key info on the editor's queue that should be a page in the control panel.  If there is data that is important, why not make a page for it?
so again, no to production access.  we'll need to figure out another way to work together to get you the data you need or come up with a slave/dump solution.  mark or fligtar - can you guys get together and come up with another solution?  
If we had a dedicated slave replicating from the production database(s), and only have MySQL read only user access (avoiding Mark's problem in 14, don't grant shell access), does that solve the problem?

QA Contact: justin → mrz
That solves most of the issues to me.  Caveat to that idea is that we'd have to allocate hardware and that MySQL doesn't support multiple masters.  So, if you want to run a setup like this, you can only replicate from one of the clusters.  (Unless we spend a while making a really convoluted setup with multiple MySQL instances running on one box...)

Do you need access to more than one cluster or AMO only?
(In reply to comment #25)
> Do you need access to more than one cluster or AMO only?
Why stop at third base? 

Laura's suggestion was what I had intended all along; I guess I didn't articulate it well enough.

I'm only requesting access to AMO in this bug but if webdev wants others (what I took comment #26 to mean) then sure. Although it would be nice to have AMO setup first and then work on adding the others later rather than delay everything if it's going to be a big project.
Whatever works for fligtar.
Actually - Mark - we don't need this for anything but AMO.  So don't worry about setting it up for other apps.  In all other cases the db dumps are fine.  We'll let you know if that changes, though.  Thanks for your help.
Whiteboard: needs VM
mrz: any objections to me building one blade to replicate AMO for webdev?

This might be doable on a VM, too, if I configure MySQL to not fsync every transaction.  If you think it'll work, I'll try that first.
Short on hardware and I don't think we care about performance on this, do we?  Start with a VM and if it's unworkable we'll look at a physical machine.
Okay.  tm-amo01-webdev01 is now setup and replicating, access has been granted to people logging in from khan, using the username/password.  You can access the account like this:

$ mysql -htm-amo01-webdev01 -uwebdev -p addons_remora

Then type the password.  (Which has been separately emailed to morgamic and fligtar, they can distribute it as appropriate.)

A few notes:

* This machine is a VM, it may not have the performance necessary to keep up with peak load times.

* We are monitoring it as if it were a production server, so we should know if that is the case.  (It will page if replication falls behind or stops.)

* The 'webdev' account is read-only, it can only do SELECT queries against the addons_remora database.  You shouldn't need to do anything else.

* There isn't a lot of RAM on the VM.  512M.  This means that MySQL is not going to be able to fully cache all of the indexes, so please expect any crazy queries to be a little rough.

* Given the access setup, there is no way for webdev to presently see how far behind replication is.  If you absolutely need this functionality, let me know and I'll set something up.  (But hopefully you'll notice if it's really behind because it will start paging in #sysadmins...)

* InnoDB is tuned to be easy on the disks.  This means that if the VM crashes then it's likely the database will become corrupted and have to be re-initialized.  This is just mostly an FYI, I guess. 

I'm going to resolve this bug.  If there are problems with this setup, if it doesn't work, or something else, please reopen and let me know.  If it ends up that the VM is unable to cope with the load, we will look into getting a hardware solution, or something else.  :)
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
I'm a little late due to being OOTO, but thanks for setting this up!
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.