Closed Bug 617144 Opened 15 years ago Closed 15 years ago

Need a new Socorro devdb in PHX

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: laura, Assigned: jabba)

Details

When PostgreSQL moves to PHX, we won't easily be able to sync our devdb to prod anymore. We'll need a devdb in PHX, which will free up the one in MPT. If we need to order another blade for this, do it now. Getting this in place around or not long after the migration should be fine.
I have no idea what the hardware requirements would be for this. Does it make sense to ship the current devdb down there, would it make sense to repurpose one of the 10 processors to be a devdb? can the devdb live on the same boxes as prod db?
HW requirements are light. It would depend on what level of performance you're willing to tolerate on devdb. The real limitation is that (at least in the short term) you'd need access to 300GB of storage. DevDB cannot share boxes with ProdDB. However, that does bring up one point. The new setup will involve a read-only replication slave. For uses of devdb which are just running a read-only query against the breakpad data, you could use the read-only slave instead. So, what else is DevDB used for?
Why does it need to be in Phoenix?
It needs to be in Phoenix so that we developers can have access to a full read/write snapshot of the production database. If our devdb was not co-located with the production db, it would take hours to get a snapshot.
(In reply to comment #4) > It needs to be in Phoenix so that we developers can have access to a full > read/write snapshot of the production database. If our devdb was not > co-located with the production db, it would take hours to get a snapshot. The latter's a function of transfer speed. Hours seems too long. What's the process to do a full snapshot of the production Phoenix postgres db? In San Jose, we snapshot the iscsi block device and promote it to a volume and mount it on the dev server. In Phoenix, it's local disk. I don't believe there is a process yet to snapshot the production db. But assuming it's something like LVM snapshots, then you have to transfer it off to your devdb. Transferring 300GB would take about an hour. But then that's an hour if it's local or remote (and I know we can transfer remote at fast speeds).
I will defer to jberkus on the details on how the transfer should be done. He found fault in our current method which, I believe, involved replaying logs.
Lars, Actually, it wasn't the general method which was problematic, it was the specific script used; effectively, we were replaying the logs in a risky way. Mr.Z, all: I'm assuming that we would like to automatically and regularly update devdb and stagedb without manual ops intervention. I've given some thought on how to do this, and come up with a new method where it doesn't matter where the servers are located. Especially since, as MrZ points out, it's not that much faster to synch servers at PHX given network speeds (around 1 hour locally compared to 2.5 hours over the VPN internet link). If we can assume that we still have space on the Equalogic available, then the answer is to set up a 2nd replication slave ... log-based only ... based on a blade attached to the equallogic. Then this replication slave can be SAN-cloned at any time in order to supply database images for stagedb and devdb. An extra machine ... or virtual machine ... *is* required for the replication, simply because devdb and stagedb need read-write databases and a replication slave is read-only. There are other alternatives, but they're more complicated.
So after discussing this during the weekly meeting, we have decided that: After the migration to PHX is complete, we will set up current devdb to be a read-only replication slave to the masters in PHX. The current prod DB will become the new devdb and the existing method of snapshoting the SAN volume will remain intact except it will be reversed, and we'll need Josh to help fix up the actual clone script to make it more safe. I'm closing this bug out. This should be added to the end of the migration plan, but I don't seem to have write access to the spreadsheet.
Assignee: server-ops → jdow
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.