The strange failure of the drive in master01 brought to light a failure mode we'd like to avoid: * Secondary database goes offline or is unavailable * Our master database fails and we need to fail over, or restore from a backup We would like a third system that is the same as master01/master02 for failover. Because we've made RO access to the secondary system available, people are building applications that connect only to it for data. We need a third system running as a replica online at all times supporting the RO zeus node.
6 years ago
Summary: Additional master server for Socorro databases → Additional master server for Socorro Postgres databases
6 years ago
Is this third system running as a replica this? https://bugzilla.mozilla.org/show_bug.cgi?id=813317 Does that count as n+1 or do we need another one?
(In reply to Sheeri Cabral [:sheeri] from comment #1) > Is this third system running as a replica this? > https://bugzilla.mozilla.org/show_bug.cgi?id=813317 > > Does that count as n+1 or do we need another one? We need another one. The Reporting replica is going to be used for a different purpose - long running experimental queries and will have a different configuration.
Awesome! Wasn't sure if it could also be used as read-only. Sounds like no. I'm all for n+1 configuration! cc'ing Corey for hardware ordering.
(In reply to Sheeri Cabral [:sheeri] from comment #3) > Awesome! Wasn't sure if it could also be used as read-only. Sounds like no. It can be used as a RO, but the query timeout will be set to 5 min. The query timeout on the reporting replica is going to be much longer - possibly an hour or more. > I'm all for n+1 configuration! > > cc'ing Corey for hardware ordering. Woot.
Blade server shipped via Fed Ex 294466306093837 ETA 2/18 Storage Blades shipped via fed Ex 812085414085796 ETA 2/15 Server hard drives (2) shipped via Fed Ex 812085414085864 ETA 2/15 Storage blade hard drives (12) shipped via Fed Ex 294466306090096 ETA 2/19
Removing Rich - the server is now racked, so it's ready for SRE kickstarting and then puppetizing. Matt - I'm leaving this in your hands.
This is up and running as socorro3.db.phx1 and is in puppet
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.