Additional master server for Socorro Postgres databases

RESOLVED FIXED

Status

Data & BI Services Team
DB: MySQL
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: selenamarie, Assigned: mpressman)

Tracking

Details

The strange failure of the drive in master01 brought to light a failure mode we'd like to avoid: 

* Secondary database goes offline or is unavailable
* Our master database fails and we need to fail over, or restore from a backup

We would like a third system that is the same as master01/master02 for failover. 

Because we've made RO access to the secondary system available, people are building applications that connect only to it for data. We need a third system running as a replica online at all times supporting the RO zeus node.
Summary: Additional master server for Socorro databases → Additional master server for Socorro Postgres databases
Blocks: 823507
Is this third system running as a replica this? https://bugzilla.mozilla.org/show_bug.cgi?id=813317

Does that count as n+1 or do we need another one?
(In reply to Sheeri Cabral [:sheeri] from comment #1)
> Is this third system running as a replica this?
> https://bugzilla.mozilla.org/show_bug.cgi?id=813317
> 
> Does that count as n+1 or do we need another one?

We need another one. The Reporting replica is going to be used for a different purpose - long running experimental queries and will have a different configuration.
Awesome! Wasn't sure if it could also be used as read-only. Sounds like no.

I'm all for n+1 configuration!

cc'ing Corey for hardware ordering.
(In reply to Sheeri Cabral [:sheeri] from comment #3)
> Awesome! Wasn't sure if it could also be used as read-only. Sounds like no.

It can be used as a RO, but the query timeout will be set to 5 min. The query timeout on the reporting replica is going to be much longer - possibly an hour or more.

> I'm all for n+1 configuration!
> 
> cc'ing Corey for hardware ordering.

Woot.

Comment 5

5 years ago
Blade server shipped via Fed Ex 294466306093837 ETA 2/18
Storage Blades shipped via fed Ex 812085414085796 ETA 2/15
Server hard drives (2) shipped via Fed Ex 812085414085864 ETA 2/15
Storage blade hard drives (12) shipped via Fed Ex 294466306090096 ETA 2/19
Depends on: 844861
Removing Rich - the server is now racked, so it's ready for SRE kickstarting and then puppetizing. Matt - I'm leaving this in your hands.
(Assignee)

Updated

5 years ago
Depends on: 845600
Assignee: server-ops-database → mpressman
(Assignee)

Comment 7

5 years ago
This is up and running as socorro3.db.phx1 and is in puppet
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.