How is recovery time impacted by compression of Socorro base backups?

RESOLVED FIXED

Status

Data & BI Services Team
DB: MySQL
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: selenamarie, Assigned: sheeri)

Tracking

Details

Our current estimates for a recovery from a complete database loss from a base backup is about 2.5 hours. 

Do you have an estimate of how much longer a recovery will take from a compressed base backup?
(Assignee)

Comment 1

5 years ago
Can you break down that timing? Specifically, I'm asking is the starting point from when the backup is on the machine, or does it take into consideration the time to copy the backup to the machine where it's being restored? copying <250Gb takes a lot less time than copying 1.2Tb, so there's time savings there.

Will you want the estimates for a complete recovery for a compressed pg_dump too? (pg_dumps were always compressed)

What's your timeframe on getting this information? Since we clarified what was going on yesterday, I wasn't planning on testing another restore so soon.

Also, it's in the works in Q4 to test out how well Data Domain might store the backups (base and pg_dump) because it can store things uncompressed very efficiently. So I'm not sure how much time you want us to spend on this right now, since very likely in a few months compressed vs. uncompressed isn't an issue.

Comment 2

5 years ago
The question is about the incremental time to decompress - it's a pretty large dataset, so I'm assuming it might add several hours. It doesn't seem like it would take long in human time to find out - just have to kick it off and see how long that takes, right?

It's really good to know this in the event of a failure, as it helps us plan.
(Assignee)

Comment 3

5 years ago
Decompressing + unarchiving (done with tar -zxvf, to get it to where the uncompressed backups are) took 3.2 hours, on the backup machine (but the backup machine is not as powerful as the the production machines).

Also, please remember that we're now n+1 with the socorro systems, so in case we need to rebuild one machine, we have a spare one at the ready, which should cover many (but not all) disaster scenarios.
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 4

5 years ago
Thanks Sheeri!
(Assignee)

Updated

5 years ago
Assignee: server-ops-database → scabral
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.