Closed Bug 822685 Opened 11 years ago Closed 10 years ago

WAL is being written to the root disk on master02 - needs to be on a separate partition

Categories

(Data & BI Services Team :: DB: MySQL, task, P3)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Assigned: mpressman)

References

Details

(Whiteboard: [2013q4] November)

We should not be writing WAL to the root partition. 

It appears that in current worst case scenarios we need about 4.5 GB per hour of WAL we need to store in order to keep up with write load on master. So, for planning purposes, lets say we had a 24 hr maintenance window -- we'd need about 108 GB to store all the WAL.

There is a /wal partition, but it is only 50 GB, meaning we could only probably last about 12 hours. 

What are the options for increasing the size of that partition?
Blocks: 823507
We have 1.8T available on /pgdata. What about creating /pgdata/wal and point to it?
(In reply to Matt Pressman [:mpressman] from comment #1)
> We have 1.8T available on /pgdata. What about creating /pgdata/wal and point
> to it?

My hope is that we can create a separate partition and that it not be on the pgdata volume.

The reasoning is: 

WAL is a copy of data that's also written to $PGDATA/base. So essentially, we have two copies of the database data floating around. We improve our resiliency if we can write one copy to one partition, and the other copy to another. If the partitions are on separate devices, we also improve the throughput capacity on our $PGDATA/base volume, because we've moved those writes to a separate volume. 

Having the partition be distinct from the /root system means that if we run out of disk, we don't lose our operating system (potentially).
Assignee: server-ops-database → scabral
Whiteboard: [2013q3]
This is a lot of work so I'm bumping this so the folks involved have it back in their heads that this should happen this quarter.
Priority: -- → P3
Whiteboard: [2013q3] → [2013q3] September
This can't happen until we failover master02 so it's not in the load balancer, so I'm moving it to next month.
Whiteboard: [2013q3] September → [2013q4] October
Whiteboard: [2013q4] October → [2013q4] November
Blocks: 925033
Assignee: scabral → mpressman
changing puppet to use /wal rather than /var/lib/pgsql/wal_archive
I'm trying to get a sysadmin to increase the size of the /wal partition from it's current 50GB
I was able to get help from rbryce to schedule time to resize partition
We will expand the /wal partition on Wednesday, December 18th
Expansion to commence.
The host needs to be taken down to single user mode in order to resize the fs
Blocks: 952083
The host is rebuilt and back online. Final thing to do is to re-re-point to /wal
Set wal_dir to /wal/9.2 in puppet Committed revision 79975.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 941948
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.