Tune Linux virtual memory settings

REOPENED
Unassigned

Status

Taskcluster
Worker
REOPENED
2 years ago
2 months ago

People

(Reporter: gps, Unassigned)

Tracking

(Blocks: 1 bug)

Details

(Reporter)

Description

2 years ago
I'm pretty sure TC workers are using the default Linux virtual memory (VM) subsystem settings.

The default Linux virtual memory settings aim to strike a balance between performance and preventing data loss.

Since TC workers are volatile, data loss isn't really a concern. So I argue we can aggressively tune the VM settings to achieve better performance.

The documentation for the vm subsystem settings for Linux 3.13 (currently used on the Ubuntu 14.04 workers) is at https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/sysctl/vm.txt?h=v3.13.11.

I think the setting we should start to aggressively tune is vm.dirty_ratio. Its current value on the workers is 20.

Typically, I/O writes are buffered into memory then flushed asynchronously by flusher threads managed by the kernel. So if you write 1 MB in a process, that goes to memory (read: fast) and doesn't hit disk until a flusher thread wakes up (either after a configured time interval or until the total number of pending writes is above a configured threshold).

What the vm.dirty_ratio value defines is the percentage of system memory containing pending writes after which process writes start completing synchronously instead of asynchronously. So e.g. if you are performing lots of write I/O and the background flushing threads can't keep up, your application starts slowing down because new writes will synchronously wait for flushing threads to catch up.

If we raise dirty_ratio, we'll minimize the chances that new writes have to wait on flushing. The risk to raising dirty_ratio is that under periods of sustained, heavy writes we will evict useful data from memory (such as page cache and inodes cache) and have to re-read it from disk.

I imagine a low dirty_ratio is impacting us during new worker/cache initialization. We have to download Docker images. We have to perform a clone and checkout of a Firefox repository. The Firefox clone alone needs say 2 GB of write I/O. So with a dirty_ratio of 20%, unless we have 10+ GB RAM, there's a chance we might be waiting for flushing.

There are other concerns to increasing dirty_ratio. Parts of Firefox use SQLite. And we have SQLite configured to fsync(). Unfortunately, fsync() on Linux is notoriously poorly implemented in terms of performance: it flushes *all* pending writes to disk. So if you have 4 GB of dirty pages pending writes and an fsync() occurs, you have to wait for those 4 GB of dirty pages to flush. Ouch. That being said, hopefully background flushing has already started. So unless you are doing fsync() along with gigabytes of writes, things shouldn't be too bad.
(In reply to Gregory Szorc [:gps] from comment #0)
> And we have SQLite configured to fsync(). Unfortunately, fsync() on
> Linux is notoriously poorly implemented in terms of performance: it flushes
> *all* pending writes to disk.

IIRC, none of that is not true anymore.
I suspect this has been superseded by other worker-efficiency bugs..
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → INCOMPLETE
(Reporter)

Comment 3

2 months ago
Actually, this is still an open issue. I started flirting with the filesystem mounting side of this in https://github.com/taskcluster/docker-worker/pull/346. But there is still a ton of tuning of the vm subsystem settings to be done.
Blocks: 1271162
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
You need to log in before you can comment on or make changes to this bug.