Closed Bug 1292306 Opened 8 years ago Closed 6 years ago

Tune Linux virtual memory settings

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

Gregory Szorc [:gps]

Reporter

Description

•

8 years ago

I'm pretty sure TC workers are using the default Linux virtual memory (VM) subsystem settings. The default Linux virtual memory settings aim to strike a balance between performance and preventing data loss. Since TC workers are volatile, data loss isn't really a concern. So I argue we can aggressively tune the VM settings to achieve better performance. The documentation for the vm subsystem settings for Linux 3.13 (currently used on the Ubuntu 14.04 workers) is at https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/sysctl/vm.txt?h=v3.13.11. I think the setting we should start to aggressively tune is vm.dirty_ratio. Its current value on the workers is 20. Typically, I/O writes are buffered into memory then flushed asynchronously by flusher threads managed by the kernel. So if you write 1 MB in a process, that goes to memory (read: fast) and doesn't hit disk until a flusher thread wakes up (either after a configured time interval or until the total number of pending writes is above a configured threshold). What the vm.dirty_ratio value defines is the percentage of system memory containing pending writes after which process writes start completing synchronously instead of asynchronously. So e.g. if you are performing lots of write I/O and the background flushing threads can't keep up, your application starts slowing down because new writes will synchronously wait for flushing threads to catch up. If we raise dirty_ratio, we'll minimize the chances that new writes have to wait on flushing. The risk to raising dirty_ratio is that under periods of sustained, heavy writes we will evict useful data from memory (such as page cache and inodes cache) and have to re-read it from disk. I imagine a low dirty_ratio is impacting us during new worker/cache initialization. We have to download Docker images. We have to perform a clone and checkout of a Firefox repository. The Firefox clone alone needs say 2 GB of write I/O. So with a dirty_ratio of 20%, unless we have 10+ GB RAM, there's a chance we might be waiting for flushing. There are other concerns to increasing dirty_ratio. Parts of Firefox use SQLite. And we have SQLite configured to fsync(). Unfortunately, fsync() on Linux is notoriously poorly implemented in terms of performance: it flushes *all* pending writes to disk. So if you have 4 GB of dirty pages pending writes and an fsync() occurs, you have to wait for those 4 GB of dirty pages to flush. Ouch. That being said, hopefully background flushing has already started. So unless you are doing fsync() along with gigabytes of writes, things shouldn't be too bad.

Mike Hommey [:glandium]

Comment 1

•

8 years ago

(In reply to Gregory Szorc [:gps] from comment #0) > And we have SQLite configured to fsync(). Unfortunately, fsync() on > Linux is notoriously poorly implemented in terms of performance: it flushes > *all* pending writes to disk. IIRC, none of that is not true anymore.

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

7 years ago

I suspect this has been superseded by other worker-efficiency bugs..

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INCOMPLETE

Gregory Szorc [:gps]

Reporter

Comment 3

•

7 years ago

Actually, this is still an open issue. I started flirting with the filesystem mounting side of this in https://github.com/taskcluster/docker-worker/pull/346. But there is still a ton of tuning of the vm subsystem settings to be done.

Blocks: fastci

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

Pete Moore [:pmoore][:pete]

Updated

•

7 years ago

Component: Worker → Docker-Worker

QA Contact: pmoore

Chris Cooper [:coop] (he/him)

Comment 4

•

6 years ago

No point in doing this on AWS, but we should look at this from the outset in GCP.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 6 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: Docker-Worker → Workers

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Tune Linux virtual memory settings

Categories

(Taskcluster :: Workers, defect)

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated