Now that we've brought the windows machines online, we're handling many more machines. We need to modify the allocated resources and the nagios check to accommodate this. 1) The papertrail config file should have the number of Max sessions large enough that it can handle the full load of machines (~1500) if we lose one of them: module(load="imtcp" MaxSessions="2000" KeepAlive="on") 2) The number of open files the process can handle needs to be the number of ports + 1000, which means modifying /etc/security/limits.conf: root soft nofile 4096 root hard nofile 8192 3) The nagios check should be increased to allow 1000 sessions for a warning (with round robin, this is significantly more than half the load). With 1600 as critical (something has clearly gone wrong since that's more than the total number of clients)
Created attachment 8613536 [details] [diff] [review] log-aggregator-resources.diff This bumps up the limits in AWS, too, but that shouldn't be an issue.
Attachment #8613536 - Flags: review?(dustin)
Attachment #8613536 - Flags: review?(dustin) → review+
For whatever reason, the modifications to /etc/security/limits.conf don't seem to be taking effect.
Unfortunately we don't have the cycles to work on this. Added another rsyslog server to compensate.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.