Closed Bug 1201458 Opened 10 years ago Closed 10 years ago

mailman2.mail.scl3.mozilla.com alerted fro swap

Categories

(Infrastructure & Operations :: Infrastructure: Mail, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Usul, Unassigned)

Details

Clamav was eating all the swap. I've restrated clamav. This resulted in a load spike. I Then restarted amavisd That didn't help. i've tried looking into logs on what was causing the spike but was unable to find anything useful (looked at messages/maillog), Peeked in /var/log/clamav and mailman).
Alerts recovred. But still would be nice to have a quick where to look next time comment.
Got Thu 03:43:44 PDT [5041] mailman2.mail.scl3.mozilla.com:mailman is WARNING: in has 32 tasks, oldest 16.1 mins (http://m.mozilla.org/mailman) restarted mailman nagios-scl3, recheck 5041 <nagios-scl3> Usul: rechecking all services on host mailman2.mail.scl3.mozilla.com <nagios-scl3> Thu 03:48:15 PDT [5044] mailman2.mail.scl3.mozilla.com:qrunner - procs is CRITICAL: PROCS CRITICAL: 1 process with regex args qrunner (http://m.mozilla.org/qrunner+-+procs) <nagios-scl3> Thu 03:48:16 PDT [5045] mailman2.mail.scl3.mozilla.com:mailman is CRITICAL: in has 40 tasks, oldest 20.5 mins (http://m.mozilla.org/mailman) (In reply to Ludovic Hirlimann [:Usul] from comment #1) > Alerts recovred. But still would be nice to have a quick where to look next > time comment. and my bad https://mana.mozilla.org/wiki/display/SYSADMIN/Check+The+Logs#CheckTheLogs-Email
restarted postfix. Created https://mana.mozilla.org/wiki/display/NAGIOS/qrunner+-+procs so next time we have the alert we'll probably have some docs.
Var/log/mailman/error : Sep 03 10:45:55 2015 mailmanctl(23681): The master qrunner lock could not be acquired because it appears as if another master qrunner is already running. Sep 03 10:45:55 2015 mailmanctl(23681): Sep 03 10:49:08 2015 (28454) subscribe: No such list "community-b2g": Sep 03 10:49:31 2015 (28885) subscribe: No such list "community-games": Sep 03 10:51:12 2015 (30391) subscribe: No such list "community-web-standards": Sep 03 10:53:32 2015 (32746) listinfo: No such list "dev-tech-css":
Limed came online and fixed a bunch of stuff. 1 hour later I got the following : Thu 05:05:36 PDT [5063] mailman2.mail.scl3.mozilla.com:mailman is CRITICAL: in has 342 tasks, oldest 45.5 mins (http://m.mozilla.org/mailman)
Should all be fixed now
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.