Closed
Bug 769408
Opened 13 years ago
Closed 13 years ago
release-drivers email monitoring
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: rbryce)
References
Details
Per bug 767559 comment 4; this bug tracks putting the monitoring in place. Filing since there was no response to bug 767559 comment 5.
Pasting rbryce's comment here:
tl;dr mailman's archive process was blocked on i/o causing further delays in mail handling.
The parent mailman qrunner process was hung up by 4 children archive processes that were blocked by i/o apparently. As the subsequent qrunner threads spawned handling the mailmail posts they also processed mails that the parent qrunner thread had not removed from queue. The main qrunner process was to blame for the delayed mail and not being able to timely remove the from the queue, the children are to blame for the dupe emails.
This also caused *deferred* emails to queue longer than normal, further delaying some email. These are almost entirely comprised bounces messages from spam. I manually purged the deferred mail queue after I restarted mailman and postfix.(manually verified before purging) This cleared the hung qrunner thread and all seems to back to normal.
The nagios check I have discussed with others will be to measure the age of mailman processes to hopefully detect this problem going forward. Also, I am not sure what the disk usage was when this started, but I suspect insufficient disk space may have caused this problem or at least exacerbated the issue.
Updated•13 years ago
|
Assignee: server-ops-infra → server-ops
Component: Server Operations: Infrastructure → Server Operations
QA Contact: jdow → phong
Comment 1•13 years ago
|
||
These are already setup at - https://ganglia-scl3.mozilla.org/ganglia/?c=zlb.ops.scl3
Assignee: server-ops → ashish
Comment 2•13 years ago
|
||
argh, wrong bug. please ignore #c1.
Updated•13 years ago
|
Assignee: ashish → server-ops
Assignee | ||
Updated•13 years ago
|
Assignee: server-ops → rbryce
Comment 3•13 years ago
|
||
rbryce: where does this fit in your work queue?
Assignee | ||
Comment 4•13 years ago
|
||
This is implemented. We are checking the postfix queue as well as the mailman process.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•