Closed Bug 711252 Opened 14 years ago Closed 14 years ago

setup monitoring of release-drivers email list

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: joduinn, Unassigned)

References

Details

Followon from bug#711242. Today, during the FF9.0b6 release, emails to release-drivers were blocked and not being sent - we only noticed this accidentally. Emails are now being sent again but the concern is that this could happen again without warning. What monitoring is in place (or should be put in place) to make sure IT, release coordinators and RelEng are notified if this mailing list stops working or gets backlogged?
Monitoring mail is not a relops issue, moving to the correct queue.
Assignee: server-ops-releng → server-ops
Component: Server Operations: RelEng → Server Operations
QA Contact: zandr → cshields
We don't have mailing list-level monitoring in place (afaik), but it is possible to extend Nagios to do it. See ( http://exchange.nagios.org/directory/Plugins/Email-and-Groupware/Mailman/check_mailman_qfiles/details ) for example. Looping in Justdave for his feedback.
mburns: we're actually already using that. The thresholds are currently set to --warning=20 --critical=40 (minutes). This is monitoring the Mailman job queues. That this didn't alert before the bug was opened suggests that things were operating as expected. It just happened that overall mailing-list traffic was *extremely* high during the time bug 711242 (the first part) occurred, but postfix and mailman were chugging through that as expected -- mail is an asynchronous technology, after all. In fact, things were back to normal soon after the mail volume on the other list was reduced. So, this is monitored to the extent necessary already.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
In RelEng/IT meeting this morning, dustin agreed to also send nagios alerts for this to #buildduty. Per irc, this is now done. (disclaimer: There are many possible other causes for mail delays to users on that mailing list, and this is only monitoring one of the possible causes. However, at least next time, if mailman is the problem, RelEng will know about this quickly, which matters during tight timing of a release. Obviously, if a mailing list delay happens again in a different area that we are not monitoring, we'll revisit in a new separate bug when that happens.)
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.