Closed Bug 372009 Opened 18 years ago Closed 18 years ago

lists.mozilla.org lists are sluggish (delays of 1-2 hours)

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: justdave)

References

Details

(Whiteboard: hoped this would be fixed by the mailman upgrade - it wasn't. still investigating - have an upstream patch that I haven't tested yet.)

For the past few weeks, lists.mozilla.org lists have been quite sluggish -- there's been a long enough email delay that due to using the email end of the email/newsgroup lists I feel partly left out of the conversation since posts are frequently taking a few hours to get through. The message headers seem consistent with the slowdown being in mailman, although there are other possible explanations. For example, the message I posted to governance this morning had: Received: from notorious.mozilla.org (localhost.localdomain [127.0.0.1]) by lists.mozilla.org (Postfix) with ESMTP id AAB6958218; Tue, 27 Feb 2007 12:24:48 -0800 (PST) X-Original-To: governance@lists.mozilla.org Delivered-To: governance@lists.mozilla.org Received: from a.mail.sonic.net (a.mail.sonic.net [64.142.16.245]) by lists.mozilla.org (Postfix) with ESMTP id 839A1580B5 for <governance@lists.mozilla.org>; Tue, 27 Feb 2007 11:03:30 -0800 (PST) which accounted for all but 5 seconds of the delay between when I sent the message (11:03:28 -0800) and when the list copy arrived in my mailbox (12:24:51 -0800). I have had some messages get through promptly, though, but more often than not they seem to be delayed.
Assignee: server-ops → aravind
Thanks to dave for the help, this should now be resolved. clamd on the server wasn't running, this caused postfix to spawn a new clamscan process for every mail, and that was slowing things down.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
I posted again today, and it's even worse: Received: from notorious.mozilla.org (localhost.localdomain [127.0.0.1]) by lists.mozilla.org (Postfix) with ESMTP id A69EA588D8; Fri, 2 Mar 2007 14:52:15 -0800 (PST) X-Original-To: dev-tech-layout@lists.mozilla.org Delivered-To: dev-tech-layout@lists.mozilla.org Received: from b.mail.sonic.net (b.mail.sonic.net [64.142.19.5]) by lists.mozilla.org (Postfix) with ESMTP id E158358092 for <dev-tech-layout@lists.mozilla.org>; Fri, 2 Mar 2007 09:58:36 -0800 (PST)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
mailman was backed up again, dave restarted it this afternoon. That seemed to have cleared up the queue.
Yeah, I discovered it on my own before you filed this. It should be completely caught up again as of about 45 minutes ago. The oldest queue item is only a minute old right now. I'll continue to leave this open for the time being though... something's fishy that it's done this twice in three days.
Assignee: aravind → justdave
Status: REOPENED → NEW
s/before you filed this/before you reopened this/
Depends on: 372888
Whiteboard: needs mailman 2.1.7
the symptoms here on the back end are that the outgoing queue runner keeps crashing with an error about being unable to unlink a backup file. After 10 restarts, the main mailman daemon refuses to continue restarting the queue runner, so it just stops. I worked around this by having a cron job restart the primary mailman daemon once an hour. This isn't a good permanent fix. There is a known problem upstream affecting the *incoming* queue rather than the outgoing that has similar symptoms, and the patch for that (which is in their tracking system, but hasn't been committed yet) touches code that looks like it might help this, too. I have not yet tested it but will soon.
Whiteboard: needs mailman 2.1.7 → hoped this would be fixed by the mailman upgrade - it wasn't. still investigating - have an upstream patch that I haven't tested yet.
The patch I mentioned appears to have worked. I've had the cron job removed for 36 hours or so, and mailman has remained caught up since.
Status: NEW → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.