Closed
Bug 891906
Opened 12 years ago
Closed 12 years ago
mailing-list / newsgroup mirroring is broken
Categories
(Infrastructure & Operations :: Infrastructure: Mail, task)
Infrastructure & Operations
Infrastructure: Mail
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dbaron, Assigned: justdave)
References
Details
(Whiteboard: affected lists listed in comment 12)
https://groups.google.com/forum/#!topic/mozilla.dev.platform/UCio5fB4VJo was posted to dev.platform nearly 24 hours ago but has not appeared to subscribers who read dev-platform as a mailing list.
This is a blocker for communication within the project -- it prevents people from sending and receiving information and knowing who has received information, and needs to be fixed immediately (or at the very least announced as an unexpected outage to all@, etc.)
Updated•12 years ago
|
Assignee: server-ops → infra
Severity: blocker → normal
Component: Server Operations → Infrastructure: Mail
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → limed
Comment 1•12 years ago
|
||
David,
Thanks for bringing this to our notice. We'll take a look and let you know what we find out.
Comment 3•12 years ago
|
||
This appears to be specific to dev-platform: I continue to receive email from dev-planning and a few other lists correctly. As moderator of dev-platform I have checked the mailmain subscription info for myself and several others who are not receiving mail, and everyone is subscribed correctly.
Assignee | ||
Comment 4•12 years ago
|
||
That link goes to an entire thread, was there a specific message that didn't go through, or all messages in that thread or?
Reporter | ||
Comment 5•12 years ago
|
||
All messages in that thread.
Comment 6•12 years ago
|
||
I think it's all messages to dev-platform since 9-July.
Assignee | ||
Comment 7•12 years ago
|
||
OK, I've determined that the news gateway process is skipping this mailing list when checking for new newsgroup messages for some reason. I haven't yet found any configuration differences between it and any of the other lists to determine why, and it's not logging any errors (it's just not doing it to begin with).
I'm continuing to play with it...
Assignee: infra → justdave
Assignee | ||
Comment 8•12 years ago
|
||
fwiw, the fact that it's outright skipping the group when checking for messages gives me high hopes that the missing messages will all go through at once when we get it fixed...
Assignee | ||
Comment 9•12 years ago
|
||
OK, this appears to be fixed now. The root cause sickens me. :(
mozilla.community.hungary group has corrupted pointers on giganews' servers, and as best as I can tell, has since May 4th, 2013 (because that's when this appears to have broken). Giganews is claiming there are 2.15 billion new messages in that newsgroup, and mailman was running out of memory trying to create a data structure to grab the headers for that many messages, causing it to crash, and failing to sync any newsgroups that came after it in the run order of the news gateway script.
There are a *LOT* of incoming messages from the news side getting pulled in and re-sent to the mailing lists right now. The script is still running. I'll post back here with a complete list of the affected mailing lists as soon as it's done (there were more than just this one).
The only way we would have caught this is monitoring mailman's crash logs. This has typically been an unpalatable thing to monitor, because it crashes a lot, and 99% of the crashes are completely innocuous things that we wouldn't actually care about, and would only cause us to start ignoring the alerts anyway.
Assignee | ||
Comment 10•12 years ago
|
||
FWIW, this was fixed by telling mailman to perform a one-time mass catchup on community-hungary, telling it to ignore those 2.15 billion pending new messages in that group.
Assignee | ||
Comment 11•12 years ago
|
||
It's still running (a lot of catching up to do).
In the meantime we are brainstorming on IRC about ways to detect if this starts failing again in the future.
Assignee | ||
Comment 12•12 years ago
|
||
OK, it's done. Of the 214 total mailing lists we have gatewayed to news.mozilla.org, the following 74 lists were affected by this issue:
bugmasters
community-games
community-india
community-ireland
community-mexico
community-switzerland
community-tunisia
community-turkey
dev-apps-bugzilla
dev-apps-calendar
dev-apps-chatzilla
dev-apps-firefox
dev-apps-seamonkey
dev-apps-thunderbird
dev-b2g
dev-builds
dev-gaia
dev-identity
dev-js-sourcemap
dev-l10n
dev-l10n-de
dev-l10n-fa
dev-l10n-in
dev-l10n-new-locales
dev-l10n-pt-br
dev-l10n-sr
dev-l10n-ta
dev-l10n-vi
dev-l10n-web
dev-mdc
dev-mdc-es
dev-mdn
dev-mozilla-org
dev-pdf-js
dev-platform
dev-popcorn
dev-ports-os2
dev-privacy
dev-security-policy
dev-shumway
dev-static-analysis
dev-tech-crypto
dev-tech-dom
dev-tech-js-engine
dev-tech-js-engine-internals
dev-tech-layout
dev-tech-plugins
dev-tech-svg
dev-tech-xml
dev-tech-xpcom
dev-tech-xul
dev-tree-management
dev-webapi
dev-webapps
dev-webdev
general
governance
mozillians
privacy
reps-general
reps-mentors
reps-webdev
support-bugzilla
support-other
support-seamonkey
support-thunderbird
support-webtools
test
tools
tools-l10n
webapps
webmaker
webmaker-canada-bc
wishlist
The missed messages have all now been downloaded from the news server, and are spooling out to the mailing lists now.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•12 years ago
|
Whiteboard: affected lists listed in comment 12
Assignee | ||
Comment 13•12 years ago
|
||
bug 892051 has been filed to track our progress on coming up with a way to monitor for this in the future.
Reporter | ||
Comment 14•12 years ago
|
||
Could you send an unexpected downtime notice explaining this, so that people understand what happened? It's important both for the folks on the mailing list side receiving a flood of messages, and for the folks on the newsgroup side who need to understand that everything they've posted to these lists for the past few months has only been read by a part of the expected audience.
Assignee | ||
Comment 16•12 years ago
|
||
I filed a ticket with Giganews this afternoon about the mozilla.community.hungary newsgroup pointers. They replied back that they were unable to resolve the issue without deleting and re-creating the newsgroup from scratch. As best as I could tell prior to them doing so (from using an NNTP reader client) there was only one real message on that newsgroup anyway (and the pointer issue may have been preventing people from using it).
Comment 18•12 years ago
|
||
shyam/justdave: this issue was reported in bug 877134 on 29th May. dbaron reported it again 20 hours ago, and it is now fixed. For future reference, what special magic did he apply to get such prompt and excellent service, that all the people CCed on bug 877134 could use next time they have a discussion forum problem? :-)
Gerv
Reporter | ||
Comment 19•12 years ago
|
||
While I'm not them, I'd note two factors:
(1) using a bug summary that was a reasonably accurate description of the actual problem. Also see http://dbaron.org/log/20100426-bug-summary .
(2) mentioning that the problem was an issue related to newsgroup -> mailing list mirroring rather than a purely mailing list issue (which was never explicitly mentioned in bug 877134, although I'd think it should have been considered)
You need to log in
before you can comment on or make changes to this bug.
Description
•