Closed Bug 814207 Opened 12 years ago Closed 3 years ago

mailing list / newsgroup mirroring should be more precise about list name substitution to avoid message duplication and thread splitting

Categories

(Infrastructure & Operations :: Infrastructure: Mail, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: dbaron, Assigned: limed)

References

Details

(Whiteboard: [2013Q1])

[ I thought I wrote this up before, but maybe it wasn't in Bugzilla, or maybe I'm just not searching Bugzilla correctly. ]

There are a bunch of problems with our mailing list + newsgroup pairs that could be fixed by better rewriting of message headers during the process of mirroring messages between mailing lists and newsgroups.  (Some of the same problems occur when people use Google Groups mailing lists to access the newsgroups, though, and that may be harder to fix on our end.)

My understanding is that the current setup is:
 * When a message is sent to the newsgroup of a pair and mirror to the mailing list, it is sent to the mailing list with the Cc: header untouched and the To: header replaced by the mailing list address
 * (I'm less sure about this part) when a message is sent to the mailing list of a pair (mailing list + newsgroup), the message sent to the newsgroup gets a Newsgroups: header added, but To: and CC: headers are untouched

This setup leads to the two common problems:

 (1) When somebody uses a client that understands both email and newsgroups (such as gmail, I think, and probably some configurations of Thunderbird), it's relatively easy to post the same message to both ends (particularly if the mailing list address ends up in the CC: header rather than the To: header, and thus doesn't get replaced), resulting in two copies being received.

 (2) When somebody responds to a cross-posted message on the opposite medium from the one on which it was originally posted, using a traditional client that doesn't understand both email and news, they end up inadvertently responding to only one of the lists rather than all of them, leading to fragmenting of discussions.

These problems could be fixed by better substitution of addresses in the message headers.  In particular:

 (a) any message mirrored should have the address substitution performed for all Mozilla mailing-list/newsgroup pairs involved, not just the individual list being processed.  This will address problem (2).

 (b) when mirroring messages between the newsgroup and mailing list (in either direction), the mirroring should replace the headers describing the destination.  In other words, when going from newsgroup to mailing list, all Mozilla list/newsgroup pair entries in the Newsgroups: header should be *removed* (not retained as they are now) and the corresponding mailing list addresses should be *appended* to the To: header (without replacing any existing contents as they are now).  Likewise, when mirroring from mailing list to newsgroup, the address of the mailing list should be removed from To: or CC: headers (leaving items other than mailing-list/newsgroup-pairs untouched) and the appropriate Newsgroups: header should be added.  This should address problem (1).

It's possible there are also things we could do to improve the handling of messages mirrored via Google Groups, though it might be that Google Groups is already doing things correctly and once we fix our end, things will be better; I haven't tested things to know exactly what it's doing.
cc'd justdave to get his input.
Assignee: server-ops → mburns
Severity: normal → enhancement
I just checked, and the new Google Groups is indeed doing this.  By default, replies are shown as going to Art Kocsi...@sbcglobal.net and Moz-Dev-Apps....  No other recipients are listed.  I posted one today from the old Google Groups, which was NOT doubled.

And by the way, I sure wouldn't classify this bug as an enhancement.
Whiteboard: [2013Q1]
Any updates on this work?
These patch changes can be rolled out once the new-mailman-rpm workflow is nailed down this  week, which is tracking in bug 765289. Only preliminary progress has been made for these particular changes as of yet.
I've rolled a small patches into our Mailman rpm to address this problem. It "skips the rewriting of Headers and keeps threading for Mailinglists as well as Newsgroups intact." This is not a comprehensive solution, but it should be a big improvement.

A related bug 651527 proposes a not-dissimilar solution to comment 0 (that is, showing both munged & unmunged headers, not a mix like we do now) that I believe is less invasive, but no working code (yet) exists. This patch has been tested by other Mailman admins, so it is a safe(r) first step.

This has been a standing issues for ~10 years in Mailman. There are several proposed solutions that have come and gone on the Mailman-users list and Launchpad, but most are bit rotten or introduce more problems than they solve (or just plain don't work). That this hasn't officially been rolled into Mailman trunk makes me nervous what corner cases this might expose, as RFC-compliance has been the limiting factor in getting a fix for this bug pushed upstream.
Depends on: 860626
Assignee: mburns → limed
Component: Server Operations → Server Operations: Infrastructure
QA Contact: shyam → jdow
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
Component: Infrastructure: Other → Infrastructure: Mail
QA Contact: jdow → limed
Mailman has since been upgraded to 2.1.15 and we have included the NNTP-MsgID patch into our rpm so closing this out.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
The NNTP-MsgID fixes the issues with threading caused by gatewaying in the general case. However, that isn't what this bug is asking for; this bug is asking for:
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #0)
>  * When a message is sent to the newsgroup of a pair and mirror to the
> mailing list, it is sent to the mailing list with the Cc: header untouched
> and the To: header replaced by the mailing list address
>  * (I'm less sure about this part) when a message is sent to the mailing
> list of a pair (mailing list + newsgroup), the message sent to the newsgroup
> gets a Newsgroups: header added, but To: and CC: headers are untouched

When a message is gatewayed to the NNTP side of things, it needs to scan the To/Cc headers to find all of the newsgroups that need to be added. Right now, it's only picking one of the newsgroups to send to, and refusing to do the other ones because it's not munging the ID (really? the patch doesn't fallback to munging the ID if posting failed? the one I wrote for MM3 did that....).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
To be clear:  this bug is *not at all* about message IDs or threading.  It's about the To: and Cc: fields of mirrored messages leading to duplicated or missing messages as described in comment 0.  In order to avoid the problems in comment 0, headers *need* to be rewritten.
For extra clarity:

The standard installation of mailman does this when mirroring messages to the newsgroup:
1. Receive a message to a mailing list.
2. Synth a new message ID.
3. Look up the newsgroup.
4. Post the message with the changed message ID to the newsgroup.

When a message is sent to two mailing lists, this happens separately for both of them. This means that cross-posted mailing list messages are treated as multi-posted from the newsgroup side. It also broke threading from newsgroups pretty bad.

The patch that was installed does this [I think]:
1. Receive a message to a mailing list.
2. Look up the newsgroup.
3. Post the message with the original message ID to the newsgroup.

This preserves threading. However, when a message is cross-posted, the second mailing list that receives it tries to post and fails due to a duplicate message ID. As a result, it is only posted to one newsgroup.

The desired functionality of this bug from comment 0 is thi:
1. Receive a message to a mailing list.
2. Look up the newsgroups for all mailing lists that the message was sent to.
3. Post the message [with the original message ID is ideal] to all of those newsgroups as a single cross-post.
Depends on: 901188
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #8)
> To be clear:  this bug is *not at all* about message IDs or threading.  It's
> about the To: and Cc: fields of mirrored messages leading to duplicated or
> missing messages as described in comment 0.  In order to avoid the problems
> in comment 0, headers *need* to be rewritten.

As jcranmer explained to me on IRC, they're not entirely unrelated for the mail->news direction of the mirroring since NNTP doesn't have an envelope-recipient concept like SMTP does, but instead directly uses the Newsgroups: header to determine where messages go.



The current setup does seem to be performing step (a), at least for the mail->news direction

That said, as far as I can tell, while step (a) from comment 0 is being performed, at least for the mail->news direction.  But I don't see any evidence for (b) from comment 0 being done.  In particular, I have verified that removal of substituted addresses/newsgroups is not happening in *either* direction, and I haven't found any evidence that the To: field is no longer being clobbered in news->mail mirroring (though I'm not sure I've found any messages that had a To: field prior to mirroring).
Whiteboard: [2013Q1] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2770] [2013Q1]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2770] [2013Q1] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2778] [2013Q1]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2778] [2013Q1] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2780] [2013Q1]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2780] [2013Q1] → [2013Q1]

mailman and the nntp gateway has been decom'ed, closing.

Status: REOPENED → RESOLVED
Closed: 11 years ago3 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.