Open
Bug 305764
Opened 20 years ago
Updated 3 years ago
The junk filters should be sensitive to missing Message-IDs
Categories
(MailNews Core :: Filters, enhancement)
MailNews Core
Filters
Tracking
(Not tracked)
UNCONFIRMED
People
(Reporter: usenet, Unassigned)
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
A missing Message-ID in a mail appears to be a strong sign of the mail being
spam. The mail filters should use this information as part of their Bayesian
filters.
Reproducible: Always
Steps to Reproduce:
1. Get mail
2.
3.
Actual Results:
Lots of spams with missing message-IDs are received, and the mail filter does
not catch them.
Expected Results:
Spotted the lack of a Message-ID headers, combined this with other evidence to
confirm that the E-mails were likely to be spam, and marked them as junk.
After getting a load of duplicate E-mails because of another bug, I had to use a
script to remove duplicates from the mboxes, based on using the Message-ID as a
key. What I found whilst debugging this was that mails without Message-IDs were
all, or almost all, spam.
Comment 1•20 years ago
|
||
From looking at training.dat, I see it contains strings like
"message-id:<d9c4a72405071603544664a05c@mail.gmail.com>", but never
"message-id:" on its own, as far as I can tell. That would seem to indicate that
only the presence of a particular message id will affect junk score (not really
likely to get the same id twice), but the presence of a message id itself all won't.
It looks like headers tokenize to "<headername>:<first-word>"; perhaps they
should tokenize to "<headername>:" as well, so the mere presence of a header
could be reflected in the training data, too?
Assignee: mscott → nobody
Component: General → MailNews: Filters
Product: Thunderbird → Core
Version: unspecified → Trunk
Updated•19 years ago
|
OS: Linux → All
QA Contact: filters
Hardware: PC → All
Comment 2•19 years ago
|
||
Comment 3•19 years ago
|
||
(In reply to comment #0)
> A missing Message-ID in a mail appears to be a strong sign of the mail being
> spam. The mail filters should use this information as part of their Bayesian
> filters.
Bayesian filters only work on text that's actually there, by definition.
Wayne's comment 2 is pertinent, but even if that bug is fixed, it won't make automatic junk detection any more reliable. The sort of testing you're looking for is appropriate for something like SpamAssassin; xref bug 235114.
Recommend WONTFIX.
Comment 4•17 years ago
|
||
One approach to requests like this would be to add an interface to the Bayesian filter store so that arbitrary tokens could be added or deleted. Then people who wanted to play with different types of tokenization could use an extension to do that.
Comment 5•17 years ago
|
||
Kent, there may also be a junk bug filed for messages that are missing addresses. Anyway, there is xref Bug 391717 – filter with from criteria doesn't work if message's From: address is null or missing
| Assignee | ||
Updated•17 years ago
|
Product: Core → MailNews Core
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•