Open Bug 223716 Opened 21 years ago Updated 2 years ago

Email addresses are not valid tokens for junk (spam) mail filtering

Categories

(MailNews Core :: Filters, defect)

defect

Tracking

(Not tracked)

People

(Reporter: mozilla, Unassigned)

References

(Blocks 1 open bug)

Details

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.5) Gecko/20031007

Email addresses, usually in headers, can be very meaningful tokens indicating 
whether an email is junk.  This is particularly the case for users who give 
different email addresses out at different times (see "Junk Address" in Paul 
Graham's "Stopping Spam" http://www.paulgraham.com/stopspam.html).

According to documentation, the current implementation doesn't allow email 
addresses as tokens because at, full-stop (period) and percentage symbols are 
treated as delimiters.  I'm suggesting that these symbols should not be 
treated as delimiters, like dash is currently.

(But would this allow senders to get around spam filtering by replacing all 
spaces with full-stops in their emails?  Can they do this already by using 
dashes, which are already treated as part of tokens?)

Reproducible: Always

Steps to Reproduce:
1. Give out a different email address to each organsiation you deal with.
2. Receive email.

Actual Results:  
Email addresses in the headers (including those sold to junk mail senders) did 
not become tokens.

Expected Results:  
Email addresses that were sold to junk mail senders should have become tokens, 
resulting in improved spam detection rates.  Even when the sender does not put 
the address in the To: or Cc: fields, it still appears in headers added by 
ISPs.
The bug is always...
Bogofilter has some logic in their tokenizing code to special case addresses and
other header information into single tokens without affecting the parsing for
the actual message body. Mozilla should adopt their technique. 
Status: UNCONFIRMED → NEW
Ever confirmed: true
Product: MailNews → Core
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
per Scott, "the implementation could be changed to tokenize mail addresses from mail headers, so i think it's a valid bug / improvement"
QA Contact: laurel → filters
Product: Core → MailNews Core
Severity: enhancement → normal
Blocks: 11035
See Also: → 71413
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.