Spam filtering should use source IP, reverse DNS info, as data for Bayesian filter



12 years ago
9 years ago


(Reporter: Neil Harris, Unassigned)


(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)




12 years ago
Currently, some countries are disproportionate sources of spam, yet most users do not recieve any legitimate E-mails from those countries. The most signficant octet of an IPv4 address correllates well with the registry that issued it (ARIN, RIPE, APNIC, etc.) and, to a lesser degree, with large ISPs. Thus, the MSB information can be used as a _very approximate_ geocoding hint. 

Similarly, reverse IP lookup data for source IPs could help discriminate by ISP sources, or even by the fact that reverse lookups were either missing or bogus.

These sorts of imprecise hint are exactly the kind of information that Bayesian filtering is designed to exploit, in the (common) case that certain countries or ISP are more, or less, likely to be sources of spam or ham, from the viewpoint of a particular user.

(For any of this to work, header analysis will first need to be robust against most common spoofing attacks.)

Since these would only be two features among many, and would in any case depend on Bayesian learning, using this source IP information will not cause whole countries' E-mail to be marked as spam or not spam; rather, it will only tip the balance in edge cases where the mail is already questionable, in cases where geographical / ISP informatis is already useful.

Comment 1

12 years ago
Isn't this a task of the SMTP mail-server of your ISP, not of the mail-client ? The last few headers will be the one from your ISP anyway.

Comment 2

11 years ago
maybe it is but there might be something Thunderbird can do about it 


10 years ago
Assignee: mscott → nobody


10 years ago
Duplicate of this bug: 419589
You need to log in before you can comment on or make changes to this bug.