Closed
Bug 263397
Opened 20 years ago
Closed 19 years ago
Use number of misspelled words as a criteria in Bayesian filtering.
Categories
(Thunderbird :: General, enhancement)
Thunderbird
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 294077
People
(Reporter: dsimcha, Assigned: mscott)
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20041007 Firefox/0.10.1 Build Identifier: I have devised an idea to make Bayesian filtering of junk mail work better: in addition to the existing criteria used to determine the probability of a message being spam, every incoming message should be spell-checked using the existing spell-checker. As messages are marked as junk, the percentage of misspelled words should be recorded and this should be used as part of the criteria for determining the probability of a message being junk. This would complement the current Bayesian filtering method very well, as one of the most common methods of getting around this filtering is to purposely misspell words, for example prof1t instead of profit or remmm oveeee instead of remove. Reproducible: Always Steps to Reproduce:
Comment 1•20 years ago
|
||
Whilst this is arguably a dup of one or more of the spam filtering bugs that are open, it is more relevant to note that the Bayes in Bayesian filtering already does this. In your example 'prof1t' identifies spam far more damningly than even profit which I doubt occurs in many of your valid e-mails.
| Reporter | ||
Comment 2•20 years ago
|
||
However, in the remove vs. ree mooove example, there are so many ways to misspell the same word that the fact that spammers use multiple misspellings will throw off the Bayesian filters until they have seen almost every feasible combination.
Comment 3•20 years ago
|
||
I agree that it looks as though what you propose should work and should be effective, but in practice, a totally statistical approach is provably superior. In your example, the tokens 'ree' and 'mooove' which I agree are likely to be strong markers of spam are only going to be evaluated if they are within the 15 most interesting tokens in a message, in other words after spammers have started to use just these mis-spellings in preference to others. The Bayes (training) approach is like using a snow plough to keep a path clear in inclement weather, anticipating mis-spellings is like trying to catch the snow on the way dowm. There are links to the Plan for Spam, and much other information from the Mozilla Bayesian spam page http://www.mozilla.org/mailnews/spam.html
Comment 4•19 years ago
|
||
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
Comment 5•19 years ago
|
||
This bug has been automatically resolved after a period of inactivity (see above comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → EXPIRED
Updated•15 years ago
|
Resolution: EXPIRED → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•