Open Bug 326542 Opened 18 years ago Updated 2 months ago

Junk filter should have a displayable "unsure" category

Categories

(Thunderbird :: Folder and Message Lists, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: chitu, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Thunderbird 1.5 20051201

What makes SpamBayes the best anti-spam filter, in my opinion, (and one of the main reasons I continue to use Microsoft Outlook at work) is that instead of making a black-and-white judgment of spam/not spam, it has a grey category: "unsure". For example, >90% likelihood is considered "spam", <15% likelihood is considered "ham/not spam", and anywhere in between is "unsure". There is a special "unsure" box where such mail is sent, and the user is notified of it's arrival. Then the interface gives two buttons: "Classify as Spam" and "Classify as Not Spam", which makes it easy to train the Bayesian filter to more accurately classify such mail in the future.

The result is that with SpamBayes on Outlook, I get zero false positives (non-spam that the filter sends to my spam box)--I have never, ever used any other spam filter that gives me zero false positives after just a few days of training. Although SpamBayes' excellent logic is largely responsible, the very simple "unsure" box has a lot to do with it.

An added, very significant, benefit of the unsure is that I am psychologically not bothered by my spam filter telling me to confirm e-mail of which it is unsure. I am slightly annoyed when an occasional spam slips into my inbox (false negative--the spam filter did not correctly eliminate the spam), and rather upset when the spam filter calls a legitimate e-mail "spam", risking my missing important e-mails, or only seeing them late, since I don't check my spam box every day. With SpamBayes on Outlook, I only check my "unsure" box when there's mail there for me to confirm (just a few each day), and I only check my confirmed spam box around once a week. Granted, because of my enterprise's excellent spam filtering, I only get two to five spam messages a day, but all the same I have never had a false positive using SpamBayes on Outlook.

My proposal: Modify the Thunderbird spam filter to include an "Unsure" box for user-customizable middle ranges of spam probability, in addition to the existing "Junk" box for very high (user-customizable level) spam probability.

Reproducible: Sometimes

Steps to Reproduce:
1. Wait one day
2. Check your Junk folder
3. Look for false positives--legitimate e-mail classified as spam.

Actual Results:  
There are often false positives, which requires you to go through your junk folder regularly, even daily.

Expected Results:  
Zero false positives. You can check your Junk folder once a week before deleting junk, and yet find no legitimate mail in it.
I know Scott is interested in looking at spam again in time for 2.0
Flags: blocking-thunderbird2?
Related to Suite bug 209898?
(In reply to comment #2)
> Related to Suite bug 209898?
> 

Definitely related. In fact, bug 209898 might be a precursor. However, I wouldn't call it a dependency, since what I'm proposing here could be implemented without a full implementation of bug 209898 first.

The main difference, as I see it, is that bug 209898 proposes a detailed labelling scheme, which I think would be great. However, here I am proposing concrete useful actions that should be taken--definite behaviours of the mail client in response to the $AutoMaybeJunk flag (what I called "unsure" above. Thus, these are two distinct bugs.
I had some more thoughts on possible implementation. A relatively easy implementation would be for the Bayesian filter to simply add two entries to the header of each filtered e-mail message:

X-ThunderBayesianScore: [The score from 0% (certainly not spam) to 100% (definite spam)]
X-ThunderBayesianDecision: AutoNotJunk | AutoMaybeJunk | AutoJunk

The decision between bounderies of the three possible decisions would be user-definable in the Junk management dialog box. For example, by default, AutoNotJunk <= 15%, while AutoJunk => 90%. Anything > 15% or < 90% would be AutoMaybeJunk.

Although it would be nice for the rest of the functionality to be automatic, what I've just described would be sufficient for the user to manually manage SpamBayes-style spam handling:

With the headers, the user could themselves create a filter that automatically sends mail into either a Junk folder or a MaybeJunk folder depending on the value of the X-ThunderBayesianDecision. Within the MaybeJunk folder, the user could then confirm junk mail with the Junk button. 

(At this point, it becomes necessary for Thunderbird to have an "unsorted" category distinct from non-junk: see Bug 209898. But this is at least a start.)

Hopefully, this partial implementation could be done relatively soon, even if the full functionality I described in the original comment might take longer to implement.
I don't think we'll get to this for 2.0. But am interested in getting hackers interested in working on the spam filters so this work could happen on the trunk. 
Flags: blocking-thunderbird2? → blocking-thunderbird2-
Keywords: helpwanted
Assignee: mscott → nobody
Doesn't this bug belong in the "MailNews Core" product? But under which Component?

I've read somewhere on the mozilla.org NGs that the junk filters already include a Maybe status in addition to Yes and No, but that currently that Maybe status can be neither displayed nor set.

And, contrary to the reporter's experience described in comment #0, sometimes I get false negatives (spam ending up in Inbox) but it's been years since my latest false positive (legit mail ending up in Junk). Maybe I'm lucky.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Bayesian spam filter should have an "unsure" category, like SpamBayes → Junk filter should have a displayable "unsure" category
(In reply to comment #6)
> 
> I've read somewhere on the mozilla.org NGs that the junk filters already
> include a Maybe status in addition to Yes and No, but that currently that Maybe
> status can be neither displayed nor set.

The bayes filter internally now returns a junk percent, which could be displayed, used for filtering, or for changing an icon. An issue though is that this is not reliably preserved in various scenarios - such as reindexing, moving to IMAP, detaching attachments, etc. I'm trying to fix that. But even after those backend fixes, there does not seem to be much enthusiasm among mailnews leads for improvements of this nature. But I hope to provide this functionality in an extension that will be available for TB 3.0
Component: General → Folder and Message Lists
QA Contact: general → folders-message-lists
Blocks: junktracker
Have a look at the JunQuilla extension. It adds exactly these wanted features.
Severity: normal → S3
See Also: → 562050
See Also: → 180004
You need to log in before you can comment on or make changes to this bug.