Closed Bug 230269 Opened 21 years ago Closed 9 years ago

for a new account messages are all marked as junk (due to untrained bayes junk database/no training.dat)

Categories

(MailNews Core :: Filters, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 250470

People

(Reporter: asa, Unassigned)

Details

(Keywords: regression)

When I set up a new profile and a new POP account, all messages downloaded are
marked as junk. I have reproduced this on the 1.6 branch and the recent trunk on
all of windows, linux, and macintosh. 

Steps to reproduce (some or all of this may not be relevant)

1. Create a new profile.
2. Use the mail account wizard to create a new POP account (I specified a
different incoming and smtp account).
3. Uncheck "download messages" at the final panel of the account wizard.
4. Go to account settings, server settings and specify "leave messages on server"
5. Login and download inbox messages.

Results: 260 of 260 messages marked as junk (though no training.dat even exists
yet). 
Expected: no marking as junk until I specify some junk.
I can't find a bug comment for reference, but I thought this was intended
behavior for visibility of the feature, encouraging users to train the filter...
Product: MailNews → Core
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
prob already on your radar, but in case not
This behaviour is by design, to "encourage" the user to train some messages.
While encouraging users to train messages is clearly a desirable goal, I'm unsure that achieving it this way is really worth the disadvantages:

* it almost forces them to start engaging with the Junk features (and taking at least a bit of time to learn how to use them) rather than allowing them explore at their leisure

* it makes Thunderbird look dumb to users who don't know why this behavior is there, since chances are pretty high that much of the mail that they're looking act is, in fact, not junk.

I'd be interesting in knowing if other clients do this sort of thing.  I'd also be interested in Bryan's thoughts here; adding him to the CC.
(In reply to comment #6)
> While encouraging users to train messages is clearly a desirable goal, I'm
> unsure that achieving it this way is really worth the disadvantages

I tend to agree with you. In this day where most servers have some sort of junk mail controls, the client-level controls are optional. I think we would be better off to disable junk mail by default - but also increase the visibility of the actions of the junk mail controls when they are enabled. The problem with disabling the controls, then not enabling this always junk feature, is that users will see absolutely nothing happen when they enable junk mail, and will probably assume it does nothing. SOMETHING should be different. At the bare minimum, using a slightly separate icon in the junk column (I use the same two icons used in the READ and UNREAD columns) to indicate a successful classification as NOTJUNK is needed. And we need some way to flag them that they have not trained enough messages.

Yeah, I think dmose hit it really well.  People want their junk mail to work, but they don't want to work for it.  And like Kent says, the prevalence of server side junk filtering has made client side optional.  We should be working in conjunction with the server side filters.  

One aspect of this would be to help people clear junk mail that gets through into their Inbox.  While we might not be able to train the server side filters we could at least act like a block filter and move mail to Junk folders.  This requires the junk controls remain available, however I think it changes the role of the learning system.

At the same time it might be really interesting to take our white list of addresses and turn it around to start retrieving mail wrongly marked as junk.
(In reply to comment #8)
> Yeah, I think dmose hit it really well.  People want their junk mail to work,
> but they don't want to work for it.
>
Client-side Bayesian filters will require the user to "work for it". The easy decisions have been made by the server filter, now if the user wants the client side filter to do anything, they are going to need to continually train it. We should not force them to use this feature - but if they are going to use it, they are going to have to "work for it". We can make it as easy as possible, but a Bayes filter is not going to work if it is invisible or hidden.

I still think that the client-side filter has value. For example, I get old email from previous careers which are not considered junk mail traditionally - but they are junk to me. The client-side filter does a good job of learning and ultimately rejecting that kind of stuff.

xref bug 179999...
Summary: POP account messages all marked as junk → for a new POP account messages are all marked as junk (due to untrained junk filter)
Depends on: 179999
Summary: for a new POP account messages are all marked as junk (due to untrained junk filter) → for a new account messages are all marked as junk (due to untrained bayes junk database/no training.dat)
(I meant to dup this to bug 250470)
Status: NEW → RESOLVED
Closed: 9 years ago
No longer depends on: 179999
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.