"Mark as Junk" using message filters do not train junk-filter



MailNews Core
11 years ago
8 years ago


(Reporter: Ingo Fischer, Unassigned)


Firefox Tracking Flags

(Not tracked)




11 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070713 Firefox/
Build Identifier: All TB versions

It seems (and i have seen this in another bug as comment too) that the junk-filter is not trained when you use "Mark as Junk"/"Mark as Not Junk" with message-filters.

Why not?

Reproducible: Always

Steps to Reproduce:

Expected Results:  
When I mark a Message as "Junk" or "Not Junk" using message filters (I filter from IMAP to local) then the junk-filter should use this for training to get better optimized for my emails

Comment 1

11 years ago
It seems to me that the present behavior is correct.  The adaptive junk filter is a vague set of rules for what is junk, which TB guesses based on which messages you manually mark as junk.  If you have a specific (not vague) criterion that determines certain messages to be junk, that criterion belongs in the filter list, not in the adaptive filter.

Certainly if your request were granted I would want the ability to disable the new behavior.  TB is slow enough as it is without having the junk filter second-guess my explicit filtering rules.

Comment 2

11 years ago
The main reason I want to have the filter trained is not for messages that are junk - I want to have it trained for messages that are _no_ junk ...
Do you understand what I mean?

Comment 3

10 years ago
No, I don't.  You can already implement a whitelist by creating a filter rule that messages with certain headers (i.e. From: myfriend@example.com) are to be marked as NOT spam; but once you've decided on a filter rule of that sort, why would you want to use it to train the adaptive filter?  The filter rule itself does the job.

If I've missed the point, can you provide an example of what you want to see happen?

Comment 4

10 years ago
I understand the value of this. One of the weaknesses of TB spam processing is that the natural tendency is to primarily train false-negative spam messages as junk. The spam processing relies on the ratio between words marked as junk and words marked as non-junk, so in the common scenario of only marking junk, the filter accumulates lots of spam samples, but very few ham samples. Using a filter to record ham could add some good ham samples to the filter.

But really I think the correct solution in the long run is to have an uncertain status, and ask users to mark them as spam or ham. Still, the proposed filter in the current bug has some merit.

Comment 5

10 years ago
Kent, I meaned it exactly as you understood it :-) With the automatically filtered messages that are known to be "good" the spam-filter would be trained to the special message-content and word of the user.

Thunderbird already has such an "unknown junk status" internally, but this is not shown normally - these messages with unknown junk status are shown as "non junk" messages. When you install the extension Mheny (i think it was this) then mails with unknown junk-status will show a question-mark in the junk column - and so you could identify them to mark them manually as junk or no junk.

Comment 6

10 years ago
The "unknown junk status" internally is not some indication of uncertainty, but a flag that the junk filter has not processed the message. I routinely run TB with a small CSS patch that shows this status. However, it is not particularly useful, as it really is just a way to show when the junk processing occasionally skips processing a message for some reason.

Unfortunately the uncertainty value is more difficult to get at for a couple of reasons. First, the interface for the junk processing does not pass that information. Second, at many places in the code, the "junkscore" variable, which is defined as a value between 0 and 100 to represent that uncertainty, is actually treated as a binary with a value of either 0 or 100. There's quite a bit of codeing required to fix that. I made a pass at that issue in my bug 366491, but it was really a massive change that generated no interest and a little resistance, so I abandoned that for now.


10 years ago
Duplicate of this bug: 429923

Comment 8

9 years ago
Kent, can you update this with the current state of the art?
I can imagine that some people would prefer this stay as is and/i.e. there is value in it having being (presumably) designed to work this way.
Severity: minor → enhancement
Component: General → Filters
Product: Thunderbird → MailNews Core
QA Contact: general → filters

Comment 9

9 years ago
(In reply to comment #8)
> Kent, can you update this with the current state of the art?
> I can imagine that some people would prefer this stay as is and/i.e. there is
> value in it having being (presumably) designed to work this way.

Train as junk/good has been added as a custom filter action in JunQuilla, so the request of this bug can now be met in an extension, at least in TB3 which has the necessary backend changes to support custom filter actions.

That being said, I'm not ready to recommend a strategy of training certain well-known message classes automatically. I would need to do some more testing to really get at the best strategy.

Comment 10

8 years ago
From my point of view  the use of an extension like JunQuilla in TB3 that allows that training in filters is great. I would see that bug as "WONTFIX" ;-) I hope this is correct ...
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.