Closed Bug 198100 Opened 21 years ago Closed 15 years ago

Option to change precedences (order) of junk control and normal filters

Categories

(MailNews Core :: Filters, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b4

People

(Reporter: straxus, Assigned: rkent)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file, 6 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312

I've got my Junk mail filter trained enough that it is starting to recognize
that certain mail coming in is junk, however I've discovered a problem with the
way the filters interact. I have filters on each of my email accounts that
forward all emails to a folder in Local Folders called "Composite Inbox". I have
also set up filters on each of my email accounts that send mail marked as junk
to the Junk folder in Local Folders.

Now the problem begins. It seems that the filter to send all email to the
Composite Inbox runs before the junk mail filter, and that the junk mail filter
is not intelligent enough to move emails from my Composite Inbox to the Junk
folder. I also discovered that it seems impossible to create filters for
anything under Local Folders, otherwise I'd try to manually move these emails
based on their junk status. There also does not appear to be a way to defer the
running of my inbox filters until after the junk mail filter has been run on
incoming emails, or to add Local Folders to the filtering process.

My suggested fixes for this issue are:

1. Implement mail filters on local folders.
2. Allow ordering of normal filters versus junk mail filters so that an end user
can decide to filter junk mail first, and then run the rest through a standard
user-defined filter.

Reproducible: Always

Steps to Reproduce:
1. Create filter on an account inbox that forwards all email to a given folder
in Local Folders (my filter is "all emails after Jan 1st, 1969").
2. Train junk controls to recognize emails and set option to automatically move
mail to trash.
3. Send a known junk email to the account. 
Actual Results:  
The email is flagged properly as junk, but was not be deleted and appears in the
given Local Folders folder.


Expected Results:  
Sent the email to the selected trash folder.
Are there any reasons that Junk Mail handling could not be integrated into the
Message Filtering system?
*** Bug 198961 has been marked as a duplicate of this bug. ***
*** Bug 198511 has been marked as a duplicate of this bug. ***
related: bug 196964, bug 198100, bug 196964, bug 197149, bug 198961
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Normal filters, junk filters, and local folders do not interoperate properly → Option to change precedences (order) of junk control and normal filters
mass re-assign.
Assignee: naving → sspitzer
only one issue per bug please, this bug can be:
2. Allow ordering of normal filters versus junk mail filters so that an end user
can decide to filter junk mail first, and then run the rest through a standard
user-defined filter.

and file a new bug for:
1. Implement mail filters on local folders.
*** Bug 219024 has been marked as a duplicate of this bug. ***
in fact there's already a bug file for issue #1: 
Bug 177093: Ability to create filters for Local Folders
Junk controls should only operate on the 'Inbox' -- *AFTER* incoming messages
have been filtered.  If mail is placed in a folder by a filter I created, why
would I want Mozilla to determine whether it is junk?  This is where the most
false positives occur.  I hate when I open a folder and messages vanish before
my eyes.  If this will not be the default behavior, it should certainly be an
option that can be set by the user.
re comment 9: I get a lot of spam on mailing lists - just to point out that junk
processing is sometimes needed after filters.
Steve Brown makes a valid comment <a href="#c9">#9</a>. Another benefit of
limiting the junk filtering in the way he suggests is that there is no need to
download the message bodies of the messages that have been moved into other
folders. This mitigates the apparent sluggishness that the junk filtering
introduces (even on my broadband line!)

I see three alternative ways this might be implemented, in order of complexity: 
1. junk filtering is only applied to the Inbox, after all other rules.
2. junk filtering is applied to any one nominated folder, after all other rules.
3. each folder has a boolean property to include or exclude it in the junk
filtering, which happens after all other rules.
*** Bug 221836 has been marked as a duplicate of this bug. ***
I have a large number of filters for my INBOX.  Some of them should be run
before the junk mail filters and others after it.  I would like to be able to
specify this using the Message Filter UI.

Something like:

This would run all the filters before spam scanning, then only scan the messages
left in the INBOX for spam.

Filter A
Filter B
Filter C
Filter D
Junk Mail

This would scan all incoming messages for spam and any that were not classified
as spam would have their filters applied:

Junk Mail
Filter A
Filter B
Filter C
Filter D

This would run filter A and filter B, followed by the junk mail filter on any
messages that weren't run by A or B, followed by C and D.

Filter A
Filter B
Junk Mail
Filter C
Filter D
An alternative would be to specify separately the step to *evaluate* junk status
from the step to *filter* junk mail. that way you could for example have
messages that are on a mailing list but junk, still arrive in the mailing list
folder, but be marked as junk.
Maybe the simplest would be to run the junk mail evaluation on every message as
soon as it is received, and to allow the user to specify the junk mail filtering
as in comment 13
I have an account that is a container for n forwarders. The emails arrives with
different "To:" fields.  I've created n filter to separate the emails looking at
the "To:" field.
unfortunately the junk filter is evaluated after the my filters and there are no
more emails to be evaluated by the junk filter because all the email have
already been filtered in different folders. (so comment 9 is invalid for my case)
with the ability to configure the order in wich junk filter operates, I will be
able to set it before my filters so I will not be flooded with spam in the
sub-folder splitted by my filters. (as suggested in comment 13)

I think that we definitely need the possiblity to choose the order in wich steps
occur.

bye
*** Bug 210252 has been marked as a duplicate of this bug. ***
OS: Windows 2000 → All
Hardware: PC → All
*** Bug 229531 has been marked as a duplicate of this bug. ***
Much of this bug may be a result of bug 219975 -- if the folder (other than a 
regular Inbox) is being viewed as the messages arrive, the junk mail is not 
moved.  Normally, opening any folder to which junk has been filtered will cause 
that mail to be moved to the Junk folder (assuming JMC has been configured to do 
so).  So, if the Composite Inbox of the original report is being viewed as the 
mail arrives (quite likely), nothing happens.

My understanding, which may be wrong, is that Mozilla actually does operate as 
postulated in comment 14: the mail is classified as junk as it arrives, before 
filtering but not moved until after all filters are executed (or until the 
destination folder is opened, for targeted mail).  It might be useful to specify 
the order of that automatic-move operation (on the Inbox only, I'm afraid) 
relative to filter execution; but I think what would be much more useful is 
being able to set/clear junk status on a message in a filter (bug 181631), and 
being able to use the junk status as a criterion in the filter.

See bug 202605.

Steve Brown's comment 9 is bug 198961.
Blocks: 202605
Depends on: 223591
(In reply to comment #18)
> My understanding, which may be wrong,

It is wrong. Speaking with the <exaggeration>supreme experience and
authority</exaggeration> of someone who has fixed 193625 (sr pending), I'll have
you know that messages are not currently processed by the bayesian filter until
you open the folder into which they have been filtered.

Anyway, here's my idea for how the filtering should work. To each message, Upon
arrival, will be applied a 'binary tree' of filters. Each tree node has a check,
a 'passed the check' action and a 'not passed the check':

The check is a boolean combination of basic checks (i.e. an OR of ANDS or an AND
of ORs); basic checks are the ones we have in the present filter dialog (e.g.
"does the sender field contain the word shazzaam?", as well as the bayesian
filter conclusion about whether the message is junk or not.

Each one of the two actions are one of the list of actions we have today in the
filter dialog at present, but besides the action itself, the message will be
subjected to the subtree whose root is the 'check passed' (left) or 'check not
passed' (right) child of the current node.

After passing through the tree and being subjected to different checks and
actions, a message will eventually reach a leaf node, and be left alone
('eventually' may be after just 1 or 2 checks, of course). Nothing more will be
done with it, specifically, nothing will be done with it at the moment the user
opens the folder in which it ends up.

Such a scheme has 4 benefits I can think of

1. The user will not need to wait until he/she opens a folder to know whether
indeed there are any new messages there (but this advantage is not exclusive to
my proposal)
2. User has full control over order of all checks (again, not exclusive to this
scheme)
3. Messages which do not reach tree nodes with a check of their
bayesian-filter-conculded junk status will not be passed to it for
classification at all
4. The number of checks which a message which a message will have to undergo
will be brought closer to a minimum (no more need to apply checks which are only
relevant to some kinds of messages to all unknown messages, and no more need to
repeat parts of checks in subsequent checks, like we have now with the series of
filter rules).

Now the question is whether I should post this to the other related bugs...
(In reply to comment #19)
> (In reply to comment #18)
> > My understanding, which may be wrong,
> 
> It is wrong. 

Heh.  I had figured that out, actually.  That's why I made bug 223591 a blocker 
for this bug.
Product: MailNews → Core
*** Bug 276355 has been marked as a duplicate of this bug. ***
*** Bug 322352 has been marked as a duplicate of this bug. ***
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
I'd love to see this issue resolved!  I'm using TB 2.0.0.9 and use filtering extensively. It gets really tiresome having to run junk mail controls manually on all my folders.
Excuse the double-post. I upgraded to 2.0.0.12 today and the issue is still
present.
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
I'm going to put my name on this for now just to increase its visibility on my radar screen. With the consolidations that have been done to the bayes filter recently, it should not be a big leap to add a hook into the end-of-batch processing of the bayes filter that calls the after-the-fact filter code when bayes processing is complete.
Assignee: nobody → kent
Boris, you asked me to add you to this bug's CC list.  It's 6+ years old.
Attached patch WIP, though mostly done (obsolete) — Splinter Review
The patch that I submitted, and an upcoming completed one, are really about adding a new context PostBayes "After junk analysis" to the existing contexts of Incoming and Manual in email filters. But there are some sticky issues with the user interface for the filter editor that I am not completely sure of the correct answer for.

The main issue is this. When you take an existing filter and change its context, the allowed search attributes may also change. For example, Junk Status is not allowed in Incoming, but will be allowed PostBayes (and should be allowed in Manual but currently is not). So what do you do if the user changes the filter context while editing a filter (say from "Junk Analysis Complete" to "Checking Mail") and now has a disallowed search attribute in the filter being edited?

I faced a similar issue in adding custom filter actions, because in principle it was now possible for the change in a filter context to affect the permitted actions in a message filter. The solution that I adopted there was to leave the invalid filter action showing in the filter list (even though it would not in fact work) but remove it from the popup that allows you to add actions. This is not wholly desireable, but was a case that only affected extension writers, in rare cases where a filter action was valid in one context but not another. (In my FiltaQuilla extension, the only filter action this affected was Copy As Unread, which is only valid in the manual context.)

There is s similar issue with offline and online IMAP folder status. If you switch to offline, then go to search, you will see that new options are available for IMAP folder search criteria, such as Subject Beginswith. You can save that as a search folder, select "Search online" and it will happily save it, even though that search criteria is not valid in the online context.

I mention these cases to make the point that, in the past, we have largely punted the question of changes in valid search criteria when conditions change, allowing invalid options to be silently used. Or in the case of the Manual filter context, we don't allow certain options that would otherwise be valid.

But the issue with search attributes and the PostBayes context is much more severe, since the junk-related search criteria are all invalid in Incoming contexts. The solution that I have implemented, which I am not too happy about, is this: When you change filter contexts (for example, from Manual "Manually Applied" to Incoming "Checking Mail") and the filter has a now-invalid search criteria like JunkStatus, the UI will throw up a warning dialog that says something like "Search criteria have been removed that are not valid in the new context", and the criteria will be automatically removed from the filter. If you Cancel the save of the filter, then the context will not be changed, and the old search attributes will not be removed. The user will be annoyed because his filter was modified during editing, but at least he can cancel further action (though only after the warning dialog).

I want to face the issue here, precisely because I want to allow important additional filter search criteria that are only valid in certain contexts. I want to check for invalid contexts and automatically remove them, just showing a warning dialog. It's ugly, but it is both safe, plus allows me to enable certian important filter actions for the first time.
I was never enthusiastic about the approach of automatically deleting invalid terms, so instead I changed things so that invalid terms still show in the UI, but the label is grayed out. It does not allow you to save filters with invalid terms. I think this is the best choice given the fact that not all search terms are valid in all contexts.

This patch (and any search/filter widget patch) is a bear to test. Some issues to look at:

1) Custom headers create their own complications
2) Test three cases: new filters, new filters that are modified before being saved, and edited filters that are modified.
3) Make sure that cancelling the save of an invalid filter leaves the old filter.
4) Most of the UI issues are associated with making sure that invalid terms truly appear as invalid, and do not cause other issues (such as blank search term fields).

Part of the work of this was to permit search of IMAP bodies in the PostPlugin filter context. This is not the ideal place to do this, and forcing tokenization to get the message locally is a little tacky. Bienvenu and I discussed allowing body filters in the normal Incoming context, by scheduling a message download when the filter was enabled. But for now, body filters seem to work in the PostPlugin context, and so I think we should allow them, even though it is ugly.

It would really be good to get some feedback on this in beta3, as this is adding a new functionality that might have unforeseen consequences.
Attachment #378090 - Attachment is obsolete: true
Attachment #380509 - Flags: ui-review?(clarkbw)
Attachment #380509 - Flags: superreview?(bienvenu)
Attachment #380509 - Flags: review?(neil)
Flags: wanted-thunderbird3?
Target Milestone: --- → Thunderbird 3.0b3
I noticed today that for POP3 messages that are being sent to the global "Local Folders" inbox, the PostPlugin filter must be defined on the "Local Folders" server, and not on the incoming server. For POP3, we are applying filter lists for both the "Local Folders" server, as well as for the Incoming POP3 server. I suppose that we need to do the same thing here as well - which will require adding a parameter to CallFilterPlugins that can pass the deferred server in POP3. I propose to do that though in a followup bug.

Also, I tested the Status Is New search term, and this now works in the PostPlugin context, even though it does not work in the Incoming context (bug 272963).
Whiteboard: [waiting review neil]
I'm having some second thoughts about the concept of adding a user-visible "postPlugin" filter context in the UI. Perhaps it would be better to just let the code examine all of the search terms in a filter (or filterList), and then automatically determine if that filter (or filterList) can be applied early or late (that is, before or after the filter plugin). I've also talked about allowing filters to be applied after gloda indexing, which would add yet another context, and that would really be too many decisions to put before the poor user.
Status: NEW → ASSIGNED
Blocks: 389022
Comment on attachment 380509 [details] [diff] [review]
Disable label for invalid search terms

I'm removing the review request from this bug, because I intend to modify it according to comment 33. Also, I want review resources to concentrate on bug 495519, which has a lot of overlap in the js.
Attachment #380509 - Flags: ui-review?(clarkbw)
Attachment #380509 - Flags: superreview?(bienvenu)
Attachment #380509 - Flags: review?(neil)
Whiteboard: [waiting review neil] → [has patch, need to redo]
Having re-read the comments on the related bug 196036 and bug 223591, and continuing this conversation I seem to be having with myself (where are the 28 voters and 34 cc'ers?) I'm having second second thoughts about my comment 33. If your email downloads take some time, either due to a slow connection or lots of emails, then you may want to do an initial quick filtering (say to move a message to a maillist folder) and still be able to apply a junk-related filter to it after that move. My proposal in comment 33 would not allow that. So I am leaning back toward having a visible post-plugin filter context (which is after all the "option" requested originally in this bug.)

Still, I have now modified bug 495519 and submitted it for review, which will bitrot this bug when that bug lands. The critical changes to the filter UI that I originally wrote for this bug are in 495519 as well, so I will make a final decision about the postplugin context when that bug lands.
Whiteboard: [has patch, need to redo] → [has patch] [waiting for 495519]
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
Whiteboard: [has patch] [waiting for 495519] → [patch needs updating]
Given that I have made 7 posts without receiving any comments on this bug, I don't have high expectations, but let me offer opportunity for comment on an issue anyway.

How do I specify to use the filter after junk analysis?

Currently, the filter editor allows the specification of three contexts or combinations: "Checking Mail", "Manually Run", or "Checking Mail or Manually Run". The logical place to add this new post-junk-analysis option is on that menu.

Issue#1: what do I call it? With the new extensions to the bayes filter, junk is only one possible message characteristic that can be determined by bayes analysis. So I am tempted to use "after analysis" to qualify the context, thus I would add an option "Checking Mail (after analysis)". But while that is technically correct, my misgiving is that would confuse users in the most common use case of analysis for junk mail.

Issue#2: do I also add the combination of checking mail and manually? I think that I need to, but then the message gets quite long: "Checking Mail (after analysis) or Manually Run". That is getting quite long. I seems like some sort of multi value list would be a better choice here, but it was decided originally not to do that, and I am reluctant to reopen that discussion.

Anyway, in spite of my misgivings, those are my current intentions.
Patch and unit tests, both a new one and existing, work with no known issues. I just need a final check for issues prior to review. I'd really like to get this landed several weeks before the TB3.0 beta4 code freeze.

Two minor points:

1) In adding changes to CallFilterPlugins, I noticed that parallel code for traits analysis did not work right (it repeatedly called the trait analysis for the same message), so I fixed it.

2) This patch has a subtle dependence on bug 127250, which I also hope to land prior to beta 4, in that I assume that IMAP message servers that download messages for offline use can also use body filters.
Attachment #380509 - Attachment is obsolete: true
In my experience (FWIW) it's never wrong to explicitly describe options for users, and even then some will misunderstand. But would Checking Mail (after classification) be enough to include junk?
I think your "after classification" would be better understood than my "after analysis" - though it is more letters. But I found long text in the dropdown menu less annoying than I had first expected, so I'll seriously consider switching to your wording.
"classification" sounds good because the message has been analyzed and a decision has already been taken ("junk" or "no junk") which is a classification ...
Attached patch tweaked, now ready (obsolete) — Splinter Review
Let's get this reviewed. I would really like to get this checked in some weeks before TB 3.0 beta4 if possible to avoid last-minute regressions.

Some issues:

1) Bayes processing sees the deferredTo folder (the local folder). That means that the postplugin filters will use the filterlist from the local server, but not the original deferred server. This is different from the behavior of normal filters, which use the filterlist from both servers. I have disabled the option for postplugin filters in the UI for deferred servers to show this to the user. This issue is in principle fixable, and I think it should be fixed, but it requires passing around the deferred server to lots of places, which would increase the number of files in this patch. I think it better to resolve this in a followup bug.

2. I have introduced three new processing flags in this bug. I also do not clear these always, which means that the keyset for them will persist for the life of the application. This is just an array of integers, one per new message, so I don't think this is a big memory problem.
Attachment #392886 - Attachment is obsolete: true
Attachment #393372 - Flags: superreview?(bienvenu)
Attachment #393372 - Flags: review?(neil)
Whiteboard: [patch needs updating] → [needs r neil, sr bienvenu]
Comment on attachment 393372 [details] [diff] [review]
tweaked, now ready

>+      let postPluginDisabled = false;
>+      if (server.rootFolder !== server.rootMsgFolder)
>+        postPluginDisabled = true;
You can write this as
let postPluginDisabled = server.rootFolder != server.rootMsgFolder;
[only one = necessary, no worries about type conversion]

>+   * contextIndex = 0: checking mail
>+   *              = 1: manually run
>+   *              = 2: checking mail or manually run
>+   *              = 3: post analysis
>+   *              = 4: post analysis or manually run
Bah, if only we didn't have separate inbox and news constants we could set the appropriate filter type as the value attribute on the menuitem e.g.
value="1" label="Checking Mail"
value="16" label="Manually Run"
value="17" label="Checking Mail or Manually Run"
value="32" label="Checking Mail (after classification)"
value="33" label="Checking Mail (after classification) or Manually Run"

>+  let Cif = Components.interfaces.nsMsgFilterType;
Nit: Prefer naming this nsMsgFilterType

>-  let filterType = Components.interfaces.nsMsgFilterType.None; // that is, 0
>+  let filterType = 0;
Why this change?

>   if (aFilterType & Ci.nsMsgFilterType.Incoming)
>     return aServerFilterScope;
> 
>-  if (aFilterType & Ci.nsMsgFilterType.Manual)
>+  else // Manual or PostPlugin
Don't use else after a return...

>+  const long PostPlugin       = 0x20; // After bayes filtering
>   const long All              = Incoming | Manual;
Hmm, I wonder where (and why) we use All...

>+      // want to propogate that value.)
propagate
(In reply to comment #42)
> 
> >-  let filterType = Components.interfaces.nsMsgFilterType.None; // that is, 0
> >+  let filterType = 0;
> Why this change?
> 

We want to start with all bits cleared so that we can accumulate all of the enabled message types, which has little to do really with some concept of "No filter type". If this was *really* some sort of default type or a magic number, then in theory we could change it to, say, -1 and the code would still work, but that is not true. It needs to be zero and not some default that means "no filter type" so I decided to make it clear by just setting it to zero. To check the code, you have to convince yourself that
Components.interfaces.nsMsgFilterType.None == 0, but why add that extra burden?

Your other issues I will fix when I redo the patch after initial review.
Comment on attachment 393372 [details] [diff] [review]
tweaked, now ready

git-apply complained of trailing whitespace on 6 lines. r=me with that and my other nits fixed.
Attachment #393372 - Flags: review?(neil) → review+
(In reply to comment #42)
> (From update of attachment 393372 [details] [diff] [review])
> >   const long All              = Incoming | Manual;
> Hmm, I wonder where (and why) we use All...
Actually I don't think we do ;-)
Attached patch Fixed Neil's nits (obsolete) — Splinter Review
Carrying over Neil's r
Attachment #393372 - Attachment is obsolete: true
Attachment #394990 - Flags: superreview?(bienvenu)
Attachment #394990 - Flags: review+
Attachment #393372 - Flags: superreview?(bienvenu)
Attachment #394990 - Attachment is obsolete: true
Attachment #394991 - Flags: superreview?(bienvenu)
Attachment #394991 - Flags: review+
Attachment #394990 - Flags: superreview?(bienvenu)
Attachment #394991 - Flags: superreview?(bienvenu) → superreview+
Comment on attachment 394991 [details] [diff] [review]
bah, left a note to myself and associated error (fixed now).

looks good - if I'm reading the diff correctly, we don't really need braces here, right?

+  if (contextIndex == 1 || contextIndex == 2 || contextIndex == 4) // manual
   {
-    filterType = Components.interfaces.nsMsgFilterType.Manual;
+    filterType |= Components.interfaces.nsMsgFilterType.Manual;
   }
-  else // 0 or 2, checking mail is enabled
+  if (contextIndex == 3 || contextIndex == 4) // post analysis
+  {
+    filterType |= Components.interfaces.nsMsgFilterType.PostPlugin;
+  }
Carrying over reviews.
Attachment #394991 - Attachment is obsolete: true
Attachment #395143 - Flags: superreview+
Attachment #395143 - Flags: review+
Keywords: checkin-needed
Whiteboard: [needs r neil, sr bienvenu] → [needs checkin]
Checked in: http://hg.mozilla.org/comm-central/rev/fc0bce19c346
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: wanted-thunderbird3?
Flags: wanted-thunderbird3+
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs checkin]
You need to log in before you can comment on or make changes to this bug.