Junk classification should execute on each message as it's received, not after all have been fetched.

NEW
Unassigned

Status

MailNews Core
Filters
--
enhancement
14 years ago
8 years ago

People

(Reporter: lev, Unassigned)

Tracking

(Blocks: 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030728 Mozilla Firebird/0.6.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030728 Mozilla Firebird/0.6.1

When downloading a large number of messages in Thunderbird (assuming it's the
same for MailNews), one has to wait for all the messages to be downloaded before
they are filtered. If there are a few dozen messages on the server, the user may
want to start reading the first few before the last have finished being fetched.
However, these messages have not yet been filtered (especially by Junk filters),
so the user will not know which messages are Junk. Also, remote images will show
up in spam messages. (The fact that valid messages have not been filtered to
their destination folders is also an inconvenience, though a lesser one)

With the current setup, the user has to wait for all mail to be downloaded and
filtered before they can effectively work with it - this can take a long time on
a dialup connection.

Additionally, the current behavior causes messages to "jump" around and change.
What initially looks like a regular message in Inbox, may end up being a junk
message in the Junk folder. However, depending on the amount of mail needed to
fetch, the delay may be significant.

I believe each message should only become "visible" after having all the filters
applied to it.

Reproducible: Always

Steps to Reproduce:
1. Check an account with a lot of (junk and non-junk) new messages
Actual Results:  
All messages appear in Inbox first, and are not spam-filtered. Once downloading
of mail is finished, filters run, marking some messages as spam, and moving
others to different folders based on filter rules.

Expected Results:  
As each message is retreived, MailNews should apply all filters (including Junk
filters) to it, and process it accordingly. Only then should the message appear
in the message list. It should not show up before having filters applied to it.

Comment 1

14 years ago
In fact, filters DO run on each message as they arrive.  I have a filter moving 
bugzilla mail to its own folder, and I have often watched as a bunch of messages 
arrive and the Unread count on that folder increments while the mail is 
downloading.

Junk-analysis is, I believe, also determined as the mail arrives.  The one thing 
that is not executed as the mail arrives is the automatic Move to Junk action; 
that is delayed until the mail is completely received (for Inbox mail) or, for 
mail that's been filtered into another folder, until that folder is opened.  
(But see bug 219975.)

See bug 198100.
(Reporter)

Comment 2

14 years ago
Hmmm... This is not the behavior I am seeing in Thunderbird 0.4 on XP... 

The client first downloads all the messages to my Inbox where they appear one by
one without Junk flags, and after the last message is done downloading, "Junk"
icons appear one by one next to some messages, staring from the oldest.

Comment 3

14 years ago
Deferring to comment 2; Lev has more info than I do. (I don't get much spam 
making it as far as Mozilla.)

Here is my suggestion: turn this bug into a request to perform junk 
classification (but not automatic movement) during download, before filters are 
executed; and it should block both:
 bug 198100 -- in order to sequence the automatic-movement part of JMC among
      the filters, messages need to be classified first; and
 bug 196036 -- you can't filter your messages on junk status if the junk status
      hasn't been determined yet.

If that's OK with you, Lev, I will confirm this bug.
(Reporter)

Comment 4

14 years ago
Sounds good to me.

Updated

14 years ago
Blocks: 196036, 198100
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: Filters (including Junk) should execute on each message as it's received, not after all have been fetched. → Junk classification should execute on each message as it's received, not after all have been fetched.

Comment 5

14 years ago
*** Bug 230127 has been marked as a duplicate of this bug. ***

Comment 6

14 years ago
*** Bug 236646 has been marked as a duplicate of this bug. ***

Comment 7

14 years ago
Additional note on why this feature is important, since I know developers don't
tend to use dialup: I tend to get about 300-400 messages per day (about 99% of
which are junk), which can take up to half an hour to download. Having to wait
for the download to complete before reading mail is quite a nuisance.

Comment 8

14 years ago
*** Bug 240830 has been marked as a duplicate of this bug. ***
Product: MailNews → Core

Comment 9

13 years ago
There's quite a few other junk-related bugs that can be discarded when this and
bug 196036 are fixed. If you can set Junk status as soon as a message arrives,
and you can have user filters trigger on the Junk status, then you can pretty
much remove all of the Actions of the Junk Mail Controls, because they are all
possible using regular user filter Actions. I think that would be a wise move.

Also you then don't have to do anything to fix bug 198100 - the user can just
insert a filter triggering on Junk status anywhere they want in their filter
list.  Likewise, bug 198961 disappears - they just have to put their other Move
filter in front of the Junk move filter, and the problem is gone.

And bug 200788 disappears, because the Junk move filter will just be a regular
filter, it won't have some delayed effect like it does now.

I think it would be extremely worthwhile to fix this and Bug 196036 and cut the
fat out of the Junk Mail controls.

Comment 10

13 years ago
(In reply to comment #9)

Applying the bayesian filter check to messages as they arrive may be a waste of
resources - if I get 100 messages, of which 95 are from mailing list I know as
non-spam and am subscribed to, and 5 are from people I don't know (these are
realistic figures) I would certainly not want to have 95 messages checked for no
reason. See my comment in bug 198100 about a more dynamic alternative of when to
classify as junk/not junk.

Comment 11

13 years ago
(In reply to comment #10)
> Applying the bayesian filter check to messages as they arrive may be a waste of
> resources - if I get 100 messages, of which 95 are from mailing list I know as
> non-spam and am subscribed to, and 5 are from people I don't know (these are
> realistic figures) I would certainly not want to have 95 messages checked for no
> reason. See my comment in bug 198100 about a more dynamic alternative of when to
> classify as junk/not junk.

Firstly, I would argue (based on my experience with Tbird on a 1.1GHz system)
that the performance hit from checking a message is trivial enough for it to not
be a significant consideration. I'd be happy for messages to be redundantly
checked if it helped fix the underlying problem.

Secondly, the filter tree idea, although elegant sounding, comes across as a
user interface (and thus usability) nightmare. It also doesn't sound like it
would work intuitively with filters that don't actually filter - eg. ones that
flag a message, copy it, mark it as read, etc.
(In reply to comment #10)

> Applying the bayesian filter check to messages as they arrive may be a waste of
> resources - if I get 100 messages, of which 95 are from mailing list I know as
> non-spam and am subscribed to, and 5 are from people I don't know (these are

As far as I know, that's not a waste of resources that you have any control over
at the moment. The waste just happens at an inopportune time where instant
responsiveness is expected (at folder open) rather than an opportune time with
existing network lag (at mail fetch). Perhaps finer-grained control would be
nice (I tend to disagree -- better to waste cpu cycles than brain cycles), but,
barring the appearance of such a feature, do you actually disagree with what is
suggested by this enhancement request, and why would you not? After all, this
issue can cause grievous waste of BY FAR more important resources (comment 7).

Comment 13

13 years ago
Actually, I'm basically in agreement with what Howard said, I was just hoping to
'squeeze' a little bit more flexibility...

I would just emphasize that it is only a flexible solution, such as the one
Howard suggested in comment 9,  or the one I suggested in my comment in bug
198100 - unifying the code for all aspects of filtering and controlling it all
together - only such a solution could indeed take care of all these bugs. If we
just leave the junk controls as they are but simply move their application to an
earlier time we're solving one problem and creating another.
I am with comment 9.

I think this could be easily accomplished with a filter condition like "IsJunk",
which would either be true if message was marked as junk, or would trigger the
actual bayesian check if it was unknown.

Controls like "do not mark messages as junk mail if sender is in my address
book" would be easily replicated with an "IF From (in my address book) THEN
SetJunkStatus (IsNotJunk)" filter placed before the junk-discarding one.

For those who have lots of mailing list messages, junk control could be skipped
altogether when not needed at all.

Actually... why was all this done the way it is in the first place? because of
the news folders, where many messages are downloaded but never looked at?

Comment 15

13 years ago
(In reply to comment #14)
> I am with comment 9.

I am testing a patch for this now.

> I think this could be easily accomplished with a filter condition like "IsJunk",
> which would either be true if message was marked as junk, or would trigger the
> actual bayesian check if it was unknown.

I won't know if my patch is actually any good until I have the filter condition
implemented, and I haven't looked at that yet.
 
> Controls like "do not mark messages as junk mail if sender is in my address
> book" would be easily replicated with an "IF From (in my address book) THEN
> SetJunkStatus (IsNotJunk)" filter placed before the junk-discarding one.

Yes. The one concern I have right now is that the JunkMail Control dialog makes
it very easy to specify this functionality, and we probably want that to remain.
I'm thinking we can have the JunkControl create the filters if the current
filter list is empty. If the filter list is already populated, I don't have a
good idea for preventing redundant filters from getting inserted, or even how to
remove filters that the JunkControl inserted (when you toggle off a feature).
> 
> For those who have lots of mailing list messages, junk control could be skipped
> altogether when not needed at all.

Well... On one of my accounts I have a reasonably effective blocker based on
sender info running on my SMTP server (see
http://www.highlandsun.com/hyc/badDNS.html). The sad thing is, I still get spam
on this account, but the only spam that gets thru is on legitimate mailing lists
(cyrus-sasl, all the GNU lists) because they have a weak- or no-filtering policy
on their lists. So for me, even though I get a ton of legit mailing list
messages, I still need spam classification to happen early, for all of my
incoming mail.

> Actually... why was all this done the way it is in the first place? because of
> the news folders, where many messages are downloaded but never looked at?

I don't think the news folders even run the spam filter, ever.

Comment 16

13 years ago
> > Controls like "do not mark messages as junk mail if sender is in my address
> > book" would be easily replicated with an "IF From (in my address book) THEN
> > SetJunkStatus (IsNotJunk)" filter placed before the junk-discarding one.
> 
> Yes. The one concern I have right now is that the JunkMail Control dialog 
> makes it very easy to specify this functionality, and we probably want
> that to remain.

I don't think we should want this to remain. If the filter creation dialog gets
a "sender is in address book" check, it would be quite simple to define such a
filter. My hope is that eventually every single item in the junk mail controls
would disappear, and you would simply have customizable filters & actions run on
different occasions (message arrives, message is marked, message is X time old,
etc.). This uber-customizability could be accompanied by a nice user-friendly
'wizard' which would suggest some common useful filters for less experienced
users - such as the ones in the junk mail controls now with a UI similar to the
outlook filter creation wizard - a hold-my-hand kind of thing with explanatory
text, perhaps some illustrations, etc.

The point of what I'm saying is this: First we need to make the filter system
more robust and featureful (more possible checks, more events in which filters
may apply, ability to trigger bayesian filter, ability to look up in address
book or whatever capabilities we wish to add), and only once these capabilities
are in place and working properly start using them instead of what's in use now.


> I don't have a
> good idea for preventing redundant filters from getting inserted

A crude solution, but one which would work, is to distinguish (e.g. by a flag)
filters created by mozilla itself from filters created by the user. Thus any
redundancy is the user's fault, not ours, and we can remove whatever filters
we've created.


> > Actually... why was all this done the way it is in the first place?

Maybe this had something to with IMAP... just a guess.

Comment 17

13 years ago
Created attachment 184083 [details] [diff] [review]
Move Junk classification to normal filter processing

This is a preliminary patch to try out the functionality. Note that the issues
in bug 185937 may still be relevant here. I'm not getting consistent results
yet. We need the Junk classifier to complete synchronously but it appears that
it uses an asynchronous call to read the message. My attempts to use
locks/monitors/condvars to control this all result in deadlock, so I don't have
any clue how it really works.

As you can see, this patch deletes a lot of code. Hopefully we can get the
synchronization details resolved, then we can get rid of the remaining loose
ends regarding the JunkMail Controls, and the overall result will be a lot
leaner than what's in there now.

Comment 18

13 years ago
(In reply to comment #14)
> I am with comment 9.
> 
> I think this could be easily accomplished with a filter condition like "IsJunk",
> which would either be true if message was marked as junk, or would trigger the
> actual bayesian check if it was unknown.

The current patch does this, almost. It triggers the check when the status is
unknown, but it appears that the result of the check comes back late, so the
junks status isn't actually seen in the current filter. I haven't been able to
figure out how to synchronize this correctly; I hope someone else more clueful
can step in here.

> Controls like "do not mark messages as junk mail if sender is in my address
> book" would be easily replicated with an "IF From (in my address book) THEN
> SetJunkStatus (IsNotJunk)" filter placed before the junk-discarding one.

Right.

> For those who have lots of mailing list messages, junk control could be skipped
> altogether when not needed at all.

Right. If none of your filters test the Junk Status, then junk filtering will
never happen.
> 
> Actually... why was all this done the way it is in the first place? because of
> the news folders, where many messages are downloaded but never looked at?

I've only tested this for POP3, and I don't have any IMAP accounts. While it's
obvious that the patch doesn't work quite right, it would still be helpful for
some people to try it anyway and post here to describe exactly what is wrong
with it.

Comment 19

13 years ago
Created attachment 184093 [details] [diff] [review]
Fix synchronization in previous patch

Well, this may not be The Right Way to do things, but it seems to work: push a
new event queue, and let things run until the classifier finishes. My Junk
filters are all working as expected now on POP3.

Updated

13 years ago
Attachment #184083 - Attachment is obsolete: true

Comment 20

13 years ago
yes, junk mail analysis is async, and trying to force it to be synchronous is
dangerous, especially for imap, imo. In essence, you're blocking the UI thread
to run the junk mail classification, and that can be a bad think if the message
is big, or the server is slow. We try to never block the UI thread like that.

Comment 21

13 years ago
This discussion is setting off all kinds of warning flags for me.  I admit that 
I can't understand Howard's patch in sufficient depth to comment on that 
directly, but I'm really worried about this:

>> The one concern I have right now is that the JunkMail Control dialog 
>> makes it very easy to specify this functionality, and we probably want
>> that to remain.
>
> I don't think we should want this to remain.

Eyal, while I'm sympathetic to the desire to make this feature work, I think you 
are wrong on this score.  Junk Controls is one of the premier TB features with 
lots of newbie appeal; throwing out the current dialog in favor of making them 
configure filters is a bad idea.  Furthermore, the JMC dialog has a number of 
settings that would still be useful, but be completely out of place in filter 
configuration.
(Granted, the current JMC dialog is not particularly great, viz bug 257990.)


I don't think having a specific item in the filters dialog to run the 
classification is the way to go; it adds too much complexity and error-proneness 
to the configuration of filters.  I also don't think it should be necessary to 
define a filter that says "if sender in my address book, mark the message Not 
Junk" -- that's a basic and obvious enough criterion that it can be handled as a 
decision point in the junk filter itself.  So, the Junk Controls items for 
enabling junk detection and whitelisting would remain.

Altho I hate to make this suggestion because it adds complexity that I'm not 
sure is desirable, my UI for this would be a special item in each account's 
filter list, called "Dispose of junk" [DJ].  This item would not be deleteable, 
but could be disabled, and it could be moved up and down in the list.  It would 
be hardcoded with the uneditable single criterion [if Junk Status is Junk] and 
the single action Move to [folder].  The folder would be selectable -- and would 
be reflected in the Junk Controls dialog.  Whether it's enabled or not would 
also be reflected in the Junk Controls dialog; and changing either of those 
settings in the JMC dialog would update the filter.

Other more advanced actions, such as Mark as Read or Delete from POP server, 
would be handled in a separate, user-defined filter, which would have to be 
placed prior to the DJ filter.  (Or, I suppose, those two actions *could* be 
enabled in the DJ filter.)

Another question: If JMC is more fully integrated into filters: Are we going to 
lose the separate Junk Mail Log?  Is that OK?  (It would be fine if the log 
viewer was smarter than a vanilla text dump, so that you could select to display 
only the actions of a given filter, etc, but that's a whole other feature.)


Another reason not to include this in the Filters dialog right now is the 
confusion about filters-in-Local Folders vs. filters-for-account, in combination 
with deferred accounts.


Howard, I'm curious if/how this change affects manually marking messages as 
junk.

Comment 22

13 years ago
I agree with Mike that we don't want to make users set up filters to deal with
junk mail, but there's nothing that says we can't make the junk mail front end
use filters as the backend implementation of the junk mail settings (nothing
except a bit of work to do that implementation :-) ) On a related note, the new
general retention settings I've implemented could replace the junk mail purging
ui and backend.

Instead of blocking the UI to run junk mail classification, could we just
classify  messages in the background, after we've run the normal filters on a
given hdr? That would probably have all sorts of fun synchronization/contention
issues between pop3 writing to the mail folders and the junk mail classifier
reading from the mail folders, and for imap, it wouldn't help at all since you
can't run two connections to the same imap folder (the one currently downloading
headers, and the second to fetch messages) because of the UW server. And maybe
it doesn't give the code simplification if the classification is async?

Comment 23

13 years ago
Created attachment 184104 [details] [diff] [review]
Remove extraneous parts of previous patch

Sorry, a couple pieces of my bug 195605 patch leaked into the previous diff.
This one should be clean.
Attachment #184093 - Attachment is obsolete: true

Comment 24

13 years ago
(In reply to comment #21)
 
> Eyal, while I'm sympathetic to the desire to make this feature work, I think you 
> are wrong on this score.  Junk Controls is one of the premier TB features with 
> lots of newbie appeal; throwing out the current dialog in favor of making them 
> configure filters is a bad idea.  Furthermore, the JMC dialog has a number of 
> settings that would still be useful, but be completely out of place in filter 
> configuration.
> (Granted, the current JMC dialog is not particularly great, viz bug 257990.)

Yes, I'm pretty much in agreement here.

> I don't think having a specific item in the filters dialog to run the 
> classification is the way to go; it adds too much complexity and error-proneness 
> to the configuration of filters.  I also don't think it should be necessary to 
> define a filter that says "if sender in my address book, mark the message Not 
> Junk" -- that's a basic and obvious enough criterion that it can be handled as a 
> decision point in the junk filter itself.  So, the Junk Controls items for 
> enabling junk detection and whitelisting would remain.

Well, the *function* of whitelisting already exists as a filter criterion, so it
seems unnecessary to implement it again in the junkmail code.

> Altho I hate to make this suggestion because it adds complexity that I'm not 
> sure is desirable, my UI for this would be a special item in each account's 
> filter list, called "Dispose of junk" [DJ].  This item would not be deleteable, 
> but could be disabled, and it could be moved up and down in the list.  It would 
> be hardcoded with the uneditable single criterion [if Junk Status is Junk] and 
> the single action Move to [folder].  The folder would be selectable -- and would 
> be reflected in the Junk Controls dialog.  Whether it's enabled or not would 
> also be reflected in the Junk Controls dialog; and changing either of those 
> settings in the JMC dialog would update the filter.
> 
> Other more advanced actions, such as Mark as Read or Delete from POP server, 
> would be handled in a separate, user-defined filter, which would have to be 
> placed prior to the DJ filter.  (Or, I suppose, those two actions *could* be 
> enabled in the DJ filter.)

Again, since the Junk actions are just a subset of regular Filter actions, I
think it makes more sense for the actions of the DJ filter to be fully editable.
I think it's fine that we specially mark it so that it cannot be deleted, and
its criterion cannot be changed in the Filter editor. But I would tie the JMC
whitelist feature to inserting "sender" / "is in my address book" into the DJ
filter criterion.

> Howard, I'm curious if/how this change affects manually marking messages as 
> junk.

In the current patch, manually marking Junk should have the identical results as
the original code; I left the bulk of that code intact. If we decide to
implement a special DJ filter, then I would change the manual mark code to
explicitly invoke that DJ filter on the marked messages. We can either walk the
filter list looking for the DJ filter, or we can keep the DJ filter as a
separate entity (e.g. the SpamSettings) and put a magic stub in the filter list
that knows to go get the DJ filter at that point in the list.

Comment 25

13 years ago
(In reply to comment #22)
> I agree with Mike that we don't want to make users set up filters to deal with
> junk mail, but there's nothing that says we can't make the junk mail front end
> use filters as the backend implementation of the junk mail settings (nothing
> except a bit of work to do that implementation :-) )

Right, that's what I was getting at.

> Instead of blocking the UI to run junk mail classification, could we just
> classify  messages in the background, after we've run the normal filters on a
> given hdr? That would probably have all sorts of fun synchronization/contention
> issues between pop3 writing to the mail folders and the junk mail classifier
> reading from the mail folders, and for imap, it wouldn't help at all since you
> can't run two connections to the same imap folder (the one currently downloading
> headers, and the second to fetch messages) because of the UW server. And maybe
> it doesn't give the code simplification if the classification is async?

And it doesn't let the user create custom actions for Junk messages, because the
Junk status won't be known in time. Pretty much nothing gained for the effort.

Can we juggle the eventqueue such that the UI won't be blocked while we wait for
classification to complete?

Currently the Junk classifier tries to read the entire message before it does
its thing. For IMAP, why don't we add an option for the classifier to not
download, and just use the headers that are already available? (This is pretty
much what I get now with POP3, since I only download headers.) And if the folder
has been set to automatically download the full message body, then the filter
classifier should use the local/offline message. So in either case, it should
never attempt to download the message body for itself.

Comment 26

13 years ago
>Can we juggle the eventqueue such that the UI won't be blocked while we wait for
>classification to complete?

I don't think so. The filter code runs on the UI thread. You could pump events
on the ui thread event queue like you do on the pushed event queue, but then you
run the risk of re-entering the ui thread code, much of which is not guaranteed
to be safely re-entrant. Maybe I'm wrong about that not being safe - I know we
do that at shutdown to run urls to completion to cleanup the trash and inbox,
but at shutdown, the UI is gone...I guess in particular you run the risk of
re-entering the mail download and filtering code - perhaps that can be protected
against. Cc'ing Darin, for his event queue expertise.

Re IMAP not downloading messages to do junk mail analysis, I don't like that for
several reasons - the message bodies are useful for the junk mail analysis, and
only downloading message bodies if the user has the inbox configured for offline
use would mean that most users would not be able to use the message bodies in
junk mail analysis. So we could have an option to not download message bodies,
but that wouldn't produce the code simplification because we'd still need the
option to download the message bodies, and that *has* to happen after we've
fetched the headers. Put another way, we fetch the message bodies for offline
use after the filters have run, so it wouldn't matter if the inbox was
configured for offline use.


Comment 27

13 years ago
(In reply to comment #21)

As David hinted, what is important in my view is to unify the junk mail controls
and the filters in the backend - a lot of the junk mail code is basically
implementation of filters which are run after the classification instead of upon
message arrival. The UI for this is not the important issue (to me anyway) and I
wouldn't really mind whether it stays the way it is, the way bug 257990 or some
other way.

As for the IMAP/event queue/UI thread freezing issue - this is something which
needs to be given more serious thought, and can in fact be itself considered as
sufficient reason why the bayesian classification of a message has not hitherto
been made part of the filter criteria. I can think of three alternative ways to
approach this:

1. UI thread is blocked until classification completes and the filter chain can
be applied

Pro: you won't be seeing unfiltered messages in folders you don't want to see
them in, and/or marked as unread when you want them marked read
Con: you may have to wait quite a while after a 'get messages' before you can do
anything else

2. classification happens in the background, messages undergoing classification
are visible and manipulable via the UI

Pro: no UI delays
Con: you're seeing messages you may not want to see, marked as unread when you
may not want them marked as unread - which is the very reason why this bug was
opened in the first place...

3. classification happens in the background, but messages undergoing
classification are hidden / not yet added to the inbox

Pro: all messages you get to see are filtered into place and marked read/unread
as per your filtering wishes. No UI delays.
Con: It seems currently all messages must exist in folders, and modifying the
classification/filtering code so as to work on messages not necessarily in
folders might not be a good idea for reasons unknown to me, plus it's a lot more
work than options 1 or 2 above (this can perhaps be avoided if we do add the
messages to folders, but hide them from view until their filtering has been
completed). More messages will pop into view as they are classified and filtered
- this may be somewhat annoying visually). There will need to be some statusbar
notification of when message filtering has been completed as well as when
message download has been completed.


Comment 28

13 years ago
(In reply to comment #26)
> >Can we juggle the eventqueue such that the UI won't be blocked while we wait for
> >classification to complete?
> 
> I don't think so. The filter code runs on the UI thread. You could pump events
> on the ui thread event queue like you do on the pushed event queue, but then you
> run the risk of re-entering the ui thread code, much of which is not guaranteed
> to be safely re-entrant. Maybe I'm wrong about that not being safe - I know we
> do that at shutdown to run urls to completion to cleanup the trash and inbox,
> but at shutdown, the UI is gone...I guess in particular you run the risk of
> re-entering the mail download and filtering code - perhaps that can be protected
> against. Cc'ing Darin, for his event queue expertise.

I would think there is already protection against re-entering the download code.
If a long download is in progress and you click "Get Messages" a second time it
will say a download is already in progress (for POP3 anyway). If IMAP isn't
similarly protected, that's a separate bug. Can we change things so that the
download/filtering step runs on a thread separate from the UI?

> Re IMAP not downloading messages to do junk mail analysis, I don't like that for
> several reasons - the message bodies are useful for the junk mail analysis, and
> only downloading message bodies if the user has the inbox configured for offline
> use would mean that most users would not be able to use the message bodies in
> junk mail analysis. So we could have an option to not download message bodies,
> but that wouldn't produce the code simplification because we'd still need the
> option to download the message bodies, and that *has* to happen after we've
> fetched the headers. Put another way, we fetch the message bodies for offline
> use after the filters have run, so it wouldn't matter if the inbox was
> configured for offline use.

OK. I would think the current code somewhat defeats one of the advantages of
using IMAP - that you can work with your mailboxes without needing to fully
download the mail in advance. Once you enable the Junk controls, you're stuck
downloading everything, and the downloaded message is discarded by the Junk
processor, so any message that you actually use is always fully downloaded at
least twice, and that's gotta hurt on a dialup. It would be nicer if the
autodownload and junk download were consolidated.

Comment 29

13 years ago
>I would think there is already protection against re-entering the download >code.

There might be, but we might need to add some as well.  

> Can we change things so that the
> download/filtering step runs on a thread separate from the UI?

Not easily, no. We'd have change a lot of code to proxy over to the UI thread
anytime we wanted to call code that runs on the UI thread (you can't assume that
any code that runs on the UI thread is thread-safe, i.e., can be called from any
other thread). You can look at the IMAP code to see how fun that is :-)

>OK. I would think the current code somewhat defeats one of the advantages of
>using IMAP - that you can work with your mailboxes without needing to fully
>download the mail in advance.

Not really, no - we have to download the headers before you read messages, but
you can read your mail while the junk mail classification is going on, since we
download those messages one at a time, and queue them, and if you ask to read a
message, that request gets serviced as soon as the current message download is
finished.

> Once you enable the Junk controls, you're stuck
>downloading everything, and the downloaded message is discarded by the Junk
>processor, so any message that you actually use is always fully downloaded at
>least twice, and that's gotta hurt on a dialup. It would be nicer if the
>autodownload and junk download were consolidated.

This would be true, except that we store imap downloaded messages in the memory
cache, so that in most cases, if the junk mail classification has already
downloaded a message, reading it won't require a connection to the server; it'll
get the message directly from the memory cache. And vice versa. And if the user
has configured the folder for offline use, we'll use the offline store instead
of the memory cache.

Comment 30

13 years ago
(In reply to comment #27)

> As for the IMAP/event queue/UI thread freezing issue - this is something which
> needs to be given more serious thought, and can in fact be itself considered as
> sufficient reason why the bayesian classification of a message has not hitherto
> been made part of the filter criteria. I can think of three alternative ways to
> approach this:
> 
> 1. UI thread is blocked until classification completes and the filter chain can
> be applied
> 
> Pro: you won't be seeing unfiltered messages in folders you don't want to see
> them in, and/or marked as unread when you want them marked read
> Con: you may have to wait quite a while after a 'get messages' before you can do
> anything else

I don't believe this is a serious Con since classification occurs one message at
a time. That means the UI gets control back between each message; any delay is
barely perceptible.

> 2. classification happens in the background, messages undergoing classification
> are visible and manipulable via the UI
> 
> Pro: no UI delays
> Con: you're seeing messages you may not want to see, marked as unread when you
> may not want them marked as unread - which is the very reason why this bug was
> opened in the first place...

Right. This seems like not much of a win. I was thinking we might be able to
have a compromise here though - integrate spam classification with filter
processing, but run the filters twice. Once per header download, as is being
done in the existing code, without the classifier, and once when the downloads
complete, like the existing code does when it invokes the spam plugins. The
added twist is this - filters that Move messages to other folders are ignored
during the download. So you can still do marking and other things right away,
but all the headers remain in the Inbox until the classifier gets a chance to
work. Then all the filters are re-run, and any Moves are allowed to occur as
expected.

> 3. classification happens in the background, but messages undergoing
> classification are hidden / not yet added to the inbox
> 
> Pro: all messages you get to see are filtered into place and marked read/unread
> as per your filtering wishes. No UI delays.
> Con: It seems currently all messages must exist in folders, and modifying the
> classification/filtering code so as to work on messages not necessarily in
> folders might not be a good idea for reasons unknown to me, plus it's a lot more
> work than options 1 or 2 above (this can perhaps be avoided if we do add the
> messages to folders, but hide them from view until their filtering has been
> completed). More messages will pop into view as they are classified and filtered
> - this may be somewhat annoying visually). There will need to be some statusbar
> notification of when message filtering has been completed as well as when
> message download has been completed.

Is the point here to avoid interfering with the UI? Isn't that only part of the
problem? (e.g. David's concern about re-entering the IMAP code). Otherwise, if
we can make it work message-by-message, this sounds OK. We should be able to add
a message to a folder without notifying any listeners that it has appeared, right?

Comment 31

13 years ago
(In reply to comment #30)
> I don't believe this is a serious Con since classification occurs one message 
> at a time. That means the UI gets control back between each message;
> any delay is barely perceptible.

Hmm, I guess you're right, I don't remember the code that well. But then you
lose the 'Pro' i mentioned, since if you've put unclassified messages in your
inbox and you're letting the UI thread run then the user is interacting with
messages which may yet be moved or marked read.

> > 2. classification happens in the background, messages undergoing 
> classification
> Right. This seems like not much of a win. I was thinking we might be able to
> have a compromise here though - integrate spam classification with filter
> processing, but run the filters twice. Once per header download, as is being
> done in the existing code, without the classifier, and once when the downloads
> complete, like the existing code does when it invokes the spam plugins. The
> added twist is this - filters that Move messages to other folders are ignored
> during the download.

If you ignore these, this makes running header-based filters only almost
useless, because the main thing they do is move messages to other folders. Or
perhaps you have something else in mind?

> So you can still do marking and other things right away,
> but all the headers remain in the Inbox until the classifier gets a chance to
> work. Then all the filters are re-run, and any Moves are allowed to occur as
> expected.

That's not much of a compromise, it really doesn't avoid the 'Con'.

> > 3. classification happens in the background, but messages undergoing
> > classification are hidden / not yet added to the inbox
> 
> Is the point here to avoid interfering with the UI?

No, the point is this: This way you will never see a message until all filters
have been applied to it and it has been moved to the folder in which it is
supposed to be. No more seeing headers in the inbox and having to wait for them
to move. You'll just know there are some more messages being processed, and
you're not seeing them anywhere. In option 1 I mentioned you could achieve this
effect by first blocking the UI thread, then classifying and filtering all
messages, then unblocking the UI thread - but this is not what we do now and
it's not a good idea. This option is about hiding messages whose headers have
been downloaded but are still being processed.

Comment 32

13 years ago
It seems that the nature of this patch in attachment 184104 [details] [diff] [review] isn't very clear, so
let me try to give a bit more clarification.

In the original code, filters are run on individual messages as soon as they are
downloaded, and junk classification is done on a batch of messages when
downloading is finished. For IMAP, usually filters are only run on the message
headers.

In the patch I've posted here, junk classification is just a side-effect of
using the Junk Status as a filter criterion. That means it happens for an
individual message, as soon as it is downloaded. Specifically, it only happens
for Filters (as opposed to Searches), and it only happens on a message if the
message's current JunkScore is Unknown.

I have deleted all of the functions for classifying a batch of messages; it only
operates one-at-a-time in this patch. I've also deleted all of the special Junk
Classification Listener code, since the regular filter actions will take effect.

This patch appears to work perfectly for me with POP3. But as David pointed out,
it will probably cause problems for IMAP. If we're OK with this approach, then
we just need to make a few fixes to make sure that the Junk body download
request uses the existing IMAP session, and that the state machine only requests
one header at a time so that Body requests can be interleaved as needed.

Comment 33

13 years ago
OK, I set up an IMAP account for myself to test with. Needless to say, the
current patch doesn't work at all. In the first place, the patch doesn't enable
Junk Status as a filter criteria. Secondly, the Online Search adapter doesn't
handle Junk Status as a search criteria either. So, none of the code in this
patch ever executes for IMAP, and junk classification just plain doesn't happen.

I quickly added the 4 lines to enable Junk as a filter criteria, and tried again
with a Junk filter set up. This promptly locked up the UI after the first IMAP
header got downloaded. Even though there's a call to run the event queue, there
are no events in the queue because the IMAP handler refused to run the Body
download while Header downloads were taking place.

Enabling the Filter criteria is no problem. Enabling the Online Search criteria
should also be easy too, just translate Junk queries into searches for an
arbitrary header (X-Keywords) with the Junk or NotJunk value. I wonder why that
wasn't already done?

The IMAP protocol lets a client request multiple items in one command, and the
client issues a single command to retrieve all of the new headers in a folder.
This is efficient, but it prevents this Junk scheme from working. So I'm going
to look into changing it such that for the Inbox, new headers are retrieved one
at a time. This will allow the Body downloads to occur if needed.

Updated

13 years ago
Depends on: 295088

Comment 34

13 years ago
(In reply to comment #33)
> OK, I set up an IMAP account for myself to test with. Needless to say, the
> current patch doesn't work at all. In the first place, the patch doesn't enable
> Junk Status as a filter criteria. Secondly, the Online Search adapter doesn't
> handle Junk Status as a search criteria either. So, none of the code in this
> patch ever executes for IMAP, and junk classification just plain doesn't happen.

I've moved the patches to enable these filter criteria over to bug #196036.

> The IMAP protocol lets a client request multiple items in one command, and the
> client issues a single command to retrieve all of the new headers in a folder.
> This is efficient, but it prevents this Junk scheme from working. So I'm going
> to look into changing it such that for the Inbox, new headers are retrieved one
> at a time. This will allow the Body downloads to occur if needed.

I'm still pretty much stuck here, there are two opposing needs that seem to be
to be totally incompatible with each other. We want headers to download as fast
as possible, and we want messages to be classified/filtered immediately. I think
the only way to allow this is to make the spam classifier run using only the
available message headers, so that it can execute quickly during the header
download phase. Initially this approach will not catch as much spam as it would
using the full body, but it also will cause very few false positives. I think
the point here is that getting *some* useful information quickly is better than
getting *all* the information after being forced to wait. For messages where the
 junkscore is borderline, we could simply leave the headers unmarked, for
handling in a second pass.

So the first pass will handle all of the user filters, with weak (header-only)
junk detection. I think that will still allow the majority of junk to be
disposed of quickly by the filters. After all the headers have finished
downloading, and after all the filters have run, we can do another pass to do
full evaluation in the background of any messages that weren't already
classified and disposed of. This isn't very different from the flow of events
that exists in the current code. The real difference is that (spam+filters) will
be a single unified function that is called in the two places where (filters
alone) and (spam alone) are currently called. Also, from my personal experience
running the spam filter only on headers in my POP3 account, I believe it will
drastically reduce network traffic because most messages can be accurately
identified as spam using only the headers.

Comment 35

13 years ago
(In reply to comment #34)

I understand your feeling of a need to be able to do some filtering with just
the headers, but I think it wouldn't be right to consider such a major change to
how the classifier works as a mere side-effect of the resolution of this bug.
2-stage bayesian classification (headers-only vs. headers-and-body) may not be
such a bad idea in itself, but in my opinion the classification should for now
remain the way that it is; and if an IMAP user uses the junk status in his/her
filters then s/he'll have to wait for the body to download.

Updated

13 years ago
Blocks: 271930

Updated

13 years ago
Blocks: 229847

Comment 36

13 years ago
(In reply to comment #35)
> (In reply to comment #34)
> 
> I understand your feeling of a need to be able to do some filtering with just
> the headers, but I think it wouldn't be right to consider such a major change to
> how the classifier works as a mere side-effect of the resolution of this bug.

OK, you're right, making such a change as a mere side-effect here is a bad idea.
Anyway, I see that bug 215941 already addresses the issue of making IMAP do
header-only junk classification. I suppose I could attack that issue first, and
then make this bug depend on that one.

> 2-stage bayesian classification (headers-only vs. headers-and-body) may not be
> such a bad idea in itself, but in my opinion the classification should for now
> remain the way that it is; and if an IMAP user uses the junk status in his/her
> filters then s/he'll have to wait for the body to download.

I understand the desire not to change deeply entrenched pre-existing behaviors.
Documenting those changes, and making users aware of them, is never any fun. But
I firmly believe that the new proposed behavior is superior and worth any pain
of transition.

Comment 37

12 years ago
I´m just a user... But I use SpamBayes called via  ¨pipe through¨ in Kmail, and
it really works well.  Would it be possible to add the ´pipe through´ action to
TB´s filters? 

Comment 38

12 years ago
Using a 'pipe' filter is a different topic (although yes I suppose it would be a
workaround for junk detection problems). 

I think that's bug 80439.

Comment 39

12 years ago
*** Bug 310542 has been marked as a duplicate of this bug. ***

Comment 40

12 years ago
*** Bug 323705 has been marked as a duplicate of this bug. ***

Comment 41

12 years ago
Will this bug ever be fixed? It was filed almost 3 years ago the first time and I think this is a major bug as it makes the junk filter useless for me. The junk gets moved to "junk" only after I opened the filter by which time I already discovered myself it was junk. It's quite frustrating this still isn't fixed...

Comment 42

12 years ago
(In reply to comment #41)

From my experience, most annoying bugs only get fixed if you fix them yourself.

Comment 43

12 years ago
I'd need to learn C/C++ or whatever TB was written in, so probably if noone else will fix this, it will never happen. Don't count on me... :)

The patch that was attached, will it fix this problem? Sorry, I'm no programmer, so I can't tell. If it's a fix, it probably needs to be applied on the source? Has anyone released a w32-binary with this patch?

Comment 44

12 years ago
(In reply to comment #43)
> I'd need to learn C/C++ or whatever TB was written in, so probably if noone
> else will fix this, it will never happen. Don't count on me... :)
> 
> The patch that was attached, will it fix this problem? Sorry, I'm no
> programmer, so I can't tell. If it's a fix, it probably needs to be applied on
> the source? Has anyone released a w32-binary with this patch?
> 
That patch worked for me with a POP3 account, but it was no good for IMAP. I haven't had time to work on it much since then. Also it wasn't very encouraging, because other folks here didn't like the approach that I thought was best - Filtering junk using headers only. The headers-only approach is fast, it means you never waste time downloading the full body of a message that you're just going to delete anyway, and it's more secure - with header-only junk filtering, I've never downloaded an email virus. Since it was unlikely that my subsequent patches would be accepted, I gave up. And since I only use POP3 myself, my builds work fine for me.

Comment 45

12 years ago
Instead of making the junk mail classification synchronous, with all the potential problems that would cause for imap, and forcing junk mail classification before filters, perhaps we could give the user the option of specifying if a filter was supposed to be applied before or after junk mail classification. So, for example, the user would check a box in the filter editor ui that said apply after junk mail classification. When we get new mail, we'd run the filters that were supposed to run before junk mail classification, and then when junk mail classification was finished, we'd run the filters that were supposed to be run after. And/or, the user could say to run all filters before or after junk mail classification. We know when junk mail classification is finished, so this wouldn't be too hard.

That doesn't address Howard's desire to do junk mail filtering on only headers, but it seems to me that's an orthogonal issue, and could be a separate setting for the junk mail code to specify what it should fetch.

Comment 46

12 years ago
(In reply to comment #45)
> Instead of making the junk mail classification synchronous, with all the
> potential problems that would cause for imap, and forcing junk mail
> classification before filters, perhaps we could give the user the option of
> specifying if a filter was supposed to be applied before or after junk mail
> classification. So, for example, the user would check a box in the filter
> editor ui that said apply after junk mail classification. When we get new mail,
> we'd run the filters that were supposed to run before junk mail classification,
> and then when junk mail classification was finished, we'd run the filters that
> were supposed to be run after. And/or, the user could say to run all filters
> before or after junk mail classification. We know when junk mail classification
> is finished, so this wouldn't be too hard.

This will still be ugly to use, because any filters that run before junk classification, that move messages to other folders, will potentially move junk to other folders. (E.g., I filter messages for various mailing lists to individual folders.) As a user, the only behavior I'd want is to have junk classification completed before all other filters, to make sure junk never gets filed with real email.
 
> That doesn't address Howard's desire to do junk mail filtering on only headers,
> but it seems to me that's an orthogonal issue, and could be a separate setting
> for the junk mail code to specify what it should fetch.

A separate setting would probably be OK, as in bug 215941. This option seems to only be needed for IMAP. Right now with POP3, it just filters whatever was downloaded, and that works fine. With Headers-only downloads, messages that actually get selected for download wind up getting classified twice - once with just the header, and again after the body download, but that's not a problem.

Comment 47

12 years ago
(In reply to comments #45, #46)

This discussion has been going on in way too many bugs. See, for example, bug 198100, comment 19 , by yours truly, which suggests what David has just suggested, among other things. Maybe it would be a good idea to open a meta-bug about the mail filtering system and move the discussion there.

(In reply to comment #43)
> The patch that was attached, will it fix this problem?

Maybe so, maybe no, but the filter system needs a more fundamental overhaul than fixing just this bug. And then there's the question of multi-threading all of this...

Comment 48

12 years ago
My problem isn't that junk won't be detected until it's downloaded, but that it isn't detected if another filter was applied to it.

TB indicates in the tray that a new mail did arrive, I check it just to find it only was a junk mail that wasn't detected until after I opened the folder. So what I need is simply that the junk filter moves mails to the junk folder even if another filter was applied before.

I can only repeat myself, I can't understand how such a bug can go unfixed for years. This literally more or less renders the junk filter useless to people like me who sort 90% of their mails. It hardly ever happens that I'm not bothered by spam, because they all make TB go "new messages arrived"...

Has there ever been any comment by the devs about this bug and if they ever intend to do something about it?

Comment 49

12 years ago
While this bug /is/ annoying... as the messages are eventually filtered to junk mail when you browse to the folder, I would hardly make it out to be the critical bug you're making it out to be.  Mozilla Mail and Thunderbird are quite usable.  The odds of the developers helping you will only go down if you insist on implying that they haven't being "doing their job" for the past few years.

Comment 50

12 years ago
I don't say it's critical, but still it's true that under not too uncommon circumstances the junk filter doesn't really help. Besides that, I doubt the developers do care if I ask nicely or am busily ranting here... ;) They'll fix this thing when they think it's important enough to do so. Still this is very annoying for me as I have to manually check my mails each time a spam arrives  since at that time it still goes through as non-spam. And if every 30mins either a new spam arrives, this can be quite unnerving. That's why at least for me the junk filter doesn't do much filtering, if any. I promise I'll stop nagging now though. :)

Comment 51

12 years ago
(In reply to comment #49)
> While this bug /is/ annoying... as the messages are eventually filtered to junk
> mail when you browse to the folder, I would hardly make it out to be the
> critical bug you're making it out to be.

I'm afraid I agree with comment #48. A junk filter which notifies you of new junk, and forces you to take an action to remove it (browsing to the folder), is barely better than no filter at all. The behaviour is inferior to pretty much most of the other alternatives out there. My personal opinion is that a leading feature failing to do half of what you'd expect it to is fairly critical, for the users it affects, ie. anyone who gets spam and uses filtering rules.

Comment 52

12 years ago
(In reply to comment #47)
> (In reply to comments #45, #46)
> This discussion has been going on in way too many bugs. See, for example, bug
> 198100, comment 19 , by yours truly, which suggests what David has just
> suggested, among other things. Maybe it would be a good idea to open a meta-bug
> about the mail filtering system and move the discussion there.

Good idea. I don't remember reading bug 198100 before, it would be nice to consolidate all of the topics together. There are so many to keep track of...

Comment 53

12 years ago
*** Bug 285471 has been marked as a duplicate of this bug. ***

Comment 54

12 years ago
*** Bug 324222 has been marked as a duplicate of this bug. ***

Comment 55

12 years ago
Can someone clarify how feature requests that relate to SeaMonkey and Thunderbird are rconciled with the codebase.
Now that I have found this bug reference, and see that is does relate to my increasing problem with 'routed spam' from a number of unmoderated eMail lists I am wondering why it has not been addressed in 1.7.* which I am currently using, and how to press for some movement to improve it in SeaMonkey?
Everybody has a different mix of eMail activity, so negative comments here are of little use. For many people there is not a problem, but for many more the simple step of applying 'routing' to junk first will save us a hell of a lot of time, so we can get on with more productive work. Perhaps even getting into the code base ourselves :)

Updated

12 years ago
Blocks: 66425
*** Bug 339219 has been marked as a duplicate of this bug. ***

Comment 57

12 years ago
I see this bug now only is listed as "enhancement". I think it needs at least a "normal" status as this virtually renders the junk filter useless for me as I sort all my incoming mail in folders. So I get notified of every spam I receive as new mail and only when I open the folder and already see by myself that it is junk, the junk filter kicks in and removes the mail - which is not much help anymore.

And resolving a bug that makes the junk filter useless in certain not to uncommon circumstances is more than a mere enhancement to me. I think it should be reclassified and hopefully solved soon.

Comment 58

12 years ago
I'm with you 100% on that Bur
This has been going on too long, but of cause the question is 
Fix it in Sea Monkey AND Thunderbird?
I don't see an answer to that since my comment in January (06), only more duplicate bug reports cleared.
Surely that fact that so many duplicate reports have been lodged indicates that this is a very common problem?

Comment 59

12 years ago
in trunk and 2.0 nightly alpha builds, we analyze each folder that's the destination of a filter for junk status of messages after retrieving all new mail.

Comment 60

11 years ago
(In reply to comment #59)
> in trunk and 2.0 nightly alpha builds, we analyze each folder that's the
> destination of a filter for junk status of messages after retrieving all new
> mail.

I'm using 2.0.0.0 final and I have filters that move all IMAP mails (matches on size > 0) to a composite inbox.

Unfortunately the junk mails aren't identified. They will be identified currectly when I manually run "Tools -> Apply Junk-Filter on this folder".

So the detection works, but it isn't executed on moved mails.
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
(Assignee)

Updated

9 years ago
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.