User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Build Identifier: Bayesian learning is great for spam. Thanks! Now I want to use it for everything. What if every message I manually moved into a folder was added to the Baysian corpus for that folder, and messages could be automatically filtered into folders based on Bayesian matches? This would be far easier to work with than the rule-based systems all other email clients have... and in some cases, would be more useful (some mailing lists seem to actively TRY to make it impossible to filter based on headers). It would be even nicer if you could do more than just move from folder to folder, but that's a good start -- and it's a great user interface. IMO. :-) Reproducible: Didn't try Steps to Reproduce:
yes, this would be great. POPFile at SourceForge tries to do this as a local POP3 server. then you'd only have to make one filter per folder and let the bayesian stuff do the rest. http://sourceforge.net/projects/popfile/ apparently their code works on windows, but is probably quite portable.
I also would like the Bayesian filters for news. Seems like a great way to filter trolls.
Ifile did this, back in 1996 (or maybe earlier). See: http://www.ai.mit.edu/~jrennie/ifile
*** Bug 180160 has been marked as a duplicate of this bug. ***
*** Bug 192927 has been marked as a duplicate of this bug. ***
Might I propose to add the keywords Junk Mail Controls somewhere? That might ease the duplication of this rfe :)
Some, imo, key reasons why the spam classifier works nicely: 1. The Bayesian classifier. But that's just because... 2. Near zero false positive rate. "False positives are bugs."  3. Acceptably fast learning rate. 4. Few false negatives. 5. "False negatives are about performance, not bugs."  Presumably, many clear cut 'readme' / 'ignoreme' binary classifiers will retain these desirable traits. Other binary classifiers might be problematic. For example, a 'readme' / 'readmetoo' binary classifier like girls/boys means that point 5 no longer applies, so suddenly false negatives become "bugs" just like false positives, and so the number of "bugs" is going to naturally jump, say, an order of magnitude higher. Other problems arise for other types of binary classifier. These other binary classifiers might still be as successful for some as the spam detection classifier, but they probably won't be in the general case. Things are generally worse for ternary classifiers and beyond. Classifiers can of course be chained. That is, one could plausibly have a 'check these out first' / 'check later' classifier run over one's non-spam messages to ensure that more urgent emails get dealt with first. Much more interestingly, one could have such a classifier run over one's nntp messages and have them marked as readworthy (or not). Note that I have no idea if these particular binary classifiers would work out in practice, they're just ideas. Going further, assuming the Bayesian classifier were able to be applied outside of mail/news, one could conceivably mark web sites according maybe to various classifications, but at least to 'rulez'/'sucks', and build up one's own web content corpus that could begin to be used to augment the browsing experience by doing things like, say, guiding link prefetch to quit prefetching sucky pages.  Paraphrasing Paul Graham. -- ralph
> Other binary classifiers might be problematic. ... number of "bugs" is going to naturally jump What am I thinking? One simply makes something like a boy/girl classifier be a 3 way classifier, with XXYs left in the inbox. I think the rest of what I said stands. -- ralph
*** Bug 195709 has been marked as a duplicate of this bug. ***
*** Bug 196044 has been marked as a duplicate of this bug. ***
*** Bug 197779 has been marked as a duplicate of this bug. ***
Please modify the bug summary to contain the word junk. This will prevent dupes. It would have at least prevented the dupe I entered (I used the word junk as that is the word used frequently in mozilla) I tried setting it to 'Bayesian filters for more than spam (junk)' but only the owner can do that.
*** Bug 199022 has been marked as a duplicate of this bug. ***
*** Bug 207609 has been marked as a duplicate of this bug. ***
Altering summary - I thought it were about false spam hits.
Created attachment 130265 [details] [diff] [review] preliminary version of my implementation of this feature Attached is my first ever Mozilla patch :). (Actually, it's for Thunderbird, don't think it would work on Mozilla Mail). This more or less implements this feature. Whenever you move a message to another folder, it will train the filter. When you receive new messages or choose Run Junk Mail Controls, it will try to classify the messages into the appropriate folders - if it is confident it will move them, if it is not so confident it will just give a suggestion in the "junkbar" area. The confidence threshold is kind of low at the moment (50%, though the percentages are probably not quite accurate), so it will probably move more than it should - I plan to eventually make this a variable the user can set. Be warned that this is far from a final release - there are plenty of issues... First and most importantly, IMAP does not yet work properly; Bienvenu told me he would work on a patch to help make IMAP more feasible for this feature. Along similar lines, I haven't tried copying across accounts, so I'm not sure how well that'll work. For now, this actually replaces the junk mail controls. It would be possible to do it separately, but to me it makes a lot more sense this way (since you can use it to filter out junk anyway). To get it to work, you must enable adaptive junk mail filters and disable whitelists (there's another thing I'll have to implement later). You should set up a junk folder as usual and you might have to allow it to move messages; I'm not sure if it would work properly without that at the moment. When trying to train a new message that just came it, it sometimes doesn't train the first time you move it - just keep moving it until you see the "observing message" message in the terminal. There are still a bunch of improvements to be made that I haven't yet gotten around to implementing (notice all the ARITODO comments). It's possible that there are also some memory leaks, and it will print a bunch of debugging messages onto your terminal. I wouldn't suggest renaming/deleting any folders you're using this with. If you test this, please give me feedback. Be nice :). Enjoy! Disclaimer: if there is anything I forgot to mention or if I messed up the diff or if this completely fails to work, I'm blaming it on the fact that it's after 4 am.
Hey, that sounds like a great patch. (Forgive me for not testing it -- my email doesn't like to be experimented with :)). Two small questions: 1) How does this interact with the user's existing rules for moving (e.g. mailing list) mail into subfolders? 2) Does it make a special exception for the learning of mail dropped into (for example) the Trash folder? Some other folders' contents are just unclassifiable too, which I wouldn't want stuff automatically filed in. For example, I keep 'ANSWER-ME', 'TO-DO', 'RECIPES-I-HAVE-TRIED' and 'FUNNY-CLASSICS' folders.
Given that this patch is for Thunderbird, it would be nicer if you package it as an extension instead. It's a great feature but it seems special enough that few advanced users might need it, and it's better to avoid bloat for the regular user.
It should interact with existing filters the same way the current spam filter does - I'm pretty sure existing filters should get precedence and my filter will only try to classify a message that isn't touched by the existing ones, but I haven't really tried this so I can't confirm that. It ignores all moves to the trash - clearly you don't want to train on those. I'm thinking about eventually making a preferences dialog that would let you choose which folders you want to allow it to move messages to (so instead of the current system where you just choose to allow it to move messages or not, you can do it on a per-folder basis). I'm not sure I really see the harm in it training on some of those other folders, as long as it never actually tries to move a message there against the user's will. Who knows, though, maybe it could actually learn some of those? I could imagine it picking up a pattern in some of the messages you consider high priority, though obviously this would be a lot harder than a more subject-oriented category. As for the comment about making it an extension, I would definitely like to, but for now I am focusing on getting it working. I think it shouldn't be too hard to move most of my code into my own file once everything works - if anyone wants to try to convert it now that'd be great though. On the other hand, if kept as-is, it's actually not quite as much bloat as you'd expect, since it actually removes a lot of code from the current spam filter's implementation: ari@ari:~/mozilla/mozilla$ grep -r ^[+] mydiffs -c 954 ari@ari:~/mozilla/mozilla$ grep -r ^[-] mydiffs -c 466 About 250 of the added lines are used to implement my own listener class in the msgCopyService to find out when the copy is complete - it's got all these empty functions that don't do anything. It's possible that once Bienvenu adds the notification that will help make IMAP work I can cut down on some of these.
Damn - just discovered that I did botch the patch a little: line 309 should read: @@ -42,6 +42,9 @@ (it is +42,10 in the file), and line 319 should read: @@ -385,6 +388,257 @@ (it was +389,257). Guess that's what happens when you right try to clean up an extra newline AFTER doing the diff...sorry. While I'm posting, if anyone wants to try to test this out, here are all the steps for getting this working in Linux without worrying about losing any e-mail (I just tried it on another machine so I'm pretty sure these should work). It's actually not too hard to do, even if you've never built Mozilla before: 1) Follow the CVS build instructions from http://www.mozilla.org/projects/thunderbird/build.html 2) Before running the build_all line, get my patch, put it inside your mozilla directory, and make those two changes I just mentioned above. Then type "patch -p0 <mypatch" (replacing mypatch with whatever you saved my patch as). 3) Do the build_all as in the instruction page - this will take a while. 4) (optional, but highly recommended) If you already have a ~/.thunderbird directory, move it to ~/.thunderbird-backup - this way we don't mess with any e-mails you already have saved there. 5) Go into the newly created dist/bin directory and run thunderbird. Create a new POP account (you are using POP, right?), and make sure you don't turn off the default option of "leave messages on server" and don't turn on "delete from server when I move messages" so that you don't have to worry about losing anything (both options in Tools->Account Settings->Server Settings). 6) Go to Tools->Junk Mail Controls and click the adaptive filters tag, and enable it. Go back to the settings tab and disable the white lists and (optionally) enable the other 3 check boxes. 7) Make some new folders in your account, get some e-mail and move it around. 8) If something with my filters gets really out of whack, you can start over by deleting the ~/.thunderbird directory and repeating steps 5-7. If you were using Thunderbird before, and want to go back to using your normal version, make sure you move your ~/.thunderbird-backup directory back to ~/.thunderbird before switching back.
Created attachment 130393 [details] [diff] [review] 2nd release of my patch I've attached a much improved version of my patch, sorry to all those who downloaded the first version (it was awful). This is still far from bug-free, and still won't work with IMAP (hopefully David will implement bug 216612 which should hopefully make IMAP more feasible), but it is a lot more useable now. Improvements in this one are a ton of bug fixes, a much improved (and working) interface, and some classification improvements. There are a couple known bugs besides the IMAP issue: sometimes messages do not get moved after being classified by the "run junk mail controls on this folder" tool. Also, sometimes moving from one message to another does not seem to update. Folder renaming/deleting will probably not be handled particularly gracefully. The preferences interface is still unchanged; apply the same settings I suggested before. I have a precompiled binary made with Red Hat Linux at http://www.stanford.edu/~ari05/ If you download this, please send me an e-mail telling me what you think. Remember to never delete messages from server and to backup your old ~/.thunderbird directory so that you don't risk losing anything.
Status update - work has not stopped on this, but I was on dial-up for a couple weeks and now I am back at school but with classes coming up, so I won't be updating as frequently. I fixed a couple bugs, added in a panel in the preferences dialog to set per-folder thresholds (the panel is a bit rough, my apologies), and added in a Porter stemmer which I've found tends to improve classification performance. I'm thinking about playing some more with the tokenizer to get it to recognize a few special features (keep URLs and e-mail addresses intact - or for URL's simplify them to just domains...throw out html comments that are used to break spammy words in half...etc.) but I don't plan to do too much more with tokenizing than that (improving the tokenizer is worthy of an entirely separate project). IMAP still stands where it did before - waiting for progress on bug 216612. I think I'll hold off on posting a new version of this until IMAP works.
I have a question: how are we ever going to know which messages were moved and where they were moved? Is there some indicator that messages were moved so that we check if the move was correct and train the system?
Created attachment 132050 [details] [diff] [review] new version, includes imap support Woohoo! David came through on bug 216612 so I fixed up all the IMAP stuff and got it working! I would say that at this point, the patch is totally useable - in fact I'm using this for all my mail now. No guarantees, of course. Remaining issues, in no particular order: 1. Junk Mail Controls panel - there are a bunch of legacy options, and my added tab is not especially user friendly. In particular, pressing cancel will not forget the changes for my tab. If someone good with xul wants to help out here that'd be great. 2. There will be problems using this with multiple accounts, at least I assume there will be - haven't extensively tested it. I'm thinking about adding some options to make this work better - any ideas? 3. Visual feedback - as Andrea pointed out above, this will be important. Right now I try to print a status bar message in some of my js code but it often doesn't stick around long enough to see it and I don't do it everywhere. Some sort of highlighting in the folder tree would be great, again, I'd welcome help from anyone good with this sort of stuff. 4. Tokenization still not great. I added the Porter stemmer (converts words to their root form) to the tokenizer which should help training happen more quickly, as well as modifying it to not split words at @ symbols or .'s (though it will still ignore .'s and @'s at the beginning and end of the word - this way e-mail addresses stay intact). A major boost would be to strip html comments as in bug 213614, but this seems pretty tricky (any volunteers?). 5. I'd like to add in a "disable notifications" option so messages marked, for example, as spam do not trigger the normal notification and distract you from work. 6. Random edge cases - better handling of renamed/deleted folders, how to handle situations where a message is marked or not marked as something but the training data doesn't have this information saved (if, for example, the program crashed or an IMAP message was moved on the server or something like that). I'd really appreciate it if anyone could help me with any of the above, particularly the UI-related issues. If you have any feedback please e-mail me or post it here.
Sorry to post again, just wanted to elaborate on two points above. First, with respect to the visual feedback, there is definitely a fair amount already in place - the bar where the normal spam classifier says "Thunderbird thinks this message is junk" now says either "This message appears to be ______ (confidence ___)" (with a button for "move it there" or if its already there to "confirm classification") or "You labeled this message as _______" (with a button for "undo label") or "This message is unlabeled" (with a button for "guess" and one for "label as current folder"). The other visual feedback already provided is that if incoming mail is moved, whatever folder it was moved into will of course display the usual unread messages indication. The only problem with the feedback is if you, for example, say "run junk controls on current folder" or press the "guess" button - some messages will get moved around according to your settings but it's hard to tell where they were moved at the moment, so it would be nice to have some sort of highlighting indicating which folder the messages were moved to or something like that. The other thing I wanted to elaborate on is the Porter stemmer...if you're curious, it came from http://www.tartarus.org/~martin/PorterStemmer/.
Ari, the visual feedback is an important issue. One possibility is to replace the junk status icon on the messages window with folder icons (perhaps with different colors / shapes) and have the same icons next to the folder name in the folders tree. This would take a long time for you, so a first shot at it would be to just display the target folder name in the junk status column. I also suggest remapping the "run junk controls" to the junk mail filter only, if you can. Then create a new item called "run adaptive filters", that runs all of the filters that are not junk. Anyway I am looking forward to your stuff, I have been waiting for long for someone fixing bug 183929 (the only reason I miss eudora: I like to read mail an THEN move it to folders) and your stuff seems to be a good enough substitute.
Hi Ari, This is great! I think that automatically setting up a message filter when you drag messages is a good idea. But why not build upon (i.e. incorporate) the existing filter functionality somehow? Surely this can only improve the accuracy of your filtering. If a user has an explicit filter saying all messages from X to Y go to folder Z, then shouldn't you be making use of the information? Also, this would allow users to partially "correct" errors in your Bayesian filtering by introducing new filters. Just an idea. A second little idea would be to make an "undo move" command reverse the effects on your filters (everyone drags to the wrong folder sometimes). Similar to this, when you drag M from X to Y, neither of which are the Inbox, this says "new messages in the Inbox with this signature go into Y AND ALSO don't go into X" ... not sure if that functionality is already there. Okay, there you go, just some thoughts for you. I presume you're already following the POPFile project . . . Cheers, Chris
Hey, if anyone here wants to test this out without having to deal with compiling stuff, I just put a Linux build (compiled in Debian unstable) up at http://ari.stanford.edu/mythunderbird1.tar.bz2 Let me know how (if?) it works for you.
Ari, I'd really like to see your patch distributed as an extension, given I'm running Thunderbird under Windows, and don't have the tools to rebuild it. I appreciate that you're possibly still testing things out, but I also think you'd see a much greater interest in your work if you got it on the texturizer page as an extension. Cheers, Matt
Does anybody have this working under Windows? I'd love to give it a try if it works as advertised. Is development still ongoing?
*** Bug 225965 has been marked as a duplicate of this bug. ***
There's a bounty to be collected for this feature, see http://www.markshuttleworth.com/bounty.html, the second bounty in the 'mozilla work' chapter (at least if it'll work for thunderbird).
yeah, can we package this up as a Thunderbird extension? I too would love to try this.
Alright, I've been kind of MIA for the past month or so (quick plug: been working on this: http://www.stanford.edu/~mcslee/ultimate/) but now I'm hoping to get back on track working on this. I did see the thing about the bounty and contacted the person but have not yet heard back from him - getting that would definitely be a nice incentive for me to push forward a bit more. Either way, though, making this into an extension is a high priority for me (in fact, I'd say it's pretty much all there is left to do), but sadly it's also a nontrivial task. While I designed the patch to reuse a lot of the existing spam filter's infrastructure, unfortunately I needed to modify that infrastructure to get it to work in a more generalized way (for example, rather than simply passing around an enum specifying JUNK/NOTJUNK, I need to actually pass around folder uri's as strings). The plan now is to get my modified interfaces working with the normal junk filter so that it is easier to swap back and forth between the two. So basically, I will need to get some code integrated into the main Thunderbird code, but I will do my best to minimize the changes I need to make to that so that it is easier for the developers to merge my changes. In addition to the more generalized interfaces, I will need to get the hook for a notification of a message being copied added as well. Anyway, that's my status; I'm hoping to have something working by January, but no promises.
Here's another take on this concept, which could turn out to be a real killer feature: Bayesian prioritizing of the inbox. Monitor my Inbox and see how I respond to mail from (1) specific senders, (2) specific threads, (3) mail that's in response to something I sent vs mail that arrives out of the blue, (4) mail that contains specific keywords (maybe let the user manually assign the keywords) and use that to prioritize the mailbox so that more important stuff floats to the top. Note that simply starting with (1) would be a giant step forward. There's some people to whom I usually respond right away. Other people's mail tends to just get read. So float the people to whom I usually respond promptly to the top of my inbox. This would be an outstanding way to differentiate Thunderbird. Bart
Interesting idea that I've thought a little about but haven't really experimented with. The way you describe it - prioritized by sender - sounds more like a glorified way of searching/sorting your box; you could almost accomplish something similar just using the existing search or filtering functionality. However, you posted this idea under this Bayesian filter bug, and I think you may be on to something with that connection. I'm not sure how good a job it would do, but my classifier could easily be modified to use the categories "high priority"/"low priority" rather than the user's folders. The only difference between how it currently works and how you propose is that rather than learning by observing message moves, it would learn by observing the user's behavior - did they reply to the message or not (maybe also extend it to "did they save it or delete it" as well). Also, ideally the result of classification would be to mark a message as "important" rather than as belonging to a certain category, possibly with a sliding scale (color-coded highlighting?) depending on how important it appears to be. Of course, this learning approach would only work under the assumption that there is a pattern in the content of the emails that you tend to reply to, which may or may not be true for each user. I'm fairly positive it's nowhere near as easy as identifying spam, so don't get your hopes up too high that this would be effective. Maybe I'll give it a shot once the "categorizer" is fully working.
Ari, Thanks for the feedback. The potential complexity of the feature was what made me think that maybe it's best to first strip it down to something as simple as possible and 'market' it as such. That's what got me thinking maybe it's easiest to focus at first on the sender. In a very simple form, the learning agent could distinguish between mail from a sender whose mail generally gets (a) deleted without a response, (b) filed without a response, (c) responded to. And sort my Inbox accordingly. While this is definitely a much less precise art than junk filtering (a tough enough task as it is), the good news is that, unlike for junk mail filtering, there's no huge price to pay for false positives. So the worst that can happen is that something ends up lower down in your inbox than it might (and of course any of the current views of the Inbox suffer from that as well). But let me get out of your way since, frankly, I have no idea what I'm talking about :)
Ari contacted me in connection with the bounty I've put up for this functionality. I think he's well on track to claiming the bounty. I wanted to put some more flesh on the usage scenario I had in mind in order to be able to claim the bounty, in case anyone else wants to comment. Here's an extract from an email I sent Ari: Let's use the following usage "story". I don't expect mail to be automatically filed to folders when it is received, as with the current Junk Mail bayesian system. Instead, I envisaged the Bayesian tool being used to assist with the quick selection of the folder to be filed WHEN the message is filed. So basically this would look like an advanced "File Message" dialog box. When the message comes in and has been read by the user, the user presses a hotkey to bring up the "Advanced Message Filing" dialog. We now have to select the correct folder in which to file this message. If you read the bounty descriptions, I want to be able to do two things. First, by simply starting to type the name of the folder that I want, a drop-down listbox of possible folders (with their full folder paths) would be populated. This is a little like the current system for email addresses in message composition windows, when you start typing, it shows a list of suggested email recipients based on what you have typed. Second, that list would be bolstered by the output from the bayesian system. For example, before even typing anything, it might start out with the best-guess folder name already selected based on the bayesian logic. The list might include several Bayesian guesses until I actually start typing something else. So here would be the process. 1. Open Thunderbird, read message in Inbox as normal. 2. Press hotkey for Advanced Message Filing 3. Best-guess folder is already preselected. Hitting enter will file message there immediately. 4. If I start typing the letters "me" the listbox is immediately showing: account/Inbox/mail/people/m/megan elliott account/Inbox/mail/people/m/meenal gallal account/Inbox/mail/people/m/mendhip muran account/Inbox/mail/companies/m/medical magic account/Inbox/mail/companies/m/mexican mojo 5. I can quickly select a folder and hit enter to have it filed there. 6. Filing a message automatically trains the system a little more, so next time it will start off suggesting a better folder. So the extension I'm looking for is not a pre-classification system or filter, so much as intelligent support for the manual message filing process. Good luck to Ari and I hope he'll claim the bounty soon!
This is great. I love the progress Ari has made and I think Mark's bounty will push this along quite nicely. I'm trying to help strip out some annoying HTML tags in bug 213614 as Ari mentioned in comment 25. Anychance someone could point me to the file that does the tokenizing for the bayesian filter. I code C/C++ fairly well and have a passing understanding of bayesian filters from my MIT AI class, Let me know if there's anything else I can I can do to help. (I'm not trying to edge in on the bounty ;), I just want this feature pushed out the door.). Also from what Mark describes it sounds like you could put a dropdown dialog box in the bar Ari describes in Comment 26. Just change ["This message appears to be _____(confidence ___)" (with a button for "move it there" )] to ["Message should be " (drop down box defaulted to what filter thinks it should be) (Move button)] Just implement type ahead find in the dropdown box and have the "hot key" move to the drop down. This of course is just a suggestion. Since it doesn't move the message anyway I would just remove ["This message is unlabeled" (with a button for "guess" and one for "label as current folder")] and use the above message regardless of confidence. Also I would like to see the option to have messages automatically filed based on confidence remain, just have it turned off by default. Ideally you could adjust the confidence per folder. I think that's about it. I've tested Ari's build somewhat and find it works well. I would like to see everything already in a folder classified as that folder (you have to do it manually now). But thanks again for great work. Again let me know if there's anything I can do, Miller
*** Bug 232061 has been marked as a duplicate of this bug. ***
I just thought I would mention, even though perhaps off-topic, that it would be cool to use the same code to filter bookmarks (I hate having to file nested bookmarks) based on page contents, title, etc. Maybe too ambitious, but it could be neat!
*** Bug 252989 has been marked as a duplicate of this bug. ***
*** Bug 260430 has been marked as a duplicate of this bug. ***
I find it depressing that this bug remains outstanding more than a year after a patch was submitted. Is there a problem with applying it? What's the future of this bug?
(In reply to comment #45) > I find it depressing that this bug remains outstanding more than a year after a > patch was submitted. Is there a problem with applying it? What's the future of > this bug? I also find it depressing, but this has happened to me before.. I guess this time it just disappeared among all the other things to fix? Or it was a design decision to not allow general bayesian filters? I don't know, I been watching and I'm still interested in this.. Where are you guys taking this decisions? :) Aren't you interested in this? Now when labels seems to be a fixed enumeration this is even more doable.
*** Bug 297108 has been marked as a duplicate of this bug. ***
Comment on attachment 132050 [details] [diff] [review] new version, includes imap support I'm sure this patch has bitrotted since it was posted. Last I heard, Gandalf was looking into generalizing the bayesian code for use in toolkit/.
Comment on attachment 132050 [details] [diff] [review] new version, includes imap support I'm sure this patch has bitrotted since it was posted. Last I heard, Gandalf was looking into generalizing the bayesian code for use in toolkit/.
Comment on attachment 132050 [details] [diff] [review] new version, includes imap support Unfortunately, this patch has probably bitrotted fairly significantly since it was contributed. Last I heard, Gandalf was looking into generalizing the bayesian code for use in toolkit/, so I'm going to reassign to him for recommendation about what to do with it.
*** Bug 324276 has been marked as a duplicate of this bug. ***
I just requested a very similar enhancement in bug 336112 ("Advanced Automatic Email Filtering") but there are some important differences that might be of interest here. 1) I recommended CRM114 because of it's better classification when dealing with more than two choices. The question is no longer "is this spam" but rather "where does this belong". The "hyperspace" classifier sounds most appropriate. 2) In my case, the problem was some users simply getting too much legitimate email to handle it. They needed complete automatic, unattended sorting. Having a title bar ask "move this to somebox?" would be of no help because they'd still have to open every message first before it could be "automatically" sorted. 3) I recommended being able to select "automatically filter to this folder" every time a new folder was created. If selected, then it would happen automatically, without any user intervention, every time mail arrived. If it was wrong, the user would just move the offending message to the correct place and the system would learn from the mistake. As long as users were even semi-consistant about correcting mistakes (and not just deleting it), the system should learn very quickly (CRM114 learns faster than traditional Bayes).
*** Bug 339442 has been marked as a duplicate of this bug. ***
I'm not dead! Still plan to work on this, but if someone has more time, feel free to take this bug.
I'm looking at implementing a stripped-down version of this in the TB3 timeframe. Let me take the bug for now, if only to get anyone else working on this to talk to me. The main implementation will focus on automatic tagging rather than moves. I'll generalize the existing bayesian classifier to be able to classify N generic features, where the first feature will be the existing junk classification. At least by TB3 I should be able to get enough hooks into the bayesian code so that an extension could easily implement soft tags based on the bayes results. Perhaps we could get some UI hooked up by TB3 as well - though time is tight for that.
I'm removing myself as the assignee for this bug, because the work is progressing, but is directed at an extension rather than the core. I will interpret this bug as requesting functionality in the core - and I am not committed to doing that. But let me give an update to interested readers. I've done a number of patches that allow access to the bayesian filter for features that are different than junk, but I use the term "traits" rather than "features". There is one critical patch that remains, for bug 471071, which is needed to get access to db listeners from js. Once that is implemented (and I hope to see it for TB 3.0 beta 2) then it will be possible for extensions to access the bayesian filter results for traits, and implement functionality that uses it. I will be doing this for "soft tags" in an extension TaQuilla, which will be tracked at http://mesquilla.com But other extensions could also use the functionality as well. For TB 3 beta 2, the trait classification still follows the same rules as junk filters, which means that it does not work by default on certain special folders (sent, draft for example) nor on things like newsgroups or rss feeds. But that is a limitation of the normal message processing, not the bayesian filter. I hope to add additional control to this in the TB 3 time frame, but in any case you can run the bayes filter manually on RSS and news in an extension if you want to.
TaQuilla is now released to experimental status at https://addons.mozilla.org/en-US/thunderbird/addon/10905 This allows the bayes filters to automatically tag emails, which matches the "other features than spam" request of the summary. With the backend work now largely complete, I am tempted to close this bug and ask that new bugs be filed for any specific features that use the bayesian filters. Right now, it is difficult to control how to apply the bayes filters, but I will do specific followup bugs (such as bug 471833) to deal with specific issues. The original request by the reporter: "What if every message I manually moved into a folder was added to the Baysian corpus for that folder, and messages could be automatically filtered into folders based on Bayesian matches?" is a fairly specific request - and certainly doable. It's really the folder-paradigm equivalent of the soft tagging in TaQuilla, I just prefer the tag/virtual folder paradigm myself. Still, this bug as currently defined is pretty general. If someone would like to keep it open, then I would appreciate a comment on what, precisely, is being requested in this bug beyond the current backend work. So plan A is to close this bug. Plan B is to leave it open, but change the summary to specifically mention automatic moves. Comments or further suggestions?
re comment 58: Well, that original request was 9 years ago, so other things have happened around it. I'm not sure I even want folder-based sorting any more, but attribute based views into the mail collection. Date/Family/websites/subscriptions/invoices/chatter etc. IF messages could be given tags for selecting in views, THEN the bayesian filters would be a grand way to slap tags on incoming messages.