Closed Bug 219949 Opened 19 years ago Closed 19 years ago
.dat grows to "insane" size
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030711 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030711 When using junk filtering more heavily, training.dat grows without bounds. Though this is by design for a bayesian filter there are bad side effects. Have a look at the size of my current training.dat: -rw-r--r-- 1 ast users 45246637 Sep 21 18:56 training.dat This seems to cause lots of memory to be 'eaten': 17366 ast 9 0 289M 180M 159M S 0.0 64.5 18:34 0 mozilla-bin Note that the laptop Mozilla is running on has "just" 288MB physical memory so using MailNews causes heavy swapping. Furthermore it takes lots of time for mail to get processed. It can easily happen that mail filtering takes 5 to 15 minutes. OTOH I did some tests with bayesian filters for another purpose that clearly shows that filtering quality is reduced by too many keywords in the filter database (keywords with a low usage count). Thus there needs to be some kind of "purge" option to remove irrelevant keywords from training.dat, i.e. keywords that have a (very) low appearance count and thus don't really affect the spamicity calculation. Reproducible: Always Steps to Reproduce:
a workaround seems to be manually deleting your spam mail, but that's bug 198830 :-)
You seem to forget that a bayesian filter database consists not only of SPAM entries but HAM (aka goot mail) entries, too. You don't suggest that I will have to delete my mail archives to purge the database?!?
Currently this is still by design, see http://www.entrian.com/sbwiki/TrainingIdeas and the interpretation thereof in bug 181534 comment 44. However, this may be dealt with in the future, see bug 228675. Duping... *** This bug has been marked as a duplicate of 228675 ***
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.