Closed Bug 219949 Opened 21 years ago Closed 21 years ago

training.dat grows to "insane" size

Categories

(MailNews Core :: Filters, enhancement)

x86
Linux
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 228675

People

(Reporter: ast, Assigned: sspitzer)

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030711
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030711

When using junk filtering more heavily, training.dat grows without bounds.
Though this is by design for a bayesian filter there are bad side effects.
Have a look at the size of my current training.dat:

-rw-r--r--    1 ast      users    45246637 Sep 21 18:56 training.dat

This seems to cause lots of memory to be 'eaten':

17366 ast        9   0  289M 180M  159M S     0.0 64.5  18:34   0 mozilla-bin

Note that the laptop Mozilla is running on has "just" 288MB physical
memory so using MailNews causes heavy swapping. Furthermore it takes
lots of time for mail to get processed. It can easily happen that mail
filtering takes 5 to 15 minutes.

OTOH I did some tests with bayesian filters for another purpose that
clearly shows that filtering quality is reduced by too many keywords
in the filter database (keywords with a low usage count).

Thus there needs to be some kind of "purge" option to remove irrelevant
keywords from training.dat, i.e. keywords that have a (very) low
appearance count and thus don't really affect the spamicity calculation.


Reproducible: Always

Steps to Reproduce:
a workaround seems to be manually deleting your spam mail, but that's bug 198830 :-)
You seem to forget that a bayesian filter database consists not only
of SPAM entries but HAM (aka goot mail) entries, too. You don't suggest
that I will have to delete my mail archives to purge the database?!?
Currently this is still by design, see 
http://www.entrian.com/sbwiki/TrainingIdeas
and the interpretation thereof in bug 181534 comment 44.

However, this may be dealt with in the future, see bug 228675. Duping...

*** This bug has been marked as a duplicate of 228675 ***
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.