Closed
Bug 309620
Opened 17 years ago
Closed 15 years ago
Discard old junk filter data from training.dat
Categories
(SeaMonkey :: MailNews: Account Configuration, enhancement)
SeaMonkey
MailNews: Account Configuration
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 228675
People
(Reporter: allltaken, Unassigned)
Details
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.9a1) Gecko/20050919 SeaMonkey/1.1a Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.9a1) Gecko/20050919 SeaMonkey/1.1a I just noticed that my training.dat file is almost 6Mb, and suspect that the earlier entries in it are only wasting space and processing time. What's the feasibility of purging it of old entries once a week or so? "If date is after ???, purge training dat of entries before ???-a year and increment the next purge date by a week." This would help to clear old useless entries out of training.dat. Reproducible: Always Steps to Reproduce: 1. Set up junk mail controls. 2. Watch the training.dat file grow for a year or two. 3. Watch the time and memory demands of processing the file grow. Actual Results: After some length of time, the file becomes large and contains a lower percent of useful junk filter information. Expected Results: The software should purge old junk data, perhaps older than 6 months or a year, as specified by the user (file size? date? other criteria?)
Comment 1•17 years ago
|
||
FWIW, my training.dat is 856KB; as far as I can recall it's the same file that I've been training to since junk filtering was first introduced (Moz 1.4?) -- when I switched to TB in August '04, I copied training.dat over to the new profile.
Comment 2•17 years ago
|
||
We don't store any dates in the training.dat, so we can't age entries there. You may want to have a look at the <http://bayesjunktool.mozdev.org>, though, and use it to remove all entries in your training.dat with fewer than ~20 occurences...
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
Marked invalid? This is an enhancement request. Adding a last-modified-date to the entries along with counts would make it possible to select and discard out-dated entries in training.dat. I haven't looked at the junk log, but if those entries are dated and old enough, they might show that entries were last modified at or before a specified cutoff date.
Updated•17 years ago
|
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Comment 4•17 years ago
|
||
> This is an enhancement request.
True. Sorry.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Here's a suggestion for an approach that would have about the same effect as aging, if the new records entered in the training.dat file are either appended or prepended: Limit the size of training.dat. A preference and maybe a dialog to specify the maximum size of training.dat, and some code to remove records from the old part of the file would do the job.
This has proved to be a problem with at least one other junk filter, Spamassassin. My email started bouncing because the junk file used up my quota of space at my ISP. The lack of a tool to remove old blacklist data that's probably invalid is an error of logic and foresight in the process of junk filtering. The removal of whitelist data is more problematic and I wouldn't suggest removing any even if it hasn't been used for a long time.
I noticed the addition to preferences in Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20060921 SeaMonkey/1.5a, the button to reset training.dat. Does this button's action restart the whole file, or just the blacklist data? It seems to me that the whitelist probably should be left as is.
Status: NEW → RESOLVED
Closed: 17 years ago → 15 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 228675
You need to log in
before you can comment on or make changes to this bug.
Description
•