Open Bug 298041 Opened 20 years ago Updated 2 years ago

Stand-alone Bayesian filter binary using training.dat

Categories

(MailNews Core :: Filters, enhancement)

enhancement

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: mozilla3eran, Unassigned)

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4
Build Identifier: 

In some cases it would be very useful to have a small stand-alone binary capable
of classifying messages as junk according to a given a Bayesian database, using
the same algorithm as Thunderbird and Mail&News. That is, something like:
  cat message | mozilla-bayes-check --db=~/training.dat && echo HAM || echo SPAM

The problem being addressed is that Mozilla's Bayesian database requires the
messages to be retrieved to the client. In some cases, this is prohibitive
(e.g., on my last trip abroad I had to painfully download 3000 messages via a
dial-up connection just so Mozilla can find the 500 non-spam messages among these).

A stand-alone binary would let people run a server-side filter which sets aside
junk into (say) a separate IMAP folder, so users can skip the downloading junk
mail. Compared to current server-side solutions, it has the benefit of using
Mozilla's excellent local feedback UI.

Extra points for a mechanism for easily updating the server-side database, but
in many cases just occasionally copying a well-trained local database to the
server would be good enough.

This is related to (but different than) bug 181471, which deals with sharing the
Baysian database between multiple Mozillas.

An alternative solution would be to export Mozilla's training.dat into a format
that is used by one of the existing server-side solutions and hope that its
algorithm is sufficiently similar to Mozilla's.

Reproducible: Always

Steps to Reproduce:
Another use would be to reject spam already at the incoming SMTP stage (which,
beside saving bandwidth, might cause te spammer to remove the victim's address
from his list). 

This would require a higher confidence threshold, so it would be useful if the
filter output a score (which the Bayesian algorithm computes internally) instead
of just a boolean decision.
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
This feature request is still pertinent.
QA Contact: filters
Product: Core → MailNews Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.