Open Bug 181471 Opened 22 years ago Updated 2 years ago

Enable sharing of junk Bayesian database

Categories

(MailNews Core :: Filters, enhancement)

x86
Windows 2000
enhancement

Tracking

(Not tracked)

People

(Reporter: john.peacock, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021122
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021122

In a corporate environment, often the same spam messages will come in to many
users.  It would be very helpful if the already trained junk filter databases
could be combined and shared back out.  Alternatively, if a client/server model
would be created, where all clients would train a server-based database, which
could be then be accessed by all.

Reproducible: Didn't try

Steps to Reproduce:
QA Contact: olgam → laurel
It would also be beneficial if there was a way to have IMAP users (such as
myself) have the junk filter settings follow them.

Project Cyrus has an ACAP system ( http://asg.web.cmu.edu/acap/ ) which might be
applicable to this situation.
Sharing would be rather difficult, since what is spam for me, is't always spam
for someone else.

see also bug 153522
As I said, in a _corporate_ environment, the needs are very different from an
ISP.  I am also more interested in tagging than blocking, so I would forsee a
combination model:

1) shared spam database is rated at a lower priority
2) user spam database is rated higher
3) user whitelist outweighs shared spam database

This way, the shared experience spam gets enough hits to mark it, if the user
rating concurs, the spam can be deleted unread if the user chose that.  A user
can manually whitelist any address to prevent tagging or deletion.
*** Bug 181569 has been marked as a duplicate of this bug. ***
dylang: What you're suggesting is bug #78858. It's unrelated to this bug.
Please note bug #182131.
*** Bug 187363 has been marked as a duplicate of this bug. ***
to dmose
Assignee: sspitzer → dmose
Status: UNCONFIRMED → NEW
Ever confirmed: true
If the basic population of training mails is big enough and the contributing
users give lots of different "junk" and "not-junk" criteria, then no false
positive junk marking should occur. Of course, the global spam-filter is not as
sharp as the individual one, trained with familiar junkmail. 
But a wave of newly composed MMF or EYPN messages may be blocked faster, after
only a couple of user reports to the global filter.

I'd appreaciate such a feature highly.

#9: What you are suggesting is very similar to the software called "SpamNet" by
Cloudmark, which only works for Outlook (and I think Outlook Express support is
coming soon). When I used to use Outlook I used this add-on extensively and it
worked quite well. It's essentially collaborative centralized spam filtering.
The only downside I can see to such a project for Mozilla Mail is that it may
require a beefy server or a list of mirrors for such a database of caught spam,
and that the traffic for updating it may be significant. Otherwise, I think it
is a very good goal because it essentially creates a peer-reviewed blacklist of
e-mails.

Anyway, it's worth contemplating. This gets my vote.
This would also help greatly for people who read mail on multiple machines. 
None of my machines is very good at catching spam now, but I bet they'd be a lot
better if they could share data.  I've tried copying training.dat from one
machine to another but it doesn't seem to work.
So in the meantime, what about some tool for merging several training.dat files?
Users could help each other by exchanging this info. Now, everybody must train
on his own...
Rather difficult to do - different persons have different ideas what spam is.
For me, every Chinese mail is spam, but not for my collegue in Shanghai. When he
tried to use my mozilla-mailer (different account, but with my training data),
he had to search for his messages in my junk-folder.
*** Bug 207579 has been marked as a duplicate of this bug. ***
Many seem to consider this request "difficult to do", not technically, but from
the standpoint of one mans spam is anothers gold.

I think that issue is irrelevant. The feature would be implemented as an option,
for those of us who have concrete use cases: I guarantee that what is spam to me
is definitely spam to my 9yr old daughter, and I would like an easy way for her
inbox to take advantage of my spam training.
(In reply to comment #6)
> Please note bug #182131.

I'd vote for this.

A corporate environment can used a shared-writable imap folder to share the
bayesian data.

Depending on that bug 182131
Depends on: 182131
In what way does general sharing (which does not require IMAP) depend on an
IMAP-specific bug?

I think you may have that dependency backward.
*** Bug 239772 has been marked as a duplicate of this bug. ***
Product: Browser → Seamonkey
*** Bug 291136 has been marked as a duplicate of this bug. ***
Assigning bugs that I'm not actively working on back to nobody; use
SearchForThis as a search term if you want to delete all related bugmail at
once.
Assignee: dmose → nobody
Assignee: nobody → mail
QA Contact: laurel
Depends on: 506397
Assignee: mail → nobody
Component: MailNews: Message Display → Filters
Product: SeaMonkey → MailNews Core
QA Contact: filters
For followers of this bug, I'd like to point out the work that I am doing in bug 506397, as it is the backend work needed for this bug. 506397 allows message corpus training data from an external source to be added to the bayes training database for a local user. If wanted (and usually you would want) the identity of the external data is kept separate from the training data that the user themself has prepared. That way, you could update or add to the external data without messing with the local user's data, though both would be used in any classifications done by the filter.

The intention of bug 506397 is to provide backend support, but the final implementation of the concept would be implemented in an extension. I will probably provide such an extension myself in the future.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.