181471 - Enable sharing of junk Bayesian database

Reporter

Description

•

22 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021122
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021122

In a corporate environment, often the same spam messages will come in to many
users.  It would be very helpful if the already trained junk filter databases
could be combined and shared back out.  Alternatively, if a client/server model
would be created, where all clients would train a server-based database, which
could be then be accessed by all.

Reproducible: Didn't try

Steps to Reproduce:

laurel

Updated

•

22 years ago

QA Contact: olgam → laurel

dylang

Comment 1

•

22 years ago

It would also be beneficial if there was a way to have IMAP users (such as
myself) have the junk filter settings follow them.

Project Cyrus has an ACAP system ( http://asg.web.cmu.edu/acap/ ) which might be
applicable to this situation.

Jo Hermans

Comment 2

•

22 years ago

Sharing would be rather difficult, since what is spam for me, is't always spam
for someone else.

see also bug 153522

John Peacock

Reporter

Comment 3

•

22 years ago

As I said, in a _corporate_ environment, the needs are very different from an
ISP.  I am also more interested in tagging than blocking, so I would forsee a
combination model:

1) shared spam database is rated at a lower priority
2) user spam database is rated higher
3) user whitelist outweighs shared spam database

This way, the shared experience spam gets enough hits to mark it, if the user
rating concurs, the spam can be deleted unread if the user chose that.  A user
can manually whitelist any address to prevent tagging or deletion.

Matthias Versen [:Matti]

Comment 4

•

22 years ago

*** Bug 181569 has been marked as a duplicate of this bug. ***

Garth Wallace

Comment 5

•

22 years ago

dylang: What you're suggesting is bug #78858. It's unrelated to this bug.

Jan Kuemmel

Comment 6

•

22 years ago

Please note bug #182131.

Matthias Versen [:Matti]

Comment 7

•

22 years ago

*** Bug 187363 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Comment 8

•

22 years ago

to dmose

Assignee: sspitzer → dmose

Status: UNCONFIRMED → NEW

Ever confirmed: true

Gunnar Kaestle

Comment 9

•

22 years ago

If the basic population of training mails is big enough and the contributing
users give lots of different "junk" and "not-junk" criteria, then no false
positive junk marking should occur. Of course, the global spam-filter is not as
sharp as the individual one, trained with familiar junkmail. 
But a wave of newly composed MMF or EYPN messages may be blocked faster, after
only a couple of user reports to the global filter.

I'd appreaciate such a feature highly.

Matthew Kerr

Comment 10

•

22 years ago

#9: What you are suggesting is very similar to the software called "SpamNet" by
Cloudmark, which only works for Outlook (and I think Outlook Express support is
coming soon). When I used to use Outlook I used this add-on extensively and it
worked quite well. It's essentially collaborative centralized spam filtering.
The only downside I can see to such a project for Mozilla Mail is that it may
require a beefy server or a list of mirrors for such a database of caught spam,
and that the traffic for updating it may be significant. Otherwise, I think it
is a very good goal because it essentially creates a peer-reviewed blacklist of
e-mails.

Anyway, it's worth contemplating. This gets my vote.

Akkana Peck

Comment 11

•

21 years ago

This would also help greatly for people who read mail on multiple machines. 
None of my machines is very good at catching spam now, but I bet they'd be a lot
better if they could share data.  I've tried copying training.dat from one
machine to another but it doesn't seem to work.

:aceman

Comment 12

•

21 years ago

So in the meantime, what about some tool for merging several training.dat files?
Users could help each other by exchanging this info. Now, everybody must train
on his own...

Jo Hermans

Comment 13

•

21 years ago

Rather difficult to do - different persons have different ideas what spam is.
For me, every Chinese mail is spam, but not for my collegue in Shanghai. When he
tried to use my mozilla-mailer (different account, but with my training data),
he had to search for his messages in my junk-folder.

Jo Hermans

Comment 14

•

21 years ago

*** Bug 207579 has been marked as a duplicate of this bug. ***

rjf

Comment 15

•

21 years ago

Many seem to consider this request "difficult to do", not technically, but from
the standpoint of one mans spam is anothers gold.

I think that issue is irrelevant. The feature would be implemented as an option,
for those of us who have concrete use cases: I guarantee that what is spam to me
is definitely spam to my 9yr old daughter, and I would like an easy way for her
inbox to take advantage of my spam training.

Mike Fedyk

Comment 16

•

20 years ago

(In reply to comment #6)
> Please note bug #182131.

I'd vote for this.

A corporate environment can used a shared-writable imap folder to share the
bayesian data.

Depending on that bug 182131

Depends on: 182131

Akkana Peck

Comment 17

•

20 years ago

In what way does general sharing (which does not require IMAP) depend on an
IMAP-specific bug?

I think you may have that dependency backward.

Alfonso Martinez

Comment 18

•

20 years ago

*** Bug 239772 has been marked as a duplicate of this bug. ***

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

Jo Hermans

Comment 19

•

19 years ago

*** Bug 291136 has been marked as a duplicate of this bug. ***

Dan Mosedale (:dmosedale, :dmose)

Comment 20

•

17 years ago

Assigning bugs that I'm not actively working on back to nobody; use
SearchForThis as a search term if you want to delete all related bugmail at
once.

Assignee: dmose → nobody

Serge Gautherie (:sgautherie)

Updated

•

16 years ago

Assignee: nobody → mail

QA Contact: laurel

Kent James (:rkent)

Updated

•

15 years ago

Depends on: 506397

Kent James (:rkent)

Updated

•

15 years ago

Assignee: mail → nobody

Component: MailNews: Message Display → Filters

Product: SeaMonkey → MailNews Core

QA Contact: filters

Kent James (:rkent)

Comment 21

•

15 years ago

For followers of this bug, I'd like to point out the work that I am doing in bug 506397, as it is the backend work needed for this bug. 506397 allows message corpus training data from an external source to be added to the bayes training database for a local user. If wanted (and usually you would want) the identity of the external data is kept separate from the training data that the user themself has prepared. That way, you could update or add to the external data without messing with the local user's data, though both would be used in any classifications done by the filter.

The intention of bug 506397 is to provide backend support, but the final implementation of the concept would be implemented in an extension. I will probably provide such an extension myself in the future.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3