Bugzilla

Comment 2

•

24 years ago

The languages don't have to be installed. The header has to just be searched I 
think to see what language its in.

Updated

•

24 years ago

Depends on: 59368

Matthew Tuck [:CodeMachine]

Updated

•

24 years ago

Blocks: 66425

Katsuhiko Momoi

Comment 3

•

24 years ago

I wonder how practical this is.
In any case, if we want this kind of feature,
maybe we should allow for a general 'header' key in which
the user can manually specify a header attribute.
For example,

Key:      relationship  value:

Headers   include     charset=iso-2022-jp
Headers   include     Content-type: text/html
etc.

Comment 4

•

24 years ago

A specific language filter rule would be more discoverable by novice users.

Katsuhiko Momoi

Comment 5

•

24 years ago

In that case, there needs to be a backend way to
map a 'language' name to a corresponding set of 'charsets'.
This is because one language may use more than one charsets.
Coming up with such a list is non-trivial as there are
quite a few languages. For the encoding such as
ISO-8859-1, there is no way to distinguish languages
that it supports without additional lang info buried 
in the messages.

That is why I asked above how practical this is.

Matthew Tuck [:CodeMachine]

Comment 6

•

24 years ago

Yep.  Maybe you could do something like:

Encoding is <blah> (Language, Language, ...)

Matthew Tuck [:CodeMachine]

Comment 7

•

24 years ago

"Language Encoding" or similar might be easier to understand.

Dan Rosen

Reporter

Comment 8

•

24 years ago

Mm, "language encoding" sounds good to me. There are descriptions for each
encoding ("Central European" for example) which serve to better describe each
encoding than enumerating each language using it (since there are often
political issues attached to the names of languages -- Serbo-Croat vs. Croatian
vs. Bosnian, etc.)

scottputterman

Comment 9

•

23 years ago

reassigning to naving

Assignee: gayatrib → naving

Comment 10

•

23 years ago

This is a bad way to filter spam.  Many Japanese users send all mail (including 
mail written in English) in the Japanese charset, etc.

Alfonso Martinez

Comment 11

•

23 years ago

*** Bug 129263 has been marked as a duplicate of this bug. ***

Roland Mainz

Comment 12

•

23 years ago

AFAIK such a filter is useless - the SPAMers always can switch to UTF-8
encoding... what do you in that case ?

Håkan Waara

Comment 13

•

23 years ago

Also, as comment 10 points out, many non-spammers use asian encodings even when
sending messages in english. This would probably cause a lot of trouble for such
users sending mail to a mozilla-mailnews user.   I hate spam, but I believe
there must be better ways to filter out junk -- this would be a ugly hack. 
Suggesting wontfix.

Håkan Waara

Comment 14

•

23 years ago

I would encourage discussion regarding clever ways / algorithms to detect spam,
so we could build those features instead. netscape.public.mail-news, anyone?

Roland Mainz

Comment 15

•

23 years ago

Håkan Waara write:
> Suggesting wontfix.
It is not a way to defeat spammers - but it may be usefull in other ways.

Just implement it - it won't hurt... :)

Jean-Marc Desperrier

Comment 16

•

22 years ago

This kind of filtering can be done for user who really want it with the current
Mozilla even if it's not very easy.

The steps could just documented in a document somewhere.

- Create a new filter 
- choose the Customize header
- create a new customized header named "Content-Type"
- When it contains "ks_c_5601" (corean spam) or "iso-2002-jp" (japanese spam)or
big-5 (chinese spam), set the rule to destroy the message.
- add a rule to also destroy the mail when you find one of the above string in
the subject. (I haven't tested if it really works, I don't know if the filter
applies before or after subject encoding decoding. If it's after, and thinking
about it it should be after, this won't work).

Comment 17

•

22 years ago

Wontfix.  Filtering by language would be an ineffective and dangerous way to
block spam, so we should not encourage Mozilla users to use language to filter spam.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → WONTFIX

Roland Mainz

Comment 18

•

22 years ago

Jesse Ruderman wrote:
> Wontfix.  Filtering by language would be an ineffective and dangerous way to
> block spam, so we should not encourage Mozilla users to use language to filter
> spam.

Please read comment #15 - and consider reopening this bug. "Implement
filter-by-language" may not be effective to filter SPAM but it may have other
(usefull) purposes...

Comment 19

•

22 years ago

There are many headers that people might want to filter on, but they can't all
be listed in the filter dialog.  Language encoding would be one of the less
reliable and more confusing headers to filter on.

Comment 20

•

22 years ago

I don't know Japanese, Chinese, or any other oriental language for that matter,
and somehow I manage to get all these emails in Outlook that look like absolute
jibberish because I'm on some stupid mailing list, and its absolutely annoying!
If Mailnews is able to show chinese, etc characters for an email, then doesn't
it KNOW its in chinese?

R.K.Aa.

Comment 21

•

22 years ago

*** Bug 154811 has been marked as a duplicate of this bug. ***

R.K.Aa.

Comment 22

•

22 years ago

*** Bug 159150 has been marked as a duplicate of this bug. ***

Andre-John Mas

Comment 23

•

22 years ago

The filter would also have to take advantage of UTF character ranges. Maybe
one solution would be to provide the ability to add mail filter plug-ins. This
way Mozilla need not provide all these filters, but at least offer the
possibility for someone to provide their Mozilla distribution independent
filter. The filter dialog would then be coupled with an 'advanced' filter
section where you would select the filter by name and then hit a configure
button which would bring up the settings panel of the filter. Below is a quick
rendition of what the code interface could look like:

   Filter 
     - getName() : String
     - getConfigPanel( SettingsRef ) : Panel
     - matches ( e-mail ) : boolean

BTW I certainly feel that this needs to be reopened

Dan Rosen

Reporter

Comment 24

•

22 years ago

Ok, you have good points. I agree that it wouldn't have the intended effect as 
much as I'd hope. Your argument of Japanese users sending English mail in Shift-
JIS encoding is particularly compelling. But I wouldn't consider it 
categorically useless, or harmful.

I don't ever expect to get any email from Russia or China, for example. Even 
English email. I don't have any friends or family there, and I'm certainly not 
interested in any "business opportunities" there. So if I see KOI8 or Big5 
emails, I don't care what the content is, I want that mail in the trash.

As for UTF-8, it'd be pretty easy to deal with character ranges... Actually, 
come to think of it, do we convert all encodings to Unicode internally? (That'd 
help with the other encodings). Regardless, it'd be plenty good to take a 
random sampling of characters in the email, determine their unicode range, and 
filter based on that. I'd be very happy to say "only send me email in basic 
latin and latin-1 supplement."

I think behavior like that, specified by the mail recipient's expectations of 
where -- in a very broad-brush approximation sense -- they expect to be 
receiving mail from, would be plenty good.

So I apologize if this is a nuisance but I'd like to reopen this request for 
enhancement. It shouldn't clutter your radar as such, and I think it would be a 
rather useful feature for many users, even if it's not perfect for everybody.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

(not reading, please use seth@sspitzer.org instead)

Comment 25

•

22 years ago

I don't get much legitimate mail from China or Japan, but I do get legitimate
mail from friends who live in the US and send all of their mail in strange
character sets.

Summary: Implement filter-by-language → Filter by language

Comment 26

•

21 years ago

mass re-assign.

Assignee: naving → sspitzer

Status: REOPENED → NEW

Jo Hermans

Comment 27

•

21 years ago

*** Bug 225784 has been marked as a duplicate of this bug. ***

ivo welch

Comment 28

•

20 years ago

has anything ever happened here?  I have become a favorite of korean and
japanese spammers---and since I have not spoken korean EVER, I would love to
turn these off.

Stephen Walker

Comment 29

•

20 years ago

*** Bug 268646 has been marked as a duplicate of this bug. ***

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Comment 30

•

19 years ago

I disagree with Jesse's reasoning.  That people post to lists in other than their native (or default) language and forget to change the language type represents incorrect behaviour on their part...

This is not an invalidation of the basic idea and its merit of filtering based on language.

The people doing the above will eventually learn to do the right thing.  In any case, TB and the rest of the Mozilla clan allows for one to easily set up multiple user profiles.  One could be a profile used exclusively for posting to mailing lists, that has default options such as:

* charset 8859-1
* top-posting
* text only encoding
* etc.

Mike Cowperthwaite

Comment 31

•

18 years ago

It's been possible for a long time now to add Content-Type to the list of searchable headers, and then filter on a charset name (similar to comment 3).

Comment 4 and 5 are still valid, if it's actually worth anyone's effort to set up a mapping between, say, Japanese and several encodings on the behalf of those novice users who want an easily discoverable way to ignore such messages.

But if the encoding is UTF-8, you can't tell what language it's in.  Either you'll end up filtering out, say, French or German messages, or you'll allow some Japanese messages.  Either way, these novice users are going to be confused by the situation.  And I agree with Jesse Ruderman's basic premise: this is a dangerous feature; therefore, I'd argue it shouldn't be discoverable.

Given that, combined with the pretty high quality of the Junk Controls feature, this bug really should be WontFix'd, for good.

Comment 32

•

18 years ago

(In reply to comment #31)

> But if the encoding is UTF-8, you can't tell what language it's in.  Either
> you'll end up filtering out, say, French or German messages, or you'll allow
> some Japanese messages.  Either way, these novice users are going to be
> confused by the situation.  And I agree with Jesse Ruderman's basic premise:
> this is a dangerous feature; therefore, I'd argue it shouldn't be discoverable.

True, but there are a lot of spammers out there that try to imitate what Outlook does, and outlook likes Windows-125[0-8] charset encoding... God knows why.

There isn't much that can't be encoded in USASCII, ISO-8859-1, or UTF8 (in that order of trying).

In fact, there's an RFC out there (forget which) that says that these are the recommended encodings, and that nothing else should be used.

This applies to comment #10 as well:  since messages should be encoded in the smallest encoding that that they will fit ("Be conservative in what you end...", to quote Jon Postel), since this has the highest probability of being supported, then English would be encoded in USASCII or at worst ISO-8859-1, not any Japanese native charsets.

As for comment #12: when that happens, we'll evolve, as they have.

Comment 33

•

18 years ago

(In reply to comment #32)

Umm... "Conservative in what you send..."  Fat fingers.

Mike Cowperthwaite

Comment 34

•

18 years ago

(In reply to comment #33)
> (In reply to comment #32)
> 
> Umm... "Conservative in what you send..."  Fat fingers.

This bug isn't about sending, it's about receiving.  And the rest of that aphorism, "be liberal in what you accept," exactly countermands your so-called "argument" for keeping this bug open.

Andre-John Mas

Comment 35

•

18 years ago

Just a question, are there any hooks available to be able to write a plugin to do this? In a worst case scenario I can imagine being able to add custom filter plugins until there is a large enough demand for such a feature.