Closed Bug 235135 Opened 21 years ago Closed 16 years ago

treat 'UNKNOWN' as if it's the default character encoding

Categories

(MailNews Core :: Backend, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: mathieu, Assigned: smontagu)

Details

Attachments

(1 file)

User-Agent:       
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113

I can only reproduce this bug using my mail.yahoo.com account. If I send a
message with subject 'é è à ç & @ ^ $ ù ? ! €' mozilla is not able to display it
properly. But if sending the same mail from mozilla, the result is perfect.
Using a webmail interface I am able to read properly the mail send from yahoo,
so it should be doable.

Reproducible: Always
Steps to Reproduce:
1. Go to your yahoo mail account
2. write a message with subject 'é è à ç & @ ^ $ ù ? ! €'
3. send it
4. receive it in your mozilla news reader

Actual Results:  
Mozilla News displays: 
Subject: ?

Expected Results:  
Mozilla News displays: 
Subject: é è à ç & @ ^ $ ù ? ! €

I am copying the subject from the eml file:

From: =?iso-8859-1?q?Malaterre=20Mathieu?= <...>
Subject: =?UNKNOWN?Q?=E9_=E8_=E0_=E7_&_=40_^_$_=F9_=3F_!?=
To: <...>


But if I send it from Mozilla news the same lines are:

From: Mathieu Malaterre <...>
Subject: =?iso-8859-1?B?6Q==?= =?iso-8859-1?B?IOg=?= =?iso-8859-1?B?IOA=?=
	=?iso-8859-1?B?IOc=?= & @ ^ $ =?iso-8859-1?B?+Q==?= ? !
To: Malaterre Mathieu <...>
I am attching this eml file so you can have access to the whole mail send via
yahoo.
> Subject: =?UNKNOWN?Q?=E9_=E8_=E0_=E7_&_=40_^_$_=F9_=3F_!?=

It's not mozilla's bug but Yahoo's bug. With the above line (MIME charset :
'UNKNOWN'), there's not much Mozilla can do. 

If virtually all of your emails are in ISO-8859-1, you may set 'Apply default to
all messages' in Options | Fonts (and languages) with the default character
encoding set to ISO-8859-1.  You can also turn on/off that property per folder.

It may be nice to treat 'UNKNOWN' as synonymous as the default character
encoding even if 'charset override' is turned off.  That is, when 'charset
override' is off, honor other character encodings but map 'UNKNOWN' to the
default character encoding. 

I'm changing the summary line accordingly. 
Assignee: sspitzer → jshin
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: Wrong interpretation of non-ASCII characters in subject → treat 'UNKNOWN' as if it's the default character encoding
jshin:
Wouldn't it be better to let the charset detector code pick the matching
encoding ?
I don't get what you meant. The encoding detector can't work reliably on such a
short run of text as mail headers.
I'm wondering if Yahoo really write "UNKNOWN" into the 
mail header. I have Yahoo accounts and when I send 
the same message containing Latin 1 accented charcters
into the Subject header, I see raw 8-bit characters both in
the headers and body and there is no "UNKNOWN" charset.
I use both netscape.net and another ISP to receive msgs and 
neither server adds "UNKNOWN".

It's possible that some mail server after Yahoo one is inserting
this one. If it's Mathieu's local one, he should get that fixed by
asking the admin person. 
(In reply to comment #5)
> I'm wondering if Yahoo really write "UNKNOWN" into the 
> mail header. I have Yahoo accounts and when I send 
....
> It's possible that some mail server after Yahoo one is inserting
> this one. 

  Aha, that's it. 

> If it's Mathieu's local one, he should get that fixed by
> asking the admin person. 

 Actually, it's not so clear whether we can blame it. As you know well, 8bit
octets are forbidden in RFC (2)822 message headers. So, whoever writes the SMTP
server in question must have thought that (s)he was doing the right thing in
that her/his program preserves the content while being compliant to the
standard. So, we may have to adjust ourselves to such a server....

 BTW, Cyrus IMAP server (made at CMU) turns 8bit octets to question marks. That
is, it throws away the information completely. It might be argued that we can
enforce the standard with only such a 'harsh/relentless' treatment of
non-compliant messages. I'd not go that far (yet), but I'm afraid that might be
the only way to clean up all these mess.

(In reply to comment #6)
> (In reply to comment #5)
> > I'm wondering if Yahoo really write "UNKNOWN" into the 
> > mail header. I have Yahoo accounts and when I send 
> ....
> > It's possible that some mail server after Yahoo one is inserting
> > this one. 
> 
> 
>  Actually, it's not so clear whether we can blame it. As you know well, 8bit
> octets are forbidden in RFC (2)822 message headers. So, whoever writes the SMTP
> server in question must have thought that (s)he was doing the right thing in
> that her/his program preserves the content while being compliant to the
> standard. So, we may have to adjust ourselves to such a server....

To follow this line of reasoning, I would think we have to conclude that Yahoo
is at fault for sending out raw 8-bit headers. You seem to be of the opnion that
drastic measures are needed to get them to correct this tyep of practice. This
then raises a question if a end-user client should cover for this type of error
passed on to it via "UNKNOWN" charset name. If we provide a workaround for
Yahoo, it would not be good for them and further we would be allowing them to go
on without feeling pressures. 
I admit that I was not very coherent in comment #6 :-). Anyway, Yahoo is
certainly at fault for such a gross violation of the standard. See my comments
in bug 166521. Yahoo and other web mail service providers could  have
implemented what I wrote there years ago. 
> It's possible that some mail server after Yahoo one is inserting
> this one. If it's Mathieu's local one, he should get that fixed by
> asking the admin person. 

You are perfectly right. I tried sending this very same mail to another mailbox,
and as you said the 8bits character are there:

From: =?iso-8859-1?q?Malaterre=20Mathieu?= <...@yahoo.com>
Subject: é è à ç & @ ^ $ ù ? ! €
To: <...@nycap.rr.com>

Thus mozilla display it properly. I'll send a mail to my server admin: rr.com

Thanks a bunch guys you were very prompt to answer !
(In reply to comment #9)
 
> From: =?iso-8859-1?q?Malaterre=20Mathieu?= <...@yahoo.com>
> Subject: é è à ç & @ ^ $ ù ? ! €
> To: <...@nycap.rr.com>

So, Yahoo is not totally ignorant of RFC 2047. It does encodes 8bit character
per RFC 2047 when all characters are covered by ISO-8859-1 (the default for
Mathieu). It's Euro at the end of Subject that made them emit raw 8bit
characters. Instead of just sending out raw 8bit characters, Yahoo should warn
senders that there are characters not covered by the default character encoding
(ISO-8859-1) or it should 'go up' to either Windows-1252, ISO-8859-15 or UTF-8
as Mozilla-mail does. 

 
> Thus mozilla display it properly. I'll send a mail to my server 
> admin: rr.com

You may also write to Yahoo. They can do much better than they currently do. I
should have founded a web mail service company in 1996 :-) See bug 166521
comment #21.

Product: MailNews → Core
Simon - is this invalid based on last comment?
Assignee: jshin1987 → smontagu
QA Contact: esther → backend
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: