129443 - Incorrect encoding (charset) for mail and news/nntp URIs in browser

Reporter

Description

•

23 years ago

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC; en-US; rv:0.9.9+) Gecko/20020306
BuildID:    2002030608

when opening in the browser a Hebrew file with the MIME type of message/rfc822 ,
Mozilla incorrectly displays it as windows-1255, and show junk. The user has to
manually change the encoding to unicode

Reproducible: Always
Steps to Reproduce:
1. go to http://www.typo.co.il/~sbforum/MESSAGES/86/1586.eml
2. see how the hebrew is (not) displayed
3. change the encoding to unicode in order to view correctly

Actual Results:  mozilla displays the message with the wrong encoding

Expected Results:  mozilla should pick the correct encoding

I have heard that this happens in linux and windows as well, but haven't tested
on them.

Rainer Bielefeld

Updated

•

23 years ago

Summary: When opening a URL with a Hebrew file with the mime type of message/rfc822, mozilla incorrectly detects it as being windows-1255 → When opening a URL with a Hebrew file with the mime type of message/rfc822, mozilla incorrectly detects it as being windows-1255

Matthew Blanchard

Comment 1

•

23 years ago

This problem is not specific to Mac. It happens in Windows and Linux as well.

Some more examples:

http://www.typo.co.il/~sbforum/MESSAGES/80/1580.eml
http://www.typo.co.il/~sbforum/MESSAGES/73/1573.eml

These messages are encoded in 'windows-1255'. 

Mozilla displays them as either 'iso-8859-1' (latin) or 'windows-1255' 
(actually I'm not sure about the later -- I'm using Windows now and the 
messages are shown using 'iso-8859-1', not 'windows-1255') and you see junk 
instead of Hebrew.

You have to switch to UTF-8 in order to read them.

It seems that the user's default encoding doesn't matter. Nor does it matter if 
the message is 'multipart/alternative' or single.

The following messages are encoded in UTF-8, so the problem is not specific to 
Hebrew:

http://www.typo.co.il/~sbforum/MESSAGES/75/975.eml
http://www.typo.co.il/~sbforum/MESSAGES/15/715.eml

Mozilla displays them as 'iso-8859-1' (latin) and you see junk instead of 
Hebrew.

Shoshannah Forbes

Reporter

Updated

•

23 years ago

OS: Mac System 9.x → All

Hardware: Macintosh → All

Simon Montagu :smontagu

Comment 2

•

23 years ago

This should probably be in the intl component.

Assignee: mkaply → yokoyama

Component: BiDi Hebrew & Arabic → Internationalization

QA Contact: zach → ruixu

Simon Montagu :smontagu

Comment 3

•

23 years ago

There is a summary of what's going on here at
http://bugzilla.mozilla.org/show_bug.cgi?id=33049#c17

Bug 33049 was resolved as WORKSFORME, but this seems to be a real problem.

Katsuhiko Momoi

Comment 4

•

23 years ago

.eml files are files saved by Mozila/Netscape 6Mail. If it is saved by 
Mozilla/Netscape 6, they are saved in UTF-8. So what you're seeing
is not a bug but according to the current spec. If you want to see them in the
encoding of the system you are using, you shoud save them as ".txt" files.
Better yet, use HTML format when saving.

So in essence this is not a browser bug. These people are exposing
saved mail msgs without pointers and they should be told to 
include an instruction. My suggestions to eliminate the problem:

1. When you save mail msgs, use HTML format. This should get you the
   document encoding tag.
2. Turn on View | Character Coding | Auto-Detect | All.
   Auto-detectors normally check for UTF-8 sequences 

It is possible that we can build in automatic UTF-8 check on any 
encoding menu item. I wonder if that is a good idea or bad idea. 

During Communicator 4.x days, we used to check for UCS-2 on any incoming
data and that turned out to cause some problems and so we restricted
the UCS-2 check to just when one of the Unicode encodings are 
chosen.

Katsuhiko Momoi

Comment 5

•

23 years ago

> .eml files are files saved by Mozila/Netscape 6Mail. If it is saved by 
> Mozilla/Netscape 6, they are saved in UTF-8.

I correct myself. I explained this much better in 

http://bugzilla.mozilla.org/show_bug.cgi?id=33049#c17

The .eml data is saved as the original RFC 822 data. 

I should add more one workaround.

   Eliminate the .eml extension. You will be able to see it
   as Windows-1255 file.

> Bug 33049 was resolved as WORKSFORME, but this seems to be a real problem.

Before you do anything, please check with the mail team to see
what consequences there are for changing the current behavior
as summarized in the above quoted comment for parsing .eml files.

Rui Xu

Updated

•

23 years ago

Keywords: intl

QA Contact: ruixu → ylong

Simon Montagu :smontagu

Comment 6

•

23 years ago

If I understand correctly, the problem is that we construct internally a DOM
representation of the message, with the text in UTF-8, but without setting any
charset attribute. I haven't located the code where this happens, but if my
assumptions are right, the fix ought to be trivial (famous last words)

Roy Yokoyama

Comment 7

•

23 years ago

re-assign to smontagu

Assignee: yokoyama → smontagu

Yuying Long

Comment 8

•

23 years ago

cc Xianglan and marina.

Alec Flett

Comment 9

•

23 years ago

this is totally a mail/charset issue. cc'ing nhotta.

ji

Comment 10

•

23 years ago

As Kat explained, saving the original RFC822 data in UTF-8 for .eml file
extension is by design. If we add any charset attribute to the file, it won't be
the original RFC822 data anymore. Should we resolve this as WFM then?
QA contact to myself.

Product: Browser → MailNews

QA Contact: ylong → ji

Katsuhiko Momoi

Comment 11

•

23 years ago

Wiith regard to comment #6 by smontagu, we may be using 
re-using or using the mail code for this because of the
.eml extension. CC'ing bienvenu@netscape.com also.

> As Kat explained, saving the original RFC822 data in UTF-8 
> for .eml file extension is by design.

My comment in this bug is incorrect. I think I was more
accurate in the original bug smontagu cited above. The data
are saved as the original data. But we use UTF-8 in internal
representation.

David :Bienvenu

Comment 12

•

23 years ago

Kat, you're probably right, but I'm not the right person to ask - you might try
e-mailing mscott directly for the definitive answer.

Simon Montagu :smontagu

Updated

•

23 years ago

Status: NEW → ASSIGNED

Simon Montagu :smontagu

Comment 13

•

21 years ago

*** Bug 223225 has been marked as a duplicate of this bug. ***

Simon Montagu :smontagu

Comment 14

•

21 years ago

From dupe: the same bug with news:// and nntp:// URIs
 
nntp://news.mozilla.org:119/tnhhsv6arys1.dlg@borumat.de
news:news.mozilla.org:119/tnhhsv6arys1.dlg@borumat.de

Summary: When opening a URL with a Hebrew file with the mime type of message/rfc822, mozilla incorrectly detects it as being windows-1255 → Incorrect encoding for mail and news URIs in browser

Christian :Biesinger (don't email me, ping me on IRC)

Updated

•

21 years ago

Summary: Incorrect encoding for mail and news URIs in browser → Incorrect encoding (charset) for mail and news/nntp URIs in browser

phil

Comment 15

•

21 years ago

Yes, I'm the one who submitted the duplicated bug 223225.
In that case it shows that the problem is not the *.EML file in itself.
Apparently the same UTF-8 conversion mentioned in comment #4 is also performed
on external links to news articles. Probably the conversion is performed on all
non-webpages displayed in the browser, and comment #6 and comment #11 are
therefore perfectly right.

Simon Montagu :smontagu

Comment 16

•

21 years ago

*** Bug 231524 has been marked as a duplicate of this bug. ***

Mike Cowperthwaite

Comment 17

•

20 years ago

xref bug 116399

Christian :Biesinger (don't email me, ping me on IRC)

Comment 18

•

20 years ago

*** Bug 244945 has been marked as a duplicate of this bug. ***

Prognathous

Updated

•

20 years ago

Blocks: 254868

Mike Cowperthwaite

Comment 19

•

20 years ago

None of the URLs provided in this bug as samples are valid any longer.
Could someone *attach* an actual .eml file that exhibits this problem to the 
bug?  Remember to give it type: message/rfc822

The file at attachment 11787 [details] (from bug 33049) is pretty peculiar.  Loading it in 
the browser:
 - Autodetect:Universal identifies the charset as Greek (ISO-8859-7).
 - Autodetect:Japanese identifies the charset as Shift_JIS, which shows a bunch 
of Kanji (or Chinese) mixed with centered-dot characters -- including within the 
vCard.  
 - Forcing an encoding of ISO-2022-JP (the charset specified within the file 
itself), the display is all '?'.  
 - Forcing an encoding of UTF-8, the subject and body appear to be some form of 
kana, except in the vCard where the characters appear as '?'.

Simon Montagu :smontagu

Comment 20

•

20 years ago

(In reply to comment #19)
>  - Forcing an encoding of UTF-8, the subject and body appear to be some form of 
> kana, except in the vCard where the characters appear as '?'.

This needs to be retested, but I believe that that is bug 221631, which has been
fixed since the date of the attachment.

Mike Cowperthwaite

Comment 21

•

20 years ago

(In reply to comment #20)
> (In reply to comment #19)
> >  - Forcing an encoding of UTF-8, the subject and body appear to be some form 
> >  of kana, except in the vCard where the characters appear as '?'.
> 
> This needs to be retested, but I believe that that is bug 221631, which has
> been fixed since the date of the attachment.

The fix there seems to be forcing a default of utf-8 on (some?) vCards -- which 
is how Mozilla sends vCards now.  The vCard in that attachment has an explicit 
2022-JP encoding.  Even when displayed in Mail/News, those characters are not 
shown correctly, so that problem is unrelated to this bug.


I forgot that attachment 139450 [details], from the bug I filed that was duped to this 
one, shows the basic problem.  One symptom from that attachment which is not 
mentioned here: the 8bit characters which (illegally) are in the Subject header 
of that mail display correctly when the browser's encoding is 8859-1 (whereas 
the body shows the 8859-1 bytes corresponding to the UTF-8 encoding of the 
original 8859-1 characters).  Forcing the encoding to UTF-8, the body displays 
correctly but the headers are wrong.

Mike Cowperthwaite

Comment 22

•

20 years ago

*** Bug 38109 has been marked as a duplicate of this bug. ***

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Tony Mechelynck [:tonymec]

Comment 23

•

16 years ago

(In reply to comment #20)
> (In reply to comment #19)
> >  - Forcing an encoding of UTF-8, the subject and body appear to be some form of 
> > kana, except in the vCard where the characters appear as '?'.
> 
> This needs to be retested, but I believe that that is bug 221631, which has been
> fixed since the date of the attachment.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5pre) Gecko/2008031507 SeaMonkey/2.0a1pre

I see Character Encoding: Autodetect -> Universal and UTF-8. The one line of text is identical to the Subject; they look Japanese (including both hiragana and kanji). The vcard includes only ASCII plus a number of black diamonds with white question marks on them.

Tony Mechelynck [:tonymec]

Comment 24

•

16 years ago

(In reply to comment #21)
[...]
> I forgot that attachment 139450 [details], from the bug I filed that was duped to this 
> one, shows the basic problem.  One symptom from that attachment which is not 
> mentioned here: the 8bit characters which (illegally) are in the Subject header 
> of that mail display correctly when the browser's encoding is 8859-1 (whereas 
> the body shows the 8859-1 bytes corresponding to the UTF-8 encoding of the 
> original 8859-1 characters).  Forcing the encoding to UTF-8, the body displays 
> correctly but the headers are wrong.

It is still so using "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5pre) Gecko/2008031507 SeaMonkey/2.0a1pre":

Autodetect -> Universal and Windows-1252 shows accented characters OK in Subject header and replaced by gibberish in the body. Forcing UTF-8 shows accented characters replaced by black diamonds with white question marks on them in the Subject header and OK in the body.

Nobody; OK to take it and work on it

Assignee

Updated

•

16 years ago

Product: Core → MailNews Core

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: ji → i18n

Wayne Mery (:wsmwk)

Updated

•

3 years ago

Assignee: smontagu → nobody

Status: ASSIGNED → NEW

Wayne Mery (:wsmwk)

Comment 25

•

2 years ago

Is this expected to still be a problem?

Flags: needinfo?(mkmelin+mozilla)

Magnus Melin [:mkmelin]

Comment 26

•

2 years ago

Probably not. Testcase are no longer available.

Status: NEW → RESOLVED

Closed: 2 years ago

Flags: needinfo?(mkmelin+mozilla)

Resolution: --- → INCOMPLETE