Closed Bug 351541 Opened 18 years ago Closed 18 years ago

Message List displays subject using wrong encoding even though correct in header view

Categories

(Thunderbird :: Mail Window Front End, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 90584

People

(Reporter: davidf, Assigned: mscott)

Details

Attachments

(2 files)

I have a message whose subject contains some non-ASCII characters.
The message / thread list has decoded these characters incorrectly, even though the header view in the message pane has decoded them correctly.
Will attach image.
Can you still reproduce with a recent nightly? http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/

If you can, also attach the message source.
Also: When the message is displayed, what encoding is shown in the
  View | Encoding
menu?  And what is the setting under  View | Encoding | Autodetect  ?

And, right click on the folder, select Properties: what character set is displayed there?
Answers to comment 3 using Thunderbird 1.5.0.6:
When the message is displayed, View | Character Encoding shows UTF-8
   View | Encoding | Autodetect shows Off
Folder properties shows Default Character Encoding: Western (ISO-8859-1), with "Apply default to all messages..." unchecked
I tried switching the Folder properties to UTF-8, leaving the folder and re-entering, but that had no effect on the display.
As suggested in comment 2, tested using a recent nightly:
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/thunderbird-2.0b1pre.en-US.win32.zip
This is dated 2006-09-14 09:10.
The problem still occurs.

Repeated the steps suggested in comment 3 on the 2.0beta build, results:
When the message is displayed, View | Character Encoding shows UTF-8
   View | Encoding | Autodetect shows Off
Folder properties shows Default Character Encoding: Western (ISO-8859-1), with
"Apply default to all messages..." unchecked
I tried switching the Folder properties to UTF-8, leaving the folder and
re-entering, but that had no effect on the display.

(In other words, exactly the same as for 1.5.0.6)
Will attach source code - have saved the message by saying View Source, then Save Page As. Also tried saving by right clicking on the message in the view pane, and saying Save As... - interestingly the filename suggested here contains the same mis-encoding of the subject line. These two methods produce the same file except for the first three lines of the View Source version (From header etc) being omitted, so including the View Source version.

Note that if I open the message from the IMAP folder under Thunderbird 2.0b, the Window title contains the misencoding, whereas if I open it from the .eml file, Thunderbird shows the window title correctly.

It looks to me like the Subject line contains the characters encoded in UTF-8.
This is strange.  That message's subject is technically illegal -- it's supposed to be MIME encoded, but it's raw UTF-8 instead.  But:

I'm testing with TB 1.5.0.7 and 2a1-0915.  Regardless of whether I set the folder properties for ISO-8859-1, UTF-8, or Big5, the single UTF-8 character is always displayed correctly in the thread pane -- which is contrary not only to the symptom described by reporter, but to common sense: if the folder setting is ISO-8859-1, I should be seeing the two bytes that make up the UTF-8 character.

I'm running with Autodetect=OFF.  My default display charset in prefs is set to 8859-1.
(In reply to comment #8)
> That message's subject is technically illegal
> -- it's supposed to be MIME encoded, but it's raw UTF-8 instead.

The character is U+00E9(LATIN SMALL LETTER E WITH ACUTE).
 http://www.fileformat.info/info/unicode/char/00e9/index.htm
UTF-8 for this is 0xC3 0xA9, and looks to be treated as ISO Latin 1 by Tb in mail list pane display. 
 http://www.idautomation.com/ascii-table.html
I don't know whether US-ASCII in RFC for mail includes 0x80 to 0xFF, but I think display them as ISO Latin 1 is appropriate even if RFC refers to 7bit US-ASCII only. 
In mail display, these 2 bytes looks to be treated as UTF-8(probably based on charset in Content-Type: header), then treated as U+00E9, thus displayed as "e acute".

Anyway, sender's fault is the main cause. Sender have to encode the subject when not ascii. 
(I hope sender will not encode the subject with other than UTF-8 when mail body is UTF-8... ;-) ) 
In the message display pane, 'charset' declared for the message body (UTF-8)  is used to interpret 'raw 8bit' (i.e. not MIME/RFC2047-encoded so it's illegal) bytes in the message header (subject, from, etc). In the message list/thread pane, 'raw 8bit' bytes are interpreted as in the character encoding configured for the folder. Therefore, what David and WADA observed is an expected behavior and is a dupe of bug 90584, but what Mike observed is (as he wrote) strange and unexpected.
 
(In reply to comment #10)
> what Mike observed is (as he wrote) strange and
> unexpected.

I tried copying the message from one folder to another and it still displays 'correctly' in the Thread pane.  The new folder also has a default charset of "ISO-8859-1".  I double-checked the message source to be sure my editor hadn't somehow written things out in an alternate encoding, but it's definitely UTF-8 encoded, both subject line and body.

Besides the settings above, I have the following pref set:
  user_pref("intl.fallbackCharsetList.ISO-8859-1", "UTF-8");
Other than the prefs specifying the items on the encoding menu, that's the only occurrence of "utf-8" in my prefs.js file.
Hmm... I'm observing exactly the same thing as Mike. My fallback for iso-8859-1 is windows-1252 (it shouldn't matter anyway). 
This is defenetly bug 90584 dupe.
How is this bug different from bug 346446? They both seem to talk about the same issue. Anyway, I'm getting emails with both UTF-8 and ISO-8859-1 encoding, and my default encoding is UTF-8, meaning that all emails using ISO-8859-1 which do not encode their subject correctly are incorrectly displayed. This is *very* irritating as it's not my fault but sender's one. Any volunteer to fix bug 90584? ;)
The only reason I haven't duped this up to now (and I suspect the same of Jungshik) is the symptom described in comment 8, 11 & 12 -- which is not what this bug was about, but it is unexpected behavior that shouldn't be lost.

I've now opened bug 360684 about that, so this is getting duped.

*** This bug has been marked as a duplicate of 90584 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
(In reply to comment #14)
> How is this bug different from bug 346446? They both seem to talk about the
> same issue. Anyway, I'm getting emails with both UTF-8 and ISO-8859-1 encoding,
> and my default encoding is UTF-8, meaning that all emails using ISO-8859-1
> which do not encode their subject correctly are incorrectly displayed. This is
> *very* irritating 

Why don't you set your default encoding to ISO-8859-1? [1] Most, if not all, emails sent in UTF-8 have correctly-encoded header fields (and due to bug 360684, even if it's not RFC-2047-encoded) so that your chance of being irritated by garbled headers will be significantly reduced. The default encoding setting is there to help deal with those standard-violating emails. 


[1] I send all my outgoing emails (unless I'm sending them to some broken web mails) in UTF-8, but my default encoding for *incoming* emails is set to EUC-KR (Korean).
Comments are too long; did not read. But have the same problem. Have set default char set to UTF-8 in two places in options, yet still see wrong characters in From and Subject fields of email. If I were an expert with TB internals, I would fix it myself. But I'm not, so I'm stuck. Thanks, developers, for not implementing automatic char set detection. Again, I'd do it myself if I knew TB internals.

There is no excuse at this time not to support UTF-8 properly, in my opinion. The USA-oriented ISO-8856 sets are just about obsolete by now. Again, just my opinion.
If you have concrete examples, file a new bug and attach the mails as .eml. (This bug is closed.)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: