Why, after all these years, can't Thunderbird auto-detect character encoding



4 years ago
3 years ago


(Reporter: Randy, Unassigned)


24 Branch
Windows XP

Firefox Tracking Flags

(Not tracked)



(2 attachments)



4 years ago
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 20140506152807

Steps to reproduce:

1) have tried ALL variations of View->Character Encoding, including switching the AUTO DETECT to Universal, or OFF
2) Have set the individual properties on every email folder (eg: right click Inbox->Properties->General Info TAB, UNCHECK "apply default to all messages" in the Default Char encoding
3) Have looked in vain for the now missing option similar to #2 above, that once was in 'Tools' > 'Options' > 'Display' -> Formatting tab click on 'Advanced' button. checkbox for "apply the default character encoding to all incoming messages" 

Actual results:

Nothing useful. Any time a message comes whose properly declared character encoding (in the message source) does not match the default encoding for received messages, Thunderbird will continue to display garbage wherever special characters (quotes, etc) appear.

Expected results:

I would expect that Thunderbird would switch to the character encoding appropriate for individual messages automatically, based on the properly states character sets properties in individual emails. 

Unfortunately, judging by all the existing messages and complaints I've seen about this over many many years, not to mention erroneous posts that say the problem is solved when it isn't, I have to conclude Mozilla either doesn't believe this is a problem or doesn't care to fix it. The bottom line is that there is no way to tell Thunderbird to automatically display emails in the character coding format they were written in. At least none that work, unless you have to add some secret variables to the config file. 

I could understand cases where the headers are not properly filled in, but I see tons of emails in which the encoding is plainly there in the headers within the message source. You can force it by selecting the proper encoding for each message manually (assuming you can guess at it), but if you do so via the menu VIEW->Character Encoding->UTF8 (for example) it won't "stick" if you view another message and come back to the first. But who would want it to "stick" permanently anyway? What the average user really wants is to be able to toggle VIEW->Character Encoding->Auto Detect from its default "off" to simply "on", and not have to bother with it anymore. I know that's not a "toggle-able" field, but maybe you should make it so, AND make sure it works?

Forgive my frustration, but this this is a problem that seems to have gone on forever, and it NEVER happens with other email clients. If there is some backdoor way to actually make autodetect work, I'd appreciate knowing about it. But more important, I think ALL users would appreciate it if it were not some secret "backdoor" setting, but a simple global menu choice for all accounts. Can Mozilla please fix this problem once and for all?

Comment 1

4 years ago
How do you know the charset is properly set in the message source? There is a ton of applications that produce badly encoded emails. Can you attach a sample message? (export as .eml and attach here.)

Comment 2

4 years ago
Created attachment 8442815 [details]
an exported EML file from a FACEBOOK notification

I'm posting this at the request of someone who requested an example EML file, and I have added comments to the thread.

Comment 3

4 years ago
Your question is fair, but let me say that over the course of many years I've received things like newsletters from many companies, as well as ordinary email from friends, on multiple accounts on both Thunderbird and Outlook. And the bottom line is, I NEVER saw character encoding fail to display properly in Outlook, but it happens like clockwork in thunderbird. My point is that even if the encoding is NOT properly set in the message source, the fact that other clients can figure it out and Mozilla can't is still noteworthy. Especially if you're talking about email from big companies like FACEBOOK, that are not likely to change because of one user's complaint. And second, about the checkbox to  "Apply default to all messages in the folder" (in the right click->properties->General tab from any folder). Assuming that unchecking this box actually worked, don't you think its reasonable for this box to be unchecked by default?

That said, I just attached an EML in the above response. Itis a simple email note from FACEBOOK, and there are only a couple of instances of strings like "=C2=A0=20" that would actually require UTF-8 interpretation. Also, the source contains declarations like the below in a message source look legit enough to interpret...

Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

In this case both my inbox folder my FACEBOOK folder (where the message is filtered to) have the checkbox I mentioned UNCHECKED. Granted, my default character encoding is Western (ISO-8859-1), but its the ability to automatically detect and switch I'm not getting to work.


4 years ago
Attachment #8442815 - Attachment mime type: message/rfc822 → text/plain

Comment 4

4 years ago
OK, in the sample message how do we know TB determined incorrect charset and is displaying wrongly? I do not see any accented chars displayed wrongly, it is English text.

So what is displaying wrong in this particular message? Maybe it displays fine for me. Can you attach a screenshot of bad display of this message?

Comment 5

4 years ago
Created attachment 8443478 [details]
JPG with added red circles, showing misinterpreted character

Aceman: Thanks for checking. I had purposely picked what looked like a short message with only a few "offenses". I'm attaching a screenshot of what I HOPE is the same one (they are all very similar). I've edited the screenshot to show two instances of suspicious characters. These *DO* correspond to instances of the string "=C2=A0=20" (minus the quotes) within the messages. If I manually switch the encoding via Menu: VIEW->Character Encoding->Unicode (UTF-8) the strange characters are replaced. But Thunderbird won't do it automatically, unless I specifically set my default to UTF-8. If you want I'll submit a better example with more apparent offenses.

I will say that over about the past year or so, many more companies seem to be sending straight 7 bit encoding with pure HTML markup, and seem to be avoiding turning every single single or double quote into an encoded block. Those are the most annoying. Again the EML I submitted was selected simply because it was short and only contained one or two "odd" character displays.

Comment 6

4 years ago
I do not see those characters in the display. When I go to View -> Character encoding, then Autodetect is Off and Unicode is selected.

Comment 7

4 years ago

you said...
(In reply to :aceman from comment #6)
> I do not see those characters in the display. When I go to View -> Character
> encoding, then Autodetect is Off and Unicode is selected.

Neither do I. Thunderbird does manual character encoding fine. But if I set it to unicode (UTF-8) and a new mail comes in with Western encoding, then I get weird characters again. I KNOW I can manually set unicode or western. My complaint is that Thunderbird won't auto-detect and switch to  whatever the email source specifies.

Comment 8

4 years ago
I have not set anything manually. I just created a folder from you message (via filesystem operations) and started TB to display it.

Yes, I have not tried to receive it.

Comment 9

4 years ago
I'm not sure how this happened without an entry being made here, but I just got a messages from someone identified as "Toad-Hall" from a no-reply address at no-reply@support.mozilla.org. His suggestion seems to have worked, so I'm going to copy it here since I can't respond to the email directly (except to click on a box indicating whether it solved the problem. Toad's suggestion was below, and indeed after toggling this config variable, it seems that every message I've examined, at least with either western or utf encoding, seems to display properly. Moreover, when I go to VIEW->Character Encoding, I can see that the mark reliably moves to the proper encoding for the message. I have no clue as to why this variable was set to 'true' or whether some direct setting in Thunderbird can cause it to be set 'true', but never cleared, because indeed its default is false. In any case, the issue seems solved now. Hopefully if someone else has the problem, they too will find this solution.

Could you say what you have hear: Tools > Options > Advanced > General tab Click on Config Editor

In top search type: mailnews.force_charset_override; Value = 'false'

What do you have as the Value..true or false? If Value = 'True', double click on line to toggle to 'false'

close window - top right X click on OK to save changes to Options. Close and reopen Thunderbird.
You need to log in before you can comment on or make changes to this bug.