Mojibake with Chinese characters in email body (support cp936 encoding as alias of gbk)

RESOLVED FIXED in Thunderbird 47.0

Status

defect
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: rachel_kronick, Assigned: mkmelin)

Tracking

Thunderbird 47.0
x86_64
Linux

Thunderbird Tracking Flags

(thunderbird45 fixed, thunderbird46 fixed, thunderbird47 fixed)

Details

Attachments

(2 attachments)

Reporter

Description

4 years ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20151006000732

Steps to reproduce:

When receiving email with Chinese text in Thunderbird, it sometimes shows up as garbage/mojibake. 

I'm uncertain the exact cause, but it seems related to messages encoded in cp936.  


Actual results:

On several separate occasions, receiving email that contains Chinese text, the results have been garbage/mojibake when viewed in Thunderbird. 

This is true both for messages received via an IMAP (Yahoo) server and via a POP server. 

With these mojibake messages, the View > Character Encoding menu item is grayed out and unavailable. With other messages, even received from the same senders, the Character Encoding menu item is not grayed out and is easily accessible.

Checking the full headers, all the mojibake messages appear to have been encoded in cp936, which is (according to Wikipedia) "Microsoft's character encoding for simplified Chinese". One message contained Simplified Chinese characters, another contained Traditional Chinese characters, and a third I am uncertain of. All are listed in their headers as cp936. I don't have mojibake problems with messages listed as other encodings such as Big5, GB2312 or UTF-8.

Also, the message I received in Traditional Characters, which is via my IMAP (Yahoo) account, shows correctly in both my webmail (via Firefox) and in my cellphone's email app. It only appears as mojibake in Thunderbird.

I'm using Thunderbird 38.3.0 under Linux Mint 17.1. 

Perhaps this is an error in how Thunderbird handles cp936? And perhaps it's related to this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1176662 ?


Expected results:

Messages should contain readable, non-garbled Chinese.
Reporter

Updated

4 years ago
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Encoding Standard does not define "cp936" as an alias of gbk.
https://encoding.spec.whatwg.org/#concept-encoding-get

Thunderbird should handle this alias before using the Core encoding converter.
Status: UNCONFIRMED → NEW
Component: Untriaged → General
Ever confirmed: true
Assignee

Updated

4 years ago
Component: General → Internationalization
Product: Thunderbird → MailNews Core
Summary: Mojibake with Chinese characters in email body → Mojibake with Chinese characters in email body (support cp936 encoding)
Reporter

Comment 2

4 years ago
I haven't seen any movement on this in the past month or so. Should I be supplying additional information, or...?
Assignee

Comment 3

4 years ago
Looks like it might be commonly used (at least in general, dunno about email), so we could add the alias.
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #8702994 - Flags: review?(Pidgeot18)
Assignee

Updated

4 years ago
Summary: Mojibake with Chinese characters in email body (support cp936 encoding) → Mojibake with Chinese characters in email body (support cp936 encoding as alias of gbk)

Comment 4

3 years ago
Any chance to get this one line addition reviewed so we can get this into TB 45. Doesn't look like a very contentious change.
Flags: needinfo?(rkent)
Flags: needinfo?(Pidgeot18)
Comment on attachment 8702994 [details] [diff] [review]
bug1217161_cp936_support.patch

Review of attachment 8702994 [details] [diff] [review]:
-----------------------------------------------------------------

I don't claim to be an expert here, but a little searching convinced me this is probably fine. Let's just try it.
Attachment #8702994 - Flags: review?(Pidgeot18) → review+

Updated

3 years ago
Flags: needinfo?(rkent)
Flags: needinfo?(Pidgeot18)

Updated

3 years ago
Keywords: checkin-needed

Comment 6

3 years ago
Comment on attachment 8702994 [details] [diff] [review]
bug1217161_cp936_support.patch

Magnus, I assume that you want to request uplift here, right?
We sort of promised the feature to the reporter of bug 1235294 (see bug 1235294 comment #4).

[Approval Request Comment]
Regression caused by (bug #): Not a regression.
User impact if declined: Unhappy Asian users, cp936 seems to be out in the wild.
Testing completed (on c-c, etc.): I assume Magnus did manual testing. I'll try it once it's landed.
Risk to taking this patch (and alternatives if risky):
No risk. One line change to the charset aliases.
Attachment #8702994 - Flags: approval-comm-beta?
Attachment #8702994 - Flags: approval-comm-aurora+

Comment 7

3 years ago
https://hg.mozilla.org/comm-central/rev/19694424a48639d4f9ca458e3e891292e0c2ae1e
Bug 1217161 - Mojibake with Chinese characters in email body (support cp936 encoding as an alias for gbk). r=rkent

Updated

3 years ago
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 47.0

Comment 8

3 years ago
I tested this with the message from bug 1235294, attachment 8702172 [details]. Works nicely as expected.
Comment on attachment 8702994 [details] [diff] [review]
bug1217161_cp936_support.patch

http://hg.mozilla.org/releases/comm-beta/rev/5b79eb3f9dd6
Attachment #8702994 - Flags: approval-comm-beta? → approval-comm-beta+
You need to log in before you can comment on or make changes to this bug.