Doesn't display GB2312 encoded texts correctly for Chinese Characters

RESOLVED FIXED in Thunderbird 41.0

Status

MailNews Core
Internationalization
--
major
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: wenbins, Assigned: Magnus Melin)

Tracking

({regression})

Thunderbird 41.0
regression

Thunderbird Tracking Flags

(thunderbird39 fixed, thunderbird40 fixed, thunderbird41 fixed, thunderbird_esr3839+ fixed)

Details

Attachments

(5 attachments, 4 obsolete attachments)

(Reporter)

Description

2 years ago
Created attachment 8622136 [details]
Image 2.png

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
Build ID: 20150525141253

Steps to reproduce:

My Thunderbird is on the release channel. Just upgraded to 38.0.1 this morning. Now some emails are not displayed correctly.


Actual results:

Thunderbird is set to use UTF-8 as default encoding. Windows 7 system.

When a received email is encoded using GB2312, Thunderbird can detect the email uses GB2312 encoding and selects Chinese, simplified to decode. However, wrong characters are displayed. 

The previous version 31.7 has no such issue.


Expected results:

Correct characters should be used.

Comment 1

2 years ago
I can confirm this bug both on Linux and Windows.
Change Prefercences -> Fonts & Encodings ->Character Encodings -> Incoming Mail to Chinese Simplified (GBK) with no luck.
But if checked "Apply encoding to all messages in the folder ..." under Folder Properties, all the gb2312 encoding mails shows correctly, but of course, the other encoding not (utf-8 etc)

Updated

2 years ago
Duplicate of this bug: 1174634

Comment 3

2 years ago
Two reports now.

Updated

2 years ago
Severity: normal → major
Status: UNCONFIRMED → NEW
status-thunderbird_esr38: --- → affected
tracking-thunderbird_esr38: --- → +
Component: Untriaged → Internationalization
Ever confirmed: true
Product: Thunderbird → MailNews Core

Comment 4

2 years ago
Did an experiment, in the eml file, find the part below:

Content-Type: text/html; charset="gb2312"
Content-Transfer-Encoding: quoted-printable

Replace "gb2312" to "gb18030", then the display is correct. So it seems related to gb2312 only.

P.S. IIRC there used to be "chinese" in character encoding auto-detect menu, but now there are only japanese, russian, ukranian.
Depends on: 964225

Comment 5

2 years ago
Depends on a "fixed" bug? So bug 964225 should be reopened?
(In reply to Lu Wei from comment #5)
> Depends on a "fixed" bug? So bug 964225 should be reopened?

You don't reopen it.  Although this may be regression by bug 964225, it is Gecko's bug.
If TB still needs this encoding, we should add fallback of it to c-c.
This is not a Gecko bug. Gecko on m-c is internally consistent: GB2312 is no longer a Gecko-canonical name but a label for gbk. gbk and gb18030 are Gecko-canonical names.

c-c has its own list of label overrides in https://mxr.mozilla.org/comm-central/source/mailnews/intl/charsetalias.properties . This list still contains the mapping gb2312=GB2312, which overrides the right mapping from https://mxr.mozilla.org/comm-central/source/mozilla/dom/encoding/labelsencodings.properties#183 . But now GB2312 is no longer a Gecko-canonical name, so treating it as one leads to failure.

The least invasive fix is making sure that charsetalias.properties has no mappings with GB2312 on the right-hand side of the equals sign. (While at it, it's a good idea to review the file for other mappings that no longer work.)

Compared to old Thunderbird, the main resulting change is that replies will say charset=gbk instead of saying charset=GB2312.

Note that the old GB2312 decoder had actually the same behavior as the old gbk decoder. The gb18030 decoder is a subset of the old gbk decoder, which is why (when charsetalias.properties doesn't interfere) Gecko *decodes* content labeled as GB2312, gbk and gb18030 *exactly* the same way (as gb18030).

I expect jcranmer and mkmelin to have opinions.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
> The gb18030 decoder is a subset of the old gbk decoder

Oops. s/subset/superset/

Updated

2 years ago
Duplicate of this bug: 1175112

Comment 10

2 years ago
It would be useful if someone could post here a valid gb2312 .eml file that display in correctly in TB 31 but not in TB 38, along with a screenshot of what the correct display looks like. That would make it easier for a non-Chinese developer to try to fix this.

Of course it would be great if a Chinese reader wants to try to fix it.
(Assignee)

Updated

2 years ago
Keywords: regression
petercpg is checking for someone to fix
I find no support requests about this problem except http://forums.mozillazine.org/viewtopic.php?f=31&t=2941613
(Assignee)

Comment 13

2 years ago
Looks like bug 1174634 has .eml test case + wrong / right pictures.
attachment 8622327 [details]: https://bugzilla.mozilla.org/attachment.cgi?id=8622327 
attachment 8622328 [details]: https://bugzilla.mozilla.org/attachment.cgi?id=8622328
attachment 8622329 [details]: https://bugzilla.mozilla.org/attachment.cgi?id=8622329
Assignee: nobody → mkmelin+mozilla
Flags: needinfo?(mkmelin+mozilla)

Comment 14

2 years ago
(In reply to Kent James (:rkent) from comment #10)
> It would be useful if someone could post here a valid gb2312 .eml file that
> display in correctly in TB 31 but not in TB 38, along with a screenshot of
> what the correct display looks like. That would make it easier for a
> non-Chinese developer to try to fix this.
> 
I uploaded an eml file and corresponding 2 sreenshots when reporting Bug 1174634. Please check.

Comment 15

2 years ago
(In reply to Kent James (:rkent) from comment #10)
> It would be useful if someone could post here a valid gb2312 .eml file that
> display in correctly in TB 31 but not in TB 38, along with a screenshot of
> what the correct display looks like. That would make it easier for a
> non-Chinese developer to try to fix this.
> 
I uploaded an eml file and corresponding 2 sreenshots when reporting Bug 1174634. Please check.

P.S. From this accident I think TB is lack of simplified chinese testers. How can I help? Switch to beta channel and report bugs of beta version?

Comment 16

2 years ago
(In reply to Magnus Melin from comment #13)
> Looks like bug 1174634 has .eml test case + wrong / right pictures.
> attachment 8622327 [details]:
> https://bugzilla.mozilla.org/attachment.cgi?id=8622327 
> attachment 8622328 [details]:
> https://bugzilla.mozilla.org/attachment.cgi?id=8622328
> attachment 8622329 [details]:
> https://bugzilla.mozilla.org/attachment.cgi?id=8622329

Oh, you have pasted the link. Sorry for my quick comment. And for correctly display chinese characters, maybe you need a chinese font file too. Do I need to upload one?

Comment 17

2 years ago
Created attachment 8623450 [details] [diff] [review]
gbk.patch


magnus, if it helps, this patch wfm.

(long ago it was required to learn both fantizi and jiantizi, so it's not so greek ;)
(In reply to Lu Wei from comment #15)
> (In reply to Kent James (:rkent) from comment #10)
> > It would be useful if someone could post here a valid gb2312 .eml file that
> > display in correctly in TB 31 but not in TB 38, along with a screenshot of
> > what the correct display looks like. That would make it easier for a
> > non-Chinese developer to try to fix this.
> > 
> I uploaded an eml file and corresponding 2 sreenshots when reporting Bug
> 1174634. Please check.
> 
> P.S. From this accident I think TB is lack of simplified chinese testers.
> How can I help? Switch to beta channel and report bugs of beta version?

Yes, we need both automated tests, and more manual testers.  If you know other people who can test please email me.

As for version 38, you can run either the beta http://download.cdn.mozilla.net/pub/mozilla.org/thunderbird/releases/38.0b6/ or the released 38.0.1 https://www.mozilla.org/en-US/thunderbird/

Comment 19

2 years ago
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #18)
> > 
> > P.S. From this accident I think TB is lack of simplified chinese testers.
> > How can I help? Switch to beta channel and report bugs of beta version?
> 
> Yes, we need both automated tests, and more manual testers.  If you know
> other people who can test please email me.
> 
> As for version 38, you can run either the beta
> http://download.cdn.mozilla.net/pub/mozilla.org/thunderbird/releases/38.0b6/
> or the released 38.0.1 https://www.mozilla.org/en-US/thunderbird/

All right, I'll use beta version as long as my extensions work fine.
https://support.mozilla.org/en-US/questions/1067327
Summary: Not display GB2312 encoded texts correctly → Not display GB2312 encoded texts correctly for Chinese Characters
Duplicate of this bug: 1175539

Comment 22

2 years ago
Created attachment 8623746 [details] [diff] [review]
gbk.patch
Attachment #8623450 - Attachment is obsolete: true
(Assignee)

Comment 23

2 years ago
Thx alta88!
Summary: Not display GB2312 encoded texts correctly for Chinese Characters → Doesn't display GB2312 encoded texts correctly for Chinese Characters
(Assignee)

Comment 24

2 years ago
Created attachment 8623846 [details] [diff] [review]
bug1174580_gb3212.patch

I came to the same result as alta88, with the addition that hz-gb-2312 is also dead (mapped hz-gb-2312=replacement in labelsencodings.properties) + removal of related cruft.

For 38 we'd only land the charsetalias.properties changes.
Attachment #8623746 - Attachment is obsolete: true
Attachment #8623846 - Flags: review?(Pidgeot18)
(Assignee)

Comment 25

2 years ago
And I guess it can be discussed if zh_cn.euc is really needed for anything...

Comment 26

2 years ago
here's a reference:
http://www.yale.edu/chinesemac/pages/character_sets.html

zh_cn.euc is solaris; since it maps to a valid decoder (gbk), may as well keep it.
hz-gb-2312 is 7bit (now) gbk.  if there isn't a decoder remaining, well nothing to do.  i put it back to err on the side of caution assuming this would need to go to 38 and that one could be addressed better later. ie, as henri suggests, auditing charsetalias to only contain valid overrides and not dupe existing entries in labelsencodings.
The decoder for hz-gb-2312 appears not to exist anymore, but the 7-bit charsets in general are icky and should be obsoleted if possible. Preferably, the alias code should use the Gecko encoding lookup and only fallback to a small hard-coded list, but that requires some more auditing that we don't have right now.

Looking around for zh_CN.euc, the evidence appears to be that it was added for Solaris, but a quick google search tends to suggest that this is used primarily as a locale/internal format switch that was more or less accidentally(?) exposed to the real world. I don't think it's necessary to keep these days.
Flags: needinfo?(Pidgeot18)
(In reply to Joshua Cranmer [:jcranmer] from comment #27)
> Looking around for zh_CN.euc, the evidence appears to be that it was added
> for Solaris, but a quick google search tends to suggest that this is used
> primarily as a locale/internal format switch that was more or less
> accidentally(?) exposed to the real world. I don't think it's necessary to
> keep these days.

This indeed seems like the more likely explanation than "zh_CN.euc" being a necessary label to support for email compat.

Comment 29

2 years ago
So when can we get a patch? I can not read email send from Outlook(GB2312 encoded email) with this buggy TB and need a patch to resolve the issue. I have to swith folder encoding to read emails which is terrible experience.
(Assignee)

Comment 30

2 years ago
Created attachment 8624696 [details] [diff] [review]
bug1174580_gb3212.patch

Remove zh_cn.euc too
Attachment #8623846 - Attachment is obsolete: true
Attachment #8623846 - Flags: review?(Pidgeot18)
Attachment #8624696 - Flags: review?(Pidgeot18)
(Assignee)

Comment 31

2 years ago
(In reply to Tony Yan from comment #29)
> So when can we get a patch? 

We expect to include this fix in the next point release and nightlies as soon as it's been reviewed.
(Assignee)

Comment 32

2 years ago
Created attachment 8624697 [details] [diff] [review]
bug1174580_gb3212.patch

Forgot to qrefresh.
Attachment #8624696 - Attachment is obsolete: true
Attachment #8624696 - Flags: review?(Pidgeot18)
Attachment #8624697 - Flags: review?(Pidgeot18)
Comment on attachment 8624697 [details] [diff] [review]
bug1174580_gb3212.patch

Review of attachment 8624697 [details] [diff] [review]:
-----------------------------------------------------------------

An automated test would have been nice, but it's not strictly necessary for this change. I'll probably consider it later when we work on redoing the mailnews/intl stuff (I rather think we could get rid of both the properties files with some cleanup).
Attachment #8624697 - Flags: review?(Pidgeot18) → review+
(Assignee)

Comment 34

2 years ago
https://hg.mozilla.org/comm-central/rev/7170634c1998 -> FIXED
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-thunderbird38: --- → affected
status-thunderbird39: --- → affected
status-thunderbird40: --- → affected
status-thunderbird41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 41.0
(Assignee)

Comment 35

2 years ago
Comment on attachment 8624697 [details] [diff] [review]
bug1174580_gb3212.patch

[Approval Request Comment]
Regression caused by (bug #): bug 964225
User impact if declined: some chinese mails garbled

Should uplift this after some days trunk baking
Attachment #8624697 - Flags: approval-comm-esr38?
Attachment #8624697 - Flags: approval-comm-beta?
Attachment #8624697 - Flags: approval-comm-aurora?

Comment 36

2 years ago
So when can I upgrade to get the fix?

Tony
(Assignee)

Comment 37

2 years ago
Comment on attachment 8624697 [details] [diff] [review]
bug1174580_gb3212.patch

Ah, should do a branch patch for this, with only the mailnews/intl/charsetalias.properties changes
Attachment #8624697 - Flags: approval-comm-esr38?
(Assignee)

Comment 38

2 years ago
(In reply to Tony Yan from comment #36)
> So when can I upgrade to get the fix?
> 
> Tony

Nightly builds should be available in ~6-7h. Upgrade of 38 would be the next point release - unclear yet exactly when that will happen.

Comment 39

2 years ago
Magnus, can you do the esr38 branch patch?

Comment 40

2 years ago
In the future, I would appreciate if you separate our the suite pieces from these patches. suite both has a different approval process, but even worse because they are perma-closed so I have to add CLOSED TREE to any checkins.

I do not understand the point of the SM perma-closed, but that seems to be what they want. But because of that, I cannot rely on the tree being closed for real reasons, like infrastructure problems, so I am forced to go through all sorts of extra steps to checkin patches that have suite pieces.

I continue to implore the SM people to get rid of this permanent CLOSED TREE status. If you are going to allow checkins in any case, what is the point? It certainly complicates the like of patches like this that need changes in mailnews, mail, and suite to be coordinated.
(Assignee)

Comment 41

2 years ago
Created attachment 8625322 [details] [diff] [review]
bug1174580_GB2312_branch.patch
Attachment #8625322 - Flags: approval-comm-esr38?
(Assignee)

Comment 42

2 years ago
And yes, keeping the seamonkey tree closed for well over a year is completely unreasonable. All it does is force people to spend time fixing the commit messages.

Comment 43

2 years ago
Created attachment 8625325 [details] [diff] [review]
without suite

https://hg.mozilla.org/releases/comm-aurora/rev/81c0321a3bf5
https://hg.mozilla.org/releases/comm-beta/rev/e167a3321887
Attachment #8625325 - Flags: approval-comm-beta+
Attachment #8625325 - Flags: approval-comm-aurora+

Comment 44

2 years ago
Although the suite piece was not checked in, all it does is remove a line from the localization file. I don't believe it is necessary to uplift that deletion.
status-thunderbird38: affected → ---
status-thunderbird39: affected → fixed
status-thunderbird40: affected → fixed

Comment 45

2 years ago
Comment on attachment 8624697 [details] [diff] [review]
bug1174580_gb3212.patch

modified patch was used sans suite
Attachment #8624697 - Flags: approval-comm-beta?
Attachment #8624697 - Flags: approval-comm-beta-
Attachment #8624697 - Flags: approval-comm-aurora?
Attachment #8624697 - Flags: approval-comm-aurora-

Comment 46

2 years ago
I tried the latest nightly build from https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-earlybird-l10n/ , but problem still exist. May I know which nightly build fix the issue? I am running Windows 8.1.

Thanks and Regards,
Tony
(Assignee)

Comment 47

2 years ago
Should be in earlybird nightly builds from 2015-06-23
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1176796

Comment 49

2 years ago
Could you please teach me how to fix this issue ?

Comment 50

2 years ago
(In reply to Henry Fung from comment #49)
> Could you please teach me how to fix this issue ?

Hold down Alt key; press key v, c, s one by one; release Alt key.

Comment 51

2 years ago
Can anyone send me a link for a fixed temp build? It is just crazy to manually do this every time to open a mail in SC.
(Assignee)

Comment 52

2 years ago
Nightlies are around here: https://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/ (earlybird in a sibling dir if you want that)

Comment 53

2 years ago
me too! So when can i get the patch ?
(Assignee)

Comment 54

2 years ago
"The patch" is is attached to this bug. If you want a running build, see previous comment, comment 52.

BTW, could people confirm it fixes the issue for them?

Comment 55

2 years ago
To Magnus, I'm using the daily build as in comment 52. It works.

Comment 56

2 years ago
Created attachment 8626507 [details]
search panel can't display the correct character in gb2312 while main message pane can

The search pane can't display the correct chinese character in gb2312 while the main message pane can,and alse the message body can't be indexed correctly.

Comment 57

2 years ago
I'm using the  Daily 41.0a1 (2015-06-25)   version,the bug is still not fixed as I mentioned before.
(Reporter)

Comment 58

2 years ago
With the 38ESR, in the reading panel, the characters in the message body cannot be displayed; title is OK. But if I search using keywords from the title, in the search result pages, there is a short preview, which displays the characters correctly.

Withe 41.0a1 (2015-06-25) Daily, at my end it seems all right. Win 7 EN, Thunderbird EN, Unicode as default.
(Assignee)

Comment 59

2 years ago
I guess it's possible gloda (the search index) has stored incorrect data for those mails during the time you used the version not supporting gb2312 properly. If so if you need it fixed you have to have the database rebuilt. https://support.mozilla.org/kb/rebuilding-global-database

Comment 60

2 years ago
yes!
I indexed the mails with version 38.
after rebuild the index,everything seems right in Daily build version 41.0a1 (2015-06-26)

Comment 61

2 years ago
The latest build wfm.  And setting GB18030 as the outgoing default in Display-Formatting-Advanced makes replies also encode properly and roundtrip fine.  However, using the GB2312 menuitem causes encoding in UTF8; is there any reason that option is still kept, given 1) it doesn't do what it advertises, 2) it's a subset of 18030 and officially superseded by it anyway. And for incoming, the label is GBK, which means 18030 but doesn't say it and isn't consistent with outgoing, thus confusing.

To be sure, the advice of the mozilla zh-cn localizer should be sought.
(Assignee)

Comment 62

2 years ago
I split that out to bug 1177830. 
Note that we silently use UTF-8 if what was written didn't fit in the selected charset.

Comment 63

2 years ago
Comment on attachment 8625322 [details] [diff] [review]
bug1174580_GB2312_branch.patch

http://hg.mozilla.org/releases/comm-esr38/rev/788fc052d220
Attachment #8625322 - Flags: approval-comm-esr38? → approval-comm-esr38+

Updated

2 years ago
status-thunderbird_esr38: affected → fixed
tracking-thunderbird_esr38: + → 39+
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1178670

Comment 65

2 years ago
I have upgraded my TB form 38.0.1 to 38.1.0.But the Bug 1174634 doesn't appear to be fixed.

User Agent:
    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
Steps to reproduce:
    After upgraded to TB 38.1.0, all mails of one account displays wrong.
Actual results:
   The chinese characters using GB2312 displays wrong, as if the encoding is not recognizedas. If I select character encoding manually from menu>view>character encoding>chinese simplified(The fallback character encoding in folder properties is set to Unicode), it will display OK, but only temporary; switch to another message and back, it displays wrong again. 

   Remarks:
     1)If the fallback character encoding in folder properties is set to chinese simplified,the mail using GB2312 will be displayed correctly,but the mail using UTF-8 will be displayed  incorrectly.
     2)In menu>view>character encoding>Auto-Detect,there are only Japanese,Russian and Ukrainian,i can't find chinese simplified,although I am using a chinese simplified win7 OS.

Expected results:
  It should display correctly as before.

Comment 66

2 years ago
Sorry my mistake.
In TB 38.1.0 the reasons for the error display is a checkbox in folder properties is selected.The checkbox is about "Apply encoding to all messages in the folder(individual message...)"

(In reply to WeiXianguo from comment #65)
> I have upgraded my TB form 38.0.1 to 38.1.0.But the Bug 1174634 doesn't
> appear to be fixed.
> 
> User Agent:
>     Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
> Thunderbird/38.1.0
> Steps to reproduce:
>     After upgraded to TB 38.1.0, all mails of one account displays wrong.
> Actual results:
>    The chinese characters using GB2312 displays wrong, as if the encoding is
> not recognizedas. If I select character encoding manually from
> menu>view>character encoding>chinese simplified(The fallback character
> encoding in folder properties is set to Unicode), it will display OK, but
> only temporary; switch to another message and back, it displays wrong again. 
> 
>    Remarks:
>      1)If the fallback character encoding in folder properties is set to
> chinese simplified,the mail using GB2312 will be displayed correctly,but the
> mail using UTF-8 will be displayed  incorrectly.
>      2)In menu>view>character encoding>Auto-Detect,there are only
> Japanese,Russian and Ukrainian,i can't find chinese simplified,although I am
> using a chinese simplified win7 OS.
> 
> Expected results:
>   It should display correctly as before.
You need to log in before you can comment on or make changes to this bug.