cp932 encoded messages display as garbage again

RESOLVED FIXED in Thunderbird 65.0

Status

defect
RESOLVED FIXED
9 months ago
8 months ago

People

(Reporter: ku, Assigned: jorgk)

Tracking

({regression})

Thunderbird 65.0
Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird_esr52 unaffected, thunderbird_esr6064+ fixed, thunderbird64 wontfix, thunderbird65 fixed)

Details

Attachments

(4 attachments, 3 obsolete attachments)

Posted file cp932.eml (obsolete) —
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36

Steps to reproduce:

Read a Japanese text mail which is created on older iOS 10.3.2 or such.

This is similar problem that is previously reported:
https://bugzilla.mozilla.org/show_bug.cgi?id=542823



Actual results:

Display is messed up like:

---
�����ɖ{�������܂��B

iPhone���瑗�M
---


Expected results:

This is good display:

---
ここに本文がきます。

iPhoneから送信
---
Attachment #9029394 - Attachment is obsolete: true
Regression window:
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=36e37076b1b36bb009b6f64a36e77a74e2ee1387&tochange=8af60715cdafb3bff5ee7ef2522e73ea88a2489f
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=91134c95d68cbcfe984211fa3cbd28d610361ef1&tochange=b266a8d8fd595b84a7d6218d7b8c6b7af0b5027c

Suspect: 	f558febc1ead	Jorg K — Bug 1363281 - Port bug 1261841 |Replace uconv with encoding_rs| to mailnews. rs=mkmelin, f=hsivonen
Component: Untriaged → Internationalization
Product: Thunderbird → MailNews Core
Blocks: 1363281
@Jorg K,
The problem seems to be old regression since 56. 
Can you please look into this?
Flags: needinfo?(jorgk)
Attachment #9029395 - Attachment mime type: message/rfc822 → text/plain
Flags: needinfo?(jorgk)
Henri, what's the story with cp932? Sounds like the "code page 932", so the Windows native encoding in Japan, a variation on Shift JIS. When I edit the message from
Content-Type: text/plain; charset=cp932
to
Content-Type: text/plain; charset=shift_jis
the message displays fine. Is cp932 no longer supported?
Flags: needinfo?(hsivonen)
cp932 is not an Encoding Standard-supported label. The encoding itself is supported.
https://encoding.spec.whatwg.org/#ref-for-shift_jis%E2%91%A0
Flags: needinfo?(hsivonen) → needinfo?(annevk)
annevk, considering that this is a regression, how did the cp932 label not get included in the standard based on Gecko previously knowing about the label?
Fwiw,
"View" > "Text Encoding" > "●Japanese(Shift_JIS)" temporarily fixes. However, the text will be corrupted again when re-viewing the message.

In this case, I think that "Auto-detect" is not working correctly.
Hi I heard about a workaround using "charsetalias.properties" file upon my googling about this problem. e.g. http://akisame.jp/2009/06/07/cp932-thunderbird

Is it still functional?

(However the topics about charsetalias.properties are 10 years old (2009 or such). It is not recent topic.)

It says we should have this file:
- C:/Program Files/Mozilla Thunderbird/res/charsetalias.properties

Contents:

---
#
# Aliases for Shift_JIS
#
x-sjis=Shift_JIS
shift-jis=Shift_JIS
ms_kanji=Shift_JIS
csshiftjis=Shift_JIS
windows-31j=Shift_JIS
---

Append the following line should solve the problem.

---
cp932=Shift_JIS
---

I'm not sure about this. just fyi.
(In reply to KU from comment #8)
> Hi I heard about a workaround using "charsetalias.properties" file upon my
> googling about this problem. e.g.
> http://akisame.jp/2009/06/07/cp932-thunderbird
> 
> Is it still functional?

No, charsetalias.properties is in the tree and, indeed, contains

# Aliases for Shift_JIS
cp932=Shift_JIS

...but apparently is no longer used for message decoding purpose if it's not taking effect here.

That line came from bug 542823.
Indeed, it's here:
https://searchfox.org/comm-central/rev/e29272bfae98b4d91ba368f7e1d393ab71722cee/mailnews/intl/charsetalias.properties#53

More recently, we also added cp936=gbk here:
https://hg.mozilla.org/comm-central/rev/19694424a486#l1.12(In reply to Henri Sivonen (:hsivonen) from comment #9)

(In reply to Henri Sivonen (:hsivonen) from comment #9)
> ...but apparently is no longer used for message decoding purpose if it's not
> taking effect here.
Right. We still have some additional alias code in nsCharsetAlias.cpp
https://searchfox.org/comm-central/source/mailnews/intl/nsCharsetAlias.cpp
but the calls to nsCharsetAlias::GetPreferred() got removed here :-(
https://hg.mozilla.org/comm-central/rev/f558febc1ead57c76ba5eb5af4443a67680f7fa4#l12.97
in bug 1363281.

I'll see how to put this back.
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
I don't recall. Per https://wiki.whatwg.org/wiki/Web_Encodings Firefox did not always support it at least and Internet Explorer didn't either. Unfortunately the Chrome/Safari/ICU situation at the time wasn't documented.

I think the rationale used to whether to include labels was mostly whether the implementations with majority market share had them, since if they didn't, the expectation was that sites using the label were doing so erroneously and relied on a default kicking in.
Flags: needinfo?(annevk)
This should do it. The message displays fine now. Try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=dd9ac88e7f8c307abb1035571af0635f18d6750e

We should add a test for cp932 and perhaps cp936=gbk.
Attachment #9029557 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9029557 [details] [diff] [review]
1511950-GetCharsetAlias.patch (v1)

Review of attachment 9029557 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/nsMsgI18N.cpp
@@ +88,5 @@
> +    do_GetService(NS_CHARSETCONVERTERMANAGER_CONTRACTID, &rv);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  nsCString newCharset;
> +  rv = ccm->GetCharsetAlias(PromiseFlatCString(aCharset).get(), newCharset);

should check rv

::: mailnews/mime/src/mimemoz2.cpp
@@ +816,5 @@
> +    do_GetService(NS_CHARSETCONVERTERMANAGER_CONTRACTID, &rv);
> +  NS_ENSURE_SUCCESS(rv, -1);
> +
> +  nsCString newCharset;
> +  rv = ccm->GetCharsetAlias(input_charset, newCharset);

and here
Attachment #9029557 - Flags: review?(mkmelin+mozilla) → review+
Added check of return value. I'm still thinking about where to best add a test.
Attachment #9029557 - Attachment is obsolete: true
Attachment #9029575 - Flags: review+
Keywords: leave-open
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/7fd582467790
Add missing calls to GetCharsetAlias() to take Thunderbird's aliases into account. r=mkmelin
I'm noticed that these subjects work:
Subject: =?shift_jis?B?grGCsYLJlnuVtoKqgquC3IK3gUI=?=
Subject: =?shift_jis?Q?=82=b1=82=b1=82=c9=96=7b=95=b6=82=aa=82=ab=82=dc=82=b7=81=42?=
giving ここに本文がきます。 (Here comes the text.) but using cp932 it doesn't work :-(

So there is more code missing.
With this patch, these tests pass:
mach xpcshell-test comm/mailnews/mime/test/unit/
mach xpcshell-test comm/mailnews/mime/jsmime/test/
mach xpcshell-test comm/mailnews/intl/test/unit/test_decode_utf-7_internal.js

It also undoes the damage done here:
https://hg.mozilla.org/comm-central/rev/f622e6a7a742#l6.12
https://hg.mozilla.org/comm-central/rev/f622e6a7a742#l7.12

I still need to add a test for decoding cp932 in bodies which isn't covered anywhere yet. Still thinking where to best add it.

With all the tests passing locally this try should also be successful, however, after the last merge, the state of the tree is still unknown, might be busted.
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=7bd72a1a4538276d04680d3da3c2e758785eee23
Attachment #9029592 - Flags: review?(mkmelin+mozilla)
Attachment #9029584 - Attachment is obsolete: true
Try run is good, so here an additional test. Copied from test-base64-display.js.
Attachment #9029619 - Flags: review?(mkmelin+mozilla)
Attachment #9029575 - Attachment description: 1511950-GetCharsetAlias.patch (v1b) → 1511950-GetCharsetAlias.patch (v1b) [landed in comment #15]
Attachment #9029592 - Flags: review?(mkmelin+mozilla) → review+
Comment on attachment 9029619 [details] [diff] [review]
1511950-test.patch (v1)

Review of attachment 9029619 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, r=mkmelin
Attachment #9029619 - Flags: review?(mkmelin+mozilla) → review+
Keywords: leave-open
Attachment #9029575 - Flags: approval-comm-esr60+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/060cb49e2b68
Add GetCharsetAlias() call to JS Mime. r=mkmelin
https://hg.mozilla.org/comm-central/rev/f4b9e739d180
Add MozMill test for message display in cp932. r=mkmelin
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 65.0
The patch says:
> +  // The following will throw if the charset is unknown.
> +  let newCharset = manager.getCharsetAlias(charset);

It seems that charsetalias.properties does not cover even nearly all the labels from the Encoding Standard. What mechanism is there to ensure that this change didn't remove support for most encoding labels from Thunderbird? That is, where's the code to delegate to mozilla::Encoding::ForLabel() for stuff that isn't covered in charsetalias.properties?
(In reply to Jorg K (GMT+1) from comment #22)
> https://searchfox.org/comm-central/rev/
> 16003487aabb9202fe21e4fc43777478d0dd916e/mailnews/intl/nsCharsetAlias.cpp#26

Thanks. Somehow, I managed to miss that on searchfox.
BTW, this fixed another bug where unsupported encodings where shipped off to the UTF-7 decoder:
Error: UTF7TextDecoder: This code should never be reached for x-mac-croatian

Now we get instead:
RangeError: The given encoding 'x-mac-croatian' is not supported.
https://hg.mozilla.org/releases/comm-esr60/rev/37dd76e3e5007e399ed6daca1f2408d129dff15a
Follow-up: Adapt test test-cp932-display.js to ESR 60. r+a=jorgk
Thank you!
I have confirmed that Thunderbird 60.4.0 can display cp932 mail!
And we've added tests so it won't break again :-) - Sorry 'bout the inconvenience.
You need to log in before you can comment on or make changes to this bug.