Reply to UTF-16LE encoded message doesn't work (reply to message with UTF-16LE encoded attachment covered in bug 1026989)

RESOLVED FIXED in Thunderbird 53.0

Status

Thunderbird
Message Compose Window
RESOLVED FIXED
3 years ago
4 months ago

People

(Reporter: elacunza, Assigned: Jorg K (GMT+2))

Tracking

({reproducible, testcase})

45 Branch
Thunderbird 53.0
x86
Linux
reproducible, testcase

Thunderbird Tracking Flags

(thunderbird_esr45? affected, thunderbird51 fixed, thunderbird52 fixed, thunderbird53 fixed)

Details

Attachments

(3 attachments, 6 obsolete attachments)

(Reporter)

Description

3 years ago
Created attachment 8362846 [details]
Test mail to reproduce the problem. It is an original mail just with References: and In-reply-to: headers removed.

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0 (Beta/Release)
Build ID: 20131206152524

Steps to reproduce:

Reply to the attached message. I just write on top of the body "test reply".


Actual results:

The recipient receives a blank message.


Expected results:

Recipient should have received my "test reply" text.
(Reporter)

Comment 1

3 years ago
This also happens on Windows, as we originally detected the problem in an Ubuntu<->Windows mail interchange, both sides having Thunderbird.

Comment 2

6 months ago
This happens on Thunderbird 45.4.0 on Ubuntu also.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 24 Branch → 45 Branch

Comment 3

6 months ago
This problem is produced by multiple reasons.

 A. The message body is lost after it is sent (or saved as a draft),
    if "UTF-16LE" is chosen for the text encoding.
 B. A reply message inherits the text encoding of the one of the
    original message (or its text attachment) automatically.

The real problem is the "A". We can confirm that by these steps:

 1. Open the Error Console via Ctrl-Shift-J.
 2. File => New => Message
 3. Fill the message body, like:
    ~~~
    Hello.
    ~~~
 4. Copy this code and paste to the "Code" field in the error console,
    and hit the Enter key.
    Note, you must remove all linebreaks before pasting, because
    Thunderbird automatically replaces all linebreaks to "," and it may
    produce a syntax error.
    ~~~
    var {Services}=Components.utils.import('resource://gre/modules/Services.jsm',{});
    Services.wm.getMostRecentWindow('msgcompose').SetDocumentCharacterSet('UTF-16LE');
    ~~~
 5. " - Unicode (UTF-16LE)" appears in the window title of the message editor.
 6. Ctrl-S to save the message as a draft.
 7. Go to the "Draft" folder and show the saved draft.

Expected result:
"Hello." appears as its body.

Actual result:
The message has a blank body. Moreover, if you edit the draft and save it again, then the updated message body will be lost again.

The only one solution to solve this problem, you need to choose any available message encoding like the "Unicode" (means "UTF-8") from the "Options" menu of the message editor.
Summary: Replying a message with an UTF-16LE content-encoding sends corrupt messages → Message body encoded in UTF-16LE are lost when it is sent or saved

Comment 4

6 months ago
Created attachment 8811675 [details]
UTF-16LE-attachment-file.txt

Another testcase. Steps to reproduce:

 1. Downlaod this attachment file "UTF-16LE-attachment-file.txt" as a file.
 2. Create new message.
 3. Attach the file "UTF-16LE-attachment-file.txt" to the message.
 4. Send it to yourself.
 5. After receive the mail, create new reply for it.
 6. Fill the message body and save it as a draft.
 7. Go to the "Draft" folder and show the message.

Expected result: the draft has its message body.

Actual result: the draft has a blank message body.
Component: Untriaged → Message Compose Window
Keywords: reproducible, testcase

Comment 5

6 months ago
Workaround addon.
https://addons.mozilla.org/thunderbird/addon/edit-message-encoding-fallback/
License: MPL 2.0

This simply inserts following codes into Recipients2CompFields:

~~~
var charsetData = CharsetMenu.getData();
var allSupportedEncodings = charsetData.pinnedCharsets.map(function(aData) { return aData.value })
                              .concat(charsetData.otherCharsets.map(function(aData) { return aData.value }));
if (gMsgCompose.compFields.characterSet &&
    allSupportedEncodings.indexOf(gMsgCompose.compFields.characterSet) < 0) {
  SetDocumentCharacterSet('UTF-8');
}
~~~
(Assignee)

Comment 6

6 months ago
The real problem is that the reply picks up the charset from an attachment. And if the attachment is UTF-16LE encoded, the message will get this too, and then it's downhill form there. This is bug 1026989.

No doubt that your workaround will work, but the correct fix is not to pick up the charset from the attachment.

The reporter originally reported that replying to a message that is UTF-16LE encoded also doesn't work, see attachment 8362846 [details]:
Date: Tue, 21 Jan 2014 09:22:45 +0100
From: Eneko Lacunza <elacunza@binovo.es>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Eneko Lacunza <elacunza@binovo.es>
Subject: Re: IMAGEN EXCHANGE --> PROXMOX
Content-Type: text/plain; charset=UTF-16LE; format=flowed
Content-Transfer-Encoding: 7bit

Firstly, UTF-16LE is not a valid encoding charset, and secondly, 16 bit sure as hell don't fit into 7 but ;-)

However, we could implement something like you suggested, but maybe somewhere else where the charset is first read from the message so that the invalid charset doesn't even get into the comp fields. See also: bug 1297118.
Summary: Message body encoded in UTF-16LE are lost when it is sent or saved → Reply to UTF-16LE encoded message doesn't work (reply to message with UTF-16LE encoded attachment covered in bug 1026989)
(Assignee)

Comment 7

6 months ago
Created attachment 8816968 [details] [diff] [review]
961983-suppress-utf-16.patch (v1)

Please review whoever comes first ;-)

This is really a non-issue. If someone ever receives an UTF-16 encoded message, which I doubt, it will look garbled (see test message). So they will set an appropriate charset before replying.

If they don't or reply to an empty message, we will now stop short UFT-16 from getting anywhere.

Note that the real problem here is in bug 1026989, where the charset is picked up from an attachment. I'm superseding attachment 8811675 [details] here since it belongs in the other bug.
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #8816968 - Flags: review?(mkmelin+mozilla)
Attachment #8816968 - Flags: review?(acelists)
(Assignee)

Comment 8

6 months ago
Created attachment 8816969 [details]
Supposedly UTF-16 encoded non-empty test e-mail

Reviewers, you can use this to test ;-)
Attachment #8362846 - Attachment is obsolete: true
Attachment #8811675 - Attachment is obsolete: true
(Reporter)

Comment 9

6 months ago
The original test message wasn't build up, it was a real message between me and a customer. I can go back to the root of the reply-chain to see where all started if that cane be of any use.
(Assignee)

Comment 10

6 months ago
The original message you had attached in attachment 8362846 [details] had an empty body. I've just added a line of content. We don't need any more input from you, but thank you for the offer. Messages in UTF-16 will be forced to UTF-8 in a reply, regardless of whether the body of the original message was empty.

Comment 11

6 months ago
Can you please enlighten me why is utf-16 bad? Can we display such messages? Why can't we send them out? OK, I can understand we do not support the encoding internally for editing/composing. Is there a better way to check if a charset is supported/invalid than just harcoding the name?
(Assignee)

Updated

6 months ago
Duplicate of this bug: 1028531
(Assignee)

Updated

6 months ago
Duplicate of this bug: 879753
(Assignee)

Comment 14

6 months ago
UTF-16 is bad. We can't display such messages, we can't send them out. As you know, UTF-16 is used internally and apparently for attachment, but it's not a valid e-mail encoding. See bug 1028531 comment #1 for an authoritative answer. Also see: https://mzl.la/2h7osOh

Also: https://en.wikipedia.org/wiki/Unicode_and_email#Unicode_support_in_message_bodies

This is a first step if stopping UTF-16 before it goes any further. As I said in the patch, this is a crude fix to stop UTF-16. The special role it has warrants hard-coding this.

Comment 15

6 months ago
I see no mention about avoiding UTF-16 on the wiki page.

Anyway, can't we do the filtering of invalid charsets inside nsMsgCompFields::SetCharacterSet() ? Or if the caller wants to know whether SetCharacterSet accepted or changed the charset, he can call SetCharacterSet() to see what charset was really stored.
(Assignee)

Comment 16

6 months ago
(In reply to :aceman from comment #15)
> Anyway, can't we do the filtering of invalid charsets inside
> nsMsgCompFields::SetCharacterSet() ?
I don't think so. The function and his brothers are purely for storing, not checking:

NS_IMETHODIMP nsMsgCompFields::SetCharacterSet(const char *value)
{
  return SetAsciiHeader(MSG_CHARACTER_SET_HEADER_ID, value);
}

I think I picked the right spot since then we call
  m_editor->SetDocumentCharacterSet(msgCharSet);
a few lines later
  rv = childCV->SetForceCharacterSet(msgCharSet);
Comment on attachment 8816968 [details] [diff] [review]
961983-suppress-utf-16.patch (v1)

Review of attachment 8816968 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/src/nsMsgCompose.cpp
@@ +1621,5 @@
>    // Set the charset
> +  nsAutoCString msgCharSet(m_compFields->GetCharacterSet());
> +
> +  // Don't accept UTF-16 ever.
> +  if (Substring(msgCharSet, 0, 6).LowerCaseEqualsLiteral("utf-16")) {

The Substring operation seems memory-unsafe if msgCharSet is shorter than 6.

It's sad that c-c code can't count on encoding names being canonical. If it could, you could just check for the string starting with "UTF-16" (upper case).
(Assignee)

Comment 18

6 months ago
Thanks for the comment!

(In reply to Henri Sivonen (:hsivonen) from comment #17)
> The Substring operation seems memory-unsafe if msgCharSet is shorter than 6.
Really? That function doesn't check the length of it's input value? Well, in this case I can fix it.

> It's sad that c-c code can't count on encoding names being canonical. If it
> could, you could just check for the string starting with "UTF-16" (upper
> case).
One day we'll look into it in bug 1297118,
(Assignee)

Comment 19

6 months ago
Comment on attachment 8816968 [details] [diff] [review]
961983-suppress-utf-16.patch (v1)

I'll try a different solution inspired by Henri, that is, switching to canonical names at that stage (implementing a bit of bug 1297118). Stand by.
Attachment #8816968 - Flags: review?(mkmelin+mozilla)
Attachment #8816968 - Flags: review?(acelists)
(Assignee)

Comment 20

6 months ago
Created attachment 8817379 [details] [diff] [review]
961983-suppress-utf-16.patch (v2)

OK, how about this then?

I noticed that UTF-7 and even x-mac-croatian seem to be valid canonical names, however, forcing the charset to those fails.

There doesn't seem to be string function to test "starts with", so I'm using good old strncmp() which should be safe even if the input is shorter than six characters.
Attachment #8816968 - Attachment is obsolete: true
Attachment #8817379 - Flags: review?(mkmelin+mozilla)
Attachment #8817379 - Flags: feedback?(hsivonen)
(Assignee)

Comment 21

6 months ago
Comment on attachment 8817379 [details] [diff] [review]
961983-suppress-utf-16.patch (v2)

Review of attachment 8817379 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/src/nsMsgCompose.cpp
@@ +1627,5 @@
> +  nsAutoCString msgCharSet;
> +  rv = ccm->GetCharsetAlias(m_compFields->GetCharacterSet(), msgCharSet);
> +
> +  // Don't accept UTF-16 ever.
> +  if (NS_FAILED(rv) || strncmp(msgCharSet.get(), "UTF-16", 6) == 0) {

I got a tip from Aceman: I could change this to:
StringBeginsWith(msgCharSet, NS_LITERAL_CSTRING("UTF-16"))

Any preference?
(Assignee)

Comment 22

6 months ago
Created attachment 8817390 [details] [diff] [review]
961983-suppress-utf-16.patch (v2b).

OK, let's go with the fancy string function ;-)
Attachment #8817379 - Attachment is obsolete: true
Attachment #8817379 - Flags: review?(mkmelin+mozilla)
Attachment #8817379 - Flags: feedback?(hsivonen)
Attachment #8817390 - Flags: review?(mkmelin+mozilla)
Attachment #8817390 - Flags: feedback?(hsivonen)

Comment 23

6 months ago
Comment on attachment 8817390 [details] [diff] [review]
961983-suppress-utf-16.patch (v2b).

Review of attachment 8817390 [details] [diff] [review]:
-----------------------------------------------------------------

While the code looks ok to me, I don't see a change in behavior compared to trunk, testing with the attached test case. Both display the content as some chinese(?) characters, and that's also included in the reply for both cases.
(Assignee)

Comment 24

6 months ago
That's not the point. The point is that the title shows UTF-16 without the patch. So if you compose based on that, you'll mess up when you save or send, see comment #0.

The test message is of course wrong, since you can't squeeze 16 bit into CTE 7bit. UTF-16 only makes sense for base64-encoded attachments which are the original cause of this bug. And that won't happen any more after bug 1026989 is fixed (only awaiting review on the test).

BTW, the original test case here had an empty body. Let me see whether I can create a better test case.
(Assignee)

Comment 25

6 months ago
I think I can't come up with a better test case, the Chinese characters are unavoidable. Not only does composing in UTF-16 not work, which I'm stopping here, also UTF-16 detection doesn't work in general, take your pick from this list: https://mzl.la/2h7osOh, maybe bug 244829.

After bug 1026989 fixes picking up UTF-16 from the attachment, this bug here could be a WONTFIX, but as the reporter assures in comment #9, this was seen in the wild.

I started looking at this bug here as a fix for bug 1026989, but it turned out that now I'll be fixing both.
(Assignee)

Comment 26

6 months ago
Created attachment 8817681 [details]
Test e-mail in UTF-16

OK, here's a decent test e-mail. Use current trunk to reply and send (at least to outbox), then try with the patch. 100% difference ;-)
Attachment #8816969 - Attachment is obsolete: true
(Assignee)

Updated

6 months ago
Attachment #8817681 - Attachment mime type: message/rfc822 → text/plain

Comment 27

6 months ago
Would it be possible to add a simple test into the test file added in bug 1026989 ?
(Assignee)

Comment 28

6 months ago
Sure. Reply to this test message here and get UTF-8. I can do that.
(Assignee)

Comment 29

6 months ago
Created attachment 8817713 [details] [diff] [review]
961983-utf-16-test.patch (v1)

Assuming that bug 1026989 will land first, here's a patch that adds a test to the new tests created over there.
Attachment #8817713 - Flags: review?(mkmelin+mozilla)
Attachment #8817713 - Flags: feedback?(acelists)
(Assignee)

Comment 30

6 months ago
Created attachment 8817790 [details] [diff] [review]
961983-utf-16-test.patch (v1b).

Refreshed for changes in bug 1026989. Try for both bugs here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=82ea022b7dc80f4c91c7c2f654083a52a2108a26

Let's get the charsets right again!
Attachment #8817713 - Attachment is obsolete: true
Attachment #8817713 - Flags: review?(mkmelin+mozilla)
Attachment #8817713 - Flags: feedback?(acelists)
Attachment #8817790 - Flags: review?(mkmelin+mozilla)
Attachment #8817790 - Flags: feedback?(acelists)

Comment 31

6 months ago
Comment on attachment 8817681 [details]
Test e-mail in UTF-16

Sorry but even with this I don't see any change from trunk. Both seem to work, and there's no change in title. I even sent it an received it (as utf-8) with trunk. I suppose that should happen too as we normally auto-upgrade to utf-8 if conversion fails. 

This was all tested on linux. Maybe a platform difference?

Comment 32

6 months ago
Comment on attachment 8817790 [details] [diff] [review]
961983-utf-16-test.patch (v1b).

Review of attachment 8817790 [details] [diff] [review]:
-----------------------------------------------------------------

::: mail/test/mozmill/composition/body-utf16.eml
@@ +1,1 @@
> +From: test <test@test.com>

test.com is a real domain, not for testing. You can use example.com (which is real but for testing!)
(Assignee)

Comment 33

6 months ago
(In reply to Magnus Melin from comment #31)
> This was all tested on linux. Maybe a platform difference?

It seems that way. I adjusted the test and now it works everywhere.
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=82ea022b7dc80f4c91c7c2f654083a52a2108a26
Check the parts for bug 1026989.

Looks like bug 1026989 is ready to go, just received r+. I'll s/test.com/example.com/
(Assignee)

Comment 34

6 months ago
Try storing the message in a folder before replying.

Comment 35

6 months ago
Comment on attachment 8817390 [details] [diff] [review]
961983-suppress-utf-16.patch (v2b).

Review of attachment 8817390 [details] [diff] [review]:
-----------------------------------------------------------------

Yep, replying from folder reproduced it. r=mkmelin

::: mailnews/compose/src/nsMsgCompose.cpp
@@ +1626,5 @@
> +
> +  nsAutoCString msgCharSet;
> +  rv = ccm->GetCharsetAlias(m_compFields->GetCharacterSet(), msgCharSet);
> +
> +  // Don't accept UTF-16 ever.

Maybe also add the comment from https://bugzilla.mozilla.org/show_bug.cgi?id=1028531#c1 inline here as explanation.
Attachment #8817390 - Flags: review?(mkmelin+mozilla) → review+

Comment 36

6 months ago
Comment on attachment 8817790 [details] [diff] [review]
961983-utf-16-test.patch (v1b).

Review of attachment 8817790 [details] [diff] [review]:
-----------------------------------------------------------------

Not 100% proof as replying from file didn't reproduce on linux. But I suppose other platforms would catch a regression. You could add a comment to that affect, so it's easier to understand if we do get a windows only failure in the future. r=mkmelin
Attachment #8817790 - Flags: review?(mkmelin+mozilla) → review+
(Assignee)

Comment 37

6 months ago
https://hg.mozilla.org/comm-central/rev/61ad407448cdc69faf2adb348e9559ef048b19fe
https://hg.mozilla.org/comm-central/rev/b26d08edfa2ac520277d576cbe9fd5c3c0bdf53d

Code landed with extra comment as requested by Magnus in comment #35.
Test landed with extra comment as requested by Aceman in bug 1026989 where this test was first created. Test also landed with s/test.com/example.com/.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 months ago
status-thunderbird51: --- → affected
status-thunderbird52: --- → affected
status-thunderbird53: --- → fixed
status-thunderbird_esr45: --- → affected
tracking-thunderbird_esr45: --- → ?
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 53.0
(Assignee)

Comment 38

6 months ago
Comment on attachment 8817390 [details] [diff] [review]
961983-suppress-utf-16.patch (v2b).

[Approval Request Comment]
Regression caused by (bug #): That seems to have been wrong since day one.
User impact if declined: Certain messages get encoded as UTF-16 and since that doesn't work, the content is lost.
Testing completed (on c-c, etc.): Manual and via test.
Risk to taking this patch (and alternatives if risky):
Not risky, just forcing the charset from UTF-16 to UTF-8.
Attachment #8817390 - Flags: feedback?(hsivonen)
Attachment #8817390 - Flags: approval-comm-esr45?
Attachment #8817390 - Flags: approval-comm-beta?
Attachment #8817390 - Flags: approval-comm-aurora+
(Assignee)

Comment 39

6 months ago
Comment on attachment 8817790 [details] [diff] [review]
961983-utf-16-test.patch (v1b).

[Approval Request Comment]
See preceding comment. This is the test.
Attachment #8817790 - Flags: feedback?(acelists)
Attachment #8817790 - Flags: approval-comm-esr45?
Attachment #8817790 - Flags: approval-comm-beta?
Attachment #8817790 - Flags: approval-comm-aurora+
(Assignee)

Comment 40

6 months ago
(In reply to Magnus Melin from comment #36)
> Not 100% proof as replying from file didn't reproduce on linux.
Is there a misunderstanding here? The test is 100% proof since we reply from a folder which fails on all platforms:
https://hg.mozilla.org/comm-central/rev/7a69461597e94a4a25157f0a8e17d1ec53df90b9#l2.44
(Assignee)

Comment 41

6 months ago
(In reply to Magnus Melin from comment #32)
> test.com is a real domain, not for testing. You can use example.com (which
> is real but for testing!)
We had test.com in 11 files, I took the liberty to fix this:
https://hg.mozilla.org/comm-central/rev/c6776de4cd739ada20d6cbf52e70ec2ffda1e5a2

Comment 42

6 months ago
Ah right, I see that now.
(Assignee)

Comment 43

6 months ago
Aurora (TB 52):
https://hg.mozilla.org/releases/comm-aurora/rev/21abbc5bf3d5e4d974351d78a15702ca1f8d6483
https://hg.mozilla.org/releases/comm-aurora/rev/ef14191fceffb4924575ecf9d2204d4cb2526158
https://hg.mozilla.org/releases/comm-aurora/rev/d83138350ebc309f3e98968484fccd2c5621c1c8
status-thunderbird52: affected → fixed
(Assignee)

Comment 44

6 months ago
Beta (TB 51):
https://hg.mozilla.org/releases/comm-beta/rev/cfba3a995fdc65700adcf76e5f427568bf01efe0
https://hg.mozilla.org/releases/comm-beta/rev/87e52188377bf2fd7256d052d011aabd5ba26cce
https://hg.mozilla.org/releases/comm-beta/rev/733ddfcd05ee691109f7b28e40747546ca5d1f57
status-thunderbird51: affected → fixed
(Assignee)

Updated

6 months ago
Attachment #8817390 - Flags: approval-comm-beta? → approval-comm-beta+
(Assignee)

Updated

6 months ago
Attachment #8817790 - Flags: approval-comm-beta? → approval-comm-beta+
Blocks: 1026989
Blocks: 1323377
No longer blocks: 1323377

Comment 45

4 months ago
There are too many changes in nsMsgCompose::InitEditor to make this an easy uplift, so I am going to skip this for 45.7.0 If you want consideration for 45.8.0 please submit a patch with adaptation for esr45

Comment 46

4 months ago
Comment on attachment 8817390 [details] [diff] [review]
961983-suppress-utf-16.patch (v2b).

Just to be clear, what I man by "too many changes" is not this patch, but other patches that have caused drift of the code.
Attachment #8817390 - Flags: approval-comm-esr45? → approval-comm-esr45-

Updated

4 months ago
Attachment #8817790 - Flags: approval-comm-esr45? → approval-comm-esr45-
(Assignee)

Comment 47

4 months ago
You got the apply order wrong. You need to apply bug 1026989 first.
You need to log in before you can comment on or make changes to this bug.