Closed Bug 275023 Opened 20 years ago Closed 8 years ago

Unicode UTF-16 encoded messages time out when sent

Categories

(MailNews Core :: Networking: SMTP, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1026989

People

(Reporter: bugzilla, Unassigned)

Details

(Keywords: intl, Whiteboard: [needs retest v3])

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.3) Gecko/20040910
Build Identifier: 

Thunderbird version 1.0 (20041206); Mac OS X 10.3.5

 Messages composed in Unicode UTF-16 character encoding can not be sent due to
SMTP server timeout. Reproduced on two separate servers running different software.

 Possibly related to, or shares the same cause as Bug 275021.

Reproducible: Always

Steps to Reproduce:
1. Compose a new message (File -> New -> Message)
2. If it isn't already there, add Unicode UTF-16 encoding to the Character
Encoding menu (Options -> Character Encoding -> Customize...)
3. Select Unicode encoding (Options -> Character Encoding -> Unicode (UTF-16)
4. Enter a recipient (e.g. your own email address)
5. Enter a subject (e.g. "UTF-16")
6. [mandatory] Enter text containing accented characters into the message body
(e.g. option-u a, option-s, option-d, option-f)
8. Send the message (File -> Send Now)

Actual Results:  
Thunderbird connects and displays the sending dialog "Status: Delivering Mail"
but then times out with message 
"Sending of message failed.
The message could not be sent because connecting to SMTP server my.smtp.server
failed. The server may be unavailable or is refusing SMTP connections. Please
verify that your SMTP server setting is correct and try again, or else contact
your network administrator."


Expected Results:  
The message should be sent successfully.

Note that UTF-16 encoded messages containing only normal ASCII characters will
send correctly and (apparently) be received correctly, but will display Chinese
characters when opened (see Bug 275021).

 This problem does not affect UTF-8 encoding: the message can be changed to
UTF-8, sent, received and displayed without problems.
UTF-16 is not MIME text/*-friedly. It should probably not be allowed on send
(UTF-8 is the way to go).
Base64-encoded UTF-16 is MIME-friendly enough - see RFC 1641
But what's the point of sending base64-encoded UTF-16 when the size of straight
 UTF-8 is in the same ballpark for CJK and less if there are more ASCII
characters? Why support the proliferation of encodings instead of convergence to
UTF-8?
RFC 1641 is experimental and obviously out of date. It talks about UTF-7 and
UNICODE-major-minor-variant--not about UTF-8 vs. UTF-16.
UTF-16 MUST be blocked. I agree with Henry that using UTF-16 for SMTP/MIME is
really a bad idea (if not illegal downright), which is why I filed bug 236882.
As such, this should be made invalid, IMO.  



Keywords: intl
OS: MacOS X → All
Hardware: Macintosh → All
It's not illegal, RFC2781 http://www.ietf.org/rfc/rfc2781.txt?number=2781.

Using v1.0.6 (20050716) gives several different problems:

Using Format->Autodetect
Set char encoding = UTF16
Write a message with just 7 bit data

Message is sent with signifant data as follows:
-----------------------
Content-Type: text/plain; charset=UTF-16; format=flowed
Content-Transfer-Encoding: 8bit

ÿ_!<ODTCPY EthlmP BUIL C-"//3W/CD/DTH MT L.410T arsntioian/lE/"N
><
thlm
><
ehda
> .
-----------------------

However, the session times out because the final . is not on its own line, so
the receiving SMTP server waits until timeout.

There are several bugs here:

1. It is sending html only, not multipart/alternative
2. The data is not UTF16, it's just ascii with the 16 bit words reversed 
3. It does not termiate the SMTP session correctly
4. The message text is completely missing

If the message is marked as UTF16 LE and format plain text only, then it sends
the following

------
Content-Type: text/plain; charset=UTF-16LE; format=flowed
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  .
------
again, note the final . is wrong.  

(In reply to comment #7)
> It's not illegal, RFC2781 http://www.ietf.org/rfc/rfc2781.txt?number=2781.

 That RFC is merely about UTF-16(LE/BE) and doesn't say anything about the
suitability (and legality) of  using UTF-16 in RFC (2)822 messages.  
This bug is still present in Thunderbird version 1.5.0.5 (20060719).
Also present in the latest trunk, version 3 alpha 1 20060912; Mac OS 10.3.9.
QA Contact: general
Reporter, does the issue still occur with the latest supported 2.0.0.x / Shredder trunk nightlies?

(1.5.0.x is now end-of-life and the latest supported Thunderbird version 2 is 2.0.0.16)
Whiteboard: closeme 2008-08-28
 Still present in 2.0.0.16; unable to check with Shredder due to the change to the certificate domain name mismatch handling around 9 Nov. 2007 which removed the option to continue when a mismatch is encountered.

 Gary, perhaps you could check this for yourself (the steps are rather simple) and confirm the bug if you can reproduce it?
Whiteboard: closeme 2008-08-28
Assignee: mscott → nobody
(In reply to comment #12)
>  Gary, perhaps you could check this for yourself (the steps are rather simple)
> and confirm the bug if you can reproduce it?

Gary, do you have someone who can test this?
Component: General → Networking: SMTP
Product: Thunderbird → MailNews Core
QA Contact: general → networking.smtp
Whiteboard: [needs retest v3]
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.9) Gecko/20100505 Shredder/3.0.5pre

I just confirmed that this bug is reproducible as described, selecting "Unicode (UTF-16)" as the Character Encoding.  Looking at strace of my smtpd process, TB connects, sends EHLO and MAIL FROM and RCPT TO and DATA, then hangs.
Status: UNCONFIRMED → NEW
Ever confirmed: true
While Henri and Jungshik might be wise to suggest that sending raw UTF-16/32 will encounter interoperability issues with MUA's and/or MTA's that can't handle it, and thus should probably MIME the parts and either base64 or quoted-printable encode down to ASCII or UTF-8 ... the RFC 2822 in Section 1.1 clearly states:

"Note: This standard is not intended to dictate the internal formats used by sites, the specific message system features that they are expected to support, or any of the characteristics of user interface programs that create or read messages. In addition, this standard does not specify an encoding of the characters for either transport or storage; that is, it does not specify the number of bits used or how those bits are specifically transferred over the wire or stored on disk."

Regardless, the bug here is that TB is not performing character conversion correctly to UTF-16.  I suspect our specific test case above is resulting in an embedded NULL when converting to UTF-16, and the code that's writing the "DATA\r\n<encoded message>\r\n.\r\n" out to the SMTP session is getting truncated at the first NULL, resulting in the hang.  In other words: TB writes out the buffer, and expects the "200 OK" or error response.  Since it never sent the full buffer with the trailing "\r\n.\r\n" to end the DATA section, it is waiting for a response from the SMTP server that will never be sent ... until the SMTP server times out the connection and disconnects and/or TB times out and disconnects, if at all.

I suspect this is a problem really in the networking code and/or the code that is sending the SMTP commands to it, expecting/relying on NULL-terminated strings.
I've got an in-house bug report about this, so I would like to ask: Is there a fix in sight?
If the attached message is replied to, then thunderbird automatically sets the encoding of the reply to utf-16.  This appears to be caused by the UTF-16 text file attachment; the actual body of the mail is encoded in ISO-8859-1.

This then triggers the behaviour (timeout or mangled message) described in the bug above.  (This behaviour is 100% reproducible using Thunderbird 17.0.3).
Isn't it about time for Thunderbird to send all email in UTF-8 and avoid all the hassle arising from trying to vary the outgoing encoding?
This behaviour is 100% reproducible using Thunderbird 31.1.2

(In reply to Mark Weaver from comment #17)
> Created attachment 722914 [details]
> A message which when replied to sets the encoding of the reply to utf-16
> 
> If the attached message is replied to, then thunderbird automatically sets
> the encoding of the reply to utf-16.  This appears to be caused by the
> UTF-16 text file attachment; the actual body of the mail is encoded in
> ISO-8859-1.
> 
> This then triggers the behaviour (timeout or mangled message) described in
> the bug above.  (This behaviour is 100% reproducible using Thunderbird
> 17.0.3).
Same here with User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

Replying to an email with an attachment like follows:

Content-Type: text/plain; charset=UTF-16LE;
 name="xxxx"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
[...]

If I try to store the reply as a draft message, the draft is empty or just garbage is displayed (source code contains readable text) regardless of storing it locally or on an IMAP server. When sending this mail, Thunderbird either sits there doing nothing (except showing the sending mail dialog) or it sends an email with charset=utf-16le, but uses plain ascii in the body.
Confirming this behaviour when receiving an email with a UTF-16LE-encoded attachment, on Thunderbird 31.7.0 (Ubuntu 14.04). In my case the email reply content is blank as per comment 7, and the entire response is indeed encoded as UTF-16LE as per comment 17.
Second time I've encountered this bug, with the same behaviour described in comment 21. I'll add to this that the behaviour is particularly aggravating in that it effectively results in lost mail. The entire body of the message is wiped, and there is no copy saved as a draft either. Not fun hammering out the same text three consecutive times, with a recipient having received two blank emails – until remembering this bug.
(In reply to dave.koelmeyer from comment #21)
> Confirming this behaviour when receiving an email with a UTF-16LE-encoded
> attachment, on Thunderbird 31.7.0 (Ubuntu 14.04).
The attachment problem is now treated in bug 1026989.
Attachment #722914 - Attachment mime type: application/octet-stream → text/plain
OK, the original problem described in comment #0 does not exist any more. You can simply not chose UTF-16 as an encoding for a new message.

From comment #17 onwards the discussion has turned to creating a UFT-16 encoded message by replying to a message with an UTF-16 encoded attachment. This is covered by bug 1026989.

So I'm making this a forward duplicate. The other option would be to close it as "worksforme", "wontfix" or "invalid" since the original problem just doesn't exist any more.

Note that bug 275021 "Unicode UTF-16 encoded messages are incorrectly displayed" was a "wontfix".
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
bug 1026989 is now for issue of "charset is picked up from last attachment" only.
Jorg K. Why dup of that bug?
Because problem of this bug now can occur only when "last attachment=UTF16-LE" && problem of bug 1026989 occurred?
Duping to bug 961983 is better, isn't it?
This bug is for "New Msg" and Bug 961983 is for "Reply".
No problem in "Forward Inline", "Forward Attachment", "Edit As New", "Edit Draft" etc.?
As we studied by bug 1026989, if problem was reported for a case, we have to care about all other cases too.
As I said in comment #24:
From comment #17 onwards the discussion has turned to creating a UFT-16 encoded message by replying to a message with an UTF-16 encoded attachment. This is covered by bug 1026989.
The test case in attachment 722914 [details] has a UTF-16 attachment(!).

Hence duplicate of 1026989 which is about the attachments providing the charset.

In the end it doesn't matter. Triaging makes sense to structure bugs. Once the bugs are fixed, it doesn't make sense to shuffle them around, so can we please stop reshuffling bugs now.
(Off-topic)

Because this bug was already closed, I write Off-topic topic here.

A problem/solution/regression-by-some-change/solution-of-regression occurs in following order.
  X -> P -> ○ -> R -> XX
  Where X = bug report for problem, P=Patch for problem, 〇=bug is not reported because problem was resolved,
   R=regression, XX=new version of bug report due to regression
If 4 components are relevant, each event occurs randomly.
X=bug report is not reported for all releases, so X or XX may not be reported for every problem/regression,
Even when patch was created, not all bugs which should be resolved by it are not closed correctly.
〇 is not reported. P is unknown in many cases unless relevant bug was bug where patch was landed.
No one knows about R when XX is reported. No one knows all events happened on the relevant component to XX.

Example of 4 components in imap.
Compact
 CompactOfOfflineStore X -> P -> ○     -> R -> XX -> ...
 ExpungeOfFolder       X    ->    P -> ○      -> R -> XX -> ...
CompactFolders                                      
 CompactOfOfflineStore   X  -> P -> ○    -> R      -> XX -> ...
 ExpungeOfFolder             X ->   P -> ○ -> R -> XX

When all of X/XX in above are duped to one bug, problem analysis is almost impossible.

In above case, all of relevant bug reports were actually consolidated to "Compact doesn't work as expected" by duping.
Some bugs were closed as FIXED or WORKSFOME during period of problem didn't occur, despite that the bug report was report for issue while problem existed. 
It's chaos. I was a victim of such duping.

When problem was resolved by a patch, if all bug reports which should be closed by the patch was not correctly closed,
and if all was duped merely by same external phenomenon for someone, it's impossible to do problem analysis.
In above case, I had to do re-check of each duped bug report and relevant bug reports, regardless of opened or closed, and I had to do regression window check for each issue in each component due to bad duping.
Because I didn't know patch relevant to problem, I had to check "why problem doesn't occur in a range".
Please note that this was actual case.

Required work for triage, required work for patch creation, required work for post problem solution, are different.
Post problem solution work is preparation for future regression or future new problem, which usually surely occurs sooner or later.
If history of closed bug is incorrect, problem analysis is difficult.
If history of closed bug is wrong, wrong duping will be repeated.
I experienced similar chaos to above in other component too. I don't want to experience such chaos again.

I feel "adding Edit As New in bug summary of bug 715823" was bad action, because it caused your confusion.
I wanted to inform "problem is not in Forward Inline case only" by it to peoples who were playng with the bug.
But I think I shouldn't have added it in bug summary which was not involved in original report.
Comment on "Edit As New" case was added by a kind of me too comment poster.   
For ease of correct understandig of bug, regardless of opened or closed, bug is better processed based on "original report/analysis result of it" instead of "morphed report by many adeed comments".

Anyway, I thought:
  If this old bug is for "New Msg" && UTF16-LE, WORKSFORME, WONTFIX or something is appropriate,
  because original report of this bug is never relevant to your bug 1026989,
  even though both bugs are or were relevant to UTF16-LE.
  Problem of "charset is picked up from last part" and "problem due to UTF16-LE was used or requested"
  is usually considered "absolutely different and independent problem".
  Regardless of that bug is opened or closed, bad is bad.
But I'll never touch this bug any more.
Thanks.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: