Last Comment Bug 653342 - CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail due to wrap produced by mailnews.wraplength and line length limitation of 1000bytes of SMTP
: CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail d...
Status: RESOLVED FIXED
[tb-papercut]
: intl
Product: MailNews Core
Classification: Components
Component: Composition (show other bugs)
: Trunk
: All All
: -- major with 41 votes (vote)
: Thunderbird 45.0
Assigned To: Jorg K (GMT+2, PTO during summer)
:
Mentors:
: 448139 553526 611411 699891 704441 Bogus_EOL_>_char_998 (view as bug list)
Depends on: 1230974 650206 1225864 1225904 1230968 1230970 1230971
Blocks: 26734 355209
  Show dependency treegraph
 
Reported: 2011-04-27 21:08 PDT by Zhong Qiyao
Modified: 2016-04-15 02:18 PDT (History)
46 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
extra unwanted spaces (163.05 KB, image/jpeg)
2011-05-22 01:41 PDT, Zhong Qiyao
no flags Details
long line 01-A, text_html, not format=flowed(iso-2022-jp).eml (14.15 KB, text/plain)
2011-10-09 20:42 PDT, WADA
no flags Details
long line 01-B, text_html, format=flowed(utf-8).eml (22.90 KB, text/plain)
2011-10-09 20:43 PDT, WADA
no flags Details
long line 02-A, text_plain, not format=flowed(iso-2022-jp).eml (6.71 KB, text/plain)
2011-10-09 20:44 PDT, WADA
no flags Details
long line 02-B, text_plain, format=flowed(utf-8).eml (9.72 KB, text/plain)
2011-10-09 20:45 PDT, WADA
no flags Details
line break -> 1000 hacker patch (4.88 KB, patch)
2012-05-22 09:42 PDT, xunxun
no flags Details | Diff | Splinter Review
Partially fix (1.22 KB, patch)
2012-07-25 20:51 PDT, Hiroyuki Ikezoe (:hiro)
m_kato: review-
Details | Diff | Splinter Review
Force to use base64 for plain text message in multi byte (4.04 KB, patch)
2012-07-25 22:45 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Force to use base64 for plain text message in multi byte and html mail (11.13 KB, patch)
2012-07-26 00:46 PDT, Hiroyuki Ikezoe (:hiro)
m_kato: feedback-
Details | Diff | Splinter Review
Force to use base64 for html message (11.10 KB, patch)
2012-07-26 02:48 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
An xpcshell test for savin long line CJK text as draft (5.00 KB, patch)
2012-07-28 18:05 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
An xpcshell test for savin long line CJK text as draft (4.95 KB, patch)
2012-07-28 19:17 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
xpcshell tests for saving as draft and send message (5.61 KB, patch)
2012-07-28 22:43 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Possible fix (11.52 KB, patch)
2012-08-02 02:05 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
xpcshell tests (6.30 KB, patch)
2012-08-02 02:06 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Encode HTML message with base64 to avoid extra spaces in CJK text (11.12 KB, patch)
2013-06-24 01:46 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Adapt to createAndSendMessage change (6.26 KB, patch)
2013-06-24 01:49 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Test ISO-2022-JP.eml (5.61 KB, text/plain)
2015-11-19 12:04 PST, Jorg K (GMT+2, PTO during summer)
no flags Details
Proposed change to always allow format=flowed (2.04 KB, patch)
2015-11-22 08:09 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed change to always allow format=flowed and use the new serialiser flag OutputDisallowLineBreaking (8.54 KB, patch)
2015-11-22 20:59 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed change (v3) (18.94 KB, patch)
2015-11-27 13:53 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed final solution (v4) (18.01 KB, patch)
2015-11-27 22:37 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed final solution (v4) (17.92 KB, patch)
2015-11-27 22:55 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed final solution (v5), includes delsp support. (17.84 KB, patch)
2015-11-28 09:53 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed final solution (v5b), includes delsp support. (17.84 KB, patch)
2015-11-28 10:12 PST, Jorg K (GMT+2, PTO during summer)
mkmelin+mozilla: review+
Details | Diff | Splinter Review
Proposed test (v1) (6.53 KB, patch)
2015-11-29 02:31 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed test (v1b) (6.53 KB, patch)
2015-11-29 13:36 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Proposed test (v1b) (6.53 KB, patch)
2015-11-29 13:37 PST, Jorg K (GMT+2, PTO during summer)
mkmelin+mozilla: review+
Details | Diff | Splinter Review
Proposed final solution (v5c), includes delsp support. (17.94 KB, patch)
2015-12-03 14:01 PST, Jorg K (GMT+2, PTO during summer)
jorgk: review+
Details | Diff | Splinter Review
Proposed test (v1c) (6.53 KB, patch)
2015-12-03 14:23 PST, Jorg K (GMT+2, PTO during summer)
jorgk: review+
Details | Diff | Splinter Review
Correction of the landed test. (4.55 KB, patch)
2015-12-04 16:59 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Correction of the landed test. (take 2) (4.42 KB, patch)
2015-12-04 23:53 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Correction of the landed test. (take 3) (3.65 KB, patch)
2015-12-05 01:23 PST, Jorg K (GMT+2, PTO during summer)
acelists: feedback+
Details | Diff | Splinter Review
Correction of the landed test. (take 4) (3.62 KB, patch)
2015-12-05 03:01 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Correction of the landed test. (take 3a), same as take 3 with added comment. (3.71 KB, patch)
2015-12-05 05:41 PST, Jorg K (GMT+2, PTO during summer)
no flags Details | Diff | Splinter Review
Correction of the landed test. (take 3b), same as take 3 with added comment. (3.79 KB, patch)
2015-12-05 05:44 PST, Jorg K (GMT+2, PTO during summer)
acelists: review+
Details | Diff | Splinter Review

Description Zhong Qiyao 2011-04-27 21:08:58 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.2.15) Gecko/20110303 BTRS26718 Firefox/3.6.15 GTB7.1 ( .NET CLR 3.5.30729)
Build Identifier: 3.1.9

http://forums.mozillazine.org/viewtopic.php?f=39&t=2173931

Version: 3.1.9
Language: Chinese (traditional character version)

Operating System: Microsoft Windows XP
Language: Chinese (traditional character version)

1. Use HTML to compose messages.

2. Send the following text to yourself. Check the "Sent" box and also your received copy.
http://tw.myblog.yahoo.com/lotus7174@ki ... -1&next=-1

3. PROBLEM: I have two versions in different dialects. In one of those, there is an extra
space after both 而 on the leftmost of two lines; in the other, no. In both versions, there is an extra space
between 向 and 上 near the end of the message.

4. PROBLEM: I have another message with 商場. There is an extra space inserted in between.

5. EXPECTATION: No extra space, whether Latin-character text, or in Chinese (trad. or simp.)
versions.

NOTE: This was reported since 3.1.5, but still not resolved:
http://getsatisfaction.com/mozilla_mess ... bird_3_1_5

Thanks.

Qiyao

Reproducible: Always
Comment 1 Zhong Qiyao 2011-05-22 01:41:30 PDT
Created attachment 534284 [details]
extra unwanted spaces

This is what appears in the "Sent" box, and also what the receiver sees.
The body of the mail is set in Courier (not <TT>).
Note that sometimes 歲月 and 一點 got inserted with an intervening space.
Those are hyperlinks, but sometimes plain text also have this problem.

Thanks.
Comment 2 Zhong Qiyao 2011-05-22 01:43:45 PDT
Sometimes, between 東 and 加, it happens.
Comment 3 Zhong Qiyao 2011-08-18 21:28:14 PDT
Do View > Mail Source.
You find that it is unwanted linebreaks in the original Mail Souce
when it sends the e-mail.
In the original mail source, linebreaks in the HTML code happen between
Chinese characters, which should not be inserted.
They cause the rendered e-mail to have a space between those Chinese
characters.

Is this a problem of sending (the linkbreak should not be there)
or of displaying (the linkbreak should disappear instead of becoming
a space).

Or there needs to be a standard as to what it means when Chinese text
is encoded as HTML is allowed to be inserted with a linebreak
between characters in the HTML,
and what should happen when this is rendered as formatted text,
when those two lines will be joined in the same line for display,
whether this linebreak should become a space or should disappear because
of Chinese text formatting can be joined.

Thanks.
Comment 4 asmwarrior 2011-09-26 19:18:48 PDT
I have report a similar bug in the forum, see (also include an screen shot):
http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453

If you enter a long sentence when sending an email, TB will automatically add many  
extra spaces in my text, see:
http://i683.photobucket.com/albums/vv194/ollydbg_cb/2011-09-25192351.png

This made TB useless for all the CJK language users.

see also: http://getsatisfaction.com/mozilla_messaging/topics/unwanted_extra_spaces_inserted_within_the_text_not_at_the_beginning_of_the_message_thunderbird_3_1_5

I just test TB 6.02 and TB 7.0beta2, and the bug still exist. 
Some one can fix this?

thanks.
Comment 5 Zhong Qiyao 2011-09-26 20:04:16 PDT
This happens if you are using HTML e-mail composition.
See my comment dated "Zhong Qiyao 2011-08-18 21:28:14 PDT"
for a cause of the problem.

The solution of the problem will need a clarification as to
whether those linebreaks are allowed in the internal HTML
representation of the the e-mail,
and whether they should be displayed as spaces when those lines
are joined, when the internal HTML representation
is rendered to be displayed for the user.

Try sending an e-mail between Thunderbird and other mailers
(Outlook Express or Postbox as someone mentioned Postbox),
and see what they do with long lines of CJK (Chinese-Japanese-Korean)
text.

The same problem would happend for a normal HTML Web-page in CJK also,
about the HTML representation of optional linebreaks (i.e. when
a line is too long to be displayed) between CJK characters in CJK text.

Thanks.
Comment 6 asmwarrior 2011-09-26 20:15:41 PDT
Well, I have check other email sender, both of them use base64 encoding.

If I send an html styled email(which have some bold CJK char or italic CJK char) from "Gmail web", I have the following content:

--------------------------------------------------------------------------

Content-Type: multipart/alternative; boundary=0016e6dede458e824504ade3981f

--0016e6dede458e824504ade3981f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64

5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOICirlk4jlk4jlk4jl
k4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jl
k4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4gqCuWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiCrlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4gqCuWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiAo=
--0016e6dede458e824504ade3981f
...
...
---------------------------------------------------------------------------

I also check an email client Foxmail(very popular in China mainland), it also send the email with such content like:
-----------------------------------------------------
Content-Type: multipart/mixed;
	boundary="=====001_Dragon381175773018_====="

This is a multi-part message in MIME format.

--=====001_Dragon381175773018_=====
Content-Type: text/plain;
	charset="gb2312"
Content-Transfer-Encoding: base64
.....
-----------------------------------------------------

So, I'm wondering whether TB can send base64 based encoding for text.
It seems the answer is NO.
Comment 7 Zhong Qiyao 2011-09-27 01:22:37 PDT
Concerning:
http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453

It may be because of the defition of optional line breaks between two Chinese (CJK) characters
when line wrapping, breaking, and joining happens, when using plain text HTML.

Base64 may be used if the trasmitting medium requires a line break after every so many ASCII characters.
But if Base64 is not used, Thunderbird may consider *not* inserting line breaks between Chinese
characters at all, just as if a line of 100 Latin letters is wanted to be transmitted by the sender
with no "force linebreaks" upon transmission.  The receiver will then receive a line with 100 Latin
letters, and determine, when displaying, whether to "force iinebreaks" when displaying.  Some mail
displayers don't.

But where is the voting page?
Comment 8 asmwarrior 2011-09-27 01:41:23 PDT
Search the word "vote" in this page, you will find it.
Comment 9 WADA 2011-10-09 20:42:29 PDT
Created attachment 565855 [details]
long line 01-A, text_html, not format=flowed(iso-2022-jp).eml
Comment 10 WADA 2011-10-09 20:43:37 PDT
Created attachment 565856 [details]
long line 01-B, text_html, format=flowed(utf-8).eml
Comment 11 WADA 2011-10-09 20:44:24 PDT
Created attachment 565857 [details]
long line 02-A, text_plain, not format=flowed(iso-2022-jp).eml
Comment 12 WADA 2011-10-09 20:45:57 PDT
Created attachment 565858 [details]
long line 02-B, text_plain, format=flowed(utf-8).eml
Comment 13 WADA 2011-10-09 21:01:16 PDT
Attached mails are mail created by Send Later of Tb 7.
  Three lines written in Japanese character,
  1024 Japanese unicode characters per line.
(1) HTML, Options/Format/\plain and Rich(HTML) text
(1-A) long line 01-A, text_html, not format=flowed(iso-2022-jp)
(1-B) long line 01-B, text_html, format=flowed(utf-8).eml
(2) Plain text mail, composed in text mode
(2-A) long line 02-A, text_plain, not format=flowed(iso-2022-jp)
(2-B) long line 02-B, text_plain, format=flowed(utf-8).eml
Note: If Japanese charset like iso-2022-jp, format=flowed is currently disabled internally, even when mailnews.send_plaintext_flowed=true. If utf-7, format=flowed is used by mailnews.send_plaintext_flowed=true.
Comment 14 WADA 2011-10-09 21:50:28 PDT
Observed phenomena around very long line during mail composition.

(i) If composed in HTML mode and test only, bug 414299 occurs and text/plain part(converted from text/html part) only is sent, unless Options/Format/Plain and Rich(HTML) text is explicitly requested.

(ii) If HTML is sent with Options/Format/Plain and Rich(HTML), both text/plain part(converted from text/html part) and text/html part are sent.
If this mail is shown with View/Message Body As/Plain Text, bug 253830 happens, and converted text version of text/plain part in multipart/alternative is shown, instead of text/plain part in multipart/alternative.

(iii) Tb 7 looks to send text/plain part with base64 encoded if long mail line exists or long word without space exists, in order to avoid insertion of space or removal of space due to reformatting for fomat=flowed.

(iv) In text/html part, Tb looks to wrap at "76 unicode characters", and adds extra spaces at left of a line for readability of mesage source.
As I intentionally tested with editor.htmlWrapColumn=8888, "always wrap html at 76 unicode characters" still remains.
I don't know bug at B.M.O for this phenomenon is alredy opened or not, although I saw report of the phenomenon in some bugs.

(v) In text/plain part(converted from text/html part), next is observed.
  - Text data length in a line is around 990 bytes.
    This is probably "wpap around 1000bytes if long mail line".
  - Inserted spaces at left of line in text/html part is seen as single space.
    This is probably a result of that "multiple spaces is equivallent to a space
    in HTML" is applied upon conversion to text from html.
    Bug 262475 is a relevant bug to this phenomenon.

(vi) Bug 355209 is seen in mails composed in text mode.
     Problem like Bug 355209 doesn't look to occur in html2text conversion.

Phenomenon you saw is already reported to bug 611411 for Korean text. IIRC, similar phenomenon is reported for Japanese text too.

Confirming.
Comment 15 WADA 2011-10-09 22:27:35 PDT
Note:
I intentionally used mailnews.wraplength=0 in testing(no wrap by wrap length, wrap by mail line length limitation in SMTP only which is around 1000bytes). If small mailnews.wraplength value is used, wrap position in text/plain part or in text mode composition may be different from my test mail.
Comment 16 WADA 2011-10-09 22:57:29 PDT
Gmail looks to avoid "problems due to wrap" by "send in base64" if line length is longer than mail line length limitation(aound 1000bytes).
Because, as seen in "long line 01-B, text_html, format=flowed(utf-8).eml" I attached, Tb already sends text/plain part in base64 in some circumstances,  sending in base64 is a universal/practical Tb's solution of problems around "mail line length limitation" and problems around "new line between CJK and CJK/non-CJK during mail composition", as you say.
  - Wrap in mail source at an ascii-space only with any language/charset.
  - Send in base64, if line length exceeds line length limitation definrd by RFC.
Comment 17 asmwarrior 2011-10-14 02:04:08 PDT
I totally agree with WADA. Thanks for supplying so many sample emails. I strongly suggest the TB developers can consider this.
Comment 18 WADA 2011-10-29 19:53:30 PDT
HTMLWrapColumn seems already gone away, and "wrap at 72 char if HTML" seems hard coded.
Setting dependency to bug 650206
Comment 19 WADA 2011-10-29 20:05:08 PDT
*** Bug 611411 has been marked as a duplicate of this bug. ***
Comment 20 dreamon 2011-11-05 00:59:09 PDT
*** Bug 699891 has been marked as a duplicate of this bug. ***
Comment 21 WADA 2011-11-06 01:23:12 PDT
Changing to Mail&News Core/Composition, according to component change of dependent bug 650206.
Comment 22 Ludovic Hirlimann [:Usul] 2012-04-12 07:21:07 PDT
*** Bug 448139 has been marked as a duplicate of this bug. ***
Comment 23 asmwarrior 2012-04-26 18:07:09 PDT
Ping. Any good news about this? As Thunderbird is now version 12, but this bug still exists, and it is too annoying. When the email receiver see such email, he/she thought the sender is not serious, he/she believe the sender hit many space key when wrote the email. Too bad thing.
Comment 24 Zhong Qiyao 2012-05-03 00:44:43 PDT
See my comments asking whether those line-breaks are allowed to be inserted
between CJK characters arbitrarily.
https://bugzilla.mozilla.org/show_bug.cgi?id=653342#c5
https://bugzilla.mozilla.org/show_bug.cgi?id=653342#c7

Or maybe it is a flaw in the definition of HTML itself regarding a long
unbroken string of CJK characters, whether it is allowed to line-break or
space-break on the screen or a in the HTML coding, without a line-break
or space-break in the user-representation.  It is unlikely to
happen for Latin text for Latin-text is allowed to line-break only
at spaces.

Maybe someone in the current thread can download the Thunderbird source and
try to patch it away?

Thanks.
Comment 25 Zhong Qiyao 2012-05-15 03:39:35 PDT
Postbox Express also has this problem even if you configure it to use "quoted printable" (base64).

Source:
長長長長長長長長長長二二二二二二二二二二三三三三三三三三三三四四四四四四四四四四五五五五五五五五五五

Sent Box and Received Result:
長長長長長長長長長長二二二二二二二二二二三三三三三三三三三三四四四四四四 四四四四五五五五五五五五五五

Message Source:
6ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW35LqM5LqM5LqM5LqM5LqM5LqM5LqM5LqM
5LqM5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5Zub5Zub5Zub5Zub5Zub5Zub5Zub
IA0K5Zub5Zub5Zub5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqUDQo=

Thanks.
Comment 26 Zhong Qiyao 2012-05-15 03:51:05 PDT
Foxmail does not exhibit the above problem with the above test text.

Thanks.
Comment 27 Zhong Qiyao 2012-05-15 18:36:22 PDT
The mailer which comes with Opera also does not have this problem when sending as HTML.

Thanks.
Comment 28 mr.kang.chen 2012-05-15 18:58:27 PDT
Any mail client works well as long as it is not thunderbird or thunderbird-based ones.

This is such a deadly bug for CJK users and it's been there for a long long time.

No more expectations for mozilla.

I am migrating from thunderbird to browsers on some devices. Just like I migrated from firefox to chrome for good.
Comment 29 Zhong Qiyao 2012-05-19 05:28:01 PDT
If it is not solved, then Mozilla Thunderbird won't be suitable for CJK users in HTML mode.  In fact, Chrome has problems with Yahoo! blog, and I am in the process of migration to simplicity: Opera instead of Thunderbird and Firefox.  Neither Firefox nor Opera is all-powerful; for the missing Web sites, use the inevitable Microsoft Internet Explorer.

Bye.
Comment 30 xunxun 2012-05-19 06:52:31 PDT
I think we should modify the hardcode:

nsPlaintextEditor::nsPlaintextEditor()
: nsEditor()
, mRules(nsnull)
, mWrapToWindow(false)
, mWrapColumn(0)


Change mWrapColumn(0) to mWrapColumn(1000)?
Comment 31 xunxun 2012-05-19 06:55:17 PDT
(In reply to xunxun from comment #30)
> I think we should modify the hardcode:
> 
> nsPlaintextEditor::nsPlaintextEditor()
> : nsEditor()
> , mRules(nsnull)
> , mWrapToWindow(false)
> , mWrapColumn(0)
> 
> 
> Change mWrapColumn(0) to mWrapColumn(1000)?

If it is correct, I will try to build tb13b2 myself and debug it.
Comment 32 xunxun 2012-05-19 11:16:05 PDT
I change:

nsDocumentEncoder.cpp
nsPlaintextEditor.cpp
nsContentUtils.cpp
nsTextEditorState.cpp
nsWebBrowserPersist.cpp
nsSelection.cpp


SetWrapColumn() => SetWrapColumn(1000)

mWrapColumn => mWrapColumn=1000


After the change, the state is improved, but not perfect, because there is also a break after 1000 characters.

So we should need a wrapcol's config option.


The test edition : http://pcxfirefox.googlecode.com/files/tb13b2_sse2_nopgo_test_20120520.7z
Comment 33 Ludovic Hirlimann [:Usul] 2012-05-22 08:40:13 PDT
xunxun could you make a patch and post the patch to this bug ?
Comment 34 xunxun 2012-05-22 09:42:59 PDT
Created attachment 626060 [details] [diff] [review]
line break -> 1000 hacker patch

(In reply to Ludovic Hirlimann [:Usul] from comment #33)
> xunxun could you make a patch and post the patch to this bug ?

Sure. But I am not Mozilla developer, this is only a hacker patch. Use for enlarge the line break.

The best choice should be that tb use base64 to send and receive mails.
Comment 35 Ludovic Hirlimann [:Usul] 2012-05-22 10:12:34 PDT
(In reply to xunxun from comment #34)
> Created attachment 626060 [details] [diff] [review]
> line break -> 1000 hacker patch
> 
> (In reply to Ludovic Hirlimann [:Usul] from comment #33)
> > xunxun could you make a patch and post the patch to this bug ?
> 
> Sure. But I am not Mozilla developer, this is only a hacker patch. Use for
> enlarge the line break.

You are welcome to become one :-)

The other solution could also be implemented (I'm not sure we want it , let's ask david)
Comment 36 David :Bienvenu 2012-05-22 17:20:11 PDT
the patch is in all non-mailnews code, so you'd need a gecko module owner to look at it...but mailnews code does have control over when we use base 64, and we do have several conditions which trigger base64 encoding. Perhaps it would be easy to add this case to that.
Comment 37 Hiroyuki Ikezoe (:hiro) 2012-05-22 17:35:41 PDT
I've seen a similar issue in other bug. maybe bug 553526.
Comment 38 asmwarrior 2012-05-22 17:38:35 PDT
I suggest that some one can create a patch that trigger base64 by default on CJK text.
Comment 39 xunxun 2012-05-23 03:51:15 PDT
(In reply to David :Bienvenu from comment #36)
> but mailnews code does have control over when we use base 64,
> and we do have several conditions which trigger base64 encoding. Perhaps it
> would be easy to add this case to that.

Hope the feature is implemented soon.
Comment 40 Seungbeom Kim 2012-06-18 21:35:58 PDT
(In reply to xunxun from comment #34)
> The best choice should be that tb use base64 to send and receive mails.

I don't want Tb to use base64 for text indiscreetly.

At least without any feature to decode base64 in the source within Tb. My experience may not reflect that of the most typical users, but I often view the source of a message I'm curious about, only to be frustrated by the unintelligible base64-encoded text (often from Gmail).

Furthermore, base64 is meant for binary, and its use increases the body size by 1/3, and should be limited to cases where most of the bytes are not printable characters, or others where it's absolutely necessary. If soft line breaks are necessary, isn't QP generally a better choice? (I understand it may not always be better.)
Comment 41 mwu4 2012-07-11 09:30:14 PDT
After reading " https://wiki.mozilla.org/Thunderbird/Proposal:_New_Release_and_Governance_Model ", I don't know whether this bug will be fixed. Sighed !!
Comment 42 Kent James (:rkent) 2012-07-23 19:44:55 PDT
Do you need an Asian Thunderbird and/or OS version to see this, or is it also reproducible on American versions of Windows and Thunderbird?
Comment 43 xunxun 2012-07-23 20:10:53 PDT
(In reply to Kent James (:rkent) from comment #42)
> Do you need an Asian Thunderbird and/or OS version to see this, or is it
> also reproducible on American versions of Windows and Thunderbird?

Chinese OS (Win7) + EN-US Thunderbird can reproduce it.

I don't know whether English OS + EN-US TB has the issue.
Comment 44 mr.kang.chen 2012-07-23 20:19:49 PDT
(In reply to Kent James (:rkent) from comment #42)
> Do you need an Asian Thunderbird and/or OS version to see this, or is it
> also reproducible on American versions of Windows and Thunderbird?

I used EN-GB Windows (and Linux) & Thunderbird, and I could see it.

I think you can see it on all platforms and versions.
Comment 45 Kent James (:rkent) 2012-07-24 15:59:35 PDT
:hiro, have you considered taking on this bug? It has received a lot of votes in a short period of time.
Comment 46 Hiroyuki Ikezoe (:hiro) 2012-07-24 16:11:35 PDT
I thought mkato is a proper person, but I will try.
Comment 47 Kent James (:rkent) 2012-07-24 16:16:53 PDT
:hiro that would be great! I really hate to see us ignoring this when it seems to be important to asian users.
Comment 48 Makoto Kato [:m_kato] 2012-07-24 23:59:17 PDT
-
Comment 49 Kent James (:rkent) 2012-07-25 09:01:31 PDT
Could the approach from comment 36 be taken with this bug to avoid the dependence on bug 26734?
Comment 50 Hiroyuki Ikezoe (:hiro) 2012-07-25 15:50:10 PDT
(In reply to Kent James (:rkent) from comment #49)
> Could the approach from comment 36 be taken with this bug to avoid the
> dependence on bug 26734?

I guess so. It will be a workaround for now but I think implementing CJKTextSerializer is the right thing to fix this issue.
Comment 51 Makoto Kato [:m_kato] 2012-07-25 19:09:07 PDT
(In reply to Kent James (:rkent) from comment #49)
> Could the approach from comment 36 be taken with this bug to avoid the
> dependence on bug 26734?

For this issue, we need support delsp=yes for plain text mail.  (I have already landed this support for nsIDocumentEncoder)

And GetBodyFromEditor sets wrapped HTML because we uses nsIDocumentEncoder::OutputFormatted.  We should use OutputRaw instead.
Comment 52 Hiroyuki Ikezoe (:hiro) 2012-07-25 20:51:48 PDT
Created attachment 646005 [details] [diff] [review]
Partially fix

I'd suggest use of nsIDocumentEncoder::OutputRaw in any way.

This patch fixes only if the mail is a multipart/alternative HTML mail but is necessary for both of the approaches, I think.
Comment 53 asmwarrior 2012-07-25 21:06:43 PDT
I'm happy to test if someone can build a new Windows TB with the patch above. Thank you.
Comment 54 Hiroyuki Ikezoe (:hiro) 2012-07-25 22:45:25 PDT
Created attachment 646034 [details] [diff] [review]
Force to use base64 for plain text message in multi byte

This patch is the approach suggested by David in comment 36.

If the message is a plain text message in multibyte composed by html composing window.

I hope the case of plain text composing window will be solved in bug 553526 or others.
Comment 55 Hiroyuki Ikezoe (:hiro) 2012-07-25 22:46:54 PDT
Comment on attachment 646034 [details] [diff] [review]
Force to use base64 for plain text message in multi byte

Ooops! sorry the logic in NeedsConvetionToPlainText seems wrong..
Comment 56 Makoto Kato [:m_kato] 2012-07-25 22:48:26 PDT
Comment on attachment 646005 [details] [diff] [review]
Partially fix

Review of attachment 646005 [details] [diff] [review]:
-----------------------------------------------------------------

SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().

According to commnet of EnsureLineBreaks(), we have to set wrap per 1000 bytes (for NNTP?).  So we may use BASE64 for HTML.

Also, Outlook uses quoted-printable for HTML to avoid this.
Comment 57 Hiroyuki Ikezoe (:hiro) 2012-07-25 22:56:48 PDT
(In reply to Makoto Kato from comment #56)
> Comment on attachment 646005 [details] [diff] [review]
> Partially fix
> 
> Review of attachment 646005 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().

Well, I am sorry I can not understand what you are saying...
I do not think the patch fixes the draft saving case. 

You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all? Or OutputRaw will cause another issue in the case of draft?
Comment 58 Makoto Kato [:m_kato] 2012-07-25 23:04:27 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #57)
> (In reply to Makoto Kato from comment #56)
> > Comment on attachment 646005 [details] [diff] [review]
> > Partially fix
> > 
> > Review of attachment 646005 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> > Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().
> 
> Well, I am sorry I can not understand what you are saying...
> I do not think the patch fixes the draft saving case. 

When saving to Draft by [Save this message], SnarfAndCopyBody is called.  But this function sets wrap per LINE_BREAK_MAX.

- Step
1. Open Compose Window
2. set body あx2000 characters
3. Save to Draft by [Save]
4. Reopen this mail on Draft

- Result
character on body is corrupted due to SnarfAndCopyBody().


> You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all?
> Or OutputRaw will cause another issue in the case of draft?

No.  draft issue is SnarfAndCopyBody().

OutputRaw doesn't set wrap.  So a line of HTML body may be over to 1000 bytes.  To avoid this for old compatibility?, we should use BASE64 for HTML.
Comment 59 Hiroyuki Ikezoe (:hiro) 2012-07-26 00:33:52 PDT
(In reply to Makoto Kato from comment #58)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #57)
> > (In reply to Makoto Kato from comment #56)
> > > Comment on attachment 646005 [details] [diff] [review]
> > > Partially fix
> > > 
> > > Review of attachment 646005 [details] [diff] [review]:
> > > -----------------------------------------------------------------
> > > 
> > > SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> > > Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().
> > 
> > Well, I am sorry I can not understand what you are saying...
> > I do not think the patch fixes the draft saving case. 
> 
> When saving to Draft by [Save this message], SnarfAndCopyBody is called. 
> But this function sets wrap per LINE_BREAK_MAX.
> 
> - Step
> 1. Open Compose Window
> 2. set body あx2000 characters
> 3. Save to Draft by [Save]
> 4. Reopen this mail on Draft
> 
> - Result
> character on body is corrupted due to SnarfAndCopyBody().

Thanks, that is what I wanted know, I mean regression.

> > You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all?
> > Or OutputRaw will cause another issue in the case of draft?
> 
> No.  draft issue is SnarfAndCopyBody().
> 
> OutputRaw doesn't set wrap.  So a line of HTML body may be over to 1000
> bytes.  To avoid this for old compatibility?, we should use BASE64 for HTML.

To resolve bug 553526 we should use base64 for plain text in multibyte either?
Comment 60 Hiroyuki Ikezoe (:hiro) 2012-07-26 00:46:29 PDT
Created attachment 646059 [details] [diff] [review]
Force to use base64 for plain text message in multi byte and html mail

Need feedbacks from expters.
Comment 61 Makoto Kato [:m_kato] 2012-07-26 00:50:00 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #59)

> To resolve bug 553526 we should use base64 for plain text in multibyte
> either?

We should not use base64 for plain text mail if it isn't attachment file / multi-part.  We can fix this for text mail by format=flowed and delsp=yes (bug 26734).
Comment 62 Makoto Kato [:m_kato] 2012-07-26 02:05:26 PDT
Comment on attachment 646059 [details] [diff] [review]
Force to use base64 for plain text message in multi byte and html mail

Review of attachment 646059 [details] [diff] [review]:
-----------------------------------------------------------------

Even if charset is us-ascii, DocumentEncoder may output character entity (&#x1234;).  EnsureLineBreaks() can break character entity if it is between 998 and 999, so You should not use EnsureLineBreaks() for text/html and should remove this.

Also, when plain text and not multi-part, use format=followed like comment #61.
Comment 63 Hiroyuki Ikezoe (:hiro) 2012-07-26 02:14:17 PDT
(In reply to Makoto Kato from comment #62)
> Comment on attachment 646059 [details] [diff] [review]
> Force to use base64 for plain text message in multi byte and html mail
> 
> Review of attachment 646059 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Even if charset is us-ascii, DocumentEncoder may output character entity
> (&#x1234;).  EnsureLineBreaks() can break character entity if it is between
> 998 and 999, so You should not use EnsureLineBreaks() for text/html and
> should remove this.

Yes, I've noticed it. Do you know where the code the thing is done?
Comment 64 Hiroyuki Ikezoe (:hiro) 2012-07-26 02:17:19 PDT
Ooops! Wait! I just noticed mkato have been restarting to fix bug 26734. 

The codes is overlapping my code. I stop to write the fix for this issue for now.
Comment 65 Hiroyuki Ikezoe (:hiro) 2012-07-26 02:48:29 PDT
Created attachment 646075 [details] [diff] [review]
Force to use base64 for html message

For the record, I attach the current WIP patch.

I think this patch works fine in most HTML mail cases.
This patch also solves us-ascii HTML issue mentioned in comment 62.

I will rework after the patch for bug 26734 landing if there is still work to do for this issue.
Comment 66 Makoto Kato [:m_kato] 2012-07-26 20:00:02 PDT
I will discuss this with Ikezoe-san tomorrow.
Comment 67 Hiroyuki Ikezoe (:hiro) 2012-07-28 18:05:26 PDT
Created attachment 646928 [details] [diff] [review]
An xpcshell test for savin long line CJK text as draft
Comment 68 Hiroyuki Ikezoe (:hiro) 2012-07-28 19:17:28 PDT
Created attachment 646934 [details] [diff] [review]
An xpcshell test for savin long line CJK text as draft

The last test has a garbage...
Comment 69 Hiroyuki Ikezoe (:hiro) 2012-07-28 22:43:06 PDT
Created attachment 646943 [details] [diff] [review]
xpcshell tests for saving as draft and send message
Comment 70 WADA 2012-07-29 22:52:40 PDT
As for text/html part and "inserted CRLF for folding + a space for folding + spaces for HTML source indention", "additional space betwenn two CJK chars in HTML rendering" is already resolved by 135323(fixed on 2002-07-08).
However, problem of bug 156369 stil exists(CJK<span>[CRLF]CJK like one) in text/html.
Needless to say, problem of "wrap in attribute value of a HTML tag" exists in text/html part.

These are not directly related to text/plain part, but it may cause this bug in text/plain part, if text/plain part is generated by html2textconverter.
So, if HTML mail is sent in text/plain only due to automatic downgrade to text mail by Option/Format/Auto Detect, this bug always occurs at mail recipient side.

Ikezoe san, Kato san, will this worst case be resolved by fix for this bug?
Comment 71 Hiroyuki Ikezoe (:hiro) 2012-07-30 02:28:42 PDT
I suppose the worst case is a delsp=yes case. So the case will be solved by the fix for that bug.
Comment 72 Makoto Kato [:m_kato] 2012-08-01 00:15:53 PDT
(In reply to WADA from comment #70)
> As for text/html part and "inserted CRLF for folding + a space for folding +
> spaces for HTML source indention", "additional space betwenn two CJK chars
> in HTML rendering" is already resolved by 135323(fixed on 2002-07-08).
> However, problem of bug 156369 stil exists(CJK<span>[CRLF]CJK like one) in
> text/html.
> Needless to say, problem of "wrap in attribute value of a HTML tag" exists
> in text/html part.

If we don't use formatted flag for document encoder, html isn't wrapped.  And, if we use base64 for text/html, it is unnecessary to take care wrap for HTML part.

Then, text/plain is generated by this HTML.  follow and delsp flag can handle wrap well.

So we can ignore bug 156369 when using raw flag for document encoder and base64.

mailnews should not handle wrap because document encoder in Gecko can handle wrap.  mailnews will break html structure even if current code.


> These are not directly related to text/plain part, but it may cause this bug
> in text/plain part, if text/plain part is generated by html2textconverter.
> So, if HTML mail is sent in text/plain only due to automatic downgrade to
> text mail by Option/Format/Auto Detect, this bug always occurs at mail
> recipient side.

Although I test some cases, I cannot reproduce this issue.  It should be filed to Gecko if possible.  I don't know this issue for plain text serializer.
Comment 73 WADA 2012-08-01 01:15:33 PDT
(In reply to Makoto Kato from comment #72)
> > So, if HTML mail is sent in text/plain only due to automatic downgrade to
> > text mail by Option/Format/Auto Detect, this bug always occurs at mail
> > recipient side.
> Although I test some cases, I cannot reproduce this issue.  It should be
> filed to Gecko if possible.  I don't know this issue for plain text
> serializer.

text/plain part of attached mail data by me is generated by following.
(1) Compose a mail in HTML mode.
(2) Send Later. text/plain part and/or text/plain part is sent by one of next;
    Option/Format : (2-1) Rich Text(HTML) and Plain Text
                    (2-2) Rich Text(HTML) Only
                    (2-3) Auto-Detect
(2-1) Rich Text(HTML) and Plain Text :
text/plain part and text/html part are embed in mulripart/alternative.
data of text/plain part is generated from HTML data by Tb.
(2-2) Rich Text(HTML) Only
text/html is sent.
(2-3) Auto-Detect :
If not all mail recipients don't have preference of HTML mail in Address Book,
(i.e. one of recipients has preference of "Text mail" or "Unknown" or not defined in Address Book)
and if HTML  doesn't have sufficient formatting which requirs HTML(e.g. text only HTML mail),
Auto Detect automatically/silently downgrades to text/plain mail, and sends text/plain mail even though HTML mode composition.
In this case, mail data is same as text/plain part of (2-1).
This is known issue, but is currently by design/implementation of Auto-Detect.

If (2-1) and recipient uses View/Message Body As/Plain Text, Tb shows text/plain part under multipart/alternative, so problem of this bug is exposed to mail recipient.
If (2-2) and recipient uses View/Message Body As/Plain Text, Tb shows text data converted from data in text/html, then problem of this bug is exposed to mail recipient.
If (2-3), there is no text/html part, text/plain data only. So, this bug is always exposed to mail recipient.

"Sending text/plain part with delsp=yes of (2-1)" can't resolve this bug's problem in (2-2), unless "HTML to Text conversion of (2-1)/(2-3) upon coposition" and  "HTML to Text conversion of (2-2) upon mail display" are absolutely same.

Question by me was;
- Will this bug in text/plain part of (2-1) and (2-3) be resolved?
- Will this bug in (2-2) be resolved?

You didn't see this bug in (2-1) and/or in (2-3) in your test?
You didn't see this bug in (2-2) in your test?

Because wraping nor "indention of HTML source by space for readability of HTML souce" will not happen in text/html part data after fix of this bug, problem will not occur in both (2-1)/(2-3) and (2-2)?
Comment 74 Hiroyuki Ikezoe (:hiro) 2012-08-02 02:05:46 PDT
Created attachment 648251 [details] [diff] [review]
Possible fix

While I was writing test codes, I could also write a fix for this issue.

This patch does encode html message with base64.
Comment 75 Hiroyuki Ikezoe (:hiro) 2012-08-02 02:06:55 PDT
Created attachment 648252 [details] [diff] [review]
xpcshell tests
Comment 77 Hiroyuki Ikezoe (:hiro) 2012-08-02 02:19:19 PDT
(In reply to WADA from comment #73)
> text/plain part of attached mail data by me is generated by following.
> (1) Compose a mail in HTML mode.
> (2) Send Later. text/plain part and/or text/plain part is sent by one of
> next;
>     Option/Format : (2-1) Rich Text(HTML) and Plain Text
>                     (2-2) Rich Text(HTML) Only
>                     (2-3) Auto-Detect
> (2-1) Rich Text(HTML) and Plain Text :
> text/plain part and text/html part are embed in mulripart/alternative.
> data of text/plain part is generated from HTML data by Tb.
> (2-2) Rich Text(HTML) Only
> text/html is sent.
> (2-3) Auto-Detect :

attachment 648251 [details] [diff] [review] fixes (2-2) and html part of (2-1). I mean all html message is encoded with base64 (without extra spaces). The attachment also fixes (2-3) if the message has html part.
Comment 78 Hiroyuki Ikezoe (:hiro) 2012-08-02 15:40:57 PDT
try server result:
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e69fe1984c9
Comment 79 xunxun 2012-08-07 21:36:06 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #78)
> try server result:
> https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e69fe1984c9

It means attachment 648251 [details] [diff] [review] fix the issue on Windows?
Comment 80 Kao Shiang-Yuan 2012-08-07 22:24:54 PDT
Hello. I tried the binary in comment 78 by Hiroyuki Ikezoe on Windows 7 and Ubuntu Linux

thunderbird-17.0a1.en-US.win32.installer.exe
SHA1: 4a4ef9703bf974c7950384dfbe46a0a4ebd6a86e
OS: Windows 7 32-bit, Traditional Chinese (Taiwan) edition

thunderbird-17.0a1.en-US.linux-x86_64.tar.bz2
SHA1: 53a04899d209a74519a029efdfe9260d724f143a
OS: Ubuntu 12.04 64-bit, environment variable LANG=en_US.UTF-8

In my test, the test result has no difference on Windows and Linux (not sure for grammar, but I mean test_result_on_windows == test_result_on_linux)

1. Install(windows) or extract(Linux) Thunderbird
2. Run Thunderbird - Windows: double click desktop shortcut, Linux: run ./thunderbird in terminal/console
3. setup email account
4. Compose a message. 200,60,60,60,200 traditional Chinese character in each line, filled with with word '測試' (this word contains 2 Chinese characters), and send it

Result: There are spaces every 36 Chinese character, every lines was affected.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

5. in config editor, set mail.wrap_long_lines = false (default is true)
6. Compose a message. 200,60,60,60,200 traditional Chinese character in each line

Result: There are spaces every 36 Chinese character, every lines was affected.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

7. in config editor, Set mailnews.wraplength = 1000  (default is 72)
8. Compose a message. 200,60,60,60,200 traditional Chinese character in each line

Result: No extra spaces.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

9. Compose a message, one line with 1002 Chinese character

Result: one space every 495 Chinese character.
Additional information: No Chinese character in mail source, its base64 encoded.

10. 200,400,600,800,1000 Chinese character each line

Result: First and second line is still intact (no extra space), but in 3rd, 4th and 5th line, there are space every 495 Chinese character.
Additional information: No Chinese character in mail source, its base64 encoded.


Thanks
Comment 81 Hiroyuki Ikezoe (:hiro) 2012-08-07 22:33:04 PDT
Kao, thanks for the testing.

The binary has an effect only on html message. Did you surely compose those messages on html message editor?
Comment 82 Hiroyuki Ikezoe (:hiro) 2012-08-07 22:36:16 PDT
(In reply to xunxun from comment #79)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #78)
> > try server result:
> > https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e69fe1984c9
> 
> It means attachment 648251 [details] [diff] [review] fix the issue on
> Windows?

It means that the tests for attachment 645281 [details] [diff] [review] (i.e. attachment 648252 [details] [diff] [review]) passed on all platforms.
Comment 83 Kao Shiang-Yuan 2012-08-07 22:52:57 PDT
Sorry I didn't change the Options/Format from Auto-Detect to 'Rich Text (HTML) Only' when composing mail.
Testing in progress.
Comment 84 Kao Shiang-Yuan 2012-08-08 00:12:01 PDT
Hello. I tried the binary in comment 78 by Hiroyuki Ikezoe on Windows 7 and Ubuntu Linux, and set format to HTML
I did NOT reuse the test environment, I reverted the virtual machine image, so this does not contain changes in config editor in previous tests.

thunderbird-17.0a1.en-US.win32.installer.exe
SHA1: 4a4ef9703bf974c7950384dfbe46a0a4ebd6a86e
OS: Windows 7 32-bit, Traditional Chinese (Taiwan) edition

thunderbird-17.0a1.en-US.linux-x86_64.tar.bz2
SHA1: 53a04899d209a74519a029efdfe9260d724f143a
OS: Ubuntu 12.04 64-bit, environment variable LANG=en_US.UTF-8

In my test, the test result has no difference on Windows and Linux (test_result_on_windows == test_result_on_linux)

1. Install(windows) or extract(Linux) Thunderbird
2. Run Thunderbird - Windows: double click desktop shortcut, Linux: run ./thunderbird in terminal/console
3. setup email account
4. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. 200,60,60,60,200 traditional Chinese character in each line, filled with with word '測試' (this word contains 2 Chinese characters), and send it

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

5. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. one line with 1002 Chinese character

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

6. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. 200,400,600,800,1000 Chinese character each line

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

7. Generate a string with following shell script
#!/bin/bash
for (( t1=0;t1<50;t1++ )); do
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n " " # one space
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "  " # two spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "          " # 10 spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "                    " # 20 spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
done
echo ""

8. Copy the string to Windows and Linux machines
9. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. Paste the generated string

Result: No extra spaces, and no missing spaces.
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

Thanks.
Comment 85 Hiroyuki Ikezoe (:hiro) 2012-08-08 01:12:30 PDT
Kao, thanks for the report. That is the behaviors what I am expecting.

Can you also check the behavior 'Plain and Rich Text' option? If it works correctly, extra spaces will be inserted in plain text part and not in HTML part.
Comment 86 asmwarrior 2012-08-08 01:37:00 PDT
QUOTE:Can you also check the behavior 'Plain and Rich Text' option? If it works correctly, extra spaces will be inserted in plain text part and not in HTML part.

Hi, I can confirm this, when I send such email to myself, I see that "view->message body as->  original/simple html" show the message correctly, but when view the message body as plain text, there are some extra spaces added shown.

I see the message source like below:
This is a multi-part message in MIME format.
--------------090004030605080301040504
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64
........
........


--------------090004030605080301040504
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: base64
.......
.......


I'm testing under Windows XP.

Thanks.

asmwarrior
Comment 87 Hiroyuki Ikezoe (:hiro) 2012-08-08 01:53:01 PDT
Thanks for the additional test.

(In reply to asmwarrior from comment #86)
> QUOTE:Can you also check the behavior 'Plain and Rich Text' option? If it
> works correctly, extra spaces will be inserted in plain text part and not in
> HTML part.
> 
> Hi, I can confirm this, when I send such email to myself, I see that
> "view->message body as->  original/simple html" show the message correctly,
> but when view the message body as plain text, there are some extra spaces
> added shown.
> 
> I see the message source like below:
> This is a multi-part message in MIME format.
> --------------090004030605080301040504
> Content-Type: text/plain; charset=UTF-8; format=flowed
> Content-Transfer-Encoding: base64

Unfortunately that is not what I expected. The Content-Transfer-Encoding should be 8-bit in this case.
I will investigate it.
Comment 88 WADA 2012-08-08 16:29:11 PDT
Quick check result with try server build, by HTML composition, with Options/Format=HTML & Text, Options/Character Encoding=iso-2022-jp/utf-8/iso-8859-1(default), a line of 4000*(a Japanese character).

(1) text/plain part. checked with mail.wrap_long_lines=true
(1-1) iso-2022-jp, mailnews.wraplength=72
  Because of iso-2022-jp, format=flowed is prohibited internally,
  so format=flowed was not used.
  Sent in Content-Transfer-Encoding: 7bits
  Excess space was not observed. 
  By mailnews.wraplength=72, wrapped at 72 characters(it was not at 72 bytes).
  It looks "wrap at wraplength chars" instead of "wrap at wraplength bytes".
  Excess space was not observed. 
(1-2) charset=utf-8/iso-2022-jp, mailnews.wraplength=0(==no limit)
  utf-8 : format=flowed, iso-2022-jp : no format=flowed
  Content-Transfer-Encoding: Base64
  By mailnews.wraplength=0 and SMTP limit, "wrap around 990 bytes" was observed.
  For many {3 bytes utf-8 code for a Japanese char}, following was seen.
    N * {3 bytes utf-8 code for a Japanese char} + 0x20 + [CRLF]
  This excess space was not observed in iso-2022-jp case.
  It looks "wrap at character boundary" instead of simple "wrap at 990 bytes".
  Wrap at mid of 3bytes data(utf-8, escape seq of iso-2022-jp) was not observed.
  So, corruption of text data in text/plain was not obseved.
(1-3) charset=iso-8859-1(default)
  Subject: = ascii only, body = Japanese chars only.
  Even though Japanese char is pasted and used, sent in iso-8859-1.
  Content-Type: text/plain; charset=ISO-8859-1; format=flowed
  Content-Transfer-Encoding: quoted-printable
  All text was ascii "?".
  Automatic UTF-8 use is killed?
  Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)

By the way, mailnews.display.show_all_body_parts_menu=true & View/Message Body As/All Body Parts is useful in test of this bug. You can save decoded image of "message body of text mail" or "each sub part under multipart/alternative", because message body or sub part is shown as if attachment by "All Body Parts".
Comment 89 Hiroyuki Ikezoe (:hiro) 2012-08-08 16:36:10 PDT
(In reply to WADA from comment #88)
> (1-3) charset=iso-8859-1(default)
>   Subject: = ascii only, body = Japanese chars only.
>   Even though Japanese char is pasted and used, sent in iso-8859-1.
>   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>   Content-Transfer-Encoding: quoted-printable
>   All text was ascii "?".
>   Automatic UTF-8 use is killed?
>   Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)

Because mailnews.send_default_charset is ISO-8859-1. The value is usually set in localized Thunderbird. That built binary is not localized.
Comment 90 WADA 2012-08-08 19:48:54 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #89)
> (In reply to WADA from comment #88)
> > (1-3) charset=iso-8859-1(default)
> >   Subject: = ascii only, body = Japanese chars only.
> >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >   Content-Transfer-Encoding: quoted-printable
> >   All text was ascii "?".
> >   Automatic UTF-8 use is killed?
> >   Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)
> Because mailnews.send_default_charset is ISO-8859-1. The value is usually
> set in localized Thunderbird. That built binary is not localized.

text/hml part was also sent in ISO-8859-1, but as it's HTML, character entity was used, and no problem occurred in text/html part(&#12354; == あ).
> <html><head><meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"></head><body text="#000000" bgcolor="#FFFFFF">&#12354;&#12354;...
If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and text/html & text/plain with charset=utf-8, even when mailnews.send_default_charset is ISO-8859-1 (silently==without asking for utf-8 use).

Whose problem?
- Conversion of character entity in HTML to Text upon composition.
- Automatic change to utf-8 when non ascii character is used in mail.
Comment 91 Hiroyuki Ikezoe (:hiro) 2012-08-08 20:34:19 PDT
(In reply to WADA from comment #90)
> If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and
> text/html & text/plain with charset=utf-8, even when
> mailnews.send_default_charset is ISO-8859-1 (silently==without asking for
> utf-8 use).
> 
> Whose problem?
> - Conversion of character entity in HTML to Text upon composition.
> - Automatic change to utf-8 when non ascii character is used in mail.

Though I do not know exactly, it's not related to attachment 648251 [details] [diff] [review].
Comment 92 WADA 2012-08-08 20:43:31 PDT
Problem in text/html part.

When following was entered at HTML mail composition window,
  Note: [Enter] in this context == press Enter key to force line break 
        ... == consecutive same character
> ああ...ああ[Enter][Enter]
> いい...いい[Enter][Enter]
> うう...うう[Enter][Enter]
> -- (and signature text from predefined signature file follows)
generated HTML was following(no New Line until signature indicator of "-- "),
  Note: [LF] = 0x0A, [EOF] = End of file
> <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> xxxx xxxx xxxx</pre></body></html>[EOF]
and, because of long HTML source line, it was sent in base64.
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: base64

Is "no line break or a few [LF] in HTML source" intentional or design?
Because HTML, I think "[LF]([CRLF] is needed?) before HTML tag start" or "[LF]([CRLF] is needed?) after HTML tag end" is better inserted in text/html part, except "after <pre>" and "before </pre>".
Is it possible and easy?
Comment 93 Hiroyuki Ikezoe (:hiro) 2012-08-08 21:13:20 PDT
(In reply to WADA from comment #92)
> Problem in text/html part.
> 
> When following was entered at HTML mail composition window,
>   Note: [Enter] in this context == press Enter key to force line break 
>         ... == consecutive same character
> > ああ...ああ[Enter][Enter]
> > いい...いい[Enter][Enter]
> > うう...うう[Enter][Enter]
> > -- (and signature text from predefined signature file follows)
> generated HTML was following(no New Line until signature indicator of "-- "),
>   Note: [LF] = 0x0A, [EOF] = End of file
> > <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> > xxxx xxxx xxxx</pre></body></html>[EOF]
> and, because of long HTML source line, it was sent in base64.
> > Content-Type: text/html; charset=UTF-8
> > Content-Transfer-Encoding: base64
> 
> Is "no line break or a few [LF] in HTML source" intentional or design?

'no line break' is intentional, but LF in signature is not intentional.

> Because HTML, I think "[LF]([CRLF] is needed?) before HTML tag start" or
> "[LF]([CRLF] is needed?) after HTML tag end" is better inserted in text/html
> part, except "after <pre>" and "before </pre>".
> Is it possible and easy?

It's possible but it's not so easy.
If we take the approach that HTML message is base64-encoded, the HTML message will have no line-feed. 
I suppose you have to wait for bug 26734 if you need line-feed in HTML message.
Comment 94 WADA 2012-08-08 23:09:26 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #91)
> (In reply to WADA from comment #90)
> > If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and
> > text/html & text/plain with charset=utf-8, even when
> > mailnews.send_default_charset is ISO-8859-1 (silently==without asking for
> > utf-8 use).
> > Whose problem?
> > - Conversion of character entity in HTML to Text upon composition.
> > - Automatic change to utf-8 when non ascii character is used in mail.
> Though I do not know exactly, it's not related to attachment 648251 [details] [diff] [review]

(1) This problem was observed in Tb 14.0 & Tb trunk 2012/7/18 build wirh Options/Format/HTML and Text and mailnews.wraplength=0.
=> Existent regression.
(2) Problem of "?" in text/plain was not observed with Options/Format/Auto-Detect and recipient's preference=Plain Text(Auto-Detect downgrades to text/plain).
=> text/plain(mail) by Auto-Detect was different from text/plain(subpart of multipart/alternative) by Options/Format/HTML and Text. It's perhaps similar to text mode composition.

Sorry for my confusion.
Comment 95 Hiroyuki Ikezoe (:hiro) 2012-08-09 00:16:33 PDT
(In reply to WADA from comment #88)

> (1-3) charset=iso-8859-1(default)
>   Subject: = ascii only, body = Japanese chars only.
>   Even though Japanese char is pasted and used, sent in iso-8859-1.
>   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>   Content-Transfer-Encoding: quoted-printable
>   All text was ascii "?".

This issue can not be reproduced on my local linux.

Please attach the problematic message body text here.
Comment 96 Hiroyuki Ikezoe (:hiro) 2012-08-09 00:24:13 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #93)
> (In reply to WADA from comment #92)
> > Problem in text/html part.
> > 
> > When following was entered at HTML mail composition window,
> >   Note: [Enter] in this context == press Enter key to force line break 
> >         ... == consecutive same character
> > > ああ...ああ[Enter][Enter]
> > > いい...いい[Enter][Enter]
> > > うう...うう[Enter][Enter]
> > > -- (and signature text from predefined signature file follows)
> > generated HTML was following(no New Line until signature indicator of "-- "),
> >   Note: [LF] = 0x0A, [EOF] = End of file
> > > <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> > > xxxx xxxx xxxx</pre></body></html>[EOF]
> > and, because of long HTML source line, it was sent in base64.
> > > Content-Type: text/html; charset=UTF-8
> > > Content-Transfer-Encoding: base64
> > 
> > Is "no line break or a few [LF] in HTML source" intentional or design?
> 
> 'no line break' is intentional, but LF in signature is not intentional.

I was wrong. The LF is intentional (but not mine) because the signature is enclosed by 'pre' so LF is needed after '--'.

Anyway, I'd consider about the LF in signature after this bug is closed.
Comment 97 WADA 2012-08-09 08:13:08 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #95)
> (In reply to WADA from comment #88)
> > (1-3) charset=iso-8859-1(default)
> >   Subject: = ascii only, body = Japanese chars only.
> >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >   Content-Transfer-Encoding: quoted-printable
> >   All text was ascii "?".
> This issue can not be reproduced on my local linux.
> Please attach the problematic message body text here.

Conditions are;
  mail.wrap_long_lines=true
  mailnews.wraplength=0
  Composing charset=iso-8859-1(default, mailnews.send_default_charset=ISO-8859-1)
  Options/Format = HTML and Text
  Subject: ascii-subject
  message body text = 2000 * あ (Shift_JIS=0x82A0, utf-8=0xE3 0x81 0x82, U+3042)
  Japanese MS Windows(system charset=Shift_JIS). This may be relevant.
Comment 98 Hiroyuki Ikezoe (:hiro) 2012-08-10 03:17:03 PDT
(In reply to WADA from comment #97)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #95)
> > (In reply to WADA from comment #88)
> > > (1-3) charset=iso-8859-1(default)
> > >   Subject: = ascii only, body = Japanese chars only.
> > >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> > >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >   Content-Transfer-Encoding: quoted-printable
> > >   All text was ascii "?".
> > This issue can not be reproduced on my local linux.
> > Please attach the problematic message body text here.
> 
> Conditions are;
>   mail.wrap_long_lines=true
>   mailnews.wraplength=0
>   Composing charset=iso-8859-1(default,
> mailnews.send_default_charset=ISO-8859-1)
>   Options/Format = HTML and Text
>   Subject: ascii-subject
>   message body text = 2000 * あ (Shift_JIS=0x82A0, utf-8=0xE3 0x81 0x82,
> U+3042)
>   Japanese MS Windows(system charset=Shift_JIS). This may be relevant.

Thanks, I can see the issue on my local, but the issue can be also seen without attachment 648251 [details] [diff] [review].
Comment 99 WADA 2012-08-10 16:36:34 PDT
Additional quick check results.
(A) Behavior on text/plain part of multipart/alternative(HTML mode composition, Options/Format=HTML and Text) depended on Japanese character type.
  wraplength=160, text/plain part, utf-8
  4000 * あ : Sent in Content-Transfer-Encodin: 8bits(non base64, i.e. wrapped)
  4000 * 1 : Sent in Content-Transfer-Encodin: base64.
This is probably due to different category of character in unicode.
  あ : Hiragana
  1 : Full-width roman characters and half-width katakana
If Full-width roman characters, treatment looks similar to english characters. It's perhaps because wrap at mid of 1234567890 is better avoided.
(B) If text mode composition, Bug 355209 was still observed.
    - wrap at mid of 3 bytes utf-8 code
    - wrap without care for escape sequence of iso-2022-jp
This kind of corrupton is not observed in text/plain part by HTML mode compoition. It may be difference between;
  - wrap in text mode composition is wrap at wraplength bytes
  - wrap in text/plain part is wrap at wraplength unicode characters
Comment 100 xunxun 2012-08-25 07:24:53 PDT
Has the attachment 648251 [details] [diff] [review] completed the prerequisite test?

Should we also need the wraplength option?
Comment 101 Hiroyuki Ikezoe (:hiro) 2012-08-25 16:36:43 PDT
FYI, the wraplength option is meaningless in html composition mode with attachment 648251 [details] [diff] [review] because the attachment always encodes the html message with base64.
Comment 102 Hiroyuki Ikezoe (:hiro) 2012-08-26 20:27:21 PDT
(In reply to WADA from comment #99)
> Additional quick check results.
> (A) Behavior on text/plain part of multipart/alternative(HTML mode
> composition, Options/Format=HTML and Text) depended on Japanese character
> type.
>   wraplength=160, text/plain part, utf-8
>   4000 * あ : Sent in Content-Transfer-Encodin: 8bits(non base64, i.e.
> wrapped)
>   4000 * 1 : Sent in Content-Transfer-Encodin: base64.

Thanks. I've finally reproduced this issue on my local machine, but I can also see this issue without attachment 648251 [details] [diff] [review].

> (B) If text mode composition, Bug 355209 was still observed.
>     - wrap at mid of 3 bytes utf-8 code
>     - wrap without care for escape sequence of iso-2022-jp
> This kind of corrupton is not observed in text/plain part by HTML mode
> compoition. It may be difference between;
>   - wrap in text mode composition is wrap at wraplength bytes
>   - wrap in text/plain part is wrap at wraplength unicode characters

The wrap length issue on text mode is for bug 26734.
Comment 103 Hiroyuki Ikezoe (:hiro) 2012-08-26 20:29:41 PDT
(In reply to xunxun from comment #100)
> Has the attachment 648251 [details] [diff] [review] completed the
> prerequisite test?

Yes. Now I suppose attachment 648251 [details] [diff] [review] has no regression.
Comment 104 Jim Porter (:squib) 2012-09-18 10:39:36 PDT
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Stealing review; hopefully I'll get to it by this weekend.
Comment 105 Jim Porter (:squib) 2012-09-18 10:47:41 PDT
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Review of attachment 648251 [details] [diff] [review]:
-----------------------------------------------------------------

Just a quick pass before I review this properly...

::: mailnews/compose/src/nsMsgSend.cpp
@@ -1751,5 @@
> -  //
> -  // XXX TODO
> -  // march backwards and determine the "best" place for the linebreak
> -  // for example, we don't want <a hrLINEBREAKref=""> or <bLINEBREAKr>
> -  // or "MississLINEBREAKippi"

Why did you remove these comments? Do they no longer apply?

@@ -1791,4 @@
>    }
> -  else {
> -     // body did not require any additional linebreaks, so just use it
> -     // body will not have any null bytes, so we can use PL_strdup

As above, I think we should keep this comment (with appropriate modifications).
Comment 106 Jim Porter (:squib) 2012-10-01 19:08:53 PDT
With the above changes fixed, and a try server run with the attached tests passing, I think this looks ok.
Comment 107 persona.org 2012-10-18 19:11:44 PDT
This is a serious problem for CJK users. 
Please kindly patch this to Thunderbird as soon as possible.

Thanks!
Comment 108 Jim Porter (:squib) 2012-10-18 19:17:45 PDT
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Clearing out review on this until I see a passing try server run (mostly so Bugzilla stops mailing me).
Comment 109 Kent James (:rkent) 2012-11-01 08:59:22 PDT
I'm confused about the status of this, and what is needed to move it forward. Are we waiting for someone to push a try server run of https://bugzilla.mozilla.org/attachment.cgi?id=648251?

The existing patch has bit-rotted.
Comment 110 Yukinoroh 2012-11-19 08:18:55 PST
*** Bug 704441 has been marked as a duplicate of this bug. ***
Comment 111 Yukinoroh 2012-11-19 08:29:33 PST
Firefox has the same problem, see bug 535485.
Comment 112 Yukinoroh 2012-11-19 08:45:43 PST
I might have a different thinking here, but I've been using Thunderbird with mailnews.wraplength set to 0 for a year now, with no problems. What about simply setting the default value of mailnews.wraplength to 0 for CJK locales? (and any other language that don't use spaces)

I sometimes send email in French or English with no problems either under that setting. What is the use of wrapping text anyway?
Comment 113 huangjs 2012-11-22 21:20:12 PST
I just changed this attribute but the wrapping spaces are still being generated. 
BTW, why is it so hard to pass regression and apply this patch.
Its an notorious bug known for cjk users along the versions of thunder bird.
Please kindly apply it.
I still don't see it in the 20.0 nightly build.

Thanks! 

(In reply to Yukinoroh from comment #112)
> I might have a different thinking here, but I've been using Thunderbird with
> mailnews.wraplength set to 0 for a year now, with no problems. What about
> simply setting the default value of mailnews.wraplength to 0 for CJK
> locales? (and any other language that don't use spaces)
> 
> I sometimes send email in French or English with no problems either under
> that setting. What is the use of wrapping text anyway?
Comment 114 Yukinoroh 2012-11-22 21:48:14 PST
(In reply to huangjs from comment #113)
> I just changed this attribute but the wrapping spaces are still being
> generated. 

Really? What version are you using? Did you try to restart the program? Using 16.0.0 here.
Comment 115 Yukinoroh 2012-11-22 21:49:21 PST
(In reply to Yukinoroh from comment #114)

Oops, I meant 16.0.1
Comment 116 mr.kang.chen 2012-11-22 22:49:01 PST
When will this be patched? The bug is so annoying for every CJK user.
Comment 117 Mike Conley (:mconley) - (Needinfo me!) 2012-12-21 06:49:39 PST
Hiro - why haven't you asked for review on these patches?
Comment 118 WADA 2013-04-12 22:43:07 PDT
*** Bug 785706 has been marked as a duplicate of this bug. ***
Comment 119 kewang 2013-04-23 01:00:20 PDT
I have the same issue, really annoying......
Comment 120 asmwarrior 2013-06-22 06:06:40 PDT
Ping, this bug still exists in TB 17.06, any developers can review the patches, and apply to trunk? Thanks.
Comment 121 Jim Porter (:squib) 2013-06-22 09:53:20 PDT
(In reply to asmwarrior from comment #120)
> Ping, this bug still exists in TB 17.06, any developers can review the
> patches, and apply to trunk? Thanks.

I've reviewed the patches and asked for some changes (and for a try server run so I can see the tests - though I could do this if need be). However, the author of the patch hasn't replied.
Comment 122 asmwarrior 2013-06-22 19:17:10 PDT
Hi, Jim Porter, thanks for the reply, It looks like the author of the patch ( Hiroyuki Ikezoe, who is active nearly one year ago in 2012-08-26), I believe he is left his original company or his email has changed or in other cases he does not receive bug notification. So, Can we just wait for his response? I would suggest that anyone who has the ability (surely you are one of them) can go ahead. I can test the nightly build if such patch (or modified/improved patch) are in trunk. Thanks.
Comment 123 WADA 2013-06-24 00:40:05 PDT
FYI.

Even if this bug occurs in HTML mail composition, if problem of "HTML mail is sent in text/plain" due to following bugs doesn't occur,
  bug 136502, bug 414299, bug 584363,
text/html part is sent by Tb.
If mail is sent in text/html or multipart/alternative{text/plain+text/html}, text/html part is usually used in mail viewing. And, in Tb, quirks like "new line between CJK chars in HTML mail == null" perhaps works in HTML mail display.

i.e. 
This bug is exposed to many users only when problem of "HTML mail is sent in text/plain" occurs at same time.
How to avoid bug 136502, bug 414299, bug 584363.
 - <b></b> in HTML signature file.
 - In address book, set format preference=HTML for any contact,
   except contact to whom TEXT mail should be sent always.
 - Upon each HTML mail send, select format option=HTML or "HTML + Text"
Comment 124 Hiroyuki Ikezoe (:hiro) 2013-06-24 01:43:24 PDT
I am sorry for the absence.

(In reply to Jim Porter (:squib) (back Jul 1) from comment #105)
> Comment on attachment 648251 [details] [diff] [review]
> Possible fix
> 
> Review of attachment 648251 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Just a quick pass before I review this properly...
> 
> ::: mailnews/compose/src/nsMsgSend.cpp
> @@ -1751,5 @@
> > -  //
> > -  // XXX TODO
> > -  // march backwards and determine the "best" place for the linebreak
> > -  // for example, we don't want <a hrLINEBREAKref=""> or <bLINEBREAKr>
> > -  // or "MississLINEBREAKippi"
> 
> Why did you remove these comments? Do they no longer apply?

Those HTML cases will never happen because EnsureLineBreaks will never be invoked for HTML message.
'Mississippi' case supposed to be already handled in nsIEditor, but I am not 100% sure. So I will left the 'Mississippi' case.


> @@ -1791,4 @@
> >    }
> > -  else {
> > -     // body did not require any additional linebreaks, so just use it
> > -     // body will not have any null bytes, so we can use PL_strdup
> 
> As above, I think we should keep this comment (with appropriate
> modifications).

I don't think the comment is useful in the patch. In the patch PL_strdup does use both cases.
Comment 125 Hiroyuki Ikezoe (:hiro) 2013-06-24 01:46:30 PDT
Created attachment 766580 [details] [diff] [review]
Encode HTML message with base64 to  avoid extra spaces in CJK text
Comment 126 Hiroyuki Ikezoe (:hiro) 2013-06-24 01:49:20 PDT
Created attachment 766581 [details] [diff] [review]
Adapt to createAndSendMessage change

This is basically same as the previous test except the argument of createAndSendMessage.
Comment 127 asmwarrior 2013-06-24 03:33:22 PDT
Hi, thanks for the patches, so any testing build is necessary? Or shall we waiting for a reviewer?
Comment 128 WADA 2013-06-30 17:15:51 PDT
Comment on attachment 766580 [details] [diff] [review]
Encode HTML message with base64 to  avoid extra spaces in CJK text

Review of attachment 766580 [details] [diff] [review]:
-----------------------------------------------------------------

> +nsMsgAttachmentHandler::NeedsConvertionToPlainText()
I think NeedsConversionToPlainText() or Needs{To)ConvertToPlainText() is better.
Comment 129 WADA 2013-06-30 18:08:02 PDT
FYI.
Bug 355209 occurs even by HTML composition, even in text/html part, if <pre> is used and "lo---ng text without line break" is typed or pasted.
 .
Because of <pre>, "Wrap at 80 unicode characters by HTML editor" doesn't occur. So, split by SMTP line length limit(==split by LINE_BREAK_MAX) occurs even in text/html part of multipart/alternative or text/html mail.
If <pre> is used, it's same as "mailnews.wraplength=0 in Text mode composition".
Comment 130 WADA 2013-06-30 18:11:49 PDT
FYI.
LINE_BREAK_MAX was changed from "#define LINE_BREAK_MAX 990" to following by bug 684508(landed on Tb 10).
  #define LINE_BREAK_MAX (1000 - MSG_LINEBREAK_LEN)
Comment 131 WADA 2013-07-12 00:38:21 PDT
putting referred bugs in "See also:" filed.
Comment 132 xunxun 2013-09-20 18:10:56 PDT
(In reply to asmwarrior from comment #127)
> Hi, thanks for the patches, so any testing build is necessary? Or shall we
> waiting for a reviewer?

I use the patch to build thunderbird 24.0

https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/
Comment 133 xunxun 2013-09-20 21:41:22 PDT
(In reply to xunxun from comment #132)
> (In reply to asmwarrior from comment #127)
> > Hi, thanks for the patches, so any testing build is necessary? Or shall we
> > waiting for a reviewer?
> 
> I use the patch to build thunderbird 24.0
> 
> https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/
> x86/

At present, I can't access Sourceforge, so I upload bak to http://pan.baidu.com/share/link?shareid=2253681088&uk=2365780601#dir/path=%2F%E6%88%91%E7%9A%84%E8%BD%AF%E4%BB%B6%2FpcxFirefox%E5%A4%87%E4%BB%BD
Comment 134 asmwarrior 2013-09-20 22:56:46 PDT
FYI: when you open the above link, xunxun's release was under subfolder Release/thunderbird/24.x, this release fix a tiny bug of the previous one in https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/. Once sourceforge can be accessed from China mainland, the binary under SF will be updated.
Comment 135 Jim Porter (:squib) 2013-11-13 13:01:45 PST
Comment on attachment 766580 [details] [diff] [review]
Encode HTML message with base64 to  avoid extra spaces in CJK text

Review of attachment 766580 [details] [diff] [review]:
-----------------------------------------------------------------

I've taken a look at this, and I'm not sure this is actually how we want to do things. Surely there's a way to have really long lines with no spaces without resorting to base64-encoding everything. I'm not sure what that way is though, since I'm not very knowledgeable about MIME message bodies (most of my knowledge is in MIME headers).

I'm clearing out review for now, but I definitely agree that we need to do *something* here. Jcranmer might be a good person to look at this, since he knows a lot more about MIME than I do.

Sorry about taking so long on this! I've been awfully busy, and didn't really know what to do with this review. :(
Comment 136 Wayne Mery (:wsmwk, NI for questions) 2013-11-13 15:04:31 PST
joshua, comment 135
Comment 137 Joshua Cranmer [:jcranmer] 2013-11-14 07:49:30 PST
(In reply to Jim Porter (:squib) from comment #135)
> I've taken a look at this, and I'm not sure this is actually how we want to
> do things. Surely there's a way to have really long lines with no spaces
> without resorting to base64-encoding everything. I'm not sure what that way
> is though, since I'm not very knowledgeable about MIME message bodies (most
> of my knowledge is in MIME headers).

Hahahahahahahahaha. The options for message bodies are 8bit (no NUL, bare CR/LF, max-998 octet lines), 7bit (above + no characters above 0x7F), QP, base64, or binary. And the mail system doesn't support binary.

The options are:
1. Violate the standards, send 8bit with arbitrarily long lines, and hope the mail system tolerates it.
2. Violate the standards, send quoted-printable, but don't escape characters above 0x7F, and hope the mail system tolerates it.
3. Send QP/base64, according to the shorter encoding format.

[For a standard developed in part to allow internationalization of email, MIME sure blew it]
Comment 138 Wayne Mery (:wsmwk, NI for questions) 2014-08-14 22:48:05 PDT
(I suspect this is not a duplicate, since no one has suggested what the duplicate might be)

So who (or what group) to decide what our best shot is, relative to comment 137?

And will we depend on bug 169395?
Comment 139 Joshua Cranmer [:jcranmer] 2014-08-23 16:11:37 PDT
(In reply to Wayne Mery (:wsmwk) from comment #138)
> (I suspect this is not a duplicate, since no one has suggested what the
> duplicate might be)
> 
> So who (or what group) to decide what our best shot is, relative to comment
> 137?
> 
> And will we depend on bug 169395?

I suppose, in lieu of anyone else, that the decision would fall on me. My recollection of reading too many raw MIME messages is that the de facto answer is to send in base64.

Honestly, though, the current compose logic for this sort of stuff is so messed up and fragile that it's not worth even attempting this bug until I get the low-level MIME assembly sanified, since that will at least guarantee we can muck around with transfer encodings sanely.
Comment 140 Frederick888 2014-11-17 03:11:03 PST
I can't believe that such a serious bug has not been solved for years!

I accidentally checked am email I sent today and found the problem. I can't imagine that how many mails with a RIDICULOUS format like this I have sent!

I do think that I need a new mail client now.
Comment 141 homoludens1000@gmail.com 2014-11-17 03:24:51 PST
Yes, it is rather annoying. I love Thunderbird and am using it on a daily basis, including loads of emails in Japanese in a professional environment. Fortunately I don't have to look at the emails AFTER I send them ...

I'm sure there are some Japanese users of Thunderbird; can I ask: How do you cope with this issue? Does this problem not occur when you work under a Japanese language environment?
Comment 142 Frederick888 2014-11-17 03:44:11 PST
(In reply to homoludens1000@gmail.com from comment #141)
> Yes, it is rather annoying. I love Thunderbird and am using it on a daily
> basis, including loads of emails in Japanese in a professional environment.
> Fortunately I don't have to look at the emails AFTER I send them ...
> 
> I'm sure there are some Japanese users of Thunderbird; can I ask: How do you
> cope with this issue? Does this problem not occur when you work under a
> Japanese language environment?

I couldn't find a way to solve the problem. I've tried to edit the wrapping settings in about:config but it didn't help. So it seems that the only way is to change your email client.

The problem does occur under a full Japanese environment - Japanese Windows, Japanese system encoding, Japanese Thunderbird. But I'm sending emails with UTF-8 because I need Japanese, Chinese and English support.
Comment 143 homoludens1000@gmail.com 2014-11-17 17:31:44 PST
I'm in a bit of a hurry so don't have time to check the thread to see whether this has been tried already, but a Japanese website I found had the following "solution" (worked apparently, although it's more of a workaround than a solution I think; see here http://soudan1.biglobe.ne.jp/qa7177921.html):

1. Open about:config
2. Create "editor.htmlWrapColumn" and set value to "0" (zero)
3. Create a new email, and set text format to "preformat" and "fixed width".

Let me know if this works, I'll give it a try later tonight.
Comment 144 Hiroyuki Ikezoe (:hiro) 2014-11-17 20:20:23 PST
htmlWrapColumn is not in Thunderbird anymore.

As per comment #139, I will work on this after all of Joshua's mime works have finished.
Comment 145 homoludens1000@gmail.com 2014-11-17 20:39:47 PST
I don't have enough in-depth knowledge of the technological fundamentals of this issue, but would it be a problem to create htmlWrapColumn? I just did and followed the instructions of comment #143, and it seems to be working fine (?)

I assume there is a fair number of Japanese, and possibly Chinese and Korean users of Thunderbird, how is everyone dealing with this issue?
Comment 146 WADA 2014-11-17 21:56:48 PST
Reason why this bug occurs is;
(A)  HTML Editor of Tb generates HTML source like next, 
       when pretty long CJK text is typed/pasted without space/new-line at appropriate position as "text in E-mail".
       Note: "Wrap at 72 unicode char in HTML mode composition" is currently hard coded.
                   (As Ikezoe-san sys, htmlWrapColumn is already removed)
                  "Wrap at 72 unicode char in HTML mode composition" is done on HTML SOURCE.
                  mailnews.wraplength=nnn is "wrap at nnn BYTES", and is used in Text mode composition only.
                  "Deliver format=Plain Text in HTML mode composition" !== Text mode composition.
                  "Deliver format=Plain Text in HTML mode composition" == Send "text converted from generated HTML" as text/plain.
> <some 0x20 for indention of HTML source><72 unicode chars #1 in lo---ng text without space>[CRLF] <= inserted by Tb
> <some 0x20 for indention of HTML source><72 unicode chars #2 in lo---ng text without space>[CRLF] <= inserted by Tb
>                                                                                                       |
> <some 0x20 for indention of HTML source><72 unicode chars #N in lo---ng text without space>[CRLF] <= inserted by Tb
>                                                                                                       |
> <some 0x20 for indention of HTML source><last less than 72 unicode chars of the lo---ng text><br>[CRLF] <= inserted by Tb
(B)  In HTML specification, there is no concrete definition about "New line" in text.
       HTML spec's request is;
            Because SBCS world uses space as word delimiter but DBCS world doesn't use such space,
            interpret "New line in text" adequately, please.
       => "Wrap at a length in HTML composition mode in Tb" == "Wrap at 72 Unicode chars in HTML SOURCE"
             == "[CRLF] + some 0x20 + 72 Unicode chars" in HTML SOURCE
             is interpreted as "a 0x20" in almost all situations.

A reason why many complaints are posted.
   Because Tb has great "Automatic Downgrade to Text" feature,
   when simple HTML(for which Text mode composition is usually sufficient) is created by user,
   Tb sends the "mail composed in HTML mode" as text/plain mail with content of "Text converted from HTML source".
Because of  "Text converted from above HTML source", "space by inserted CRLF" is pretty beautifully layout and shown :-)
If sent as text/html or multipart/alternative{text/plain+text/html}, and if mail is viewed as HTML(in Tb, View/Message Body As/Original HTML), "space by inserted CRLF" is not so beautifully layout, and because proportional font is usually used in HTML mail display, width of "space by inserted CRLF" is smaller than "sent as text/plain by Tb" case.

A workaround of "ugly inserted space by inserted New Line" is "Accept Wrap at 72 in HTML mail display too".
(1) If HTML mode composition is not mandatory for you,
      and if you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail".,
      Use "Text mode composition" with mailnews.wraplength=72(default).
      Because Wrapped at 72 bytes in plain text, "ugly space" won't be inserted.
(2) If HTML composition is needed for you.
  - Skip Tb's "Automatic Downgrade to Text"
     (a) bgColor != "#FFFFFF", eg."#FFFFFE", color != "#000000", eg."#000001", <B>&nbsp;</B> in HTML signature, etc.
     (b) Or, Install "Always HTML" addon.
  -  Send "mail composed in HTML mode" as "HTML mail" always.
      - No "prefers Plain Text" contact in any your Address Book, No Plain Text Domain setting.
      - Send Options/Text Format : Other than "send in Plain Text"
             i.e. Ask me, or Send in HTML, or send Plain Text and HTML.
  - In HTML mode composition, avoid HTML source like above,
     when you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail".
     1. Open Text mode composition window of Tb(Shift+Write),
     2. Paste or type pretty long CJK text without space/new-line at appropriate position as "text in E-mail".
     3. Edit/Rewrap(Ctrl+R), Ctrl+A, Ctrl+C
     4. Ctrl+V(paste ) at HTML mode composition window.
         Generated HTML by this action is as follows:
> <some 0x20 for indention of HTML source><72 BYTES in lo---ng text without space><BR>[CRLF] <= inserted by Tb
         Wrap at 72 BYTES is done by Text mode composition, and it's represented as <BR> in HTML source.

As far as you don't want same display as "mailnews.wraplength=0 in Text mode composition" or "mailnews.wraplength=999999 in Text mode composition" in HTML mail composition too when you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail",
I believe that displayed result by "wrap at 72 bytes in HTML composition of above workaround" is acceptable.
I believe it's far better than "ugly inserted space by inserted [CRLF]".
Comment 147 homoludens1000@gmail.com 2014-11-17 22:14:30 PST
Wada, thanks so much for the summary and the workaround, that is EXTREMELY helpful and useful! :-)
Comment 148 WADA 2014-11-17 23:24:53 PST
I believe fault is in HTML Spec.
HTML Spec should have had a way to force "Newline in text==Null" for CJK world and for free HTML source layouting.
  <html format=Flowed,DelCRLF=Yes>, <div ForceCRLFisNull=true>
   <WWBR> : <WBR> who eats up following NewLine and White Spaces
               AAA<WBR>...<WWBR>some-spaces[CRLF]
               some-spaces QQQ<WBR>...
   In CSS, NewLine : CRLFOnly, LFOnly, CROnly, Any, CRLF_and_LF, CRLF_and_CR, LF_and_CR
                                 IsSpace, IsNull, IsNewLine(space or null is determined after, based on context)
                                 DelSP=Yes/No : Consecutive spaces after newline is eaten up
               <p style="NewLine: CRLFOnly, IsNull, DelSP=yes;">[CRLF]
                     any number of 0x20 for indention + line1[CRLF]
                     any number of 0x20 for indention + lineN[CRLF]
               </p>[CRLF]
   HTML Editor of Tb can freely layout HTML source, with keeping lo---ng text in a <p> without space, newline,
   without breaking line length limit in mail.

format=flowed,DelSp=Yes for text/plain is also possible for exchange of HTML data by E-mail:
    When text/html; format=flowed,DelSP=yes,
       Insert [CRLF] + 1 to N spaces at any place in HTML source.
    Upon interpreting data in text/html; format=flowed,DelSP=yes,
       If sequence of "[CRLF] and some spaces", remove it.
It can be achieved by new Content-Transfer-Encoding : HTML_with_DelSP_Yes, in addition to quoted-printable.
Comment 149 Eli4ph 2014-11-18 08:06:30 PST
(In reply to homoludens1000@gmail.com from comment #141)
> Yes, it is rather annoying. I love Thunderbird and am using it on a daily
> basis, including loads of emails in Japanese in a professional environment.
> Fortunately I don't have to look at the emails AFTER I send them ...
> 
> I'm sure there are some Japanese users of Thunderbird; can I ask: How do you
> cope with this issue? Does this problem not occur when you work under a
> Japanese language environment?

There is a 3rd custom build of Thunderbird by xunxun1982, which fixes the bug and add some features such as being portable. "xunxun1982" is the author of a popular 3rd custom build of Firefox called pcxFirefox. The build of Thunderbird can be got from http://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/24.4.0/. Regarding the security and privacy issues, FYI, I have work with this build for a year or more and nothing wrong happened.

By the way, the build has four versions, zh-TW, zh-CN, ja and en-US.

Hope it helps. :)
Comment 150 KSak 2015-08-18 12:35:19 PDT
Hi all.

I just started using the latest version of TB (38.2) and noticed that this still hasn't been fixed. That is the HTML code (of the source) gets cut at 72 characters which causes a "blank space" to get inserted in the actual message of my emails.

Take note I use TB in writing emails in Japanese or English, and so far I've noticed this issue when writing in Japanese only.

I realized it's been nearly a 10 months since the last comments on this bug, but after googling around I still can't seem to find a permanent workaround or good solution. I mainly use HTML composed emails and would like to write my messages without having all these random "blank spaces (aka "hankaku" spaces in Japanese) all over the place as it looks bad, especially when writing business emails.

The only temporary workaround I've found is to manually add "editor.htmlWrapColumn" with value=0 into the config editor and whenever I write a HTML composed email I first select "Preformat" from the paragraph list (or use the shortcut Alt+O P F) and then continue to write my email message from there. By doing this I am able to send lengthy sentences/phrases in Japanese without the random "blank spaces" showing up.

Is this bug/issue still being addressed and looked at? I presume that many people who use Japanese environments also encounter this issue so I believe a fix would help a lot of people including myself.

Any update on this would be appreciated. Thank you.
Comment 151 Frederick888 2015-08-18 18:36:25 PDT
(In reply to KSak from comment #150)
> Hi all.
> 
> I just started using the latest version of TB (38.2) and noticed that this
> still hasn't been fixed. That is the HTML code (of the source) gets cut at
> 72 characters which causes a "blank space" to get inserted in the actual
> message of my emails.
> 
> Take note I use TB in writing emails in Japanese or English, and so far I've
> noticed this issue when writing in Japanese only.
> 
> I realized it's been nearly a 10 months since the last comments on this bug,
> but after googling around I still can't seem to find a permanent workaround
> or good solution. I mainly use HTML composed emails and would like to write
> my messages without having all these random "blank spaces (aka "hankaku"
> spaces in Japanese) all over the place as it looks bad, especially when
> writing business emails.
> 
> The only temporary workaround I've found is to manually add
> "editor.htmlWrapColumn" with value=0 into the config editor and whenever I
> write a HTML composed email I first select "Preformat" from the paragraph
> list (or use the shortcut Alt+O P F) and then continue to write my email
> message from there. By doing this I am able to send lengthy
> sentences/phrases in Japanese without the random "blank spaces" showing up.
> 
> Is this bug/issue still being addressed and looked at? I presume that many
> people who use Japanese environments also encounter this issue so I believe
> a fix would help a lot of people including myself.
> 
> Any update on this would be appreciated. Thank you.

This is a problem which has been existed for years. Mozilla just cares nothing about CJK users. So put up with it, solve it yourself or just switch to another client.

Frankly, Outlook 2013 features perfect CJK compatibility, and although I haven't tried, the new Outlook 2016 should be better (I love open source softwares but I don't stick to them). The settings page of Outlook is a little difficult to use but once you get those settings done, it'll work like a charm.

However, if you're using Linux, you may try Evolution, Geary or something.
Comment 152 Frederick888 2015-08-18 18:39:32 PDT
"has existed for years"... Wrong typing...
Comment 153 herbs 2015-08-19 04:15:05 PDT
Still have not fixed the bug till the version of 38.2.0, dated 2015.8.19. Really doubt there is someone who have the ability to fix the bug.
Comment 154 Frederick888 2015-08-19 05:56:02 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #125)
> Created attachment 766580 [details] [diff] [review]
> Encode HTML message with base64 to  avoid extra spaces in CJK text

(In reply to Hiroyuki Ikezoe (:hiro) from comment #126)
> Created attachment 766581 [details] [diff] [review]
> Adapt to createAndSendMessage change
> 
> This is basically same as the previous test except the argument of
> createAndSendMessage.

I downloaded the code just now and didn't find the patches in the default branch.

And according to the last reply by Hiroyuki Ikezoe (:hiro):

(In reply to Hiroyuki Ikezoe (:hiro) from comment #144)
> htmlWrapColumn is not in Thunderbird anymore.
> 
> As per comment #139, I will work on this after all of Joshua's mime works
> have finished.

However, it's been more than half a year since the reply.
Comment 155 Wayne Mery (:wsmwk, NI for questions) 2015-08-19 06:41:47 PDT
(In reply to Frederick888 from comment #151)
> ...
> This is a problem which has been existed for years. Mozilla just cares
> nothing about CJK users. So put up with it, solve it yourself or just switch
> to another client.
(In reply to herbs from comment #153)
> Still have not fixed the bug till the version of 38.2.0, dated 2015.8.19.
> Really doubt there is someone who have the ability to fix the bug.

To bring some clarification to these concerns:

* Thunderbird is a community project, developed by volunteers on their own time. Mozilla isn't involved.
* CJK is a very important area, both in terms of users (Japan ranks second in terms of number of users, ahead of the USA and second only to Germany[1]) and in terms of development resources ...
** a major CJK issue got tremendous attention last fall while developing version 31
** this bug is getting attention, but you'e not seeing it because the work is happening in other bugs [2]
* unfortunately early this year we lost a key Japanese volunteer developer

The current volunteers are making progress, but it is over a long period of time, affected by...

* limited time (for example the person with expertise to work on comment 139 is in grad school) 
* the very sad truth that despite Japan's very high number of users there are (to the best of my knowledge) only two very active volunteer developers from Japan - whose focus are not in the area needed by this bug (one is in NSPR, backend and build issues, and the other is focused on IO issues)

So Japan is very POORLY represented, just two, and is a major reason why CJK issues generally make slow or no progress. Actually we could use more coders and volunteers of any type.  Perhaps this can be corrected by a call to action driven through an organized effort of publicity and other means, by users within the country.


[1] https://blog.mozilla.org/thunderbird/2015/02/thunderbird-usage-continues-to-grow/
[2] see the whiteboard and comment 139 for the current status
Comment 156 Frederick888 2015-08-19 07:30:21 PDT
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #155)

Thanks for your explanation. It seems that I misunderstood the relationship between Thunderbird and Mozilla to some extent.

Anyway, I'm looking forward to the day when the problem can be solved. Thanks in advance.
Comment 157 Ludovic Hirlimann [:Usul] 2015-09-25 05:45:40 PDT
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
Comment 158 Jorg K (GMT+2, PTO during summer) 2015-11-17 03:16:00 PST
OK, this bug needs fixing. There were a few patches proposed and I will try to get one of them landed.

Let's ALL read Joshua's comment #137 ...
The options are:
1. Violate the standards, send 8bit with arbitrarily long lines, and hope the mail system tolerates it.
2. Violate the standards, send quoted-printable, but don't escape characters above 0x7F, and hope the mail system tolerates it.
3. Send QP/base64, according to the shorter encoding format.

... and comment #139:
the de facto answer is to send in base64.

I did the following simple experiment:
I created a long line of 'á' characters. I sent the message as UTF-8. Result:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

Then I created a long line of 'こ' characters. I sent the message as UTF-8. Result:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

A broken message with injected spaced.

This is absolutely crazy!! Western European characters get base64 encoded, but Asian characters get shipped broken.

It is so sad that this bug has stalled two years ago in comment #135 because someone decided that base64 is not the way to go.

As far as I can see, it IS ABSOLUTELY the way to go and will go my best to make it so. This should then also fix bug 26734 and bug 553526.
Comment 159 Jorg K (GMT+2, PTO during summer) 2015-11-17 03:30:51 PST
Here is another observation:

If I copy Japanese text from bug 26734 comment #2
これは長い日本語のテキストですので、行が折れると思います。
into an e-mail and send it, I already get base64 encoding.

Only if I insert the text
ここここここここここここここここここここここここここここここここここここ
from my clipboard manager (Ditto) into an e-mail, I get broken 8bit encoding.

So it seems to be most reasonable to treat both cases the same and always send base64.
Comment 160 Jorg K (GMT+2, PTO during summer) 2015-11-17 11:50:18 PST
OK, looking at nsMsgAttachmentHandler.cpp.
There is a lot of decision making going on.

But before we make decisions, we analyse the attachment/body:

I do some printing right after
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp?from=nsMsgAttachmentHandler.cpp#281
AnalyzeSnarfedFile();

Here are the surprising results.

On a plain text message with 200 a's I get:
m_size=205 m_lines=2 m_max_column=201
Two lines, one 200 bytes long plus and newline.

On a plain text message with 200 á's I get:
m_size=405 m_lines=2 m_max_column=401
Two line, one 400 bytes long, that's 200 characters á, which is c3a1 in UTF-8, and a newline.

So far so good.

On a plain text message with 200 Korean characters 안 I get:
m_size=620 m_lines=6 m_max_column=109

Houston, we have a problem!

What has happened? Well, before we can pick the correct encoding in nsMsgAttachmentHandler::PickEncoding, the data has already been destroyed. Instead of two lines, we get six, and they are all 109 bytes long. One is for the newline, and the 108 bytes represent 36 characters, each ec9588 in UTF-8. And if we check the sent e-mail, a space was inserted after 36 characters.

Conclusion: Whatever encoding we pick in sMsgAttachmentHandler::PickEncoding, even if we force base64, the result will always have the line broken where it shouldn't have been broken.

The investigation continues.
Comment 161 Jorg K (GMT+2, PTO during summer) 2015-11-17 14:03:07 PST
This stuff is truly terrible. To get a plain text message, the HTML message is written to a temporary file nsemail.html. In there we find:

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <font face="Aharoni">안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안</font>
  </body>
</html>

Yes, for every line break, we will get an extra space later. Great!

Note the first line, it has 47 characters. So due to the first problem mentioned above, in the sent e-mail we get an extra space after 36 characters, then anther space after 11 characters.
Comment 162 Jorg K (GMT+2, PTO during summer) 2015-11-17 14:36:52 PST
I had the terrible feeling that this comes down to a problem in the unowned and unmaintained serialisers, and sadly, I was right:

Here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1516
we get the body of the message from the M-C editor, and we ask for it formatted.
Well, that means, that some wrapping is taking place! And sure enough it wraps right through long unicode strings as we've seen in the previous comment.
Comment 163 Jorg K (GMT+2, PTO during summer) 2015-11-17 15:10:51 PST
OK, to eliminate the source of spaces from comment #161, we can change
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1507
to get the HTML "raw" instead of formatted.

Of course we could also fix the M-C serialiser ;-)

Which this change, we now get the spaces consistently after 36 characters (but no longer after 47 characters), as described in comment #160.

One more to go ;-)
Comment 164 Jorg K (GMT+2, PTO during summer) 2015-11-17 15:11:47 PST
s/Which/With/
Comment 165 Jorg K (GMT+2, PTO during summer) 2015-11-17 15:22:14 PST
The downside of the raw HTML is of course that we'll have pretty ugly HTML, all in one long line, I tried it:

Here we have some HTML. Let's see how it turns out.<br><br>Let's insert a picture:<br><img src="cid:part2.08020202.08070206@jorgk.com" alt=""><br><br><ul><li>list item 1</li><li>list item 2</li></ul><br>

Having long lines may also change the encoding, see
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp?from=nsMsgAttachmentHandler.cpp#318

Do we have any opinion on shipping "raw" HTML instead of prettified?

Joshua, Kent, Magnus?
Comment 166 Masatoshi Kimura [:emk] 2015-11-17 18:30:12 PST
Using OutputRaw is already proposed around comment #51.

It will be appropreate for text/html messages and text/plain;charset=utf-8 messages. But we should use format=flowed;delsp=yes (bug 26734) for text/plain;charset=iso-2022-jp message body unless it is an attachment. (Or drop the support for non-UTF-8 message composing entirely.)
Comment 167 Jorg K (GMT+2, PTO during summer) 2015-11-17 22:14:49 PST
Thanks for the hint. It's always difficult to pick up an abandoned bug and get up to speed. I can see that attachment 646005 [details] [diff] [review] uses raw HTML output from the editor. This, or fixing the M-C serialiser, is an absolute "must do", since once a long string is incorrectly chopped up, there is no chance to remove the spaces later. There is also the other issue of inserting a space after 36 unicode characters, in my test case 108 bytes (see comment #160).

With ISO-2022-JP encoding, the temporary file mentioned in comment #160 contains this

<html>
  <head>
    <meta content="text/html; charset=ISO-2022-JP"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    $B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B
$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B

if asking for formatted HTML. This leads to extra spaces. If asking for raw HTML, we get:

<html><head><meta content="text/html; charset=ISO-2022-JP" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$ [stuff deleted] 3$3$3$3$3(B</body></html>

and avoid extra spaces. So raw HTML (or, again, fixing the M-C serialiser) is an essential component of the solution *regardless* of which encoding is used.

Looking at ISO-2022-JP, even when using raw HTML, the message gets chopped up an is sent as:
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B
$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B

This leads to extra line breaks, but NOT extra spaces.

I've also looked at bug 26734 (delsp=yes) but haven't found a case reproducible problem that would profit from delsp=yes. Maybe using ISO-2022-JP (or Shift_JIS) is such a case.

My first aim is to remove the spuriously inserted spaces, for which I found two sources, one is the formatted HTML, the other still needs to be investigated, so that Asian users can at least send e-mail correctly using UTF-8. As it stands, the product is completely useless.
Comment 168 Jorg K (GMT+2, PTO during summer) 2015-11-18 01:55:51 PST
OK, coming back to the spaces inserted after 36 Unicode characters from comment #160.

After the HTML is retrieved (badly) from the editor, it gets written to nsemail.html. From there, during a call to SnarfAttachment() here
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#546
the body is written out to yet another file nsmail.tmp here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp#624

And peeking into the file we see lines of (arrows added to show trailing space):
  ===>ここここここここここここここここここここここここここここここここここここ <===

Each line consists of 36 unicode characters, one space and a CR/LF.

Summarising the investigation so far:

There is one source of additional spaces for HTLM mail. That comes from the incorrect wrapping of CJK strings when retrieved from the M-C editor as "formatted". This can be fixed be getting raw HTML.

The second source for spaces for plain text mail is when the first file containing the bad wrapping is processed into the second file. I still have to see why and where the trailing space is added.

In light of this, my comment #160 wasn't accurate, in the 
m_max_column of 200+1=201, 400+1=401 or 3*36+1=109,
the +1 is a space, not a newline.
Comment 169 Jorg K (GMT+2, PTO during summer) 2015-11-18 02:31:19 PST
Coming back to the example of a long line of á's:
The temp file nsmail.tmp contains one long line of á's followed by a CR/LF. Not breaking of the line, no trailing spaces. I just wonder which part of the system thinks that it's a good idea to wrap CJK characters and insert spaces. That's just crazy. It's all UTF-8 data, at that stage no one should interpret that data.
Comment 170 Jorg K (GMT+2, PTO during summer) 2015-11-18 04:18:53 PST
OK, analysis done.

The convert to plain text is done here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp#1122
and here:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgUtils.cpp#2479
This calls the M-C core serialiser and this shreds right through an CJK string since once again we asked for formatting in the conversion. Here we have to ask for formatting since we need to get pretty-printed plain text output (see further details below in this comment).

While digging through the code, I found some doubtful stuff here, which might need looking at:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1747

This determines whether the message should be 'flowed'. This is later passed as a flag to M-C's ConvertToPlainText(). Just for the record, if you pass 'flowed', the M-C serialiser shreds right through the CJK and appends a space which creates the extra spaces we're seeing. Not passing 'flowed' (I did in the debugger) leads to the same shredding/chopping of the CJK string to 36 character pieces (in my test case) without added spaces which leads to multiple lines in the resulting e-mail.

Final conclusion:
=================
The aim is NOT to chop up CJK strings *at all* in the M-C serialiser just as Western European strings of many á's are not chopped.

This will fix both sources of added spaces. I will need to pursue this in an M-C Core::Serialiser bug.

There is nothing we can do in C-C code to fix this bug.

Note: I tried to call the plaintext conversion asking for 'raw' instead of 'formatted'. We get the result below which is not really good.

====
Hi this is HTML.

   this was a list
   jkjkjk
   jkjkjk

Here we have bold. And red. 
====

With formatting we get something better:

====
*bold* red

 * jkjkj
 * jkjkj

text

1. jkjkjk
2. jkjkjk
====

I believe we really need to fix the M-C core piece to get proper formatting for both HTML and plaintext e-mail.

Clearing all NIs for now.
Comment 171 Jorg K (GMT+2, PTO during summer) 2015-11-18 08:57:50 PST
The action will move to bug 1225864 and bug 1225904.
Comment 172 Jorg K (GMT+2, PTO during summer) 2015-11-19 07:18:59 PST
*** Bug 553526 has been marked as a duplicate of this bug. ***
Comment 173 Jorg K (GMT+2, PTO during summer) 2015-11-19 12:04:19 PST
Created attachment 8689687 [details]
Test ISO-2022-JP.eml

(In reply to Masatoshi Kimura [:emk] from comment #166)
> But we should use format=flowed;delsp=yes (bug 26734) for
> text/plain;charset=iso-2022-jp message body unless it is an attachment. (Or
> drop the support for non-UTF-8 message composing entirely.)

With the fixes from bug 1225864 and bug 1225904 I can perfectly well use ISO-2022-JP but encode base64, if the line with has no spaces gets too long. Look at the attached message to see it. I think we don't need "delsp=yes". That's why I closed bug 26734 as "wontfix".
Comment 174 Jorg K (GMT+2, PTO during summer) 2015-11-20 10:28:31 PST
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #170)
> While digging through the code, I found some doubtful stuff here, which
> might need looking at:
> https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/
> nsMsgCompUtils.cpp#1747

This is indeed doubtful code, once again a hack to avoid additional spaces caused be the M-C serialiser. Once bug 1225864 is fixed, we can allow "format=flowed" for all character encodings.
Comment 175 Jorg K (GMT+2, PTO during summer) 2015-11-22 08:09:36 PST
Created attachment 8690534 [details] [diff] [review]
Proposed change to always allow format=flowed

As per comment #174 we will be able to always send format=flowed, regardless of the character set used.
Comment 176 Jorg K (GMT+2, PTO during summer) 2015-11-22 20:59:50 PST
Created attachment 8690638 [details] [diff] [review]
Proposed change to always allow format=flowed and use the new serialiser flag OutputDisallowLineBreaking

This solves all "extra space" problems in both HTML and plaintext mail while at the same time allowing format=flowed for all character encodings.

This depends on landing the patches in bug 1225864 and bug 1225904. These two bugs already have patches ready for review.

I will supply a try build for users to test with.
Comment 177 Jorg K (GMT+2, PTO during summer) 2015-11-23 04:08:25 PST
Interested users can find a developer build here:
Windows:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-9dfaecc45e1cd9ded100cfcb5831a1a0cb9e2fc3/try-comm-central-win32/thunderbird-45.0a1.en-US.win32.installer.exe
Other platforms on request.

Disclaimer:
This is a developer version, it is therefore pre-Aurora (Alpha). Use at your own risk on a NEW PROFILE.

I've tested:
- Long strings of Japanese, Korean and European accented characters,
  both separated with spaces and without spaces. No extra spaces got added.
- Korean characters encoded UTF-8.
- Japanese characters encoded UTF-8 and ISO-2022-JP.
- HTML and plaintext, plaintext is flowed, also for ISO-2022-JP.
All works.
Comment 178 asmwarrior 2015-11-23 07:00:49 PST
I test this TB under Windows XP, I just enter many Chinese characters such as "中文中文中文中文...", and I see that when I send it(I use the default sending delivery format is "auto detect"), I get the result email as:

Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

中文中文中文中文中文中文中文中文中文中文中文中文中文中文中文ä¸

I don't see the extra space added. Good work!
Comment 179 asmwarrior 2015-11-23 07:08:41 PST
I also test the other three delivery format, and all works fine. I see that only the last format"plain and rich html text" will send the email as base64 encoding, such as below:

User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:45.0) Gecko/20100101
 Thunderbird/45.0a1
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="------------050200080102050308020002"

This is a multi-part message in MIME format.
--------------050200080102050308020002
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: base64

5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paHIA0KDQo=
--------------050200080102050308020002
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64

PGh0bWw+DQogIDxoZWFkPg0KDQogICAgPG1ldGEgaHR0cC1lcXVpdj0iY29udGVudC10eXBl
IiBjb250ZW50PSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPg0KICA8L2hlYWQ+DQogIDxi
b2R5IGJnY29sb3I9IiNGRkZGRkYiIHRleHQ9IiMwMDAwMDAiPg0K5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paHDQogIDwvYm9k
eT4NCjwvaHRtbD4NCg==
--------------050200080102050308020002--
Comment 180 Jorg K (GMT+2, PTO during summer) 2015-11-23 07:16:53 PST
Thanks for testing.

Base64 will be used when the input lines get too long. It's either that or extra spaces ;-)
Base64 comes from bug 1225904. Base64 was already used for plaintext in certain cases. Now we also use it for the HTML part if necessary.
Comment 181 Jorg K (GMT+2, PTO during summer) 2015-11-24 09:01:01 PST
Would you be so kind as to test the version I supplied in comment #177 with some real Japanese text and ISO-2022-JP encoding. format=flowed should work. Since "delsp=yes" is not implemented, we revert to base64 encoding instead of 7bit if a line gets longer than 900 bytes, so about 450 characters (approx. 2 bytes per character in this encoding, right?).
Comment 183 Jorg K (GMT+2, PTO during summer) 2015-11-26 04:28:38 PST
Looks like a wrong link. Do you mean this?
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ea5a8e06169f

I took the liberty to cancel it, since it won't compile.
You need the patches from bug 1225864 and bug 1225904.
You need to submit a M-C patch with the C-C push, that's quite tricky.

Why don't you simple use the binary from comment #177?
Comment 184 Masayuki Nakano [:masayuki] (Mozilla Japan) 2015-11-26 19:33:45 PST
Although, I might not understand well what you do here. If you try to append "format=flowed" to all encodings (i.e., including ISO-2022-JP), I think that ISO-2022-JP shouldn't be changed to so because some MUAs and server applications which touch received emails before MUAs access them may not assume that the contents are encoded due to RFC 1468. I think that if users want to send flowed format email written in Japanese, they should use UTF-8.

Anyway, I strongly recommend that making ISO-2022-JP "format=flowed" should be optional behavior by a pref.
Comment 185 Jorg K (GMT+2, PTO during summer) 2015-11-27 00:16:36 PST
Most of the patch is busily removing the charset argument from UseFormatFlowed(). I could of course leave it in.

Before UseFormatFlowed() did:
  return !(PL_strcasecmp(charset, "UTF-8") && nsMsgI18Nmultibyte_charset(charset));
This is hard to read, so let's look at three cases:
charset is not multi-byte: returns 'true', so flowed is used.
charset is UTF-8 and therefore multi-byte: returns 'true', so flowed is used.
charset is not UTF-8 but multi-byte, like ISO-2022-JP: returns 'false'.

We already have a preference mailnews.send_plaintext_flowed (default 'true').
We could introduce a prference mailnews.send_plaintext_flowed_for_stateful_charset (default 'false') which users would have to set to enable sending flowed for ISO-2022-JP.
Or the other way around: mailnews.disable_plaintext_flowed_for_stateful_charset
Or more specific: mailnews.disable_plaintext_flowed_for_iso-2022-jp or mailnews.send_plaintext_flowed_for_iso-2022-jp.

Note: The existing preference mailnews.disable_format_flowed_for_cjk is useless and undocumented, that's why I am removing it.

If flowed sending is turned off, we would obviously not send flowed and also call the plaintext serialiser without the "don't break lines" flag, so for plaintext the old behaviour would be maintained.

The question is whether the new or the old behaviour becomes the default. Making the old behaviour the default will lead to people never seeing the improvement. Making the new behaviour the default would lead to people potentially complaining until they discover the new preference.

We still don't know whether the new behaviour would cause any problems. The new behaviour is:
- Not break long lines.
- Transmit long lines with base64 instead of 7bit or 8bit.
  For ISO-2022-JP that means using base64 instead of 7bit.

Magnus, how do you feel about this?
Comment 186 Jorg K (GMT+2, PTO during summer) 2015-11-27 06:50:15 PST
Quoting from 1225864 comment #42:
(In reply to Masatoshi Kimura [:emk] from comment #40)
> I'm arguing about the encoding here because RFC 1468 is a big reason why we
> are using ISO-2022-JP for outgoing Japanese mail messages. If we use
> ISO-2022-JP, we should follow the entire RFC 1468 instead of picking some
> convenient parts of the RFC.

If you want to cover the full RFC 1468, that is
  - ISO-2022-JP and
  - CTE 7bit and
  - short lines,
then we need to implement the flag proposed in the previous comment. No problem. Here are a few variations for the name:

mailnews.send_plaintext_flowed_for_stateful_charset
mailnews.disable_plaintext_flowed_for_stateful_charset
mailnews.send_plaintext_flowed_for_iso-2022-jp
mailnews.disable_plaintext_flowed_for_iso-2022-jp
or
mailnews.send_plaintext_rfc1486_strict

I'm open. We can treat ISO-2022-JP in a special way and make sure it fully complies with RFC 1468, even with the "should"s: https://www.ietf.org/rfc/rfc1468.txt)
===
The ISO-2022-JP encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It *should* be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in current JUNET software.
===
The human user (not implementor) *should* try to keep lines within 80
display columns, or, preferably, within 75 (or so) columns, ...
===

Mangnus, any preference for the name?

In light of the discussion I prefer mailnews.send_plaintext_rfc1486_strict.
Comment 187 Jorg K (GMT+2, PTO during summer) 2015-11-27 10:44:12 PST
Comment on attachment 8690638 [details] [diff] [review]
Proposed change to always allow format=flowed and use the new serialiser flag OutputDisallowLineBreaking

Clearing the review request for now.
New patch with preference mailnews.send_plaintext_rcf1468_strict is coming up shortly.
Comment 188 Jorg K (GMT+2, PTO during summer) 2015-11-27 13:53:48 PST
Created attachment 8693104 [details] [diff] [review]
Proposed change (v3)

OK, here we have a more comprehensive approach:

- I refactored the code so UseFormatFlowed() is no longer used,
  instead we have a new function GetSerialiserFlags().
  This is already prepared for "delsp=yes" from bug 26734.
- New preference mailnews.send_plaintext_rfc1486_strict.
- GetSerialiserFlags() returns the flags according to the charset,
  there is special treatment for ISO-2022-JP if the preference is set.
- Flowed is enabled for all charsets, the new preference disables it for
  ISO-2022-JP.
- New serialiser flag OutputDisallowLineBreaking is always used, unless
  new preference disables it for ISO-2022-JP.

The default for mailnews.send_plaintext_rfc1486_strict is 'true', so for ISO-2022-JP the current behaviour doesn't change: flowed is not used and long lines are broken.
Comment 189 Jorg K (GMT+2, PTO during summer) 2015-11-27 14:24:49 PST
OK, with the mailnews.send_plaintext_rfc1486_strict set to true, which is the default, I get this:

I copied これは長い日本語のテキストですので、行が折れると思います。 a few times. The resulting e-mail is this:

これは長い日本語のテキストですので、行が折れると思います。これは長い日本
語のテキストですので、行が折れると思います。これは長い日本語のテキストで
すので、行が折れると思います。これは長い日本語のテキストですので、行が折
れると思います。これは長い日本語のテキストですので、行が折れると思いま
す。これは長い日本語のテキストですので、行が折れると思います。これは長い
日本語のテキストですので、行が折れると思います。これは長い日本語のテキス
トですので、行が折れると思います。これは長い日本語のテキストですので、行
が折れると思います。これは長い日本語のテキストですので、行が折れると思い
ます。これは長い日本語のテキストですので、行が折れると思います。これは長
い日本語のテキストですので、行が折れると思います。これは長い日本語のテキ
ストですので、行が折れると思います。これは長い日本語のテキストですので、
行が折れると思います。これは長い日本語のテキストですので、行が折れると思
います。これは長い日本語のテキストですので、行が折れると思います。

All neatly broken at the but no extra spaces inserted. Source:

Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

$B$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\(B
$B8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G(B
$B$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^(B
$B$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^(B
$B$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$(B
$BF|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9(B
$B%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T(B
$B$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$(B
$B$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9(B
$B$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-(B
$B%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"(B
$B9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W(B
$B$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#(B

This should satisfy also the most conservative Japanese user since RFT 1468 is *fully* honoured:
ISO-2022-JP, 7bit, short lines ... and no extra spaces!

If they don't want the broken lines, they need to switch mailnews.send_plaintext_rfc1486_strict to 'false'. Then they will get base64 and no broken lines.

All other character sets are not affected, so all the Chinese, Korean or Japanese UTF-8 users get the benefit of having transmitted exactly what they entered.

I hope this will keep everyone happy.

If still required, we can fix bug 26734 as well, so format=flowed; delsp=yes would become available, but then we're not adhering 100% to RTF 1468 which also calls for short lines in a very vague way:

===
The human user (not implementor) *should* try to keep lines within 80
display columns, or, preferably, within 75 (or so) columns, ...
===
Comment 190 Jorg K (GMT+2, PTO during summer) 2015-11-27 15:19:16 PST
There's another option *without* a new preference. We just do it like this for ISO-2022-JP:

If mailnews.send_plaintext_flowed is set to 'true' (default), we do the new behaviour:
No line breaking, using CTE base64 as required for long lines.
When bug 26734 is done, we use CTE 7bit and delsp=yes.

If mailnews.send_plaintext_flowed is set to 'false', we do the old behaviour:
Line breaking, but always using CTE 7bit as shown in comment #189.

Can we get a consensus among Japanese users? I can you what you decide, even implement bug 26734 after all.
Comment 191 Masatoshi Kimura [:emk] 2015-11-27 15:54:31 PST
(In reply to Jorg K (GMT+1) from comment #189)
Did you test a text containing both ASCII characters and CJK characters? That is, does the patch implement this part of the RFC?
>   Each JIS
>   X 0208 character takes up two columns, and the escape sequences do
>   not take up any columns. The implementor is reminded that JIS X 0208
>   characters take up two bytes and should not be split in the middle to
>   break lines for displaying, etc.

(In reply to Jorg K (GMT+1) from comment #190)
> There's another option *without* a new preference. We just do it like this
> for ISO-2022-JP:
> 
> If mailnews.send_plaintext_flowed is set to 'true' (default), we do the new
> behaviour:
> No line breaking, using CTE base64 as required for long lines.
> When bug 26734 is done, we use CTE 7bit and delsp=yes.
> 
> If mailnews.send_plaintext_flowed is set to 'false', we do the old behaviour:
> Line breaking, but always using CTE 7bit as shown in comment #189.
> 
> Can we get a consensus among Japanese users? I can you what you decide, even
> implement bug 26734 after all.

The old behavior should be used by default for iso-2022-jp plaintext mails (IIRC Thunderbird currently disables format=flowed for iso-2022-jp). Otherwise nobody will use it and legacy incompatible mails will be sent.
Comment 192 Jorg K (GMT+2, PTO during summer) 2015-11-27 22:20:46 PST
(In reply to Masatoshi Kimura [:emk] from comment #191)
> Did you test a text containing both ASCII characters and CJK characters?
No.
> That is, does the patch implement this part of the RFC?
I don't know.
 
> The old behavior should be used by default for iso-2022-jp plaintext mails
> (IIRC Thunderbird currently disables format=flowed for iso-2022-jp).
> Otherwise nobody will use it and legacy incompatible mails will be sent.
OK. In the patch coming up I will maintain the *exact* old behaviour:
- format=flowed is disabled regardless of mailnews.send_plaintext_flowed
- lines are broken as can be seen in comment #189.
- no extra new preference.
- delsp will be left to bug 26734.
- the behaviour for the mix of ASCII and Japanese characters has not changed
  (I don't know whether it's right or not.)
Comment 193 Jorg K (GMT+2, PTO during summer) 2015-11-27 22:37:22 PST
Created attachment 8693174 [details] [diff] [review]
Proposed final solution (v4)

This patch fixes all problems for CJK languages but maintains the previous behaviour for ISO-2022-JP so that Thunderbird complies with RFC 1468.
That should keep everyone happy.
Comment 194 Jorg K (GMT+2, PTO during summer) 2015-11-27 22:55:22 PST
Created attachment 8693177 [details] [diff] [review]
Proposed final solution (v4)

Oops, forgot to "hg qref" before attaching the patch. Now we're good.
Comment 195 Jorg K (GMT+2, PTO during summer) 2015-11-28 01:04:49 PST
New binaries here, this time for all platforms:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-b4966113eaa14a806a88aa558efdd2c6a4f9c89d/

Same behaviour as the binaries from comment #177 but previous behaviour for ISO-2022-JP plaintext messages, that is lines are broken and no format=flowed.
Comment 197 Jorg K (GMT+2, PTO during summer) 2015-11-28 09:53:33 PST
Created attachment 8693229 [details] [diff] [review]
Proposed final solution (v5), includes delsp support.

Adding delsp was another two line change, so I added it.
Comment 198 Jorg K (GMT+2, PTO during summer) 2015-11-28 10:12:50 PST
Created attachment 8693231 [details] [diff] [review]
Proposed final solution (v5b), includes delsp support.

Sigh. Fixed cut and paste error.
Comment 199 Jorg K (GMT+2, PTO during summer) 2015-11-28 12:34:52 PST
Makoto-san and Masatoshi-san: You both cared about RFC 1468. I've now completed the work. ISO-2022-JP will always be sent with CTE 7bit. If format=flowed is used, this is achieved with delsp=yes.

Can you please test that everything is to your liking. Binaries here, sadly the Mac compile failed due to Mercurial problems:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-09c359459df26c2000cd4cb30971a140a661c0ff/

Here is the try run:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=09c359459df2

And here is a flowed/delsp message. Note the spaces at the end of each line.

Content-Type: text/plain; charset=ISO-2022-JP; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit

$B$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\(B <=== space here
$B8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G(B 
$B$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^(B 
$B$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^(B 
$B$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#(B
Comment 200 Jorg K (GMT+2, PTO during summer) 2015-11-29 02:31:36 PST
Created attachment 8693333 [details] [diff] [review]
Proposed test (v1)

This test extends the test from bug 1225904.

Needless to say that this will only work with the patches from bug 1225864 and bug 1225904 applied first.

(Note: I'm obsoleting the ISO-2022-JP sample message since the system will no longer produce ISO-2022-JP encoded plaintext messages with CTE base64.)
Comment 201 Jorg K (GMT+2, PTO during summer) 2015-11-29 02:54:35 PST
htmlWrapColumn does not exist anywhere in the system any more, so I'm removing it from summary to reduce the confusion: https://dxr.mozilla.org/comm-central/search?q=htmlWrapColumn&redirect=false&case=false

mailnews.wraplength is obeyed when wrapping non-flowed plaintext e-mail.

Japanese text encoded in ISO-2022-JP is "tradiationally" wrapped at the number of bytes specified in mailnews.wraplength (default: 72), which is equivalent to half the number of characters (36 assuming the default).

Some parts of the system rely on this "magic", see for example here:
https://dxr.mozilla.org/mozilla-central/source/dom/base/test/TestPlainTextSerializer.cpp#78
Comment 202 Jorg K (GMT+2, PTO during summer) 2015-11-29 13:36:07 PST
Created attachment 8693362 [details] [diff] [review]
Proposed test (v1b)

Fixed typo.
Comment 203 Jorg K (GMT+2, PTO during summer) 2015-11-29 13:37:34 PST
Created attachment 8693364 [details] [diff] [review]
Proposed test (v1b)

Fixed typo, this time for real.
Comment 204 Magnus Melin 2015-12-03 13:16:58 PST
Comment on attachment 8693231 [details] [diff] [review]
Proposed final solution (v5b), includes delsp support.

Review of attachment 8693231 [details] [diff] [review]:
-----------------------------------------------------------------

Code looks ok to me. r=mkmelin

::: mailnews/compose/src/nsMsgCompose.h
@@ +159,5 @@
>      NS_DECL_NSISTREAMLISTENER
>      NS_DECL_NSIMSGQUOTINGOUTPUTSTREAMLISTENER
>  
>      NS_IMETHOD  SetComposeObj(nsIMsgCompose *obj);
> +    NS_IMETHOD  ConvertToPlainText(bool formatflowed,

odd double spacing here, please fix it and the row above

::: mailnews/compose/src/nsMsgSend.cpp
@@ +1527,5 @@
>  
>    //
>    // Query the editor, get the body of HTML!
>    //
> +  uint32_t  flags = nsIDocumentEncoder::OutputFormatted |

one space only please
Comment 205 Magnus Melin 2015-12-03 13:17:16 PST
Comment on attachment 8693364 [details] [diff] [review]
Proposed test (v1b)

Review of attachment 8693364 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/test/unit/test_longLines.js
@@ +12,5 @@
>  
> +// Copied from jsmime.js.
> +function stringToTypedArray(buffer) {
> +  var typedarray = new Uint8Array(buffer.length);
> +  for (var i = 0; i < buffer.length; i++)

please always have braces for loops, even one-line loops

@@ +22,3 @@
>    let msgData = mailTestUtils
>      .loadMessageToString(gDraftFolder, mailTestUtils.firstMsgHdr(gDraftFolder));
> +  checkMessageHeaders(msgData, expectedHeaders, "");

just leave out the last param, instead of having ""
Comment 206 Jorg K (GMT+2, PTO during summer) 2015-12-03 14:01:51 PST
Created attachment 8695528 [details] [diff] [review]
Proposed final solution (v5c), includes delsp support.

Carrying over Magnus' r+.
Fixed nits.
Comment 207 Jorg K (GMT+2, PTO during summer) 2015-12-03 14:23:44 PST
Created attachment 8695543 [details] [diff] [review]
Proposed test (v1c)

Carrying over Magnus' r+.
Fixed nits.
Comment 208 Jorg K (GMT+2, PTO during summer) 2015-12-03 14:29:18 PST
This depends on bug 1225904, so please land it first.
Please apply the "proposed solution", then the test.
Comment 209 aleth [:aleth] 2015-12-04 13:05:16 PST
https://hg.mozilla.org/comm-central/rev/c6c9a8e486b7e67f43707ead63a4f796fa4e6952
Bug 653342 - Properly set serializer flags. Support delsp. Use OutputDisallowLineBreaking. r=mkmelin.

https://hg.mozilla.org/comm-central/rev/725ae1aad7d0900e34dbd06359a1a41279e25873
Bug 653342 - Properly set serializer flags. Support delsp. Use OutputDisallowLineBreaking. Test. r=mkmelin.
Comment 210 Jorg K (GMT+2, PTO during summer) 2015-12-04 15:44:38 PST
How bad, the new long line test fails on Mac and Linux, works on Windows:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=725ae1aad7d0

I'll look into it tomorrow.
Comment 211 Jorg K (GMT+2, PTO during summer) 2015-12-04 16:59:00 PST
Created attachment 8696134 [details] [diff] [review]
Correction of the landed test.

I suspect a newline problem. The first test which compares HTML works, the second test compares using an appended "\r\n" and fails. I suspect that's it. I'm now doing the newlines based on the platform the test runs on, so "\r\n" for Windows and "\n" for the rest. See how we go: Try here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=16293598e883
Comment 212 Jorg K (GMT+2, PTO during summer) 2015-12-04 17:11:26 PST
Mistyped try string for macosx64: Here's another one:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=90da358fdcd7
Comment 213 Jorg K (GMT+2, PTO during summer) 2015-12-04 23:53:07 PST
Created attachment 8696199 [details] [diff] [review]
Correction of the landed test. (take 2)

I'm on the right track, now all tests pass, but the last one. This will hopefully fix the last one as well. New try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=5c54437b0007
Comment 214 Jorg K (GMT+2, PTO during summer) 2015-12-05 01:23:57 PST
Created attachment 8696211 [details] [diff] [review]
Correction of the landed test. (take 3)

Another go:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=c5e8bd24affe
Comment 215 :aceman 2015-12-05 02:52:56 PST
Comment on attachment 8696211 [details] [diff] [review]
Correction of the landed test. (take 3)

Review of attachment 8696211 [details] [diff] [review]:
-----------------------------------------------------------------

The test seems to pass now and the change looks reasonable.
Could you use Services.appinfo.OS for the OS detection? The value for Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)
Comment 216 Jorg K (GMT+2, PTO during summer) 2015-12-05 02:54:58 PST
Comment on attachment 8696211 [details] [diff] [review]
Correction of the landed test. (take 3)

OK, the third attempt fixes the test failures on Mac and Linux.
I also corrected a typo: s/htmt/html/.

Aleth, you might just want to rs and land this quickly.
Comment 217 Jorg K (GMT+2, PTO during summer) 2015-12-05 02:57:07 PST
(In reply to :aceman from comment #215)
> Could you use Services.appinfo.OS for the OS detection? The value for
> Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)

Well, https://developer.mozilla.org/en-US/docs/Mozilla/QA/Writing_xpcshell-based_unit_tests#Platform-specific_tests says to use what I used. I will try your solution now.
Comment 218 Jorg K (GMT+2, PTO during summer) 2015-12-05 03:01:35 PST
Created attachment 8696220 [details] [diff] [review]
Correction of the landed test. (take 4)

OK, there the OS detection according to Aceman. Works on Windows.

Please review if you feel fit ;-)
Comment 219 :aceman 2015-12-05 03:12:40 PST
(In reply to Jorg K (GMT+1) from comment #217)
> (In reply to :aceman from comment #215)
> > Could you use Services.appinfo.OS for the OS detection? The value for
> > Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)
> 
> Well,
> https://developer.mozilla.org/en-US/docs/Mozilla/QA/Writing_xpcshell-
> based_unit_tests#Platform-specific_tests says to use what I used. I will try
> your solution now.

Yeah, maybe it was written before Services were available :) Anyway, my method seems less hacky and you can grep it that we use it in the TB code.
Comment 220 Jorg K (GMT+2, PTO during summer) 2015-12-05 03:35:17 PST
Aceman reports failures on his local Linux debug build:
###!!! ASSERTION: Not a UTF-8 string. This code should only be used for converting from known UTF-8 strings.: 'Error', file /mozilla/xpcom/string/nsUTF8Utils.h, line 430"

So here goes another try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ea78d2cefb3a

So let's see the results before reviewing and landing this.
Comment 221 Jorg K (GMT+2, PTO during summer) 2015-12-05 05:32:01 PST
Comment on attachment 8696220 [details] [diff] [review]
Correction of the landed test. (take 4)

Aceman lied to me, Services.appinfo.OS returns XPCShell.
Grrrr. Going back to the previous version with another try run.
Comment 222 Jorg K (GMT+2, PTO during summer) 2015-12-05 05:41:12 PST
Created attachment 8696229 [details] [diff] [review]
Correction of the landed test. (take 3a), same as take 3 with added comment.

OK, since Services.appinfo.OS doesn't return the correct result I'm going back to version 3 with an added comment.

Another try, this time including the debug builds:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=f7ddd280284e
Comment 223 Jorg K (GMT+2, PTO during summer) 2015-12-05 05:44:22 PST
Created attachment 8696230 [details] [diff] [review]
Correction of the landed test. (take 3b), same as take 3 with added comment.

OK, one more time, with an even better comment upon Aceman's request.
Comment 224 :aceman 2015-12-05 08:07:20 PST
Comment on attachment 8696230 [details] [diff] [review]
Correction of the landed test. (take 3b), same as take 3 with added comment.

Review of attachment 8696230 [details] [diff] [review]:
-----------------------------------------------------------------

It seems my xpcshell breakage is not related to this patch.
The patch fixes the tests on try server so let's land it.
Comment 226 Jorg K (GMT+2, PTO during summer) 2015-12-05 09:44:55 PST
All done, thanks Aceman!
Comment 227 Philip Chee 2015-12-05 15:15:48 PST
I tested xpcshell:

js> Components.utils.import("resource://gre/modules/AppConstants.jsm");
[object BackstagePass]
js> AppConstants.platform;
win
Comment 228 asmwarrior 2015-12-06 18:22:28 PST
Is it possible that this fix patches could be applied in the current release branch? We are in release 38.x version, but the target milestone is 45.0, and we will wait long time to see this bug is fixed in the official release. Thanks.
Comment 229 asmwarrior 2015-12-06 18:24:09 PST
Is it possible that this patches could be applied in the current release branch? We are in release 38.x version, but the target milestone is 45.0, and we will wait long time to see this bug is fixed in the official release. Thanks.
Comment 230 Kent James (:rkent) 2015-12-06 18:40:28 PST
(In reply to asmwarrior from comment #229)
> Is it possible that this patches could be applied in the current release
> branch? We are in release 38.x version, but the target milestone is 45.0,
> and we will wait long time to see this bug is fixed in the official release.
> Thanks.

The required change was complex spanning multiple patches over multiple bugs, so I don't this is a good candidate for uplift to release.
Comment 231 Jorg K (GMT+2, PTO during summer) 2015-12-06 22:26:03 PST
I agree with Kent. However, after running with these patches on Daily and Aurora from 14th Dec., we could see whether they can be made to apply on TB 38.6 due at the end of January. Then it's just six weeks away from TB 45 due in early March, so perhaps not worth the effort. Kent is doing the uplifts, so it's up to him.
Comment 232 KSak 2016-01-19 09:26:37 PST
(In reply to Jorg K (GMT+1) from comment #231)
> I agree with Kent. However, after running with these patches on Daily and
> Aurora from 14th Dec., we could see whether they can be made to apply on TB
> 38.6 due at the end of January. Then it's just six weeks away from TB 45 due
> in early March, so perhaps not worth the effort. Kent is doing the uplifts,
> so it's up to him.

I'm just a general TB end-user and have no clue about the technicalities of this bug, but I can see that this bug has finally been resolved and is due for release into production soon - THANK YOU and all the other developers/programemrs who worked so hard and diligent in order to fix this issue.

With that said, when exactly is this fix scheduled to be released in TB? End of this month (Jan)? Or do we need to wait until early March?

I am currently using the latest production build (v38.5.1) and would LOVE to have this fix implemented so that I can finally write my emails in Japanese without seeing all those random blank spaces everywhere! It certainly doesn't look pretty, almost unprofessional even, especially when writing business emails to clients and partners and it would be life saving to have this resolved over the next 1-2 weeks once and for all :-D
Comment 233 Jorg K (GMT+2, PTO during summer) 2016-01-19 09:56:30 PST
(In reply to KSak from comment #232)
> With that said, when exactly is this fix scheduled to be released in TB? End
> of this month (Jan)? Or do we need to wait until early March?

This is coming out in TB 45 in early to mid-March 2016:
https://wiki.mozilla.org/RapidRelease/Calendar

TB 45 will go to beta at the end of January.

You can use this functionality now in the US English version of Earlybird:
http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/ for example
http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/thunderbird-45.0a2.en-US.win32.installer.exe

Warning: This is Alpha software, but hey, I've been using it since Christmas and as far as I can tell, it works ;-)
Comment 234 KSak 2016-01-20 04:51:30 PST
(In reply to Jorg K (GMT+1) from comment #233)
> (In reply to KSak from comment #232)
> > With that said, when exactly is this fix scheduled to be released in TB? End
> > of this month (Jan)? Or do we need to wait until early March?
> 
> This is coming out in TB 45 in early to mid-March 2016:
> https://wiki.mozilla.org/RapidRelease/Calendar
> 
> TB 45 will go to beta at the end of January.
> 
> You can use this functionality now in the US English version of Earlybird:
> http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/ for
> example
> http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/
> thunderbird-45.0a2.en-US.win32.installer.exe
> 
> Warning: This is Alpha software, but hey, I've been using it since Christmas
> and as far as I can tell, it works ;-)

Thanks a ton for the reply. I gave Earlybird a try and confirmed the blanks are gone! Such great stuff!!! Can't wait for the official March release :-)

I did run into one minor problem though when sending emails with Earlybird. For some reason it seems like URL links (e.g. https://www.google.co.jp) that I type in the email message does not "activate" or automatically get converted to a hyperlink when I receive the message.

I sent the exact same email with the URL using Thunderbird and confirmed the URL was properly "activated" as a hyperlink when I received the email (on either TB or EB), so it appears to be an issue with sending emails through Earlybird. Not sure if it's a bug, limitation or perhaps one of the settings get changed when I use Earlybird, but just wanted to raise this to your attention.
Comment 235 KSak 2016-01-20 04:58:05 PST
Forgot to mention that in Earlybird the URL will properly convert to a link if I manually select it and use the "Link" command button. But it doesn't automatically convert to a link like it does when using Thunderbird.
Comment 236 :aceman 2016-01-20 05:01:20 PST
The link problem should be bug 1240903.
Comment 237 Jorg K (GMT+2, PTO during summer) 2016-01-20 06:44:28 PST
(In reply to KSak from comment #234)
> I did run into one minor problem though when sending emails with Earlybird.
> For some reason it seems like URL links (e.g. https://www.google.co.jp) that
> I type in the email message does not "activate" or automatically get
> converted to a hyperlink when I receive the message.
As Aceman said, this is already fixed in bug 1240903 and will be landed any day now.
Comment 238 KSak 2016-01-20 10:48:17 PST
(In reply to Jorg K (GMT+1) from comment #237)
> (In reply to KSak from comment #234)
> > I did run into one minor problem though when sending emails with Earlybird.
> > For some reason it seems like URL links (e.g. https://www.google.co.jp) that
> > I type in the email message does not "activate" or automatically get
> > converted to a hyperlink when I receive the message.
> As Aceman said, this is already fixed in bug 1240903 and will be landed any
> day now.

Gotcha! Really appreciate the quick point outs. Looking forward to the upcoming release and fixes!
Comment 239 KSak 2016-03-29 22:36:03 PDT
I just noticed that TB 38.7.1 was just released, but I thought TB 45 which includes a fix for this bug was due for release in early to mid-March? Has 45 not been released yet or has it been postponed?

I checked the schedule (https://wiki.mozilla.org/RapidRelease/Calendar) and could not find reference to Thunderbird anywhere (is it the "ESR" column?) but it appears as though Firefox 45 and ESR 45.0 were already released on March 7th. Please correct me if I'm looking at the wrong place.
Comment 240 Kent James (:rkent) 2016-03-29 23:01:03 PDT
45 is late, but what we hope is the final beta was built this week. We hope that 45 will ship next week.
Comment 241 KSak 2016-03-31 07:02:01 PDT
(In reply to Kent James (:rkent) from comment #240)
> 45 is late, but what we hope is the final beta was built this week. We hope
> that 45 will ship next week.

That's good to hear. Is this release info updated or already included somewhere in the schedule link (https://wiki.mozilla.org/RapidRelease/Calendar)?

I wasn't sure where to look, but the next release date appears to be set on April 19th and it mentions 45.1 under ESR - is that referring to Thunderbird 45 or am I looking at the schedule incorrectly?
Comment 242 Jorg K (GMT+2, PTO during summer) 2016-03-31 09:00:14 PDT
TB follows the Release Calendar. TB 45.0 ESR should have been released on March 8, 2016, but usually we're running two to four weeks late, as per comment #240.
Comment 243 KSak 2016-04-08 00:46:38 PDT
(In reply to Jorg K (GMT+2) from comment #242)
> TB follows the Release Calendar. TB 45.0 ESR should have been released on
> March 8, 2016, but usually we're running two to four weeks late, as per
> comment #240.

I'm a little confused as to what you mean by TB 45.0 already being released on March 8, 2016 even though the actual fix for this bug not been released yet. Wasn't this fix included in 45.0?

Also, my TB recently got updated to 38.7.2, not 45.0 based on the Release Calendar as you mentioned. 38.7 looks like a version for Firefox, while 45.0 is for TB. Can you kindly clarify how these updates work?
Comment 244 Jorg K (GMT+2, PTO during summer) 2016-04-08 01:00:12 PDT
1) The fix is in TB 45.
2) TB 45 ESR has not been released yet. I will be released next week.
3) TB 45 beta is available for testing now: https://www.mozilla.org/en-US/thunderbird/channel/
4) Once TB 45 ESR is released, TB 38.x will eventually automatically update.
5) FF and TB follow the same numbering scheme and release calendar, however, TB is usually running
   a few weeks late since it is staffed only by unpaid volunteers.
   That's why TB 45 ESR should have been released on March 8, 2016, but wasn't, see 2).
Comment 245 KSak 2016-04-09 20:30:58 PDT
(In reply to Jorg K (GMT+2) from comment #244)
> 1) The fix is in TB 45.
> 2) TB 45 ESR has not been released yet. I will be released next week.
> 3) TB 45 beta is available for testing now:
> https://www.mozilla.org/en-US/thunderbird/channel/
> 4) Once TB 45 ESR is released, TB 38.x will eventually automatically update.
> 5) FF and TB follow the same numbering scheme and release calendar, however,
> TB is usually running
>    a few weeks late since it is staffed only by unpaid volunteers.
>    That's why TB 45 ESR should have been released on March 8, 2016, but
> wasn't, see 2).

Thanks JorgK - that explanation really helps a lot to understand how the release system works between FF nd TB. By the way what does "ESR" stand for? Is there any difference between "TB 45" and "TB 45 ESR"?

Mozilla really need to do some good and hire you guys - i.e. the unpaid volunteers. It's really unfortunate that they've stopped officially supporting TB after all this time the community has continued on with it.
Comment 246 Jorg K (GMT+2, PTO during summer) 2016-04-09 23:42:21 PDT
(In reply to KSak from comment #245)
> By the way what does "ESR" stand for?
> Is there any difference between "TB 45" and "TB 45 ESR"?
This information is readily available elsewhere.
ESR = Extended Service Release.
In Firefox every seventh release is an ESR, 17, 24, 31, 38, 45, etc.
Since TB doesn't have the manpower, we only do every seventh release, so you get all those release versions. TB 45 and TB 45 ESR are the same thing. We do the other versions as beta releases only, at times skipping some. I'd say the next beta will be TB 47 skipping TB 46. It's not decided yet.

> Mozilla really need to do some good and hire you guys - i.e. the unpaid
> volunteers. It's really unfortunate that they've stopped officially
> supporting TB after all this time the community has continued on with it.
I agree, but Mozilla see it differently. There will be official announcements made soon. In the meantime you can donate directly to Thunderbird: https://donate.mozilla.org/en-US/thunderbird/about/
Comment 247 KSak 2016-04-10 03:44:04 PDT
(In reply to Jorg K (GMT+2) from comment #246)
> (In reply to KSak from comment #245)
> > By the way what does "ESR" stand for?
> > Is there any difference between "TB 45" and "TB 45 ESR"?
> This information is readily available elsewhere.
> ESR = Extended Service Release.
> In Firefox every seventh release is an ESR, 17, 24, 31, 38, 45, etc.
> Since TB doesn't have the manpower, we only do every seventh release, so you
> get all those release versions. TB 45 and TB 45 ESR are the same thing. We
> do the other versions as beta releases only, at times skipping some. I'd say
> the next beta will be TB 47 skipping TB 46. It's not decided yet.
> 
> > Mozilla really need to do some good and hire you guys - i.e. the unpaid
> > volunteers. It's really unfortunate that they've stopped officially
> > supporting TB after all this time the community has continued on with it.
> I agree, but Mozilla see it differently. There will be official
> announcements made soon. In the meantime you can donate directly to
> Thunderbird: https://donate.mozilla.org/en-US/thunderbird/about/

Just made a donation. Thanks so much for taking the time to clarify these items, really appreciate it!
Comment 248 Jorg K (GMT+2, PTO during summer) 2016-04-13 23:43:49 PDT
In case you haven't noticed: TB 45 has now been released:
https://www.mozilla.org/en-US/thunderbird/all/
Comment 249 KSak 2016-04-15 00:57:33 PDT
My TB hasn't yet auto-updated to 45 (it's still sitting at 38.7.2).

Is there any way to force an update on the application? Or do I need to download from that link you provided and reinstall it again? I wouldn't want to lose any settings/add-ons if possible when updating to 45.
Comment 250 Jorg K (GMT+2, PTO during summer) 2016-04-15 01:01:57 PDT
Download and install it from https://www.mozilla.org/en-US/thunderbird/all/.
Settings won't be lost, but we can't guarantee that all add-ons will continue to work.
Comment 251 KSak 2016-04-15 02:18:41 PDT
(In reply to Jorg K (GMT+2) from comment #250)
> Download and install it from https://www.mozilla.org/en-US/thunderbird/all/.
> Settings won't be lost, but we can't guarantee that all add-ons will
> continue to work.

Thanks! Just installed 45 though as you warned it wasn't compatible with Noia Fox theme. Will give it a whirl!

Note You need to log in before you can comment on or make changes to this bug.