Closed Bug 653342 Opened 13 years ago Closed 9 years ago

CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail due to wrap produced by mailnews.wraplength and line length limitation of 1000bytes of SMTP

Categories

(MailNews Core :: Composition, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 45.0

People

(Reporter: lotus7174, Assigned: jorgk-bmo)

References

(Depends on 1 open bug)

Details

(Keywords: intl, Whiteboard: [tb-papercut])

Attachments

(8 files, 28 obsolete files)

163.05 KB, image/jpeg
Details
14.15 KB, text/plain
Details
22.90 KB, text/plain
Details
6.71 KB, text/plain
Details
9.72 KB, text/plain
Details
17.94 KB, patch
jorgk-bmo
: review+
Details | Diff | Splinter Review
6.53 KB, patch
jorgk-bmo
: review+
Details | Diff | Splinter Review
3.79 KB, patch
aceman
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.2.15) Gecko/20110303 BTRS26718 Firefox/3.6.15 GTB7.1 ( .NET CLR 3.5.30729)
Build Identifier: 3.1.9

http://forums.mozillazine.org/viewtopic.php?f=39&t=2173931

Version: 3.1.9
Language: Chinese (traditional character version)

Operating System: Microsoft Windows XP
Language: Chinese (traditional character version)

1. Use HTML to compose messages.

2. Send the following text to yourself. Check the "Sent" box and also your received copy.
http://tw.myblog.yahoo.com/lotus7174@ki ... -1&next=-1

3. PROBLEM: I have two versions in different dialects. In one of those, there is an extra
space after both 而 on the leftmost of two lines; in the other, no. In both versions, there is an extra space
between 向 and 上 near the end of the message.

4. PROBLEM: I have another message with 商場. There is an extra space inserted in between.

5. EXPECTATION: No extra space, whether Latin-character text, or in Chinese (trad. or simp.)
versions.

NOTE: This was reported since 3.1.5, but still not resolved:
http://getsatisfaction.com/mozilla_mess ... bird_3_1_5

Thanks.

Qiyao

Reproducible: Always
Whiteboard: dupeme
Attached image extra unwanted spaces
This is what appears in the "Sent" box, and also what the receiver sees.
The body of the mail is set in Courier (not <TT>).
Note that sometimes 歲月 and 一點 got inserted with an intervening space.
Those are hyperlinks, but sometimes plain text also have this problem.

Thanks.
Sometimes, between 東 and 加, it happens.
Do View > Mail Source.
You find that it is unwanted linebreaks in the original Mail Souce
when it sends the e-mail.
In the original mail source, linebreaks in the HTML code happen between
Chinese characters, which should not be inserted.
They cause the rendered e-mail to have a space between those Chinese
characters.

Is this a problem of sending (the linkbreak should not be there)
or of displaying (the linkbreak should disappear instead of becoming
a space).

Or there needs to be a standard as to what it means when Chinese text
is encoded as HTML is allowed to be inserted with a linebreak
between characters in the HTML,
and what should happen when this is rendered as formatted text,
when those two lines will be joined in the same line for display,
whether this linebreak should become a space or should disappear because
of Chinese text formatting can be joined.

Thanks.
I have report a similar bug in the forum, see (also include an screen shot):
http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453

If you enter a long sentence when sending an email, TB will automatically add many  
extra spaces in my text, see:
http://i683.photobucket.com/albums/vv194/ollydbg_cb/2011-09-25192351.png

This made TB useless for all the CJK language users.

see also: http://getsatisfaction.com/mozilla_messaging/topics/unwanted_extra_spaces_inserted_within_the_text_not_at_the_beginning_of_the_message_thunderbird_3_1_5

I just test TB 6.02 and TB 7.0beta2, and the bug still exist. 
Some one can fix this?

thanks.
This happens if you are using HTML e-mail composition.
See my comment dated "Zhong Qiyao 2011-08-18 21:28:14 PDT"
for a cause of the problem.

The solution of the problem will need a clarification as to
whether those linebreaks are allowed in the internal HTML
representation of the the e-mail,
and whether they should be displayed as spaces when those lines
are joined, when the internal HTML representation
is rendered to be displayed for the user.

Try sending an e-mail between Thunderbird and other mailers
(Outlook Express or Postbox as someone mentioned Postbox),
and see what they do with long lines of CJK (Chinese-Japanese-Korean)
text.

The same problem would happend for a normal HTML Web-page in CJK also,
about the HTML representation of optional linebreaks (i.e. when
a line is too long to be displayed) between CJK characters in CJK text.

Thanks.
Well, I have check other email sender, both of them use base64 encoding.

If I send an html styled email(which have some bold CJK char or italic CJK char) from "Gmail web", I have the following content:

--------------------------------------------------------------------------

Content-Type: multipart/alternative; boundary=0016e6dede458e824504ade3981f

--0016e6dede458e824504ade3981f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64

5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI
5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOI5ZOICirlk4jlk4jlk4jl
k4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jl
k4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4gqCuWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiCrlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4jlk4gqCuWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWTiOWT
iOWTiOWTiOWTiOWTiOWTiOWTiAo=
--0016e6dede458e824504ade3981f
...
...
---------------------------------------------------------------------------

I also check an email client Foxmail(very popular in China mainland), it also send the email with such content like:
-----------------------------------------------------
Content-Type: multipart/mixed;
	boundary="=====001_Dragon381175773018_====="

This is a multi-part message in MIME format.

--=====001_Dragon381175773018_=====
Content-Type: text/plain;
	charset="gb2312"
Content-Transfer-Encoding: base64
.....
-----------------------------------------------------

So, I'm wondering whether TB can send base64 based encoding for text.
It seems the answer is NO.
Concerning:
http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453

It may be because of the defition of optional line breaks between two Chinese (CJK) characters
when line wrapping, breaking, and joining happens, when using plain text HTML.

Base64 may be used if the trasmitting medium requires a line break after every so many ASCII characters.
But if Base64 is not used, Thunderbird may consider *not* inserting line breaks between Chinese
characters at all, just as if a line of 100 Latin letters is wanted to be transmitted by the sender
with no "force linebreaks" upon transmission.  The receiver will then receive a line with 100 Latin
letters, and determine, when displaying, whether to "force iinebreaks" when displaying.  Some mail
displayers don't.

But where is the voting page?
Search the word "vote" in this page, you will find it.
Attached mails are mail created by Send Later of Tb 7.
  Three lines written in Japanese character,
  1024 Japanese unicode characters per line.
(1) HTML, Options/Format/\plain and Rich(HTML) text
(1-A) long line 01-A, text_html, not format=flowed(iso-2022-jp)
(1-B) long line 01-B, text_html, format=flowed(utf-8).eml
(2) Plain text mail, composed in text mode
(2-A) long line 02-A, text_plain, not format=flowed(iso-2022-jp)
(2-B) long line 02-B, text_plain, format=flowed(utf-8).eml
Note: If Japanese charset like iso-2022-jp, format=flowed is currently disabled internally, even when mailnews.send_plaintext_flowed=true. If utf-7, format=flowed is used by mailnews.send_plaintext_flowed=true.
Summary: Chinese: extra spaces inserted "within" the text → CJK(Chinese, Japanese, Korean): extra spaces inserted "within" the text
Observed phenomena around very long line during mail composition.

(i) If composed in HTML mode and test only, bug 414299 occurs and text/plain part(converted from text/html part) only is sent, unless Options/Format/Plain and Rich(HTML) text is explicitly requested.

(ii) If HTML is sent with Options/Format/Plain and Rich(HTML), both text/plain part(converted from text/html part) and text/html part are sent.
If this mail is shown with View/Message Body As/Plain Text, bug 253830 happens, and converted text version of text/plain part in multipart/alternative is shown, instead of text/plain part in multipart/alternative.

(iii) Tb 7 looks to send text/plain part with base64 encoded if long mail line exists or long word without space exists, in order to avoid insertion of space or removal of space due to reformatting for fomat=flowed.

(iv) In text/html part, Tb looks to wrap at "76 unicode characters", and adds extra spaces at left of a line for readability of mesage source.
As I intentionally tested with editor.htmlWrapColumn=8888, "always wrap html at 76 unicode characters" still remains.
I don't know bug at B.M.O for this phenomenon is alredy opened or not, although I saw report of the phenomenon in some bugs.

(v) In text/plain part(converted from text/html part), next is observed.
  - Text data length in a line is around 990 bytes.
    This is probably "wpap around 1000bytes if long mail line".
  - Inserted spaces at left of line in text/html part is seen as single space.
    This is probably a result of that "multiple spaces is equivallent to a space
    in HTML" is applied upon conversion to text from html.
    Bug 262475 is a relevant bug to this phenomenon.

(vi) Bug 355209 is seen in mails composed in text mode.
     Problem like Bug 355209 doesn't look to occur in html2text conversion.

Phenomenon you saw is already reported to bug 611411 for Korean text. IIRC, similar phenomenon is reported for Japanese text too.

Confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Note:
I intentionally used mailnews.wraplength=0 in testing(no wrap by wrap length, wrap by mail line length limitation in SMTP only which is around 1000bytes). If small mailnews.wraplength value is used, wrap position in text/plain part or in text mode composition may be different from my test mail.
Gmail looks to avoid "problems due to wrap" by "send in base64" if line length is longer than mail line length limitation(aound 1000bytes).
Because, as seen in "long line 01-B, text_html, format=flowed(utf-8).eml" I attached, Tb already sends text/plain part in base64 in some circumstances,  sending in base64 is a universal/practical Tb's solution of problems around "mail line length limitation" and problems around "new line between CJK and CJK/non-CJK during mail composition", as you say.
  - Wrap in mail source at an ascii-space only with any language/charset.
  - Send in base64, if line length exceeds line length limitation definrd by RFC.
Summary: CJK(Chinese, Japanese, Korean): extra spaces inserted "within" the text → CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail, due to wrap produced by editor.htmlWrapColumn(which looks 76 or 78 unicode chars reardless of the setting), mailnews.wraplength, and line length limitation of 1000bytes of SMTP
I totally agree with WADA. Thanks for supplying so many sample emails. I strongly suggest the TB developers can consider this.
HTMLWrapColumn seems already gone away, and "wrap at 72 char if HTML" seems hard coded.
Setting dependency to bug 650206
Depends on: 650206
Component: General → Editor
Product: Thunderbird → Core
QA Contact: general → editor
Version: unspecified → Trunk
Severity: normal → major
Keywords: intl
Changing to Mail&News Core/Composition, according to component change of dependent bug 650206.
Component: Editor → Composition
OS: Windows XP → All
Product: Core → MailNews Core
QA Contact: editor → composition
Hardware: x86 → All
Ping. Any good news about this? As Thunderbird is now version 12, but this bug still exists, and it is too annoying. When the email receiver see such email, he/she thought the sender is not serious, he/she believe the sender hit many space key when wrote the email. Too bad thing.
See my comments asking whether those line-breaks are allowed to be inserted
between CJK characters arbitrarily.
https://bugzilla.mozilla.org/show_bug.cgi?id=653342#c5
https://bugzilla.mozilla.org/show_bug.cgi?id=653342#c7

Or maybe it is a flaw in the definition of HTML itself regarding a long
unbroken string of CJK characters, whether it is allowed to line-break or
space-break on the screen or a in the HTML coding, without a line-break
or space-break in the user-representation.  It is unlikely to
happen for Latin text for Latin-text is allowed to line-break only
at spaces.

Maybe someone in the current thread can download the Thunderbird source and
try to patch it away?

Thanks.
Postbox Express also has this problem even if you configure it to use "quoted printable" (base64).

Source:
長長長長長長長長長長二二二二二二二二二二三三三三三三三三三三四四四四四四四四四四五五五五五五五五五五

Sent Box and Received Result:
長長長長長長長長長長二二二二二二二二二二三三三三三三三三三三四四四四四四 四四四四五五五五五五五五五五

Message Source:
6ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW36ZW35LqM5LqM5LqM5LqM5LqM5LqM5LqM5LqM
5LqM5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5LiJ5Zub5Zub5Zub5Zub5Zub5Zub5Zub
IA0K5Zub5Zub5Zub5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqU5LqUDQo=

Thanks.
Foxmail does not exhibit the above problem with the above test text.

Thanks.
The mailer which comes with Opera also does not have this problem when sending as HTML.

Thanks.
Any mail client works well as long as it is not thunderbird or thunderbird-based ones.

This is such a deadly bug for CJK users and it's been there for a long long time.

No more expectations for mozilla.

I am migrating from thunderbird to browsers on some devices. Just like I migrated from firefox to chrome for good.
If it is not solved, then Mozilla Thunderbird won't be suitable for CJK users in HTML mode.  In fact, Chrome has problems with Yahoo! blog, and I am in the process of migration to simplicity: Opera instead of Thunderbird and Firefox.  Neither Firefox nor Opera is all-powerful; for the missing Web sites, use the inevitable Microsoft Internet Explorer.

Bye.
I think we should modify the hardcode:

nsPlaintextEditor::nsPlaintextEditor()
: nsEditor()
, mRules(nsnull)
, mWrapToWindow(false)
, mWrapColumn(0)


Change mWrapColumn(0) to mWrapColumn(1000)?
(In reply to xunxun from comment #30)
> I think we should modify the hardcode:
> 
> nsPlaintextEditor::nsPlaintextEditor()
> : nsEditor()
> , mRules(nsnull)
> , mWrapToWindow(false)
> , mWrapColumn(0)
> 
> 
> Change mWrapColumn(0) to mWrapColumn(1000)?

If it is correct, I will try to build tb13b2 myself and debug it.
I change:

nsDocumentEncoder.cpp
nsPlaintextEditor.cpp
nsContentUtils.cpp
nsTextEditorState.cpp
nsWebBrowserPersist.cpp
nsSelection.cpp


SetWrapColumn() => SetWrapColumn(1000)

mWrapColumn => mWrapColumn=1000


After the change, the state is improved, but not perfect, because there is also a break after 1000 characters.

So we should need a wrapcol's config option.


The test edition : http://pcxfirefox.googlecode.com/files/tb13b2_sse2_nopgo_test_20120520.7z
xunxun could you make a patch and post the patch to this bug ?
Attached patch line break -> 1000 hacker patch (obsolete) — Splinter Review
(In reply to Ludovic Hirlimann [:Usul] from comment #33)
> xunxun could you make a patch and post the patch to this bug ?

Sure. But I am not Mozilla developer, this is only a hacker patch. Use for enlarge the line break.

The best choice should be that tb use base64 to send and receive mails.
(In reply to xunxun from comment #34)
> Created attachment 626060 [details] [diff] [review]
> line break -> 1000 hacker patch
> 
> (In reply to Ludovic Hirlimann [:Usul] from comment #33)
> > xunxun could you make a patch and post the patch to this bug ?
> 
> Sure. But I am not Mozilla developer, this is only a hacker patch. Use for
> enlarge the line break.

You are welcome to become one :-)

The other solution could also be implemented (I'm not sure we want it , let's ask david)
the patch is in all non-mailnews code, so you'd need a gecko module owner to look at it...but mailnews code does have control over when we use base 64, and we do have several conditions which trigger base64 encoding. Perhaps it would be easy to add this case to that.
I've seen a similar issue in other bug. maybe bug 553526.
I suggest that some one can create a patch that trigger base64 by default on CJK text.
(In reply to David :Bienvenu from comment #36)
> but mailnews code does have control over when we use base 64,
> and we do have several conditions which trigger base64 encoding. Perhaps it
> would be easy to add this case to that.

Hope the feature is implemented soon.
(In reply to xunxun from comment #34)
> The best choice should be that tb use base64 to send and receive mails.

I don't want Tb to use base64 for text indiscreetly.

At least without any feature to decode base64 in the source within Tb. My experience may not reflect that of the most typical users, but I often view the source of a message I'm curious about, only to be frustrated by the unintelligible base64-encoded text (often from Gmail).

Furthermore, base64 is meant for binary, and its use increases the body size by 1/3, and should be limited to cases where most of the bytes are not printable characters, or others where it's absolutely necessary. If soft line breaks are necessary, isn't QP generally a better choice? (I understand it may not always be better.)
After reading " https://wiki.mozilla.org/Thunderbird/Proposal:_New_Release_and_Governance_Model ", I don't know whether this bug will be fixed. Sighed !!
Do you need an Asian Thunderbird and/or OS version to see this, or is it also reproducible on American versions of Windows and Thunderbird?
(In reply to Kent James (:rkent) from comment #42)
> Do you need an Asian Thunderbird and/or OS version to see this, or is it
> also reproducible on American versions of Windows and Thunderbird?

Chinese OS (Win7) + EN-US Thunderbird can reproduce it.

I don't know whether English OS + EN-US TB has the issue.
(In reply to Kent James (:rkent) from comment #42)
> Do you need an Asian Thunderbird and/or OS version to see this, or is it
> also reproducible on American versions of Windows and Thunderbird?

I used EN-GB Windows (and Linux) & Thunderbird, and I could see it.

I think you can see it on all platforms and versions.
:hiro, have you considered taking on this bug? It has received a lot of votes in a short period of time.
I thought mkato is a proper person, but I will try.
:hiro that would be great! I really hate to see us ignoring this when it seems to be important to asian users.
-
Depends on: 26767
Depends on: 26734
No longer depends on: 26767
Could the approach from comment 36 be taken with this bug to avoid the dependence on bug 26734?
(In reply to Kent James (:rkent) from comment #49)
> Could the approach from comment 36 be taken with this bug to avoid the
> dependence on bug 26734?

I guess so. It will be a workaround for now but I think implementing CJKTextSerializer is the right thing to fix this issue.
(In reply to Kent James (:rkent) from comment #49)
> Could the approach from comment 36 be taken with this bug to avoid the
> dependence on bug 26734?

For this issue, we need support delsp=yes for plain text mail.  (I have already landed this support for nsIDocumentEncoder)

And GetBodyFromEditor sets wrapped HTML because we uses nsIDocumentEncoder::OutputFormatted.  We should use OutputRaw instead.
Attached patch Partially fix (obsolete) — Splinter Review
I'd suggest use of nsIDocumentEncoder::OutputRaw in any way.

This patch fixes only if the mail is a multipart/alternative HTML mail but is necessary for both of the approaches, I think.
Attachment #646005 - Flags: review?(mozilla)
Attachment #646005 - Flags: review?(m_kato)
I'm happy to test if someone can build a new Windows TB with the patch above. Thank you.
This patch is the approach suggested by David in comment 36.

If the message is a plain text message in multibyte composed by html composing window.

I hope the case of plain text composing window will be solved in bug 553526 or others.
Attachment #646034 - Flags: review?(mozilla)
Comment on attachment 646034 [details] [diff] [review]
Force to use base64 for plain text message in multi byte

Ooops! sorry the logic in NeedsConvetionToPlainText seems wrong..
Attachment #646034 - Attachment is obsolete: true
Attachment #646034 - Flags: review?(mozilla)
Comment on attachment 646005 [details] [diff] [review]
Partially fix

Review of attachment 646005 [details] [diff] [review]:
-----------------------------------------------------------------

SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().

According to commnet of EnsureLineBreaks(), we have to set wrap per 1000 bytes (for NNTP?).  So we may use BASE64 for HTML.

Also, Outlook uses quoted-printable for HTML to avoid this.
Attachment #646005 - Flags: review?(m_kato) → review-
(In reply to Makoto Kato from comment #56)
> Comment on attachment 646005 [details] [diff] [review]
> Partially fix
> 
> Review of attachment 646005 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().

Well, I am sorry I can not understand what you are saying...
I do not think the patch fixes the draft saving case. 

You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all? Or OutputRaw will cause another issue in the case of draft?
(In reply to Hiroyuki Ikezoe (:hiro) from comment #57)
> (In reply to Makoto Kato from comment #56)
> > Comment on attachment 646005 [details] [diff] [review]
> > Partially fix
> > 
> > Review of attachment 646005 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> > Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().
> 
> Well, I am sorry I can not understand what you are saying...
> I do not think the patch fixes the draft saving case. 

When saving to Draft by [Save this message], SnarfAndCopyBody is called.  But this function sets wrap per LINE_BREAK_MAX.

- Step
1. Open Compose Window
2. set body あx2000 characters
3. Save to Draft by [Save]
4. Reopen this mail on Draft

- Result
character on body is corrupted due to SnarfAndCopyBody().


> You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all?
> Or OutputRaw will cause another issue in the case of draft?

No.  draft issue is SnarfAndCopyBody().

OutputRaw doesn't set wrap.  So a line of HTML body may be over to 1000 bytes.  To avoid this for old compatibility?, we should use BASE64 for HTML.
(In reply to Makoto Kato from comment #58)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #57)
> > (In reply to Makoto Kato from comment #56)
> > > Comment on attachment 646005 [details] [diff] [review]
> > > Partially fix
> > > 
> > > Review of attachment 646005 [details] [diff] [review]:
> > > -----------------------------------------------------------------
> > > 
> > > SnarfAndCopyBody() will set wrap per LINE_BREAK_MAX when saving mail to
> > > Draft.  So we should not call EnsureLineBreaks() on SnarfAndCopyBody().
> > 
> > Well, I am sorry I can not understand what you are saying...
> > I do not think the patch fixes the draft saving case. 
> 
> When saving to Draft by [Save this message], SnarfAndCopyBody is called. 
> But this function sets wrap per LINE_BREAK_MAX.
> 
> - Step
> 1. Open Compose Window
> 2. set body あx2000 characters
> 3. Save to Draft by [Save]
> 4. Reopen this mail on Draft
> 
> - Result
> character on body is corrupted due to SnarfAndCopyBody().

Thanks, that is what I wanted know, I mean regression.

> > You mean the fix (Using nsIDocumentEncoder::OutputRaw) is not needed at all?
> > Or OutputRaw will cause another issue in the case of draft?
> 
> No.  draft issue is SnarfAndCopyBody().
> 
> OutputRaw doesn't set wrap.  So a line of HTML body may be over to 1000
> bytes.  To avoid this for old compatibility?, we should use BASE64 for HTML.

To resolve bug 553526 we should use base64 for plain text in multibyte either?
Need feedbacks from expters.
Attachment #646005 - Attachment is obsolete: true
Attachment #646005 - Flags: review?(mozilla)
Attachment #646059 - Flags: feedback?(mozilla)
Attachment #646059 - Flags: feedback?(m_kato)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #59)

> To resolve bug 553526 we should use base64 for plain text in multibyte
> either?

We should not use base64 for plain text mail if it isn't attachment file / multi-part.  We can fix this for text mail by format=flowed and delsp=yes (bug 26734).
Comment on attachment 646059 [details] [diff] [review]
Force to use base64 for plain text message in multi byte and html mail

Review of attachment 646059 [details] [diff] [review]:
-----------------------------------------------------------------

Even if charset is us-ascii, DocumentEncoder may output character entity (&#x1234;).  EnsureLineBreaks() can break character entity if it is between 998 and 999, so You should not use EnsureLineBreaks() for text/html and should remove this.

Also, when plain text and not multi-part, use format=followed like comment #61.
Attachment #646059 - Flags: feedback?(m_kato) → feedback-
(In reply to Makoto Kato from comment #62)
> Comment on attachment 646059 [details] [diff] [review]
> Force to use base64 for plain text message in multi byte and html mail
> 
> Review of attachment 646059 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Even if charset is us-ascii, DocumentEncoder may output character entity
> (&#x1234;).  EnsureLineBreaks() can break character entity if it is between
> 998 and 999, so You should not use EnsureLineBreaks() for text/html and
> should remove this.

Yes, I've noticed it. Do you know where the code the thing is done?
Ooops! Wait! I just noticed mkato have been restarting to fix bug 26734. 

The codes is overlapping my code. I stop to write the fix for this issue for now.
For the record, I attach the current WIP patch.

I think this patch works fine in most HTML mail cases.
This patch also solves us-ascii HTML issue mentioned in comment 62.

I will rework after the patch for bug 26734 landing if there is still work to do for this issue.
Attachment #646059 - Attachment is obsolete: true
Attachment #646059 - Flags: feedback?(mozilla)
I will discuss this with Ikezoe-san tomorrow.
The last test has a garbage...
Attachment #646928 - Attachment is obsolete: true
Attachment #646934 - Attachment is obsolete: true
Summary: CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail, due to wrap produced by editor.htmlWrapColumn(which looks 76 or 78 unicode chars reardless of the setting), mailnews.wraplength, and line length limitation of 1000bytes of SMTP → CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail, due to wrap produced by editor.htmlWrapColumn(which looks 76 or 78 unicode chars regardless of the setting), mailnews.wraplength, and line length limitation of 1000bytes of SMTP
As for text/html part and "inserted CRLF for folding + a space for folding + spaces for HTML source indention", "additional space betwenn two CJK chars in HTML rendering" is already resolved by 135323(fixed on 2002-07-08).
However, problem of bug 156369 stil exists(CJK<span>[CRLF]CJK like one) in text/html.
Needless to say, problem of "wrap in attribute value of a HTML tag" exists in text/html part.

These are not directly related to text/plain part, but it may cause this bug in text/plain part, if text/plain part is generated by html2textconverter.
So, if HTML mail is sent in text/plain only due to automatic downgrade to text mail by Option/Format/Auto Detect, this bug always occurs at mail recipient side.

Ikezoe san, Kato san, will this worst case be resolved by fix for this bug?
I suppose the worst case is a delsp=yes case. So the case will be solved by the fix for that bug.
(In reply to WADA from comment #70)
> As for text/html part and "inserted CRLF for folding + a space for folding +
> spaces for HTML source indention", "additional space betwenn two CJK chars
> in HTML rendering" is already resolved by 135323(fixed on 2002-07-08).
> However, problem of bug 156369 stil exists(CJK<span>[CRLF]CJK like one) in
> text/html.
> Needless to say, problem of "wrap in attribute value of a HTML tag" exists
> in text/html part.

If we don't use formatted flag for document encoder, html isn't wrapped.  And, if we use base64 for text/html, it is unnecessary to take care wrap for HTML part.

Then, text/plain is generated by this HTML.  follow and delsp flag can handle wrap well.

So we can ignore bug 156369 when using raw flag for document encoder and base64.

mailnews should not handle wrap because document encoder in Gecko can handle wrap.  mailnews will break html structure even if current code.


> These are not directly related to text/plain part, but it may cause this bug
> in text/plain part, if text/plain part is generated by html2textconverter.
> So, if HTML mail is sent in text/plain only due to automatic downgrade to
> text mail by Option/Format/Auto Detect, this bug always occurs at mail
> recipient side.

Although I test some cases, I cannot reproduce this issue.  It should be filed to Gecko if possible.  I don't know this issue for plain text serializer.
(In reply to Makoto Kato from comment #72)
> > So, if HTML mail is sent in text/plain only due to automatic downgrade to
> > text mail by Option/Format/Auto Detect, this bug always occurs at mail
> > recipient side.
> Although I test some cases, I cannot reproduce this issue.  It should be
> filed to Gecko if possible.  I don't know this issue for plain text
> serializer.

text/plain part of attached mail data by me is generated by following.
(1) Compose a mail in HTML mode.
(2) Send Later. text/plain part and/or text/plain part is sent by one of next;
    Option/Format : (2-1) Rich Text(HTML) and Plain Text
                    (2-2) Rich Text(HTML) Only
                    (2-3) Auto-Detect
(2-1) Rich Text(HTML) and Plain Text :
text/plain part and text/html part are embed in mulripart/alternative.
data of text/plain part is generated from HTML data by Tb.
(2-2) Rich Text(HTML) Only
text/html is sent.
(2-3) Auto-Detect :
If not all mail recipients don't have preference of HTML mail in Address Book,
(i.e. one of recipients has preference of "Text mail" or "Unknown" or not defined in Address Book)
and if HTML  doesn't have sufficient formatting which requirs HTML(e.g. text only HTML mail),
Auto Detect automatically/silently downgrades to text/plain mail, and sends text/plain mail even though HTML mode composition.
In this case, mail data is same as text/plain part of (2-1).
This is known issue, but is currently by design/implementation of Auto-Detect.

If (2-1) and recipient uses View/Message Body As/Plain Text, Tb shows text/plain part under multipart/alternative, so problem of this bug is exposed to mail recipient.
If (2-2) and recipient uses View/Message Body As/Plain Text, Tb shows text data converted from data in text/html, then problem of this bug is exposed to mail recipient.
If (2-3), there is no text/html part, text/plain data only. So, this bug is always exposed to mail recipient.

"Sending text/plain part with delsp=yes of (2-1)" can't resolve this bug's problem in (2-2), unless "HTML to Text conversion of (2-1)/(2-3) upon coposition" and  "HTML to Text conversion of (2-2) upon mail display" are absolutely same.

Question by me was;
- Will this bug in text/plain part of (2-1) and (2-3) be resolved?
- Will this bug in (2-2) be resolved?

You didn't see this bug in (2-1) and/or in (2-3) in your test?
You didn't see this bug in (2-2) in your test?

Because wraping nor "indention of HTML source by space for readability of HTML souce" will not happen in text/html part data after fix of this bug, problem will not occur in both (2-1)/(2-3) and (2-2)?
Attached patch Possible fix (obsolete) — Splinter Review
While I was writing test codes, I could also write a fix for this issue.

This patch does encode html message with base64.
Assignee: nobody → hiikezoe
Attachment #646075 - Attachment is obsolete: true
Attached patch xpcshell tests (obsolete) — Splinter Review
Attachment #646943 - Attachment is obsolete: true
(In reply to WADA from comment #73)
> text/plain part of attached mail data by me is generated by following.
> (1) Compose a mail in HTML mode.
> (2) Send Later. text/plain part and/or text/plain part is sent by one of
> next;
>     Option/Format : (2-1) Rich Text(HTML) and Plain Text
>                     (2-2) Rich Text(HTML) Only
>                     (2-3) Auto-Detect
> (2-1) Rich Text(HTML) and Plain Text :
> text/plain part and text/html part are embed in mulripart/alternative.
> data of text/plain part is generated from HTML data by Tb.
> (2-2) Rich Text(HTML) Only
> text/html is sent.
> (2-3) Auto-Detect :

attachment 648251 [details] [diff] [review] fixes (2-2) and html part of (2-1). I mean all html message is encoded with base64 (without extra spaces). The attachment also fixes (2-3) if the message has html part.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #78)
> try server result:
> https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e69fe1984c9

It means attachment 648251 [details] [diff] [review] fix the issue on Windows?
Hello. I tried the binary in comment 78 by Hiroyuki Ikezoe on Windows 7 and Ubuntu Linux

thunderbird-17.0a1.en-US.win32.installer.exe
SHA1: 4a4ef9703bf974c7950384dfbe46a0a4ebd6a86e
OS: Windows 7 32-bit, Traditional Chinese (Taiwan) edition

thunderbird-17.0a1.en-US.linux-x86_64.tar.bz2
SHA1: 53a04899d209a74519a029efdfe9260d724f143a
OS: Ubuntu 12.04 64-bit, environment variable LANG=en_US.UTF-8

In my test, the test result has no difference on Windows and Linux (not sure for grammar, but I mean test_result_on_windows == test_result_on_linux)

1. Install(windows) or extract(Linux) Thunderbird
2. Run Thunderbird - Windows: double click desktop shortcut, Linux: run ./thunderbird in terminal/console
3. setup email account
4. Compose a message. 200,60,60,60,200 traditional Chinese character in each line, filled with with word '測試' (this word contains 2 Chinese characters), and send it

Result: There are spaces every 36 Chinese character, every lines was affected.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

5. in config editor, set mail.wrap_long_lines = false (default is true)
6. Compose a message. 200,60,60,60,200 traditional Chinese character in each line

Result: There are spaces every 36 Chinese character, every lines was affected.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

7. in config editor, Set mailnews.wraplength = 1000  (default is 72)
8. Compose a message. 200,60,60,60,200 traditional Chinese character in each line

Result: No extra spaces.
Additional information: I can see Chinese character in mail source, no base64 encoded data.

9. Compose a message, one line with 1002 Chinese character

Result: one space every 495 Chinese character.
Additional information: No Chinese character in mail source, its base64 encoded.

10. 200,400,600,800,1000 Chinese character each line

Result: First and second line is still intact (no extra space), but in 3rd, 4th and 5th line, there are space every 495 Chinese character.
Additional information: No Chinese character in mail source, its base64 encoded.


Thanks
Kao, thanks for the testing.

The binary has an effect only on html message. Did you surely compose those messages on html message editor?
(In reply to xunxun from comment #79)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #78)
> > try server result:
> > https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=7e69fe1984c9
> 
> It means attachment 648251 [details] [diff] [review] fix the issue on
> Windows?

It means that the tests for attachment 645281 [details] [diff] [review] (i.e. attachment 648252 [details] [diff] [review]) passed on all platforms.
Sorry I didn't change the Options/Format from Auto-Detect to 'Rich Text (HTML) Only' when composing mail.
Testing in progress.
Hello. I tried the binary in comment 78 by Hiroyuki Ikezoe on Windows 7 and Ubuntu Linux, and set format to HTML
I did NOT reuse the test environment, I reverted the virtual machine image, so this does not contain changes in config editor in previous tests.

thunderbird-17.0a1.en-US.win32.installer.exe
SHA1: 4a4ef9703bf974c7950384dfbe46a0a4ebd6a86e
OS: Windows 7 32-bit, Traditional Chinese (Taiwan) edition

thunderbird-17.0a1.en-US.linux-x86_64.tar.bz2
SHA1: 53a04899d209a74519a029efdfe9260d724f143a
OS: Ubuntu 12.04 64-bit, environment variable LANG=en_US.UTF-8

In my test, the test result has no difference on Windows and Linux (test_result_on_windows == test_result_on_linux)

1. Install(windows) or extract(Linux) Thunderbird
2. Run Thunderbird - Windows: double click desktop shortcut, Linux: run ./thunderbird in terminal/console
3. setup email account
4. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. 200,60,60,60,200 traditional Chinese character in each line, filled with with word '測試' (this word contains 2 Chinese characters), and send it

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

5. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. one line with 1002 Chinese character

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

6. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. 200,400,600,800,1000 Chinese character each line

Result: No extra spaces
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

7. Generate a string with following shell script
#!/bin/bash
for (( t1=0;t1<50;t1++ )); do
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n " " # one space
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "  " # two spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "          " # 10 spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
    echo -n "                    " # 20 spaces
    echo -n "測試測試測試測試測試測試測試測試測試測試"
done
echo ""

8. Copy the string to Windows and Linux machines
9. Compose a message, set Options/Format from 'Auto-Detect' to 'Rich Text (HTML) Only'. Paste the generated string

Result: No extra spaces, and no missing spaces.
Additional information: In mail source, there are "Content-Type: text/html; charset=UTF-8", Content-Transfer-Encoding: base64"

Thanks.
Kao, thanks for the report. That is the behaviors what I am expecting.

Can you also check the behavior 'Plain and Rich Text' option? If it works correctly, extra spaces will be inserted in plain text part and not in HTML part.
QUOTE:Can you also check the behavior 'Plain and Rich Text' option? If it works correctly, extra spaces will be inserted in plain text part and not in HTML part.

Hi, I can confirm this, when I send such email to myself, I see that "view->message body as->  original/simple html" show the message correctly, but when view the message body as plain text, there are some extra spaces added shown.

I see the message source like below:
This is a multi-part message in MIME format.
--------------090004030605080301040504
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64
........
........


--------------090004030605080301040504
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: base64
.......
.......


I'm testing under Windows XP.

Thanks.

asmwarrior
Thanks for the additional test.

(In reply to asmwarrior from comment #86)
> QUOTE:Can you also check the behavior 'Plain and Rich Text' option? If it
> works correctly, extra spaces will be inserted in plain text part and not in
> HTML part.
> 
> Hi, I can confirm this, when I send such email to myself, I see that
> "view->message body as->  original/simple html" show the message correctly,
> but when view the message body as plain text, there are some extra spaces
> added shown.
> 
> I see the message source like below:
> This is a multi-part message in MIME format.
> --------------090004030605080301040504
> Content-Type: text/plain; charset=UTF-8; format=flowed
> Content-Transfer-Encoding: base64

Unfortunately that is not what I expected. The Content-Transfer-Encoding should be 8-bit in this case.
I will investigate it.
Quick check result with try server build, by HTML composition, with Options/Format=HTML & Text, Options/Character Encoding=iso-2022-jp/utf-8/iso-8859-1(default), a line of 4000*(a Japanese character).

(1) text/plain part. checked with mail.wrap_long_lines=true
(1-1) iso-2022-jp, mailnews.wraplength=72
  Because of iso-2022-jp, format=flowed is prohibited internally,
  so format=flowed was not used.
  Sent in Content-Transfer-Encoding: 7bits
  Excess space was not observed. 
  By mailnews.wraplength=72, wrapped at 72 characters(it was not at 72 bytes).
  It looks "wrap at wraplength chars" instead of "wrap at wraplength bytes".
  Excess space was not observed. 
(1-2) charset=utf-8/iso-2022-jp, mailnews.wraplength=0(==no limit)
  utf-8 : format=flowed, iso-2022-jp : no format=flowed
  Content-Transfer-Encoding: Base64
  By mailnews.wraplength=0 and SMTP limit, "wrap around 990 bytes" was observed.
  For many {3 bytes utf-8 code for a Japanese char}, following was seen.
    N * {3 bytes utf-8 code for a Japanese char} + 0x20 + [CRLF]
  This excess space was not observed in iso-2022-jp case.
  It looks "wrap at character boundary" instead of simple "wrap at 990 bytes".
  Wrap at mid of 3bytes data(utf-8, escape seq of iso-2022-jp) was not observed.
  So, corruption of text data in text/plain was not obseved.
(1-3) charset=iso-8859-1(default)
  Subject: = ascii only, body = Japanese chars only.
  Even though Japanese char is pasted and used, sent in iso-8859-1.
  Content-Type: text/plain; charset=ISO-8859-1; format=flowed
  Content-Transfer-Encoding: quoted-printable
  All text was ascii "?".
  Automatic UTF-8 use is killed?
  Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)

By the way, mailnews.display.show_all_body_parts_menu=true & View/Message Body As/All Body Parts is useful in test of this bug. You can save decoded image of "message body of text mail" or "each sub part under multipart/alternative", because message body or sub part is shown as if attachment by "All Body Parts".
(In reply to WADA from comment #88)
> (1-3) charset=iso-8859-1(default)
>   Subject: = ascii only, body = Japanese chars only.
>   Even though Japanese char is pasted and used, sent in iso-8859-1.
>   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>   Content-Transfer-Encoding: quoted-printable
>   All text was ascii "?".
>   Automatic UTF-8 use is killed?
>   Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)

Because mailnews.send_default_charset is ISO-8859-1. The value is usually set in localized Thunderbird. That built binary is not localized.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #89)
> (In reply to WADA from comment #88)
> > (1-3) charset=iso-8859-1(default)
> >   Subject: = ascii only, body = Japanese chars only.
> >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >   Content-Transfer-Encoding: quoted-printable
> >   All text was ascii "?".
> >   Automatic UTF-8 use is killed?
> >   Affected by System charset? (as Japanese Win-XP, it's Shift_JIS)
> Because mailnews.send_default_charset is ISO-8859-1. The value is usually
> set in localized Thunderbird. That built binary is not localized.

text/hml part was also sent in ISO-8859-1, but as it's HTML, character entity was used, and no problem occurred in text/html part(&#12354; == あ).
> <html><head><meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"></head><body text="#000000" bgcolor="#FFFFFF">&#12354;&#12354;...
If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and text/html & text/plain with charset=utf-8, even when mailnews.send_default_charset is ISO-8859-1 (silently==without asking for utf-8 use).

Whose problem?
- Conversion of character entity in HTML to Text upon composition.
- Automatic change to utf-8 when non ascii character is used in mail.
(In reply to WADA from comment #90)
> If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and
> text/html & text/plain with charset=utf-8, even when
> mailnews.send_default_charset is ISO-8859-1 (silently==without asking for
> utf-8 use).
> 
> Whose problem?
> - Conversion of character entity in HTML to Text upon composition.
> - Automatic change to utf-8 when non ascii character is used in mail.

Though I do not know exactly, it's not related to attachment 648251 [details] [diff] [review].
Problem in text/html part.

When following was entered at HTML mail composition window,
  Note: [Enter] in this context == press Enter key to force line break 
        ... == consecutive same character
> ああ...ああ[Enter][Enter]
> いい...いい[Enter][Enter]
> うう...うう[Enter][Enter]
> -- (and signature text from predefined signature file follows)
generated HTML was following(no New Line until signature indicator of "-- "),
  Note: [LF] = 0x0A, [EOF] = End of file
> <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> xxxx xxxx xxxx</pre></body></html>[EOF]
and, because of long HTML source line, it was sent in base64.
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: base64

Is "no line break or a few [LF] in HTML source" intentional or design?
Because HTML, I think "[LF]([CRLF] is needed?) before HTML tag start" or "[LF]([CRLF] is needed?) after HTML tag end" is better inserted in text/html part, except "after <pre>" and "before </pre>".
Is it possible and easy?
(In reply to WADA from comment #92)
> Problem in text/html part.
> 
> When following was entered at HTML mail composition window,
>   Note: [Enter] in this context == press Enter key to force line break 
>         ... == consecutive same character
> > ああ...ああ[Enter][Enter]
> > いい...いい[Enter][Enter]
> > うう...うう[Enter][Enter]
> > -- (and signature text from predefined signature file follows)
> generated HTML was following(no New Line until signature indicator of "-- "),
>   Note: [LF] = 0x0A, [EOF] = End of file
> > <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> > xxxx xxxx xxxx</pre></body></html>[EOF]
> and, because of long HTML source line, it was sent in base64.
> > Content-Type: text/html; charset=UTF-8
> > Content-Transfer-Encoding: base64
> 
> Is "no line break or a few [LF] in HTML source" intentional or design?

'no line break' is intentional, but LF in signature is not intentional.

> Because HTML, I think "[LF]([CRLF] is needed?) before HTML tag start" or
> "[LF]([CRLF] is needed?) after HTML tag end" is better inserted in text/html
> part, except "after <pre>" and "before </pre>".
> Is it possible and easy?

It's possible but it's not so easy.
If we take the approach that HTML message is base64-encoded, the HTML message will have no line-feed. 
I suppose you have to wait for bug 26734 if you need line-feed in HTML message.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #91)
> (In reply to WADA from comment #90)
> > If non-ascii is used in Subject, Tb silently sent utf-8 encoded Subject: and
> > text/html & text/plain with charset=utf-8, even when
> > mailnews.send_default_charset is ISO-8859-1 (silently==without asking for
> > utf-8 use).
> > Whose problem?
> > - Conversion of character entity in HTML to Text upon composition.
> > - Automatic change to utf-8 when non ascii character is used in mail.
> Though I do not know exactly, it's not related to attachment 648251 [details] [diff] [review]

(1) This problem was observed in Tb 14.0 & Tb trunk 2012/7/18 build wirh Options/Format/HTML and Text and mailnews.wraplength=0.
=> Existent regression.
(2) Problem of "?" in text/plain was not observed with Options/Format/Auto-Detect and recipient's preference=Plain Text(Auto-Detect downgrades to text/plain).
=> text/plain(mail) by Auto-Detect was different from text/plain(subpart of multipart/alternative) by Options/Format/HTML and Text. It's perhaps similar to text mode composition.

Sorry for my confusion.
(In reply to WADA from comment #88)

> (1-3) charset=iso-8859-1(default)
>   Subject: = ascii only, body = Japanese chars only.
>   Even though Japanese char is pasted and used, sent in iso-8859-1.
>   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>   Content-Transfer-Encoding: quoted-printable
>   All text was ascii "?".

This issue can not be reproduced on my local linux.

Please attach the problematic message body text here.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #93)
> (In reply to WADA from comment #92)
> > Problem in text/html part.
> > 
> > When following was entered at HTML mail composition window,
> >   Note: [Enter] in this context == press Enter key to force line break 
> >         ... == consecutive same character
> > > ああ...ああ[Enter][Enter]
> > > いい...いい[Enter][Enter]
> > > うう...うう[Enter][Enter]
> > > -- (and signature text from predefined signature file follows)
> > generated HTML was following(no New Line until signature indicator of "-- "),
> >   Note: [LF] = 0x0A, [EOF] = End of file
> > > <html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">ああ...ああ<br><br>いい...いい<br><br>うう...うう<br><br><pre class="moz-signature" cols="800">-- [LF]
> > > xxxx xxxx xxxx</pre></body></html>[EOF]
> > and, because of long HTML source line, it was sent in base64.
> > > Content-Type: text/html; charset=UTF-8
> > > Content-Transfer-Encoding: base64
> > 
> > Is "no line break or a few [LF] in HTML source" intentional or design?
> 
> 'no line break' is intentional, but LF in signature is not intentional.

I was wrong. The LF is intentional (but not mine) because the signature is enclosed by 'pre' so LF is needed after '--'.

Anyway, I'd consider about the LF in signature after this bug is closed.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #95)
> (In reply to WADA from comment #88)
> > (1-3) charset=iso-8859-1(default)
> >   Subject: = ascii only, body = Japanese chars only.
> >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >   Content-Transfer-Encoding: quoted-printable
> >   All text was ascii "?".
> This issue can not be reproduced on my local linux.
> Please attach the problematic message body text here.

Conditions are;
  mail.wrap_long_lines=true
  mailnews.wraplength=0
  Composing charset=iso-8859-1(default, mailnews.send_default_charset=ISO-8859-1)
  Options/Format = HTML and Text
  Subject: ascii-subject
  message body text = 2000 * あ (Shift_JIS=0x82A0, utf-8=0xE3 0x81 0x82, U+3042)
  Japanese MS Windows(system charset=Shift_JIS). This may be relevant.
(In reply to WADA from comment #97)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #95)
> > (In reply to WADA from comment #88)
> > > (1-3) charset=iso-8859-1(default)
> > >   Subject: = ascii only, body = Japanese chars only.
> > >   Even though Japanese char is pasted and used, sent in iso-8859-1.
> > >   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >   Content-Transfer-Encoding: quoted-printable
> > >   All text was ascii "?".
> > This issue can not be reproduced on my local linux.
> > Please attach the problematic message body text here.
> 
> Conditions are;
>   mail.wrap_long_lines=true
>   mailnews.wraplength=0
>   Composing charset=iso-8859-1(default,
> mailnews.send_default_charset=ISO-8859-1)
>   Options/Format = HTML and Text
>   Subject: ascii-subject
>   message body text = 2000 * あ (Shift_JIS=0x82A0, utf-8=0xE3 0x81 0x82,
> U+3042)
>   Japanese MS Windows(system charset=Shift_JIS). This may be relevant.

Thanks, I can see the issue on my local, but the issue can be also seen without attachment 648251 [details] [diff] [review].
Additional quick check results.
(A) Behavior on text/plain part of multipart/alternative(HTML mode composition, Options/Format=HTML and Text) depended on Japanese character type.
  wraplength=160, text/plain part, utf-8
  4000 * あ : Sent in Content-Transfer-Encodin: 8bits(non base64, i.e. wrapped)
  4000 * 1 : Sent in Content-Transfer-Encodin: base64.
This is probably due to different category of character in unicode.
  あ : Hiragana
  1 : Full-width roman characters and half-width katakana
If Full-width roman characters, treatment looks similar to english characters. It's perhaps because wrap at mid of 1234567890 is better avoided.
(B) If text mode composition, Bug 355209 was still observed.
    - wrap at mid of 3 bytes utf-8 code
    - wrap without care for escape sequence of iso-2022-jp
This kind of corrupton is not observed in text/plain part by HTML mode compoition. It may be difference between;
  - wrap in text mode composition is wrap at wraplength bytes
  - wrap in text/plain part is wrap at wraplength unicode characters
Has the attachment 648251 [details] [diff] [review] completed the prerequisite test?

Should we also need the wraplength option?
FYI, the wraplength option is meaningless in html composition mode with attachment 648251 [details] [diff] [review] because the attachment always encodes the html message with base64.
(In reply to WADA from comment #99)
> Additional quick check results.
> (A) Behavior on text/plain part of multipart/alternative(HTML mode
> composition, Options/Format=HTML and Text) depended on Japanese character
> type.
>   wraplength=160, text/plain part, utf-8
>   4000 * あ : Sent in Content-Transfer-Encodin: 8bits(non base64, i.e.
> wrapped)
>   4000 * 1 : Sent in Content-Transfer-Encodin: base64.

Thanks. I've finally reproduced this issue on my local machine, but I can also see this issue without attachment 648251 [details] [diff] [review].

> (B) If text mode composition, Bug 355209 was still observed.
>     - wrap at mid of 3 bytes utf-8 code
>     - wrap without care for escape sequence of iso-2022-jp
> This kind of corrupton is not observed in text/plain part by HTML mode
> compoition. It may be difference between;
>   - wrap in text mode composition is wrap at wraplength bytes
>   - wrap in text/plain part is wrap at wraplength unicode characters

The wrap length issue on text mode is for bug 26734.
(In reply to xunxun from comment #100)
> Has the attachment 648251 [details] [diff] [review] completed the
> prerequisite test?

Yes. Now I suppose attachment 648251 [details] [diff] [review] has no regression.
Attachment #648251 - Flags: review?(mozilla)
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Stealing review; hopefully I'll get to it by this weekend.
Attachment #648251 - Flags: review?(mozilla) → review?(squibblyflabbetydoo)
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Review of attachment 648251 [details] [diff] [review]:
-----------------------------------------------------------------

Just a quick pass before I review this properly...

::: mailnews/compose/src/nsMsgSend.cpp
@@ -1751,5 @@
> -  //
> -  // XXX TODO
> -  // march backwards and determine the "best" place for the linebreak
> -  // for example, we don't want <a hrLINEBREAKref=""> or <bLINEBREAKr>
> -  // or "MississLINEBREAKippi"

Why did you remove these comments? Do they no longer apply?

@@ -1791,4 @@
>    }
> -  else {
> -     // body did not require any additional linebreaks, so just use it
> -     // body will not have any null bytes, so we can use PL_strdup

As above, I think we should keep this comment (with appropriate modifications).
With the above changes fixed, and a try server run with the attached tests passing, I think this looks ok.
This is a serious problem for CJK users. 
Please kindly patch this to Thunderbird as soon as possible.

Thanks!
Comment on attachment 648251 [details] [diff] [review]
Possible fix

Clearing out review on this until I see a passing try server run (mostly so Bugzilla stops mailing me).
Attachment #648251 - Flags: review?(squibblyflabbetydoo)
I'm confused about the status of this, and what is needed to move it forward. Are we waiting for someone to push a try server run of https://bugzilla.mozilla.org/attachment.cgi?id=648251?

The existing patch has bit-rotted.
Firefox has the same problem, see bug 535485.
I might have a different thinking here, but I've been using Thunderbird with mailnews.wraplength set to 0 for a year now, with no problems. What about simply setting the default value of mailnews.wraplength to 0 for CJK locales? (and any other language that don't use spaces)

I sometimes send email in French or English with no problems either under that setting. What is the use of wrapping text anyway?
I just changed this attribute but the wrapping spaces are still being generated. 
BTW, why is it so hard to pass regression and apply this patch.
Its an notorious bug known for cjk users along the versions of thunder bird.
Please kindly apply it.
I still don't see it in the 20.0 nightly build.

Thanks! 

(In reply to Yukinoroh from comment #112)
> I might have a different thinking here, but I've been using Thunderbird with
> mailnews.wraplength set to 0 for a year now, with no problems. What about
> simply setting the default value of mailnews.wraplength to 0 for CJK
> locales? (and any other language that don't use spaces)
> 
> I sometimes send email in French or English with no problems either under
> that setting. What is the use of wrapping text anyway?
(In reply to huangjs from comment #113)
> I just changed this attribute but the wrapping spaces are still being
> generated. 

Really? What version are you using? Did you try to restart the program? Using 16.0.0 here.
(In reply to Yukinoroh from comment #114)

Oops, I meant 16.0.1
When will this be patched? The bug is so annoying for every CJK user.
See Also: → 156369
Hiro - why haven't you asked for review on these patches?
Blocks: 355209
I have the same issue, really annoying......
Ping, this bug still exists in TB 17.06, any developers can review the patches, and apply to trunk? Thanks.
(In reply to asmwarrior from comment #120)
> Ping, this bug still exists in TB 17.06, any developers can review the
> patches, and apply to trunk? Thanks.

I've reviewed the patches and asked for some changes (and for a try server run so I can see the tests - though I could do this if need be). However, the author of the patch hasn't replied.
Hi, Jim Porter, thanks for the reply, It looks like the author of the patch ( Hiroyuki Ikezoe, who is active nearly one year ago in 2012-08-26), I believe he is left his original company or his email has changed or in other cases he does not receive bug notification. So, Can we just wait for his response? I would suggest that anyone who has the ability (surely you are one of them) can go ahead. I can test the nightly build if such patch (or modified/improved patch) are in trunk. Thanks.
FYI.

Even if this bug occurs in HTML mail composition, if problem of "HTML mail is sent in text/plain" due to following bugs doesn't occur,
  bug 136502, bug 414299, bug 584363,
text/html part is sent by Tb.
If mail is sent in text/html or multipart/alternative{text/plain+text/html}, text/html part is usually used in mail viewing. And, in Tb, quirks like "new line between CJK chars in HTML mail == null" perhaps works in HTML mail display.

i.e. 
This bug is exposed to many users only when problem of "HTML mail is sent in text/plain" occurs at same time.
How to avoid bug 136502, bug 414299, bug 584363.
 - <b></b> in HTML signature file.
 - In address book, set format preference=HTML for any contact,
   except contact to whom TEXT mail should be sent always.
 - Upon each HTML mail send, select format option=HTML or "HTML + Text"
I am sorry for the absence.

(In reply to Jim Porter (:squib) (back Jul 1) from comment #105)
> Comment on attachment 648251 [details] [diff] [review]
> Possible fix
> 
> Review of attachment 648251 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Just a quick pass before I review this properly...
> 
> ::: mailnews/compose/src/nsMsgSend.cpp
> @@ -1751,5 @@
> > -  //
> > -  // XXX TODO
> > -  // march backwards and determine the "best" place for the linebreak
> > -  // for example, we don't want <a hrLINEBREAKref=""> or <bLINEBREAKr>
> > -  // or "MississLINEBREAKippi"
> 
> Why did you remove these comments? Do they no longer apply?

Those HTML cases will never happen because EnsureLineBreaks will never be invoked for HTML message.
'Mississippi' case supposed to be already handled in nsIEditor, but I am not 100% sure. So I will left the 'Mississippi' case.


> @@ -1791,4 @@
> >    }
> > -  else {
> > -     // body did not require any additional linebreaks, so just use it
> > -     // body will not have any null bytes, so we can use PL_strdup
> 
> As above, I think we should keep this comment (with appropriate
> modifications).

I don't think the comment is useful in the patch. In the patch PL_strdup does use both cases.
Attachment #648251 - Attachment is obsolete: true
Attachment #766580 - Flags: review?(squibblyflabbetydoo)
This is basically same as the previous test except the argument of createAndSendMessage.
Attachment #648252 - Attachment is obsolete: true
Attachment #766581 - Flags: review?(squibblyflabbetydoo)
Hi, thanks for the patches, so any testing build is necessary? Or shall we waiting for a reviewer?
Comment on attachment 766580 [details] [diff] [review]
Encode HTML message with base64 to  avoid extra spaces in CJK text

Review of attachment 766580 [details] [diff] [review]:
-----------------------------------------------------------------

> +nsMsgAttachmentHandler::NeedsConvertionToPlainText()
I think NeedsConversionToPlainText() or Needs{To)ConvertToPlainText() is better.
FYI.
Bug 355209 occurs even by HTML composition, even in text/html part, if <pre> is used and "lo---ng text without line break" is typed or pasted.
 .
Because of <pre>, "Wrap at 80 unicode characters by HTML editor" doesn't occur. So, split by SMTP line length limit(==split by LINE_BREAK_MAX) occurs even in text/html part of multipart/alternative or text/html mail.
If <pre> is used, it's same as "mailnews.wraplength=0 in Text mode composition".
FYI.
LINE_BREAK_MAX was changed from "#define LINE_BREAK_MAX 990" to following by bug 684508(landed on Tb 10).
  #define LINE_BREAK_MAX (1000 - MSG_LINEBREAK_LEN)
putting referred bugs in "See also:" filed.
See Also: → 26734, 253830, 262475, 355209, 553526
See Also: 26734
See Also: 355209
See Also: → 584363
(In reply to asmwarrior from comment #127)
> Hi, thanks for the patches, so any testing build is necessary? Or shall we
> waiting for a reviewer?

I use the patch to build thunderbird 24.0

https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/
(In reply to xunxun from comment #132)
> (In reply to asmwarrior from comment #127)
> > Hi, thanks for the patches, so any testing build is necessary? Or shall we
> > waiting for a reviewer?
> 
> I use the patch to build thunderbird 24.0
> 
> https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/
> x86/

At present, I can't access Sourceforge, so I upload bak to http://pan.baidu.com/share/link?shareid=2253681088&uk=2365780601#dir/path=%2F%E6%88%91%E7%9A%84%E8%BD%AF%E4%BB%B6%2FpcxFirefox%E5%A4%87%E4%BB%BD
FYI: when you open the above link, xunxun's release was under subfolder Release/thunderbird/24.x, this release fix a tiny bug of the previous one in https://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/. Once sourceforge can be accessed from China mainland, the binary under SF will be updated.
Comment on attachment 766580 [details] [diff] [review]
Encode HTML message with base64 to  avoid extra spaces in CJK text

Review of attachment 766580 [details] [diff] [review]:
-----------------------------------------------------------------

I've taken a look at this, and I'm not sure this is actually how we want to do things. Surely there's a way to have really long lines with no spaces without resorting to base64-encoding everything. I'm not sure what that way is though, since I'm not very knowledgeable about MIME message bodies (most of my knowledge is in MIME headers).

I'm clearing out review for now, but I definitely agree that we need to do *something* here. Jcranmer might be a good person to look at this, since he knows a lot more about MIME than I do.

Sorry about taking so long on this! I've been awfully busy, and didn't really know what to do with this review. :(
Attachment #766580 - Flags: review?(squibblyflabbetydoo)
Attachment #766581 - Flags: review?(squibblyflabbetydoo)
joshua, comment 135
Flags: needinfo?(Pidgeot18)
(In reply to Jim Porter (:squib) from comment #135)
> I've taken a look at this, and I'm not sure this is actually how we want to
> do things. Surely there's a way to have really long lines with no spaces
> without resorting to base64-encoding everything. I'm not sure what that way
> is though, since I'm not very knowledgeable about MIME message bodies (most
> of my knowledge is in MIME headers).

Hahahahahahahahaha. The options for message bodies are 8bit (no NUL, bare CR/LF, max-998 octet lines), 7bit (above + no characters above 0x7F), QP, base64, or binary. And the mail system doesn't support binary.

The options are:
1. Violate the standards, send 8bit with arbitrarily long lines, and hope the mail system tolerates it.
2. Violate the standards, send quoted-printable, but don't escape characters above 0x7F, and hope the mail system tolerates it.
3. Send QP/base64, according to the shorter encoding format.

[For a standard developed in part to allow internationalization of email, MIME sure blew it]
Flags: needinfo?(Pidgeot18)
(I suspect this is not a duplicate, since no one has suggested what the duplicate might be)

So who (or what group) to decide what our best shot is, relative to comment 137?

And will we depend on bug 169395?
Flags: needinfo?(Pidgeot18)
Whiteboard: dupeme
(In reply to Wayne Mery (:wsmwk) from comment #138)
> (I suspect this is not a duplicate, since no one has suggested what the
> duplicate might be)
> 
> So who (or what group) to decide what our best shot is, relative to comment
> 137?
> 
> And will we depend on bug 169395?

I suppose, in lieu of anyone else, that the decision would fall on me. My recollection of reading too many raw MIME messages is that the de facto answer is to send in base64.

Honestly, though, the current compose logic for this sort of stuff is so messed up and fragile that it's not worth even attempting this bug until I get the low-level MIME assembly sanified, since that will at least guarantee we can muck around with transfer encodings sanely.
Flags: needinfo?(Pidgeot18)
I can't believe that such a serious bug has not been solved for years!

I accidentally checked am email I sent today and found the problem. I can't imagine that how many mails with a RIDICULOUS format like this I have sent!

I do think that I need a new mail client now.
Yes, it is rather annoying. I love Thunderbird and am using it on a daily basis, including loads of emails in Japanese in a professional environment. Fortunately I don't have to look at the emails AFTER I send them ...

I'm sure there are some Japanese users of Thunderbird; can I ask: How do you cope with this issue? Does this problem not occur when you work under a Japanese language environment?
(In reply to homoludens1000@gmail.com from comment #141)
> Yes, it is rather annoying. I love Thunderbird and am using it on a daily
> basis, including loads of emails in Japanese in a professional environment.
> Fortunately I don't have to look at the emails AFTER I send them ...
> 
> I'm sure there are some Japanese users of Thunderbird; can I ask: How do you
> cope with this issue? Does this problem not occur when you work under a
> Japanese language environment?

I couldn't find a way to solve the problem. I've tried to edit the wrapping settings in about:config but it didn't help. So it seems that the only way is to change your email client.

The problem does occur under a full Japanese environment - Japanese Windows, Japanese system encoding, Japanese Thunderbird. But I'm sending emails with UTF-8 because I need Japanese, Chinese and English support.
I'm in a bit of a hurry so don't have time to check the thread to see whether this has been tried already, but a Japanese website I found had the following "solution" (worked apparently, although it's more of a workaround than a solution I think; see here http://soudan1.biglobe.ne.jp/qa7177921.html):

1. Open about:config
2. Create "editor.htmlWrapColumn" and set value to "0" (zero)
3. Create a new email, and set text format to "preformat" and "fixed width".

Let me know if this works, I'll give it a try later tonight.
htmlWrapColumn is not in Thunderbird anymore.

As per comment #139, I will work on this after all of Joshua's mime works have finished.
I don't have enough in-depth knowledge of the technological fundamentals of this issue, but would it be a problem to create htmlWrapColumn? I just did and followed the instructions of comment #143, and it seems to be working fine (?)

I assume there is a fair number of Japanese, and possibly Chinese and Korean users of Thunderbird, how is everyone dealing with this issue?
Reason why this bug occurs is;
(A)  HTML Editor of Tb generates HTML source like next, 
       when pretty long CJK text is typed/pasted without space/new-line at appropriate position as "text in E-mail".
       Note: "Wrap at 72 unicode char in HTML mode composition" is currently hard coded.
                   (As Ikezoe-san sys, htmlWrapColumn is already removed)
                  "Wrap at 72 unicode char in HTML mode composition" is done on HTML SOURCE.
                  mailnews.wraplength=nnn is "wrap at nnn BYTES", and is used in Text mode composition only.
                  "Deliver format=Plain Text in HTML mode composition" !== Text mode composition.
                  "Deliver format=Plain Text in HTML mode composition" == Send "text converted from generated HTML" as text/plain.
> <some 0x20 for indention of HTML source><72 unicode chars #1 in lo---ng text without space>[CRLF] <= inserted by Tb
> <some 0x20 for indention of HTML source><72 unicode chars #2 in lo---ng text without space>[CRLF] <= inserted by Tb
>                                                                                                       |
> <some 0x20 for indention of HTML source><72 unicode chars #N in lo---ng text without space>[CRLF] <= inserted by Tb
>                                                                                                       |
> <some 0x20 for indention of HTML source><last less than 72 unicode chars of the lo---ng text><br>[CRLF] <= inserted by Tb
(B)  In HTML specification, there is no concrete definition about "New line" in text.
       HTML spec's request is;
            Because SBCS world uses space as word delimiter but DBCS world doesn't use such space,
            interpret "New line in text" adequately, please.
       => "Wrap at a length in HTML composition mode in Tb" == "Wrap at 72 Unicode chars in HTML SOURCE"
             == "[CRLF] + some 0x20 + 72 Unicode chars" in HTML SOURCE
             is interpreted as "a 0x20" in almost all situations.

A reason why many complaints are posted.
   Because Tb has great "Automatic Downgrade to Text" feature,
   when simple HTML(for which Text mode composition is usually sufficient) is created by user,
   Tb sends the "mail composed in HTML mode" as text/plain mail with content of "Text converted from HTML source".
Because of  "Text converted from above HTML source", "space by inserted CRLF" is pretty beautifully layout and shown :-)
If sent as text/html or multipart/alternative{text/plain+text/html}, and if mail is viewed as HTML(in Tb, View/Message Body As/Original HTML), "space by inserted CRLF" is not so beautifully layout, and because proportional font is usually used in HTML mail display, width of "space by inserted CRLF" is smaller than "sent as text/plain by Tb" case.

A workaround of "ugly inserted space by inserted New Line" is "Accept Wrap at 72 in HTML mail display too".
(1) If HTML mode composition is not mandatory for you,
      and if you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail".,
      Use "Text mode composition" with mailnews.wraplength=72(default).
      Because Wrapped at 72 bytes in plain text, "ugly space" won't be inserted.
(2) If HTML composition is needed for you.
  - Skip Tb's "Automatic Downgrade to Text"
     (a) bgColor != "#FFFFFF", eg."#FFFFFE", color != "#000000", eg."#000001", <B>&nbsp;</B> in HTML signature, etc.
     (b) Or, Install "Always HTML" addon.
  -  Send "mail composed in HTML mode" as "HTML mail" always.
      - No "prefers Plain Text" contact in any your Address Book, No Plain Text Domain setting.
      - Send Options/Text Format : Other than "send in Plain Text"
             i.e. Ask me, or Send in HTML, or send Plain Text and HTML.
  - In HTML mode composition, avoid HTML source like above,
     when you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail".
     1. Open Text mode composition window of Tb(Shift+Write),
     2. Paste or type pretty long CJK text without space/new-line at appropriate position as "text in E-mail".
     3. Edit/Rewrap(Ctrl+R), Ctrl+A, Ctrl+C
     4. Ctrl+V(paste ) at HTML mode composition window.
         Generated HTML by this action is as follows:
> <some 0x20 for indention of HTML source><72 BYTES in lo---ng text without space><BR>[CRLF] <= inserted by Tb
         Wrap at 72 BYTES is done by Text mode composition, and it's represented as <BR> in HTML source.

As far as you don't want same display as "mailnews.wraplength=0 in Text mode composition" or "mailnews.wraplength=999999 in Text mode composition" in HTML mail composition too when you want to type/paste pretty long CJK text without space/new-line at appropriate position as "text in E-mail",
I believe that displayed result by "wrap at 72 bytes in HTML composition of above workaround" is acceptable.
I believe it's far better than "ugly inserted space by inserted [CRLF]".
Wada, thanks so much for the summary and the workaround, that is EXTREMELY helpful and useful! :-)
I believe fault is in HTML Spec.
HTML Spec should have had a way to force "Newline in text==Null" for CJK world and for free HTML source layouting.
  <html format=Flowed,DelCRLF=Yes>, <div ForceCRLFisNull=true>
   <WWBR> : <WBR> who eats up following NewLine and White Spaces
               AAA<WBR>...<WWBR>some-spaces[CRLF]
               some-spaces QQQ<WBR>...
   In CSS, NewLine : CRLFOnly, LFOnly, CROnly, Any, CRLF_and_LF, CRLF_and_CR, LF_and_CR
                                 IsSpace, IsNull, IsNewLine(space or null is determined after, based on context)
                                 DelSP=Yes/No : Consecutive spaces after newline is eaten up
               <p style="NewLine: CRLFOnly, IsNull, DelSP=yes;">[CRLF]
                     any number of 0x20 for indention + line1[CRLF]
                     any number of 0x20 for indention + lineN[CRLF]
               </p>[CRLF]
   HTML Editor of Tb can freely layout HTML source, with keeping lo---ng text in a <p> without space, newline,
   without breaking line length limit in mail.

format=flowed,DelSp=Yes for text/plain is also possible for exchange of HTML data by E-mail:
    When text/html; format=flowed,DelSP=yes,
       Insert [CRLF] + 1 to N spaces at any place in HTML source.
    Upon interpreting data in text/html; format=flowed,DelSP=yes,
       If sequence of "[CRLF] and some spaces", remove it.
It can be achieved by new Content-Transfer-Encoding : HTML_with_DelSP_Yes, in addition to quoted-printable.
(In reply to homoludens1000@gmail.com from comment #141)
> Yes, it is rather annoying. I love Thunderbird and am using it on a daily
> basis, including loads of emails in Japanese in a professional environment.
> Fortunately I don't have to look at the emails AFTER I send them ...
> 
> I'm sure there are some Japanese users of Thunderbird; can I ask: How do you
> cope with this issue? Does this problem not occur when you work under a
> Japanese language environment?

There is a 3rd custom build of Thunderbird by xunxun1982, which fixes the bug and add some features such as being portable. "xunxun1982" is the author of a popular 3rd custom build of Firefox called pcxFirefox. The build of Thunderbird can be got from http://sourceforge.net/projects/pcxfirefox/files/Release/Thunderbird/24.x/x86/24.4.0/. Regarding the security and privacy issues, FYI, I have work with this build for a year or more and nothing wrong happened.

By the way, the build has four versions, zh-TW, zh-CN, ja and en-US.

Hope it helps. :)
Hi all.

I just started using the latest version of TB (38.2) and noticed that this still hasn't been fixed. That is the HTML code (of the source) gets cut at 72 characters which causes a "blank space" to get inserted in the actual message of my emails.

Take note I use TB in writing emails in Japanese or English, and so far I've noticed this issue when writing in Japanese only.

I realized it's been nearly a 10 months since the last comments on this bug, but after googling around I still can't seem to find a permanent workaround or good solution. I mainly use HTML composed emails and would like to write my messages without having all these random "blank spaces (aka "hankaku" spaces in Japanese) all over the place as it looks bad, especially when writing business emails.

The only temporary workaround I've found is to manually add "editor.htmlWrapColumn" with value=0 into the config editor and whenever I write a HTML composed email I first select "Preformat" from the paragraph list (or use the shortcut Alt+O P F) and then continue to write my email message from there. By doing this I am able to send lengthy sentences/phrases in Japanese without the random "blank spaces" showing up.

Is this bug/issue still being addressed and looked at? I presume that many people who use Japanese environments also encounter this issue so I believe a fix would help a lot of people including myself.

Any update on this would be appreciated. Thank you.
(In reply to KSak from comment #150)
> Hi all.
> 
> I just started using the latest version of TB (38.2) and noticed that this
> still hasn't been fixed. That is the HTML code (of the source) gets cut at
> 72 characters which causes a "blank space" to get inserted in the actual
> message of my emails.
> 
> Take note I use TB in writing emails in Japanese or English, and so far I've
> noticed this issue when writing in Japanese only.
> 
> I realized it's been nearly a 10 months since the last comments on this bug,
> but after googling around I still can't seem to find a permanent workaround
> or good solution. I mainly use HTML composed emails and would like to write
> my messages without having all these random "blank spaces (aka "hankaku"
> spaces in Japanese) all over the place as it looks bad, especially when
> writing business emails.
> 
> The only temporary workaround I've found is to manually add
> "editor.htmlWrapColumn" with value=0 into the config editor and whenever I
> write a HTML composed email I first select "Preformat" from the paragraph
> list (or use the shortcut Alt+O P F) and then continue to write my email
> message from there. By doing this I am able to send lengthy
> sentences/phrases in Japanese without the random "blank spaces" showing up.
> 
> Is this bug/issue still being addressed and looked at? I presume that many
> people who use Japanese environments also encounter this issue so I believe
> a fix would help a lot of people including myself.
> 
> Any update on this would be appreciated. Thank you.

This is a problem which has been existed for years. Mozilla just cares nothing about CJK users. So put up with it, solve it yourself or just switch to another client.

Frankly, Outlook 2013 features perfect CJK compatibility, and although I haven't tried, the new Outlook 2016 should be better (I love open source softwares but I don't stick to them). The settings page of Outlook is a little difficult to use but once you get those settings done, it'll work like a charm.

However, if you're using Linux, you may try Evolution, Geary or something.
"has existed for years"... Wrong typing...
Still have not fixed the bug till the version of 38.2.0, dated 2015.8.19. Really doubt there is someone who have the ability to fix the bug.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #125)
> Created attachment 766580 [details] [diff] [review]
> Encode HTML message with base64 to  avoid extra spaces in CJK text

(In reply to Hiroyuki Ikezoe (:hiro) from comment #126)
> Created attachment 766581 [details] [diff] [review]
> Adapt to createAndSendMessage change
> 
> This is basically same as the previous test except the argument of
> createAndSendMessage.

I downloaded the code just now and didn't find the patches in the default branch.

And according to the last reply by Hiroyuki Ikezoe (:hiro):

(In reply to Hiroyuki Ikezoe (:hiro) from comment #144)
> htmlWrapColumn is not in Thunderbird anymore.
> 
> As per comment #139, I will work on this after all of Joshua's mime works
> have finished.

However, it's been more than half a year since the reply.
(In reply to Frederick888 from comment #151)
> ...
> This is a problem which has been existed for years. Mozilla just cares
> nothing about CJK users. So put up with it, solve it yourself or just switch
> to another client.
(In reply to herbs from comment #153)
> Still have not fixed the bug till the version of 38.2.0, dated 2015.8.19.
> Really doubt there is someone who have the ability to fix the bug.

To bring some clarification to these concerns:

* Thunderbird is a community project, developed by volunteers on their own time. Mozilla isn't involved.
* CJK is a very important area, both in terms of users (Japan ranks second in terms of number of users, ahead of the USA and second only to Germany[1]) and in terms of development resources ...
** a major CJK issue got tremendous attention last fall while developing version 31
** this bug is getting attention, but you'e not seeing it because the work is happening in other bugs [2]
* unfortunately early this year we lost a key Japanese volunteer developer

The current volunteers are making progress, but it is over a long period of time, affected by...

* limited time (for example the person with expertise to work on comment 139 is in grad school) 
* the very sad truth that despite Japan's very high number of users there are (to the best of my knowledge) only two very active volunteer developers from Japan - whose focus are not in the area needed by this bug (one is in NSPR, backend and build issues, and the other is focused on IO issues)

So Japan is very POORLY represented, just two, and is a major reason why CJK issues generally make slow or no progress. Actually we could use more coders and volunteers of any type.  Perhaps this can be corrected by a call to action driven through an organized effort of publicity and other means, by users within the country.


[1] https://blog.mozilla.org/thunderbird/2015/02/thunderbird-usage-continues-to-grow/
[2] see the whiteboard and comment 139 for the current status
Assignee: hiikezoe → nobody
Whiteboard: [status: waiting on results of blocking bugs and jsmime per comment 139]
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #155)

Thanks for your explanation. It seems that I misunderstood the relationship between Thunderbird and Mozilla to some extent.

Anyway, I'm looking forward to the day when the problem can be solved. Thanks in advance.
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
OK, this bug needs fixing. There were a few patches proposed and I will try to get one of them landed.

Let's ALL read Joshua's comment #137 ...
The options are:
1. Violate the standards, send 8bit with arbitrarily long lines, and hope the mail system tolerates it.
2. Violate the standards, send quoted-printable, but don't escape characters above 0x7F, and hope the mail system tolerates it.
3. Send QP/base64, according to the shorter encoding format.

... and comment #139:
the de facto answer is to send in base64.

I did the following simple experiment:
I created a long line of 'á' characters. I sent the message as UTF-8. Result:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

Then I created a long line of 'こ' characters. I sent the message as UTF-8. Result:

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

A broken message with injected spaced.

This is absolutely crazy!! Western European characters get base64 encoded, but Asian characters get shipped broken.

It is so sad that this bug has stalled two years ago in comment #135 because someone decided that base64 is not the way to go.

As far as I can see, it IS ABSOLUTELY the way to go and will go my best to make it so. This should then also fix bug 26734 and bug 553526.
Here is another observation:

If I copy Japanese text from bug 26734 comment #2
これは長い日本語のテキストですので、行が折れると思います。
into an e-mail and send it, I already get base64 encoding.

Only if I insert the text
ここここここここここここここここここここここここここここここここここここ
from my clipboard manager (Ditto) into an e-mail, I get broken 8bit encoding.

So it seems to be most reasonable to treat both cases the same and always send base64.
OK, looking at nsMsgAttachmentHandler.cpp.
There is a lot of decision making going on.

But before we make decisions, we analyse the attachment/body:

I do some printing right after
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp?from=nsMsgAttachmentHandler.cpp#281
AnalyzeSnarfedFile();

Here are the surprising results.

On a plain text message with 200 a's I get:
m_size=205 m_lines=2 m_max_column=201
Two lines, one 200 bytes long plus and newline.

On a plain text message with 200 á's I get:
m_size=405 m_lines=2 m_max_column=401
Two line, one 400 bytes long, that's 200 characters á, which is c3a1 in UTF-8, and a newline.

So far so good.

On a plain text message with 200 Korean characters 안 I get:
m_size=620 m_lines=6 m_max_column=109

Houston, we have a problem!

What has happened? Well, before we can pick the correct encoding in nsMsgAttachmentHandler::PickEncoding, the data has already been destroyed. Instead of two lines, we get six, and they are all 109 bytes long. One is for the newline, and the 108 bytes represent 36 characters, each ec9588 in UTF-8. And if we check the sent e-mail, a space was inserted after 36 characters.

Conclusion: Whatever encoding we pick in sMsgAttachmentHandler::PickEncoding, even if we force base64, the result will always have the line broken where it shouldn't have been broken.

The investigation continues.
This stuff is truly terrible. To get a plain text message, the HTML message is written to a temporary file nsemail.html. In there we find:

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <font face="Aharoni">안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안안
      안안안안안안안안안안안안안안안안안안안안안</font>
  </body>
</html>

Yes, for every line break, we will get an extra space later. Great!

Note the first line, it has 47 characters. So due to the first problem mentioned above, in the sent e-mail we get an extra space after 36 characters, then anther space after 11 characters.
I had the terrible feeling that this comes down to a problem in the unowned and unmaintained serialisers, and sadly, I was right:

Here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1516
we get the body of the message from the M-C editor, and we ask for it formatted.
Well, that means, that some wrapping is taking place! And sure enough it wraps right through long unicode strings as we've seen in the previous comment.
OK, to eliminate the source of spaces from comment #161, we can change
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1507
to get the HTML "raw" instead of formatted.

Of course we could also fix the M-C serialiser ;-)

Which this change, we now get the spaces consistently after 36 characters (but no longer after 47 characters), as described in comment #160.

One more to go ;-)
s/Which/With/
The downside of the raw HTML is of course that we'll have pretty ugly HTML, all in one long line, I tried it:

Here we have some HTML. Let's see how it turns out.<br><br>Let's insert a picture:<br><img src="cid:part2.08020202.08070206@jorgk.com" alt=""><br><br><ul><li>list item 1</li><li>list item 2</li></ul><br>

Having long lines may also change the encoding, see
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp?from=nsMsgAttachmentHandler.cpp#318

Do we have any opinion on shipping "raw" HTML instead of prettified?

Joshua, Kent, Magnus?
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
Using OutputRaw is already proposed around comment #51.

It will be appropreate for text/html messages and text/plain;charset=utf-8 messages. But we should use format=flowed;delsp=yes (bug 26734) for text/plain;charset=iso-2022-jp message body unless it is an attachment. (Or drop the support for non-UTF-8 message composing entirely.)
Thanks for the hint. It's always difficult to pick up an abandoned bug and get up to speed. I can see that attachment 646005 [details] [diff] [review] uses raw HTML output from the editor. This, or fixing the M-C serialiser, is an absolute "must do", since once a long string is incorrectly chopped up, there is no chance to remove the spaces later. There is also the other issue of inserting a space after 36 unicode characters, in my test case 108 bytes (see comment #160).

With ISO-2022-JP encoding, the temporary file mentioned in comment #160 contains this

<html>
  <head>
    <meta content="text/html; charset=ISO-2022-JP"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    $B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B
$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B

if asking for formatted HTML. This leads to extra spaces. If asking for raw HTML, we get:

<html><head><meta content="text/html; charset=ISO-2022-JP" http-equiv="Content-Type"></head><body bgcolor="#FFFFFF" text="#000000">$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$ [stuff deleted] 3$3$3$3$3(B</body></html>

and avoid extra spaces. So raw HTML (or, again, fixing the M-C serialiser) is an essential component of the solution *regardless* of which encoding is used.

Looking at ISO-2022-JP, even when using raw HTML, the message gets chopped up an is sent as:
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B
$B$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3$3(B

This leads to extra line breaks, but NOT extra spaces.

I've also looked at bug 26734 (delsp=yes) but haven't found a case reproducible problem that would profit from delsp=yes. Maybe using ISO-2022-JP (or Shift_JIS) is such a case.

My first aim is to remove the spuriously inserted spaces, for which I found two sources, one is the formatted HTML, the other still needs to be investigated, so that Asian users can at least send e-mail correctly using UTF-8. As it stands, the product is completely useless.
OK, coming back to the spaces inserted after 36 Unicode characters from comment #160.

After the HTML is retrieved (badly) from the editor, it gets written to nsemail.html. From there, during a call to SnarfAttachment() here
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#546
the body is written out to yet another file nsmail.tmp here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp#624

And peeking into the file we see lines of (arrows added to show trailing space):
  ===>ここここここここここここここここここここここここここここここここここここ <===

Each line consists of 36 unicode characters, one space and a CR/LF.

Summarising the investigation so far:

There is one source of additional spaces for HTLM mail. That comes from the incorrect wrapping of CJK strings when retrieved from the M-C editor as "formatted". This can be fixed be getting raw HTML.

The second source for spaces for plain text mail is when the first file containing the bad wrapping is processed into the second file. I still have to see why and where the trailing space is added.

In light of this, my comment #160 wasn't accurate, in the 
m_max_column of 200+1=201, 400+1=401 or 3*36+1=109,
the +1 is a space, not a newline.
Coming back to the example of a long line of á's:
The temp file nsmail.tmp contains one long line of á's followed by a CR/LF. Not breaking of the line, no trailing spaces. I just wonder which part of the system thinks that it's a good idea to wrap CJK characters and insert spaces. That's just crazy. It's all UTF-8 data, at that stage no one should interpret that data.
OK, analysis done.

The convert to plain text is done here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgAttachmentHandler.cpp#1122
and here:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgUtils.cpp#2479
This calls the M-C core serialiser and this shreds right through an CJK string since once again we asked for formatting in the conversion. Here we have to ask for formatting since we need to get pretty-printed plain text output (see further details below in this comment).

While digging through the code, I found some doubtful stuff here, which might need looking at:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgCompUtils.cpp#1747

This determines whether the message should be 'flowed'. This is later passed as a flag to M-C's ConvertToPlainText(). Just for the record, if you pass 'flowed', the M-C serialiser shreds right through the CJK and appends a space which creates the extra spaces we're seeing. Not passing 'flowed' (I did in the debugger) leads to the same shredding/chopping of the CJK string to 36 character pieces (in my test case) without added spaces which leads to multiple lines in the resulting e-mail.

Final conclusion:
=================
The aim is NOT to chop up CJK strings *at all* in the M-C serialiser just as Western European strings of many á's are not chopped.

This will fix both sources of added spaces. I will need to pursue this in an M-C Core::Serialiser bug.

There is nothing we can do in C-C code to fix this bug.

Note: I tried to call the plaintext conversion asking for 'raw' instead of 'formatted'. We get the result below which is not really good.

====
Hi this is HTML.

   this was a list
   jkjkjk
   jkjkjk

Here we have bold. And red. 
====

With formatting we get something better:

====
*bold* red

 * jkjkj
 * jkjkj

text

1. jkjkjk
2. jkjkjk
====

I believe we really need to fix the M-C core piece to get proper formatting for both HTML and plaintext e-mail.

Clearing all NIs for now.
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
The action will move to bug 1225864 and bug 1225904.
Depends on: 1225864, 1225904
See Also: 553526
No longer depends on: 26734
Attached file Test ISO-2022-JP.eml (obsolete) —
(In reply to Masatoshi Kimura [:emk] from comment #166)
> But we should use format=flowed;delsp=yes (bug 26734) for
> text/plain;charset=iso-2022-jp message body unless it is an attachment. (Or
> drop the support for non-UTF-8 message composing entirely.)

With the fixes from bug 1225864 and bug 1225904 I can perfectly well use ISO-2022-JP but encode base64, if the line with has no spaces gets too long. Look at the attached message to see it. I think we don't need "delsp=yes". That's why I closed bug 26734 as "wontfix".
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #170)
> While digging through the code, I found some doubtful stuff here, which
> might need looking at:
> https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/
> nsMsgCompUtils.cpp#1747

This is indeed doubtful code, once again a hack to avoid additional spaces caused be the M-C serialiser. Once bug 1225864 is fixed, we can allow "format=flowed" for all character encodings.
Attachment #626060 - Attachment is obsolete: true
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Attachment #766580 - Attachment is obsolete: true
Attachment #766581 - Attachment is obsolete: true
As per comment #174 we will be able to always send format=flowed, regardless of the character set used.
This solves all "extra space" problems in both HTML and plaintext mail while at the same time allowing format=flowed for all character encodings.

This depends on landing the patches in bug 1225864 and bug 1225904. These two bugs already have patches ready for review.

I will supply a try build for users to test with.
Attachment #8690534 - Attachment is obsolete: true
Attachment #8690638 - Flags: review?(mkmelin+mozilla)
Interested users can find a developer build here:
Windows:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-9dfaecc45e1cd9ded100cfcb5831a1a0cb9e2fc3/try-comm-central-win32/thunderbird-45.0a1.en-US.win32.installer.exe
Other platforms on request.

Disclaimer:
This is a developer version, it is therefore pre-Aurora (Alpha). Use at your own risk on a NEW PROFILE.

I've tested:
- Long strings of Japanese, Korean and European accented characters,
  both separated with spaces and without spaces. No extra spaces got added.
- Korean characters encoded UTF-8.
- Japanese characters encoded UTF-8 and ISO-2022-JP.
- HTML and plaintext, plaintext is flowed, also for ISO-2022-JP.
All works.
I test this TB under Windows XP, I just enter many Chinese characters such as "中文中文中文中文...", and I see that when I send it(I use the default sending delivery format is "auto detect"), I get the result email as:

Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

中文中文中文中文中文中文中文中文中文中文中文中文中文中文中文ä¸

I don't see the extra space added. Good work!
I also test the other three delivery format, and all works fine. I see that only the last format"plain and rich html text" will send the email as base64 encoding, such as below:

User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:45.0) Gecko/20100101
 Thunderbird/45.0a1
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="------------050200080102050308020002"

This is a multi-part message in MIME format.
--------------050200080102050308020002
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: base64

5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH
5Lit5paHIA0KDQo=
--------------050200080102050308020002
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64

PGh0bWw+DQogIDxoZWFkPg0KDQogICAgPG1ldGEgaHR0cC1lcXVpdj0iY29udGVudC10eXBl
IiBjb250ZW50PSJ0ZXh0L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPg0KICA8L2hlYWQ+DQogIDxi
b2R5IGJnY29sb3I9IiNGRkZGRkYiIHRleHQ9IiMwMDAwMDAiPg0K5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit
5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paH5Lit5paHDQogIDwvYm9k
eT4NCjwvaHRtbD4NCg==
--------------050200080102050308020002--
Thanks for testing.

Base64 will be used when the input lines get too long. It's either that or extra spaces ;-)
Base64 comes from bug 1225904. Base64 was already used for plaintext in certain cases. Now we also use it for the HTML part if necessary.
Would you be so kind as to test the version I supplied in comment #177 with some real Japanese text and ISO-2022-JP encoding. format=flowed should work. Since "delsp=yes" is not implemented, we revert to base64 encoding instead of 7bit if a line gets longer than 900 bytes, so about 450 characters (approx. 2 bytes per character in this encoding, right?).
Flags: needinfo?(VYV03354)
Looks like a wrong link. Do you mean this?
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ea5a8e06169f

I took the liberty to cancel it, since it won't compile.
You need the patches from bug 1225864 and bug 1225904.
You need to submit a M-C patch with the C-C push, that's quite tricky.

Why don't you simple use the binary from comment #177?
Although, I might not understand well what you do here. If you try to append "format=flowed" to all encodings (i.e., including ISO-2022-JP), I think that ISO-2022-JP shouldn't be changed to so because some MUAs and server applications which touch received emails before MUAs access them may not assume that the contents are encoded due to RFC 1468. I think that if users want to send flowed format email written in Japanese, they should use UTF-8.

Anyway, I strongly recommend that making ISO-2022-JP "format=flowed" should be optional behavior by a pref.
Most of the patch is busily removing the charset argument from UseFormatFlowed(). I could of course leave it in.

Before UseFormatFlowed() did:
  return !(PL_strcasecmp(charset, "UTF-8") && nsMsgI18Nmultibyte_charset(charset));
This is hard to read, so let's look at three cases:
charset is not multi-byte: returns 'true', so flowed is used.
charset is UTF-8 and therefore multi-byte: returns 'true', so flowed is used.
charset is not UTF-8 but multi-byte, like ISO-2022-JP: returns 'false'.

We already have a preference mailnews.send_plaintext_flowed (default 'true').
We could introduce a prference mailnews.send_plaintext_flowed_for_stateful_charset (default 'false') which users would have to set to enable sending flowed for ISO-2022-JP.
Or the other way around: mailnews.disable_plaintext_flowed_for_stateful_charset
Or more specific: mailnews.disable_plaintext_flowed_for_iso-2022-jp or mailnews.send_plaintext_flowed_for_iso-2022-jp.

Note: The existing preference mailnews.disable_format_flowed_for_cjk is useless and undocumented, that's why I am removing it.

If flowed sending is turned off, we would obviously not send flowed and also call the plaintext serialiser without the "don't break lines" flag, so for plaintext the old behaviour would be maintained.

The question is whether the new or the old behaviour becomes the default. Making the old behaviour the default will lead to people never seeing the improvement. Making the new behaviour the default would lead to people potentially complaining until they discover the new preference.

We still don't know whether the new behaviour would cause any problems. The new behaviour is:
- Not break long lines.
- Transmit long lines with base64 instead of 7bit or 8bit.
  For ISO-2022-JP that means using base64 instead of 7bit.

Magnus, how do you feel about this?
Flags: needinfo?(VYV03354) → needinfo?(mkmelin+mozilla)
Quoting from 1225864 comment #42:
(In reply to Masatoshi Kimura [:emk] from comment #40)
> I'm arguing about the encoding here because RFC 1468 is a big reason why we
> are using ISO-2022-JP for outgoing Japanese mail messages. If we use
> ISO-2022-JP, we should follow the entire RFC 1468 instead of picking some
> convenient parts of the RFC.

If you want to cover the full RFC 1468, that is
  - ISO-2022-JP and
  - CTE 7bit and
  - short lines,
then we need to implement the flag proposed in the previous comment. No problem. Here are a few variations for the name:

mailnews.send_plaintext_flowed_for_stateful_charset
mailnews.disable_plaintext_flowed_for_stateful_charset
mailnews.send_plaintext_flowed_for_iso-2022-jp
mailnews.disable_plaintext_flowed_for_iso-2022-jp
or
mailnews.send_plaintext_rfc1486_strict

I'm open. We can treat ISO-2022-JP in a special way and make sure it fully complies with RFC 1468, even with the "should"s: https://www.ietf.org/rfc/rfc1468.txt)
===
The ISO-2022-JP encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It *should* be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in current JUNET software.
===
The human user (not implementor) *should* try to keep lines within 80
display columns, or, preferably, within 75 (or so) columns, ...
===

Mangnus, any preference for the name?

In light of the discussion I prefer mailnews.send_plaintext_rfc1486_strict.
Comment on attachment 8690638 [details] [diff] [review]
Proposed change to always allow format=flowed and use the new serialiser flag OutputDisallowLineBreaking

Clearing the review request for now.
New patch with preference mailnews.send_plaintext_rcf1468_strict is coming up shortly.
Attachment #8690638 - Flags: review?(mkmelin+mozilla)
Attached patch Proposed change (v3) (obsolete) — Splinter Review
OK, here we have a more comprehensive approach:

- I refactored the code so UseFormatFlowed() is no longer used,
  instead we have a new function GetSerialiserFlags().
  This is already prepared for "delsp=yes" from bug 26734.
- New preference mailnews.send_plaintext_rfc1486_strict.
- GetSerialiserFlags() returns the flags according to the charset,
  there is special treatment for ISO-2022-JP if the preference is set.
- Flowed is enabled for all charsets, the new preference disables it for
  ISO-2022-JP.
- New serialiser flag OutputDisallowLineBreaking is always used, unless
  new preference disables it for ISO-2022-JP.

The default for mailnews.send_plaintext_rfc1486_strict is 'true', so for ISO-2022-JP the current behaviour doesn't change: flowed is not used and long lines are broken.
Attachment #8690638 - Attachment is obsolete: true
Attachment #8693104 - Flags: feedback?(mkmelin+mozilla)
OK, with the mailnews.send_plaintext_rfc1486_strict set to true, which is the default, I get this:

I copied これは長い日本語のテキストですので、行が折れると思います。 a few times. The resulting e-mail is this:

これは長い日本語のテキストですので、行が折れると思います。これは長い日本
語のテキストですので、行が折れると思います。これは長い日本語のテキストで
すので、行が折れると思います。これは長い日本語のテキストですので、行が折
れると思います。これは長い日本語のテキストですので、行が折れると思いま
す。これは長い日本語のテキストですので、行が折れると思います。これは長い
日本語のテキストですので、行が折れると思います。これは長い日本語のテキス
トですので、行が折れると思います。これは長い日本語のテキストですので、行
が折れると思います。これは長い日本語のテキストですので、行が折れると思い
ます。これは長い日本語のテキストですので、行が折れると思います。これは長
い日本語のテキストですので、行が折れると思います。これは長い日本語のテキ
ストですので、行が折れると思います。これは長い日本語のテキストですので、
行が折れると思います。これは長い日本語のテキストですので、行が折れると思
います。これは長い日本語のテキストですので、行が折れると思います。

All neatly broken at the but no extra spaces inserted. Source:

Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

$B$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\(B
$B8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G(B
$B$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^(B
$B$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^(B
$B$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$(B
$BF|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9(B
$B%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T(B
$B$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$(B
$B$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9(B
$B$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-(B
$B%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"(B
$B9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W(B
$B$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#(B

This should satisfy also the most conservative Japanese user since RFT 1468 is *fully* honoured:
ISO-2022-JP, 7bit, short lines ... and no extra spaces!

If they don't want the broken lines, they need to switch mailnews.send_plaintext_rfc1486_strict to 'false'. Then they will get base64 and no broken lines.

All other character sets are not affected, so all the Chinese, Korean or Japanese UTF-8 users get the benefit of having transmitted exactly what they entered.

I hope this will keep everyone happy.

If still required, we can fix bug 26734 as well, so format=flowed; delsp=yes would become available, but then we're not adhering 100% to RTF 1468 which also calls for short lines in a very vague way:

===
The human user (not implementor) *should* try to keep lines within 80
display columns, or, preferably, within 75 (or so) columns, ...
===
There's another option *without* a new preference. We just do it like this for ISO-2022-JP:

If mailnews.send_plaintext_flowed is set to 'true' (default), we do the new behaviour:
No line breaking, using CTE base64 as required for long lines.
When bug 26734 is done, we use CTE 7bit and delsp=yes.

If mailnews.send_plaintext_flowed is set to 'false', we do the old behaviour:
Line breaking, but always using CTE 7bit as shown in comment #189.

Can we get a consensus among Japanese users? I can you what you decide, even implement bug 26734 after all.
(In reply to Jorg K (GMT+1) from comment #189)
Did you test a text containing both ASCII characters and CJK characters? That is, does the patch implement this part of the RFC?
>   Each JIS
>   X 0208 character takes up two columns, and the escape sequences do
>   not take up any columns. The implementor is reminded that JIS X 0208
>   characters take up two bytes and should not be split in the middle to
>   break lines for displaying, etc.

(In reply to Jorg K (GMT+1) from comment #190)
> There's another option *without* a new preference. We just do it like this
> for ISO-2022-JP:
> 
> If mailnews.send_plaintext_flowed is set to 'true' (default), we do the new
> behaviour:
> No line breaking, using CTE base64 as required for long lines.
> When bug 26734 is done, we use CTE 7bit and delsp=yes.
> 
> If mailnews.send_plaintext_flowed is set to 'false', we do the old behaviour:
> Line breaking, but always using CTE 7bit as shown in comment #189.
> 
> Can we get a consensus among Japanese users? I can you what you decide, even
> implement bug 26734 after all.

The old behavior should be used by default for iso-2022-jp plaintext mails (IIRC Thunderbird currently disables format=flowed for iso-2022-jp). Otherwise nobody will use it and legacy incompatible mails will be sent.
(In reply to Masatoshi Kimura [:emk] from comment #191)
> Did you test a text containing both ASCII characters and CJK characters?
No.
> That is, does the patch implement this part of the RFC?
I don't know.
 
> The old behavior should be used by default for iso-2022-jp plaintext mails
> (IIRC Thunderbird currently disables format=flowed for iso-2022-jp).
> Otherwise nobody will use it and legacy incompatible mails will be sent.
OK. In the patch coming up I will maintain the *exact* old behaviour:
- format=flowed is disabled regardless of mailnews.send_plaintext_flowed
- lines are broken as can be seen in comment #189.
- no extra new preference.
- delsp will be left to bug 26734.
- the behaviour for the mix of ASCII and Japanese characters has not changed
  (I don't know whether it's right or not.)
Attached patch Proposed final solution (v4) (obsolete) — Splinter Review
This patch fixes all problems for CJK languages but maintains the previous behaviour for ISO-2022-JP so that Thunderbird complies with RFC 1468.
That should keep everyone happy.
Attachment #8693104 - Attachment is obsolete: true
Attachment #8693104 - Flags: feedback?(mkmelin+mozilla)
Flags: needinfo?(mkmelin+mozilla)
Attachment #8693174 - Flags: review?(mkmelin+mozilla)
Whiteboard: [status: waiting on results of blocking bugs and jsmime per comment 139]
Attached patch Proposed final solution (v4) (obsolete) — Splinter Review
Oops, forgot to "hg qref" before attaching the patch. Now we're good.
Attachment #8693174 - Attachment is obsolete: true
Attachment #8693174 - Flags: review?(mkmelin+mozilla)
Attachment #8693177 - Flags: review?(mkmelin+mozilla)
New binaries here, this time for all platforms:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-b4966113eaa14a806a88aa558efdd2c6a4f9c89d/

Same behaviour as the binaries from comment #177 but previous behaviour for ISO-2022-JP plaintext messages, that is lines are broken and no format=flowed.
Adding delsp was another two line change, so I added it.
Attachment #8693177 - Attachment is obsolete: true
Attachment #8693177 - Flags: review?(mkmelin+mozilla)
Attachment #8693229 - Flags: review?(mkmelin+mozilla)
Blocks: 26734
Sigh. Fixed cut and paste error.
Attachment #8693229 - Attachment is obsolete: true
Attachment #8693229 - Flags: review?(mkmelin+mozilla)
Attachment #8693231 - Flags: review?(mkmelin+mozilla)
Makoto-san and Masatoshi-san: You both cared about RFC 1468. I've now completed the work. ISO-2022-JP will always be sent with CTE 7bit. If format=flowed is used, this is achieved with delsp=yes.

Can you please test that everything is to your liking. Binaries here, sadly the Mac compile failed due to Mercurial problems:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-09c359459df26c2000cd4cb30971a140a661c0ff/

Here is the try run:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=09c359459df2

And here is a flowed/delsp message. Note the spaces at the end of each line.

Content-Type: text/plain; charset=ISO-2022-JP; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit

$B$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\(B <=== space here
$B8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G(B 
$B$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^(B 
$B$l$k$H;W$$$^$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^(B 
$B$9!#$3$l$OD9$$F|K\8l$N%F%-%9%H$G$9$N$G!"9T$,@^$l$k$H;W$$$^$9!#(B
Flags: needinfo?(m_kato)
Flags: needinfo?(VYV03354)
Attached patch Proposed test (v1) (obsolete) — Splinter Review
This test extends the test from bug 1225904.

Needless to say that this will only work with the patches from bug 1225864 and bug 1225904 applied first.

(Note: I'm obsoleting the ISO-2022-JP sample message since the system will no longer produce ISO-2022-JP encoded plaintext messages with CTE base64.)
Attachment #8689687 - Attachment is obsolete: true
Attachment #8693333 - Flags: review?(mkmelin+mozilla)
htmlWrapColumn does not exist anywhere in the system any more, so I'm removing it from summary to reduce the confusion: https://dxr.mozilla.org/comm-central/search?q=htmlWrapColumn&redirect=false&case=false

mailnews.wraplength is obeyed when wrapping non-flowed plaintext e-mail.

Japanese text encoded in ISO-2022-JP is "tradiationally" wrapped at the number of bytes specified in mailnews.wraplength (default: 72), which is equivalent to half the number of characters (36 assuming the default).

Some parts of the system rely on this "magic", see for example here:
https://dxr.mozilla.org/mozilla-central/source/dom/base/test/TestPlainTextSerializer.cpp#78
Summary: CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail, due to wrap produced by editor.htmlWrapColumn(which looks 76 or 78 unicode chars regardless of the setting), mailnews.wraplength, and line length limitation of 1000bytes of SMTP → CJK(Chinese, Japanese, Korean): extra space is inserted within text in mail due to wrap produced by mailnews.wraplength and line length limitation of 1000bytes of SMTP
Attached patch Proposed test (v1b) (obsolete) — Splinter Review
Fixed typo.
Attachment #8693333 - Attachment is obsolete: true
Attachment #8693333 - Flags: review?(mkmelin+mozilla)
Attachment #8693362 - Flags: review?(mkmelin+mozilla)
Attached patch Proposed test (v1b) (obsolete) — Splinter Review
Fixed typo, this time for real.
Attachment #8693362 - Attachment is obsolete: true
Attachment #8693362 - Flags: review?(mkmelin+mozilla)
Attachment #8693364 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8693231 [details] [diff] [review]
Proposed final solution (v5b), includes delsp support.

Review of attachment 8693231 [details] [diff] [review]:
-----------------------------------------------------------------

Code looks ok to me. r=mkmelin

::: mailnews/compose/src/nsMsgCompose.h
@@ +159,5 @@
>      NS_DECL_NSISTREAMLISTENER
>      NS_DECL_NSIMSGQUOTINGOUTPUTSTREAMLISTENER
>  
>      NS_IMETHOD  SetComposeObj(nsIMsgCompose *obj);
> +    NS_IMETHOD  ConvertToPlainText(bool formatflowed,

odd double spacing here, please fix it and the row above

::: mailnews/compose/src/nsMsgSend.cpp
@@ +1527,5 @@
>  
>    //
>    // Query the editor, get the body of HTML!
>    //
> +  uint32_t  flags = nsIDocumentEncoder::OutputFormatted |

one space only please
Attachment #8693231 - Flags: review?(mkmelin+mozilla) → review+
Comment on attachment 8693364 [details] [diff] [review]
Proposed test (v1b)

Review of attachment 8693364 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/compose/test/unit/test_longLines.js
@@ +12,5 @@
>  
> +// Copied from jsmime.js.
> +function stringToTypedArray(buffer) {
> +  var typedarray = new Uint8Array(buffer.length);
> +  for (var i = 0; i < buffer.length; i++)

please always have braces for loops, even one-line loops

@@ +22,3 @@
>    let msgData = mailTestUtils
>      .loadMessageToString(gDraftFolder, mailTestUtils.firstMsgHdr(gDraftFolder));
> +  checkMessageHeaders(msgData, expectedHeaders, "");

just leave out the last param, instead of having ""
Attachment #8693364 - Flags: review?(mkmelin+mozilla) → review+
Carrying over Magnus' r+.
Fixed nits.
Attachment #8693231 - Attachment is obsolete: true
Flags: needinfo?(m_kato)
Flags: needinfo?(VYV03354)
Attachment #8695528 - Flags: review+
Carrying over Magnus' r+.
Fixed nits.
Attachment #8693364 - Attachment is obsolete: true
Attachment #8695543 - Flags: review+
This depends on bug 1225904, so please land it first.
Please apply the "proposed solution", then the test.
Keywords: checkin-needed
https://hg.mozilla.org/comm-central/rev/c6c9a8e486b7e67f43707ead63a4f796fa4e6952
Bug 653342 - Properly set serializer flags. Support delsp. Use OutputDisallowLineBreaking. r=mkmelin.

https://hg.mozilla.org/comm-central/rev/725ae1aad7d0900e34dbd06359a1a41279e25873
Bug 653342 - Properly set serializer flags. Support delsp. Use OutputDisallowLineBreaking. Test. r=mkmelin.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
How bad, the new long line test fails on Mac and Linux, works on Windows:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=725ae1aad7d0

I'll look into it tomorrow.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Correction of the landed test. (obsolete) — Splinter Review
I suspect a newline problem. The first test which compares HTML works, the second test compares using an appended "\r\n" and fails. I suspect that's it. I'm now doing the newlines based on the platform the test runs on, so "\r\n" for Windows and "\n" for the rest. See how we go: Try here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=16293598e883
Mistyped try string for macosx64: Here's another one:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=90da358fdcd7
I'm on the right track, now all tests pass, but the last one. This will hopefully fix the last one as well. New try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=5c54437b0007
Attachment #8696134 - Attachment is obsolete: true
Comment on attachment 8696211 [details] [diff] [review]
Correction of the landed test. (take 3)

Review of attachment 8696211 [details] [diff] [review]:
-----------------------------------------------------------------

The test seems to pass now and the change looks reasonable.
Could you use Services.appinfo.OS for the OS detection? The value for Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)
Attachment #8696211 - Flags: feedback+
Comment on attachment 8696211 [details] [diff] [review]
Correction of the landed test. (take 3)

OK, the third attempt fixes the test failures on Mac and Linux.
I also corrected a typo: s/htmt/html/.

Aleth, you might just want to rs and land this quickly.
Flags: needinfo?(aleth)
Attachment #8696211 - Flags: review?(mkmelin+mozilla)
(In reply to :aceman from comment #215)
> Could you use Services.appinfo.OS for the OS detection? The value for
> Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)

Well, https://developer.mozilla.org/en-US/docs/Mozilla/QA/Writing_xpcshell-based_unit_tests#Platform-specific_tests says to use what I used. I will try your solution now.
OK, there the OS detection according to Aceman. Works on Windows.

Please review if you feel fit ;-)
Attachment #8696211 - Attachment is obsolete: true
Attachment #8696211 - Flags: review?(mkmelin+mozilla)
Attachment #8696220 - Flags: review?(mkmelin+mozilla)
Attachment #8696220 - Flags: review?(acelists)
(In reply to Jorg K (GMT+1) from comment #217)
> (In reply to :aceman from comment #215)
> > Could you use Services.appinfo.OS for the OS detection? The value for
> > Windows is "WINNT". ("Linux" and "Darwin" are the other ones.)
> 
> Well,
> https://developer.mozilla.org/en-US/docs/Mozilla/QA/Writing_xpcshell-
> based_unit_tests#Platform-specific_tests says to use what I used. I will try
> your solution now.

Yeah, maybe it was written before Services were available :) Anyway, my method seems less hacky and you can grep it that we use it in the TB code.
Aceman reports failures on his local Linux debug build:
###!!! ASSERTION: Not a UTF-8 string. This code should only be used for converting from known UTF-8 strings.: 'Error', file /mozilla/xpcom/string/nsUTF8Utils.h, line 430"

So here goes another try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=ea78d2cefb3a

So let's see the results before reviewing and landing this.
Comment on attachment 8696220 [details] [diff] [review]
Correction of the landed test. (take 4)

Aceman lied to me, Services.appinfo.OS returns XPCShell.
Grrrr. Going back to the previous version with another try run.
Attachment #8696220 - Attachment is obsolete: true
Attachment #8696220 - Flags: review?(mkmelin+mozilla)
Attachment #8696220 - Flags: review?(acelists)
OK, since Services.appinfo.OS doesn't return the correct result I'm going back to version 3 with an added comment.

Another try, this time including the debug builds:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=f7ddd280284e
Attachment #8696229 - Flags: review?(acelists)
OK, one more time, with an even better comment upon Aceman's request.
Attachment #8696229 - Attachment is obsolete: true
Attachment #8696229 - Flags: review?(acelists)
Attachment #8696230 - Flags: review?(acelists)
Comment on attachment 8696230 [details] [diff] [review]
Correction of the landed test. (take 3b), same as take 3 with added comment.

Review of attachment 8696230 [details] [diff] [review]:
-----------------------------------------------------------------

It seems my xpcshell breakage is not related to this patch.
The patch fixes the tests on try server so let's land it.
Attachment #8696230 - Flags: review?(acelists) → review+
https://hg.mozilla.org/comm-central/rev/9c7115099094
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
All done, thanks Aceman!
Flags: needinfo?(aleth)
I tested xpcshell:

js> Components.utils.import("resource://gre/modules/AppConstants.jsm");
[object BackstagePass]
js> AppConstants.platform;
win
Is it possible that this fix patches could be applied in the current release branch? We are in release 38.x version, but the target milestone is 45.0, and we will wait long time to see this bug is fixed in the official release. Thanks.
Is it possible that this patches could be applied in the current release branch? We are in release 38.x version, but the target milestone is 45.0, and we will wait long time to see this bug is fixed in the official release. Thanks.
(In reply to asmwarrior from comment #229)
> Is it possible that this patches could be applied in the current release
> branch? We are in release 38.x version, but the target milestone is 45.0,
> and we will wait long time to see this bug is fixed in the official release.
> Thanks.

The required change was complex spanning multiple patches over multiple bugs, so I don't this is a good candidate for uplift to release.
I agree with Kent. However, after running with these patches on Daily and Aurora from 14th Dec., we could see whether they can be made to apply on TB 38.6 due at the end of January. Then it's just six weeks away from TB 45 due in early March, so perhaps not worth the effort. Kent is doing the uplifts, so it's up to him.
Whiteboard: [tb-papercut]
(In reply to Jorg K (GMT+1) from comment #231)
> I agree with Kent. However, after running with these patches on Daily and
> Aurora from 14th Dec., we could see whether they can be made to apply on TB
> 38.6 due at the end of January. Then it's just six weeks away from TB 45 due
> in early March, so perhaps not worth the effort. Kent is doing the uplifts,
> so it's up to him.

I'm just a general TB end-user and have no clue about the technicalities of this bug, but I can see that this bug has finally been resolved and is due for release into production soon - THANK YOU and all the other developers/programemrs who worked so hard and diligent in order to fix this issue.

With that said, when exactly is this fix scheduled to be released in TB? End of this month (Jan)? Or do we need to wait until early March?

I am currently using the latest production build (v38.5.1) and would LOVE to have this fix implemented so that I can finally write my emails in Japanese without seeing all those random blank spaces everywhere! It certainly doesn't look pretty, almost unprofessional even, especially when writing business emails to clients and partners and it would be life saving to have this resolved over the next 1-2 weeks once and for all :-D
(In reply to KSak from comment #232)
> With that said, when exactly is this fix scheduled to be released in TB? End
> of this month (Jan)? Or do we need to wait until early March?

This is coming out in TB 45 in early to mid-March 2016:
https://wiki.mozilla.org/RapidRelease/Calendar

TB 45 will go to beta at the end of January.

You can use this functionality now in the US English version of Earlybird:
http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/ for example
http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/thunderbird-45.0a2.en-US.win32.installer.exe

Warning: This is Alpha software, but hey, I've been using it since Christmas and as far as I can tell, it works ;-)
(In reply to Jorg K (GMT+1) from comment #233)
> (In reply to KSak from comment #232)
> > With that said, when exactly is this fix scheduled to be released in TB? End
> > of this month (Jan)? Or do we need to wait until early March?
> 
> This is coming out in TB 45 in early to mid-March 2016:
> https://wiki.mozilla.org/RapidRelease/Calendar
> 
> TB 45 will go to beta at the end of January.
> 
> You can use this functionality now in the US English version of Earlybird:
> http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/ for
> example
> http://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-aurora/
> thunderbird-45.0a2.en-US.win32.installer.exe
> 
> Warning: This is Alpha software, but hey, I've been using it since Christmas
> and as far as I can tell, it works ;-)

Thanks a ton for the reply. I gave Earlybird a try and confirmed the blanks are gone! Such great stuff!!! Can't wait for the official March release :-)

I did run into one minor problem though when sending emails with Earlybird. For some reason it seems like URL links (e.g. https://www.google.co.jp) that I type in the email message does not "activate" or automatically get converted to a hyperlink when I receive the message.

I sent the exact same email with the URL using Thunderbird and confirmed the URL was properly "activated" as a hyperlink when I received the email (on either TB or EB), so it appears to be an issue with sending emails through Earlybird. Not sure if it's a bug, limitation or perhaps one of the settings get changed when I use Earlybird, but just wanted to raise this to your attention.
Forgot to mention that in Earlybird the URL will properly convert to a link if I manually select it and use the "Link" command button. But it doesn't automatically convert to a link like it does when using Thunderbird.
The link problem should be bug 1240903.
(In reply to KSak from comment #234)
> I did run into one minor problem though when sending emails with Earlybird.
> For some reason it seems like URL links (e.g. https://www.google.co.jp) that
> I type in the email message does not "activate" or automatically get
> converted to a hyperlink when I receive the message.
As Aceman said, this is already fixed in bug 1240903 and will be landed any day now.
(In reply to Jorg K (GMT+1) from comment #237)
> (In reply to KSak from comment #234)
> > I did run into one minor problem though when sending emails with Earlybird.
> > For some reason it seems like URL links (e.g. https://www.google.co.jp) that
> > I type in the email message does not "activate" or automatically get
> > converted to a hyperlink when I receive the message.
> As Aceman said, this is already fixed in bug 1240903 and will be landed any
> day now.

Gotcha! Really appreciate the quick point outs. Looking forward to the upcoming release and fixes!
I just noticed that TB 38.7.1 was just released, but I thought TB 45 which includes a fix for this bug was due for release in early to mid-March? Has 45 not been released yet or has it been postponed?

I checked the schedule (https://wiki.mozilla.org/RapidRelease/Calendar) and could not find reference to Thunderbird anywhere (is it the "ESR" column?) but it appears as though Firefox 45 and ESR 45.0 were already released on March 7th. Please correct me if I'm looking at the wrong place.
45 is late, but what we hope is the final beta was built this week. We hope that 45 will ship next week.
(In reply to Kent James (:rkent) from comment #240)
> 45 is late, but what we hope is the final beta was built this week. We hope
> that 45 will ship next week.

That's good to hear. Is this release info updated or already included somewhere in the schedule link (https://wiki.mozilla.org/RapidRelease/Calendar)?

I wasn't sure where to look, but the next release date appears to be set on April 19th and it mentions 45.1 under ESR - is that referring to Thunderbird 45 or am I looking at the schedule incorrectly?
TB follows the Release Calendar. TB 45.0 ESR should have been released on March 8, 2016, but usually we're running two to four weeks late, as per comment #240.
(In reply to Jorg K (GMT+2) from comment #242)
> TB follows the Release Calendar. TB 45.0 ESR should have been released on
> March 8, 2016, but usually we're running two to four weeks late, as per
> comment #240.

I'm a little confused as to what you mean by TB 45.0 already being released on March 8, 2016 even though the actual fix for this bug not been released yet. Wasn't this fix included in 45.0?

Also, my TB recently got updated to 38.7.2, not 45.0 based on the Release Calendar as you mentioned. 38.7 looks like a version for Firefox, while 45.0 is for TB. Can you kindly clarify how these updates work?
1) The fix is in TB 45.
2) TB 45 ESR has not been released yet. I will be released next week.
3) TB 45 beta is available for testing now: https://www.mozilla.org/en-US/thunderbird/channel/
4) Once TB 45 ESR is released, TB 38.x will eventually automatically update.
5) FF and TB follow the same numbering scheme and release calendar, however, TB is usually running
   a few weeks late since it is staffed only by unpaid volunteers.
   That's why TB 45 ESR should have been released on March 8, 2016, but wasn't, see 2).
(In reply to Jorg K (GMT+2) from comment #244)
> 1) The fix is in TB 45.
> 2) TB 45 ESR has not been released yet. I will be released next week.
> 3) TB 45 beta is available for testing now:
> https://www.mozilla.org/en-US/thunderbird/channel/
> 4) Once TB 45 ESR is released, TB 38.x will eventually automatically update.
> 5) FF and TB follow the same numbering scheme and release calendar, however,
> TB is usually running
>    a few weeks late since it is staffed only by unpaid volunteers.
>    That's why TB 45 ESR should have been released on March 8, 2016, but
> wasn't, see 2).

Thanks JorgK - that explanation really helps a lot to understand how the release system works between FF nd TB. By the way what does "ESR" stand for? Is there any difference between "TB 45" and "TB 45 ESR"?

Mozilla really need to do some good and hire you guys - i.e. the unpaid volunteers. It's really unfortunate that they've stopped officially supporting TB after all this time the community has continued on with it.
(In reply to KSak from comment #245)
> By the way what does "ESR" stand for?
> Is there any difference between "TB 45" and "TB 45 ESR"?
This information is readily available elsewhere.
ESR = Extended Service Release.
In Firefox every seventh release is an ESR, 17, 24, 31, 38, 45, etc.
Since TB doesn't have the manpower, we only do every seventh release, so you get all those release versions. TB 45 and TB 45 ESR are the same thing. We do the other versions as beta releases only, at times skipping some. I'd say the next beta will be TB 47 skipping TB 46. It's not decided yet.

> Mozilla really need to do some good and hire you guys - i.e. the unpaid
> volunteers. It's really unfortunate that they've stopped officially
> supporting TB after all this time the community has continued on with it.
I agree, but Mozilla see it differently. There will be official announcements made soon. In the meantime you can donate directly to Thunderbird: https://donate.mozilla.org/en-US/thunderbird/about/
(In reply to Jorg K (GMT+2) from comment #246)
> (In reply to KSak from comment #245)
> > By the way what does "ESR" stand for?
> > Is there any difference between "TB 45" and "TB 45 ESR"?
> This information is readily available elsewhere.
> ESR = Extended Service Release.
> In Firefox every seventh release is an ESR, 17, 24, 31, 38, 45, etc.
> Since TB doesn't have the manpower, we only do every seventh release, so you
> get all those release versions. TB 45 and TB 45 ESR are the same thing. We
> do the other versions as beta releases only, at times skipping some. I'd say
> the next beta will be TB 47 skipping TB 46. It's not decided yet.
> 
> > Mozilla really need to do some good and hire you guys - i.e. the unpaid
> > volunteers. It's really unfortunate that they've stopped officially
> > supporting TB after all this time the community has continued on with it.
> I agree, but Mozilla see it differently. There will be official
> announcements made soon. In the meantime you can donate directly to
> Thunderbird: https://donate.mozilla.org/en-US/thunderbird/about/

Just made a donation. Thanks so much for taking the time to clarify these items, really appreciate it!
In case you haven't noticed: TB 45 has now been released:
https://www.mozilla.org/en-US/thunderbird/all/
My TB hasn't yet auto-updated to 45 (it's still sitting at 38.7.2).

Is there any way to force an update on the application? Or do I need to download from that link you provided and reinstall it again? I wouldn't want to lose any settings/add-ons if possible when updating to 45.
Download and install it from https://www.mozilla.org/en-US/thunderbird/all/.
Settings won't be lost, but we can't guarantee that all add-ons will continue to work.
(In reply to Jorg K (GMT+2) from comment #250)
> Download and install it from https://www.mozilla.org/en-US/thunderbird/all/.
> Settings won't be lost, but we can't guarantee that all add-ons will
> continue to work.

Thanks! Just installed 45 though as you warned it wasn't compatible with Noia Fox theme. Will give it a whirl!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: