355209 - Long Japanese or Unicode sentences is broken by Tb/Sm when mail is sent or saved in Outbox/Drafts (When long line is split by SMTP line length limit==LINE_BREAK_MAX, Tb splits at mid of 3bytes code of utf-8 and 3bytes escape sequence of iso-2022-jp)

Reporter

Description

•

19 years ago

When long Japanese sentences are input, save cannot be normally done to Draft. It is possible to preserve it up to 492 characters in Japanese normally. Reproducible: Always Steps to Reproduce: Thunderbird account setting: 1) Open Tools --> Account Settings 2) "Composition & Addressing" is selected from a set list of the left side. 3) The check on "Compose messages in HTML format" is turned off. 4) Push OK button Thunderbird option setting: 5) Open Tools --> Options 6) Select Composition tab 7) 0 is input with "Wrap plain text message at xxx characters". 8) Select Display tab 9) Select "Japanese(ISO-2022-JP)" with Outgoing Mail and Incoming Mail 10) Push OK button 11) Create new messasge 12) IME is turned on 13) Japanese is input by 493 characters. e.g Only あ is input by 493 characters. 14) Push Save button and close Compose window 15) The message saved in Draft is opened. Windows XP SP1 version 3 alpha 1 (20061002)

Hiro

Reporter

Comment 1

•

19 years ago

Attached image Screen shot when problem occurs — Details

Hiro

Reporter

Comment 2

•

19 years ago

Attached file Draft when problem occurs — Details

The message saved in Draft was input continuing "あ" in Japanese by 493 characters.

Hiro

Reporter

Comment 3

•

19 years ago

(In reply to comment #2) > Created an attachment (id=241033) [edit] > Draft when problem occurs > > The message saved in Draft was input continuing "あ" in Japanese by 493 > characters. > oops... The last several characters input "い".

Atsushi Sakai

Comment 4

•

19 years ago

I think it occurs here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/mailnews/compose/src/nsMsgSend.cpp&rev=1.387#1876 |charsSinceLineBreak| is actually bytes, not character. So, multi-byte character may be split with linebreak. Product is not Thunderbird, but Core/MailNews?

Hiro

Reporter

Comment 5

•

19 years ago

(In reply to comment #4) > I think it occurs here: > http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/mailnews/compose/src/nsMsgSend.cpp&rev=1.387#1876 > |charsSinceLineBreak| is actually bytes, not character. > So, multi-byte character may be split with linebreak. The the number of characters and following figures that save is done by an illegal character are coherent. http://landfill.mozilla.org/mxr-test/mozilla/source/mailnews/compose/src/nsMsgSend.cpp#1845 1845 #define LINE_BREAK_MAX 990 > Product is not Thunderbird, but Core/MailNews? Yes. And the Mac version reproduces, too.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Windows XP → All

Hardware: PC → All

Dan Mosedale (:dmosedale, :dmose)

Updated

•

17 years ago

Assignee: mscott → nobody

Wayne Mery (:wsmwk)

Updated

•

16 years ago

Severity: normal → major

Component: General → Composition

Product: Thunderbird → MailNews Core

QA Contact: general → composition

Version: Trunk → unspecified

Wayne Mery (:wsmwk)

Comment 6

•

16 years ago

perhaps kozawa can test this with beta 3 http://www.mozillamessaging.com/en-US/thunderbird/early_releases/

Whiteboard: [needs trunk test]

Version: unspecified → 1.8 Branch

Hidehiro Kozawa

Comment 7

•

16 years ago

WinXP/SP3, Tb bata3 reproduced. It doesn't seem that the problem was corrected as long as I see the cvs history.

Wayne Mery (:wsmwk)

Updated

•

15 years ago

Keywords: intl

Whiteboard: [needs trunk test]

Version: 1.8 Branch → Trunk

Axel Hecht [:Pike]

Comment 8

•

15 years ago

I guess there's something that can be done with the charset encoders/decoders? I'm not sure what exactly the STR are on 3.1 RC, can't get that to fail locally, I don't seem to have found the setting to trigger this. Also, is that max line break a thing we do for RFCs or for our own sanity? That would determine how the encoding actually impacts what we're doing, namely glyphs vs bytes.

Makoto Kato [:m_kato]

Comment 9

•

15 years ago

I think "Wrap plain text message" options don't works now. Plain text formatter always set linebreak each 72 character and mailnews code sets linebreak 990 byte at force. I will consider the fix by bug 553526 and Bug 26734. To support delsp=yes, I need refactor linebreak code in mailnews.

Wayne Mery (:wsmwk)

Updated

•

15 years ago

Attachment #241033 - Attachment mime type: text/plain → application/octet-stream

Wayne Mery (:wsmwk)

Comment 10

•

15 years ago

(In reply to comment #9) > I will consider the fix by bug 553526 and Bug 26734. To support delsp=yes, I > need refactor linebreak code in mailnews. if you mean you will fix them here then please adjust the dependencies I have just test

Depends on: 553526, 26734

Makoto Kato [:m_kato]

Comment 11

•

15 years ago

(In reply to comment #10) > (In reply to comment #9) > > I will consider the fix by bug 553526 and Bug 26734. To support delsp=yes, I > > need refactor linebreak code in mailnews. > > if you mean you will fix them here then please adjust the dependencies I have > just test To support DelSp=yes (Bug 26734), I must fix this. Fix plan is - It doesn't break line when saving mail to draft. - when sending it, it breaks lines But this is just idea. I am investigating fixing.

WADA:World Anti-bad-Duping Agency

Comment 12

•

15 years ago

(In reply to comment #1) > Created attachment 241032 [details] > Screen shot when problem occurs With Tb 3.1.7, I couldn't see this kind of corruption of iso-2022-jp data(loss of escape sequence due to inserted CRLF) with HTML mode composition and "Send Later". Tb 3.1 looks to care for charset and escape sequence upon split by LINE_BREAK_MAX 990(990 bytes). Kato san do you still see same corruption with Tb 3.1? (In reply to comment #11) > To support DelSp=yes (Bug 26734), I must fix this. > Fix plan is > - It doesn't break line when saving mail to draft. If local Drafts folder, it'll improve, because Tb can use any line length . But, if IMAP Drats, line length should be cared. - when sending it, it breaks lines "Generated mail data stream for mail send" is same data as mail data saved in Outbox(==Unsent Messages) by "Send Later". Split of long line happens by any of next in the generated mail data stream. (A) text/html part. (A-1) by "#define LINE_BREAK_MAX 990". This is applied to any charset. (A-2) by editor.htmlWrapColumn(default=72, 72 characters, not 72 bytes) If SBCS character like ascii, split of a word longer than this length doesn't occur. Split of continuous characters seems DBCS charset only phenomenon. As Tb 3.1 executes formatting of HTML source(indention by putting spaces before text in HTML, inserted data by splitting becomes "CRLF + some spaces"). (B) text/plain part. (B-0) text/plain part data is gnerated by text converter. Because "new line character" in HTML is equivallent to a space, inserted CRLF by (A) for text/html is converted to a space. So, excess space appears in text/plain part data. After text conversion, next are applied. (B-1) by mailnews.wraplength(default=72, 72 bytes, not 72 characters) As 72 bytes instead of 72 characters, additional line splitting occurs if charset of multi-bytes code is used. (B-2) by format=flowed(max 80 bytes or 78 bytes including CRLF) If ascii, split at a space. I don't know about behaviour on DBCS characters well. Note: If text mode composition, Hard-Wrap is executed during compoition. So, line split by "LINE_BREAK_MAX 990" occurs only when user intentionally sets mailnews.wraplength=0 or value larger than 990. See bug 611411 comment #3 for procedure to observe above. Tb 3.1 doesn't show very long line as if hard-wrapped during HTML mode composition. And, Even if long line is split in text/html part by Save As or Send Later, Tb 3.1 looks to show it as "continuous characters"(i.e. ignore or remove inserted CRLF in HTML source by line splitting), if text/html part exists and View/Message Body As/HTML is choosed. I don't know behaviour on <pre> part. I guess bug 611411 is for excess space by (B-0) in text/plain part. To support line spliting of multi-byte charset by wrap length or line length limitation well, DelSp=Yes support or similar is required for both text/html part and text/plain part. Further, "split at middle of a multi-bytes charater or an escape-sequence by LINE_BREAK_MAX 990" should be cared. "Draft or sent mail data corruption when many long long lines are pasted at compose window" was reported to a Forum Japan once. It looked split of "three bytes code" or "three bytes escape sequece of iso-2022-jp" when a special condition(e.g. the three bytes are placed at buffer boundary).

WADA:World Anti-bad-Duping Agency

Comment 13

•

15 years ago

(In reply to comment #1) > Created attachment 241032 [details] > Screen shot when problem occurs This problem still occurred in Tb 3.1 and Tb 3.3a3pre(2011/01/15 build). I wrongly thought first line is also shown as corrupted data by Tb, because iso-2022-jp's spec is "line should end with unescape to ascii-mode". Sorry for my misunderstanding. (1) Text mode composition, mailnews.wraplength=0, iso-2022-jp. CRLF is inserted regardless of escape sequence of iso-2022-jp by LINE_BREAK_MAX 990. Tb 3.1 shows first line in Japanes character, with U+FFFD at line end. All of second and later line in gargled text. View/Message Source of Tb 3.1/trunk shows first line also in garbled. Text editor shows all lines in garbled too. (2) HTML mode composition, mailnews.wraplength=0, iso-2022-jp, editor.htmlWrapColumn=0. CRLF is inserted regardless of editor.htmlWrapColumn setting in text/html part(looks always 72 characters). In text/part, line length looks "LINE_BREAK_MAX 990", but data corruption is not observed. Last charcter's column is 498, 499, or 500. Depends on excess space. It seems splitting in text/plain part is executed at character boundary with regarding escape sequence, if split is executed on is-2022-jp binary line.

Wayne Mery (:wsmwk)

Updated

•

13 years ago

Depends on: 653342

WADA:World Anti-bad-Duping Agency

Updated

•

12 years ago

Severity: major → critical

Keywords: dataloss

Summary: Long Japanese sentences are not normally saved in Draft → Long Japanese sentences is broken by Tb/Sm when mail is sent or saved in Outbox/Drafts (When long line is split by SMTP line length limit==LINE_BREAK_MAX, Tb splits at mid of 3bytes code of utf-8 and 3bytes escape sequence of iso-2022-jp)

WADA:World Anti-bad-Duping Agency

Comment 14

•

12 years ago

(Correction of comment #12) > With Tb 3.1.7, I couldn't see this kind of corruption of iso-2022-jp > data(loss of escape sequence due to inserted CRLF) with HTML mode > composition and "Send Later". > Tb 3.1 looks to care for charset and escape sequence upon split by > LINE_BREAK_MAX 990(990 bytes). My observation ad guess was wrong. Reason why "split of 3bytes escape sequence of iso-2022-jp" doesn't occur in HTML mode composition was; HTML editor always inserts line break for each "around 80 Unicode Characters(Not in "Bytes". Perhaps at "80-Line Break length" chars.) This "insert line break around 80 unicode chars" doesn't occur in <pre>. So, if <pre> is used, this bug occurs at text/html part, even in HTML mode composition.

WADA:World Anti-bad-Duping Agency

Updated

•

12 years ago

Updated

•

12 years ago

Updated

•

12 years ago

Summary: Long Japanese sentences is broken by Tb/Sm when mail is sent or saved in Outbox/Drafts (When long line is split by SMTP line length limit==LINE_BREAK_MAX, Tb splits at mid of 3bytes code of utf-8 and 3bytes escape sequence of iso-2022-jp) → Long Japanese or Unicode sentences is broken by Tb/Sm when mail is sent or saved in Outbox/Drafts (When long line is split by SMTP line length limit==LINE_BREAK_MAX, Tb splits at mid of 3bytes code of utf-8 and 3bytes escape sequence of iso-2022-jp)

Jorg K (CEST = GMT+2)

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → DUPLICATE

Screen shot when problem occurs 19 years ago Hiro 21.19 KB, image/png		Details
Draft when problem occurs 19 years ago Hiro 1.53 KB, application/octet-stream		Details