Closed Bug 27062 Opened 25 years ago Closed 25 years ago

JPN plain text mail Line Breaking is broken

Categories

(MailNews Core :: Backend, defect, P3)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: nhottanscp)

References

Details

(Whiteboard: [PDT-])

Attachments

(2 files)

** Observed with 2/8/2000 Win32 build (2000020817) ** Now that Bug 26802 is fixed, I tried format=fixed with the following pref setting: user_pref("mailnews.send_plaintext_flowed", false); It works and the Content-Type header omits format=flowed. But when we input a long Japanese line without a hard break, we send out iso-2022-jp mail without any line wrapping. This is not something we have been doing in Navigator 3 or Communicator. We always wrapped long plain text mail when format is fixed. It seems to me that this has helped in inter-operability area and we should not stop doing that in Mozilla. We should fix this in beta 1.
Now that finally we can see the format=fixed plain text mail, we also see a problem that needs to be resolved for beta 1. Marking it as such.
Keywords: beta1
QA Contact: lchiang → momoi
Is it possible for me, with a western windows system and no knowledge of Japanese, to reproduce this? My guess is that the Japanese strings doesn't have any characters that nsString::IsSpace() returns true on and therefore the line breaking algorithm doesn't find anywhere to put a line break. If that is the case, should nsString::IsSpace() be fixed or is it the search for a place to break that should be fixed? By the way, see bug 27055 which also has to do with line breaking. I guess this bug should have target milestone M14, but I will not that touch that.
I'm touching that. Just forgot to do that. Daniel, I thought you had a thesis defense to do. Seriously, CJK line-wrapping should be done by nhotta. If you're curious, you can copy paste from a Japanese web page, engage File | Edit Page on such a page, copy a line, and then paste into Plain Text editor many times wihout pressing Enter. Then Choose View | Charater Set Multibyte | Japanese (ISO-2022-JP). Then Send.
Target Milestone: M14
You're so right. :-) I wrote the current code that does the wrapping so I'm interested in how it evolves and I'm also trying to cut down on my spare time, that's why I continue to write strange comments. :-)
Is this specific to ISO-2022-JP? Or happens with other charsets if we set the pref to send format=fixed?
Rather than checking for whitespace, shouldn't we use the linebreaking methods? http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/public/nsILineBreaker.h http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/tests/TestLineBreak.cpp CC'd ftang for linebreaking expertise. Note comment from http://bugzilla.mozilla.org/show_bug.cgi?id=26734: ----- Additional Comments From jgmyers@netscape.com 2000-02-07 16:43 ----- My take: A client that has better definition of "word" than RFC 2646 should use that better definition. If Mozilla knows that text is Japanese, it should use the Japanese line-breaking rules for deciding when to insert soft line breaks for both generation and layout.
With the last testable build, 2/8/2000, with the pref option set to format=fixed, I see the following: 1. ASCII plain text: wraps 2. Latin accents with QP -- wraps but slightly uneven line breakings 3. Japanese (ISO-2022-JP) -- no wrapping This problem may be limited to CJK. I'll try next C and K.
Does anyone know where in the code the line break are generated in case of format=flow is off (i.e. format=fixed)? I think that part is currently looking for only spaces as line breaks. We need to put CJK aware line breaker there as bobj mentioned.
I must have written that in another bug. Both format=flowed and format=fixed uses the same code. There is no real difference between the two. Look at nsHTMLToTextSinkStream::AddToLine(...).
Whiteboard: Help Wanted - need to identify the code which generates line breaks
Look at nsHTMLToTXTSinkStream::EndLine. If the line breaks are coming from somewhere other than that, it might be part of the problem.
Thanks Daniel, Akkana. I am glad that the operatin is done to unicode instead of converted text (e.g., ISO-2022-JP). Let me study the code and the line break interface then estimate how much work we need.
Whiteboard: Help Wanted - need to identify the code which generates line breaks
Putting on PDT- radar for beta1. If not part of previous release, not needed for beta1.
Whiteboard: [PDT-]
Status: NEW → ASSIGNED
Whiteboard: [PDT-] → [PDT-] Fixed is reviewed but need more testing before check in
I don't think this bug has anything to do with format=fixed or line-wrapping from a display format perspective. The issue is that each line in outgoing mail should not be longer than allowed by the RFC. Previous releases do break lines in outgoing mail. Long lines could potentially break some mail agents that do not gracefully handle lines longer that allowed by the mail RFC. Is the above correct? If so, we should update the summary and have this reconsidered for beta1.
You are right that the summary is somewhat misleading. It should be something like "Line breaking broken for certain charsets". Broken as in, doesn't break at all and broken as in puts (CR)LF where (CR)LF are not allowed.
For some reason, the summary line was misleading since the point of this bug is that line breaking is broken in Japanese even when the format=flowed is turned off. So corrected.
Summary: Format=fixed JPN mail does not line-wrap long lines at all → JPN mail Line Breaking is broken
Note that plain text is preferred by many users in Japan.
Summary: JPN mail Line Breaking is broken → JPN plain text mail Line Breaking is broken
Whiteboard: [PDT-] Fixed is reviewed but need more testing before check in → [PDT-] Fix was reviewed but need more testing before check in
With the patch, the automatic test fails. It generates incorrect line breaks. I need to investigate if the problem is the line breaker's or caller's. Here are the lines which failing. WRONG: , introduced by a fairly long line to see how CORRECT: introduced by a fairly long line to see how WRONG: Here is a line ending with a space followed by a line break. Plaintext output should contain only one space (and no line breaks) CORRECT: Here is a line ending with a space followed by a line break. Plaintext output should contain only one space (and no line breaks) between
Is this something we care to renominate for Beta1? How likely will users create email with very long lines and how likely would that email break other mail readers?
I can no longer reproduce the problem in the automated test (linux) with my latest build. I tried that with my local build twice (pulled Fryday morning and Monday morning). On windows, I sent a plain text mail with the same data as auto test but it was sent out with a correct line break.
Under either format=fixed or format=flowed, the current plain text breaking is terrible as seen on the attached JPG image. This is worse than it has ever been. We really should fix this.
The same message shows OK under Mozilla. So what are we doing differently from before?
There is a small hack in the line wrapping algorithm that causes lines just _a little_ too long to not line break. Could that make that improvement? It seems unlikely though since the short lines are quite long.
Sorry, everyone. There were 3 spaces in my Japanese data. The font size is very big and it's hard to see that there is an ASCII space. That was what was causing the problem. So as of today's build, there is no change in the status of this bug. Lines are still going out "unbroken".
Whiteboard: [PDT-] Fix was reviewed but need more testing before check in → [PDT-] Fix was reviewed, passed automated test.
Target Milestone: M14 → M16
changing beta1 to beta2, the patch is old, I need to merge and re-test.
Keywords: beta1beta2
Added a line breaker bug as a depend.
Depends on: 35025
Whiteboard: [PDT-] Fix was reviewed, passed automated test. → [PDT-]
Checked in. Now japanese text have breaks but note that it breaks by number of characters not bytes (since the code handling unicode text no information about the final outgoing charset). If we want to break by number of bytes, we need to change a different place also need a separate bug for that.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Keywords: nsbeta2
** Checked with 6/2/2000 Win32 build ** Lines are now breaking at least at the character count specified in the Preferences. For Japanese, however, it would be better to count bytes as we have done in the past. This avoids breaking some mailers which may not have ways to handle such long lines. Marking it verified as fixed. Will file a new bug on the remaining problem.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: