JPN plain text mail Line Breaking is broken

VERIFIED FIXED in M16

Status

defect
P3
critical
VERIFIED FIXED
20 years ago
11 years ago

People

(Reporter: momoi, Assigned: nhottanscp)

Tracking

Trunk
x86
Windows NT
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [PDT-])

Attachments

(2 attachments)

** Observed with 2/8/2000 Win32 build (2000020817) **

Now that Bug 26802 is fixed, I tried format=fixed with the 
following pref setting:

user_pref("mailnews.send_plaintext_flowed", false);

It works and the Content-Type header omits format=flowed.
But when we input a long Japanese line without a hard break,
we send out iso-2022-jp mail without any line wrapping.

This is not something we have been doing in Navigator 3 or 
Communicator. We always wrapped long plain text mail when format 
is fixed. It seems to me that this has helped in inter-operability
area and we should not stop doing that in Mozilla.

We should fix this in beta 1.
Now that finally we can see the format=fixed plain text mail,
we also see a problem that needs to be resolved for beta 1.
Marking it as such.
Keywords: beta1
QA Contact: lchiang → momoi
Is it possible for me, with a western windows system and no knowledge of 
Japanese, to reproduce this? My guess is that the Japanese strings doesn't have 
any characters that nsString::IsSpace() returns true on and therefore the line 
breaking algorithm doesn't find anywhere to put a line break.

If that is the case, should nsString::IsSpace() be fixed or is it the search for 
a place to break that should be fixed? 

By the way, see bug 27055 which also has to do with line breaking.

I guess this bug should have target milestone M14, but I will not that touch 
that.
I'm touching that. Just forgot to do that.
Daniel, I thought you had a thesis defense to do.
Seriously, CJK line-wrapping should be done by nhotta.

If you're curious, you can copy paste from a Japanese
web page, engage File | Edit Page on such a page, copy
a line, and then paste into Plain Text editor many times
wihout pressing Enter. Then Choose View | Charater Set Multibyte
 | Japanese (ISO-2022-JP). Then Send. 
Target Milestone: M14
You're so right. :-)

I wrote the current code that does the wrapping so I'm interested in how it 
evolves and I'm also trying to cut down on my spare time, that's why I continue 
to write strange comments. :-)
Is this specific to ISO-2022-JP? Or happens with other charsets if we set the 
pref to send format=fixed?
Rather than checking for whitespace, shouldn't we use the linebreaking methods?
    http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/public/nsILineBreaker.h
    http://lxr.mozilla.org/seamonkey/source/intl/lwbrk/tests/TestLineBreak.cpp

CC'd ftang for linebreaking expertise.

Note comment from http://bugzilla.mozilla.org/show_bug.cgi?id=26734:
  ----- Additional Comments From jgmyers@netscape.com  2000-02-07 16:43 -----
  My take:

  A client that has better definition of "word" than RFC 2646 should use that 
  better definition.  If Mozilla knows that text is Japanese, it should use the 
  Japanese line-breaking rules for deciding when to insert soft line breaks for 
  both generation and layout.
With the last testable build, 2/8/2000, with the pref option set to
format=fixed, I see the following:

1. ASCII plain text: wraps
2. Latin accents with QP -- wraps but slightly uneven line breakings
3. Japanese (ISO-2022-JP) -- no wrapping

This problem may be limited to CJK. I'll try next C and K.
Does anyone know where in the code the line break are generated in case of 
format=flow is off (i.e. format=fixed)?
I think that part is currently looking for only spaces as line breaks. We need 
to put CJK aware line breaker there as bobj mentioned.
I must have written that in another bug. Both format=flowed and format=fixed 
uses the same code. There is no real difference between the two. Look at 
nsHTMLToTextSinkStream::AddToLine(...).
Whiteboard: Help Wanted - need to identify the code which generates line breaks
Look at nsHTMLToTXTSinkStream::EndLine.  If the line breaks are coming from
somewhere other than that, it might be part of the problem.
Thanks Daniel, Akkana. I am glad that the operatin is done to unicode instead of 
converted text (e.g., ISO-2022-JP).
Let me study the code and the line break interface then estimate how much work 
we need.
Whiteboard: Help Wanted - need to identify the code which generates line breaks
Putting on PDT- radar for beta1.  If not part of previous release, not needed 
for beta1.
Whiteboard: [PDT-]
Status: NEW → ASSIGNED
Whiteboard: [PDT-] → [PDT-] Fixed is reviewed but need more testing before check in
I don't think this bug has anything to do with format=fixed or line-wrapping
from a display format perspective.  The issue is that each line in outgoing
mail should not be longer than allowed by the RFC.

Previous releases do break lines in outgoing mail.  Long lines could
potentially break some mail agents that do not gracefully handle lines
longer that allowed by the mail RFC.

Is the above correct?  If so, we should update the summary and have this
reconsidered for beta1.
You are right that the summary is somewhat misleading. It should be something 
like "Line breaking broken for certain charsets". Broken as in, doesn't break at 
all and broken as in puts (CR)LF where (CR)LF are not allowed.
For some reason, the summary line was misleading
since the point of this bug is that line breaking is
broken in Japanese even when the format=flowed is turned
off. So corrected.
Summary: Format=fixed JPN mail does not line-wrap long lines at all → JPN mail Line Breaking is broken
Note that plain text is preferred by many users in
Japan.
Summary: JPN mail Line Breaking is broken → JPN plain text mail Line Breaking is broken
Whiteboard: [PDT-] Fixed is reviewed but need more testing before check in → [PDT-] Fix was reviewed but need more testing before check in
With the patch, the automatic test fails. It generates incorrect line breaks. I 
need to investigate if the problem is the line breaker's or caller's.
Here are the lines which failing.

WRONG:
, introduced by a fairly long line to see how
CORRECT:
introduced by a fairly long line to see how

WRONG:
Here is a line ending with a space followed by a line break.
Plaintext output should contain only one space (and no line breaks)

CORRECT:
Here is a line ending with a space followed by a line break. Plaintext
output should contain only one space (and no line breaks) between

Is this something we care to renominate for Beta1?
How likely will users create email with very long lines and how likely would
that email break other mail readers?
I can no longer reproduce the problem in the automated test (linux) with my 
latest build. I tried that with my local build twice (pulled Fryday morning and 
Monday morning).
On windows, I sent a plain text mail with the same data as auto test but it was 
sent out with a correct line break.
Under either format=fixed or format=flowed, the current
plain text breaking is terrible as seen on the attached JPG image.
This is worse than it has ever been. We really should fix this.
The same message shows OK under Mozilla.
So what are we doing differently from before?
There is a small hack in the line wrapping algorithm that causes lines just _a 
little_ too long to not line break. Could that make that improvement? It seems 
unlikely though since the short lines are quite long.
Sorry, everyone. There were 3 spaces in my Japanese data. The font
size is very big and it's hard to see that there is an ASCII space.
That was what was causing the problem. So as of today's build, 
there is no change in the status of this bug. Lines are still 
going out "unbroken".
Whiteboard: [PDT-] Fix was reviewed but need more testing before check in → [PDT-] Fix was reviewed, passed automated test.
Target Milestone: M14 → M16
changing beta1 to beta2, the patch is old, I need to merge and re-test.
Keywords: beta1beta2
Added a line breaker bug as a depend.
Depends on: 35025
Whiteboard: [PDT-] Fix was reviewed, passed automated test. → [PDT-]
Checked in.
Now japanese text have breaks but note that it breaks by number of characters 
not bytes (since the code handling unicode text no information about the final 
outgoing charset).
If we want to break by number of bytes, we need to change a different place also 
need a separate bug for that.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Keywords: nsbeta2
** Checked with 6/2/2000 Win32 build **

Lines are now breaking at least at the character
count specified in the Preferences.
For Japanese, however, it would be better to count bytes
as we have done in the past. This avoids breaking
some mailers which may not have ways to handle such long lines.
Marking it verified as fixed.  Will file a new bug on the remaining
 problem.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.