Implement RFC 3676 delsp=yes for format=flowed ISO-2022-JP CTE 7bit plaintext messages for sending (comment #148). Already implemented for reading in bug 231701.

VERIFIED FIXED in Thunderbird 45.0

Status

MailNews Core
MIME
--
critical
VERIFIED FIXED
18 years ago
6 months ago

People

(Reporter: Katsuhiko Momoi, Assigned: Jorg K (GMT+2))

Tracking

(Blocks: 1 bug, {intl})

Trunk
Thunderbird 45.0
x86
All
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [DelSP=yes is currently for iso-2022-jp only, so please don't open excess bug for other charset at B.M.O], URL)

Attachments

(1 attachment, 15 obsolete attachments)

7.71 KB, text/plain
Details
(Reporter)

Description

18 years ago
** Observed with 2/5/2000 Win32 build **

In Bug 16398, an enhancement feature for format=flowed for plain/text
mail is implemented. This seems to be a good thing and in principle
I agree with its desires. But the RFC 2646 which defines it, 

http://www.cis.ohio-state.edu/htbin/rfc/rfc2646.html

does not seem to provide for languages which do not contain
spaces as word-breaks. For example, there is no space between words in
pure Japanese, Chinese, and Korean text generally. 

Thus it is possible to have a "word" as defined in RFC 2646
which could be 15 more lines long. In such languages, there really
are often no differences between a word break and a paragraph break.

Bug 16398 seems to have some problem left still, but I'm filing this
bug to make sure that for Beta 1, either format=flowed is compliant with
CJK line breaking conventions, or should not be the default setting
for these languages.

With the above M14 build, I see 2 problems in Japanese plain/text mail.

1. When we create a Japanese paragraph consiting of many characters, e.g.
700-1200 characters without a space break, it goes out without any 
CJK line breaking, which should occur every 70 bytes or so whether or
not a space exists. Not being able to follow the CJK line-breaking
rules leads to a 1-line message containing several hundred characters 
for these paragraphs, which then cannot be displayed well at all in 
mailers which do not have line-wrapping functions. This in effect violates 
the spirit of RFC 2646.

2. When we create these long paragraphs in Japanese (equivalent of
word in the current RFC 2616 implementation unfortuantely),
we seem to have a problem in converting to ISO-2022-JP
message in send. Thus this type of paragraph ends in data
corruption for Japanese mail send. I attach a JPG view of such 
a problem.

To reproduce this problem of data corrution, view the following
Japanese line under Japanese (Shift_JIS) encoding setting
and copy/paste into Mozilla Plain/Text Composer. Paste it
many times without any space break so it will be become
a fairly long (10 or more lines) but not unusual Japanese paragraph.

Make a few more paragraphs like that and vary the length of each 
paragraph and if you have enough characters in it, the data corruption a
s you see in the attached image will occur.

In general, we need to review other issues of RFC 2616 for CJK and
other languages which may not fit well within the definition. 
These issues should be somehow dealt in Beta1.
(Reporter)

Comment 1

18 years ago
The Japanese data line to copy/paste:

これは長い日本語のテキストですので、行が折れると思います。
(Reporter)

Comment 2

18 years ago
Created attachment 5000 [details]
NOT REPRODUCIBLE in 2015 and most likely due to a different problem: JPG image of a corrupted Japanese message with a long long word/paragraph when sent with format=flowed
(Reporter)

Comment 3

18 years ago
Assigning myself as QA contact.
Keywords: beta1
QA Contact: lchiang → momoi

Comment 4

18 years ago
Assigning to Daniel since he wrote this support.

- rhp
Assignee: rhp → bratell

Comment 5

18 years ago
This is interesting. Is there a simple way to distinguish western languages from 
the asian (or other similar) when looking at a mail header?

I can't really "fix" this until I have gotten bug 16398 back on track and I 
think that a couple of the problems mentioned depends on bug 16398 being broken. 

I don't know when I can do since I will present my master thesis in only a 
couple a weeks. I will look at both bugs today but no progress is promised.
Depends on: 16398
(Reporter)

Comment 6

18 years ago
nhotta should be able to help you with the mail charset and
what line beraking rules to apply.  I would think that we already 
have CJK line-breaking rules implemented in format=fixed plain/text. 

We need to have CJK line-breakign rules to be working in M14,
be it format=fixed or format=flowed. If not, I suggest that 
we set the default to be the one which can do CJK line-breaking 
rules, i.e. format=fixed. 

Comment 7

18 years ago
My initial idea was to only use format=flowed when knowing that the text was of 
a western, alternatively set format=fixed when seeing a message that needed 
special line breaking. That would be a start until someone (maybe me) figured 
out how format=flowed and those line breaks could work together. To set the 
default to format=fixed would be a pity since very few would know how to turn it 
on, but if it's necessary to unbreak mail, sure it should be done. When is the 
M14 deadline?
(Reporter)

Comment 8

18 years ago
Daniel, I think you should work with nhotta. He should be able to
look at this on Monday. Then we should be able to arrive at
what would be the best way to preserve CJK line breaking in plain
text mail. Please discuss the schedule with him.

I would be also curious to see what would happen to these problems
when Bug 16398 is fixed.
(Reporter)

Comment 9

18 years ago
I tried a pref line to turn off format=flowed and make plain text
into format=fixed, i.e.

user_pref("mailnews.send_plaintext_flowed", false);

But this did not work. It did not change the plain text
mail to regular plain text mail or perform line-wrapping
on CJK mail as mail was generated. It was still marked as
format=flowed.

Comment 10

18 years ago
Could you file a seperate bug about the pref not turning format=flowed off and 
assign it to me? 

What the format=flowed does when composing mails is to space stuff some lines 
and add a space to the end of some others as well as stripping trailing spaces 
from a line. If that could interfere, it's certainly a problem with the 
standard. If it's just the line wrapping that's wrong it's in the same file but 
not actually a part of the format=flowed code.

Comment 11

18 years ago
>My initial idea was to only use format=flowed when knowing that the text was of 
>a western
I agree, until we figure out how to do this for other languages.
You can check a charset then generate format=flowed only if it's "us-ascii" or 
"ISO-8859-1".

Comment 12

18 years ago
I found an additional issue. 
The following is from RFC2546 page 5,

   [Quoted-Printable] encoding SHOULD NOT be used with Format=Flowed
   unless absolutely necessary (for example, non-US-ASCII (8-bit)
   characters over a strictly 7-bit transport such as unextended SMTP).
   In particular, a message SHOULD NOT be encoded in Quoted-Printable
   for the sole purpose of protecting the trailing space on flowed lines
   unless the body part is cryptographically signed or encrypted (see
   Section 4.6).

   The intent of Format=Flowed is to allow user agents to generate
   flowed text which is non-obnoxious when viewed as pure, raw
   Text/Plain (without any decoding); use of Quoted-Printable hinders
   this and may cause Format=Flowed to be rejected by end users.

We are putting Format=Flowed with bodies containing quoted-printable.
That needs to be fixed.

Comment 13

18 years ago
Because of the QP issue, we want to use format=flowed only for "us-ascii" for 
beta (sending 7bit with QP is our default).
Daniel, could you write a patch to do that?

Comment 14

18 years ago
I've seen that and noone would be happier than me if quoted printable was 
disabled by default for 8 bit mail. I think the world is ready for that now. 

Note that the RFC says that you shouldn't QP-encode a mail when it has been 
format=flowed, not the other way around. Furthermore, it leaves it open to use 
qp anyway with the phrase "unless absolutely necessary (for example, 
non-US-ASCII (8-bit) characters over a strictly 7-bit transport such as 
unextended SMTP)". If we use qp because we thinks that phrase apply, we could 
also use format=flowed. If we don't think that phrase apply, we shouldn't 
qp-encode 8 bit mail in the first place. 

What i want to say is that I'm reluctant to disable format=flowed for 8 bit 
e-mails for that reason. For one thing I could never more use format=flowed in 
mozilla. It's very difficult to write a mail in swedish without using an 8 bit 
character.

I would rather disable the qp thing in that case but I'm not sure how to do 
that. When looking at the source ot looks like it has something to do with some 
strict_mime-prefs. 

So what do you think? In any case, I think it's harmless for 8 bit e-mails.

Comment 15

18 years ago
There is a correction in my last comment. Sending 7bit with QP is NOT our 
default (it is for headers but not for bodies).
The strict_mime pref has UI, so the user can enable/disable. I think what we can 
do for now is to check the strict_mime pref value in addition to check charset 
names ("us-ascii" and "ISO-8859-1") to decide whether to generate format=flowed.
Does that sound good?
(Reporter)

Comment 16

18 years ago
Added John M. to CC.

Comment 17

18 years ago
Is this bug about CJK line breaking or QP?  If the former, could the QP issue be 
moved to another bug?

Comment 18

18 years ago
Okay, I'll file a separate bug for QP.

Comment 19

18 years ago
Filed a separate bug for QP issue 26904. Please add comments to that bug for QP 
issue.

Comment 20

18 years ago
My take:

A client that has better definition of "word" than RFC 2646 should use that 
better definition.  If Mozilla knows that text is Japanese, it should use the 
Japanese line-breaking rules for deciding when to insert soft line breaks for 
both generation and layout.

Technically, format=fixed does not allow for *any* line breaking during display.

Comment 21

18 years ago
Ok, the problem is that RFC 2646 provides no way to generate a soft line break 
without also generating a space.

Comment 22

18 years ago
So for CJK, lines are all fixed. As long as the lines are wrapped properly, it 
should work okay for CJK. But do we want to put format=flowed when the lines are 
actually not flowed for CJK (and languages which line breaks without a space)?
Also we need to check if layouts (both ours and other mailers) work with CJK 
when format=flowed is used (e.g., check if kinsoku works properly).

Comment 23

18 years ago
I have made a patch to nsMsgCompUtils.cpp and nsMsgCompoese.cpp that only allows 
format=flowed for us-ascii and iso-8859-?. If someone wants a charset added to 
the list of charsets that surely can cope with format=flowed, it's just to say. 
I will add the patch as an attachment to this bug so if someone (rhp?)can review 
and check it in, this bug should be "fixed". 
Status: NEW → ASSIGNED
Whiteboard: Fix waiting for someone to check it in.

Comment 24

18 years ago
Created attachment 5068 [details] [diff] [review]
The patch that fixes this bug.

Comment 25

18 years ago
Restricting format=flowed to a set of inclusive charsets is not the correct fix.
(Reporter)

Comment 26

18 years ago
Daniel, jgmyers, nhotta and I had a chat yesterday about format=flowed
and CJK after the  MozillaZine i18n session  
I'm going to attach a log of that discussion to this bug report.
Please take a look.

Comment 27

18 years ago
Should we provide a set of exclusive charsets instead?
Or can we provide non space soft break for CJK?
(Reporter)

Comment 28

18 years ago
Created attachment 5069 [details]
Log of 2/7/2000 chat on format=flowed: jgmyers, nhotta, & momoi

Comment 29

18 years ago
Naoki, 
Could you check this in today. I'm swamped with some other tree changes at the 
moment.

- rhp

Comment 30

18 years ago
I see, but I don't quite understand what the bug is and even less how to fix it. 

A comment to the log is that when format=flowed messages are recieved, they are 
not wrapped in a <PRE>...</PRE> but instead <blockquote> and <br> are used. And 
<tt> if the mail should be displayed with a fixed width font. 

If format=flowed is the thing breaking CJK mail, then the patch I attached could 
at least be seen as temporary workaround but that would make continued debugging 
hard. So what do you think? I'm sorry but since I don't know how to test this 
myself I have to rely on what you say.

Comment 31

18 years ago
I think the key points from the chat which relate to this bug are:

1) For layout, Mozilla can and should insert soft line breaks using Japanese 
line breaking rules.

2) For generation, format=flowed provides no method of inserting soft line 
breaks which does not also insert a space, so Japanese cannot use format=flowed 
soft line breaks.

3) For format=fixed, generation wraps lines using hard line breaks.  Having 
format=flowed generation wrap lines using hard line breaks is no worse.

4) The image attached to this bug looks like the result of a CRLF inserted while 
ISO 2022 was shifted into a non-ASCII mode.  Per the iso-2022-jp charset, the 
generator must shift back to us-ascii before inserting the CRLF.

Comment 32

18 years ago
It would help to have an attachment of the message corresponding to the attached 
image.  That way, we can determine whether it is the generation of format=flowed 
that is broken.

Comment 33

18 years ago
So if I understand things correctly:
1) has nothing to do with mail nor format=flowed but is a layout issue.
2) says that we have to disable format=flowed for Japanese.
3) and 4) the line wrapping may damage mail contents. This has nothing to do 
with format=flowed really.

Disabling format=flowed for Japanese is simple. The crude fix I attached could 
easily be changed to that.

Line wrapping must be changed to change mode(?) before inserting line breaks and 
than back again before letting text continue. Does the same apply to quote signs 
(>), ordered lists, (1., 2., etc) and unordered lists (*)? Is this only with a 
certain charset or with several? How do one know which mode(?) the text is in?

Comment 34

18 years ago
(2) does not say we have to disable format=flowed for Japanese.

(4) is an issue for any stateful charset where CRLF is only allowed in a strict 
subset of the possible states.

Comment 35

18 years ago
Well, (2) could be implemented by never using soft line breaks in format=flowed, 
but without soft line breaks there are no reason at all to use format=flowed so 
I think it's honest to disable the whole thing until the RFC hava come up with 
an alternative solution.

For (4), how is that solved in for instance Netscape 4? It sounds like it must 
be a really hard problem and way out of anything I can do in a week on my spare 
time.

Comment 36

18 years ago
For (2), the points mentioned in my last comments are,
Do we want to put format/flowed even all lines are fixed for Japanese?
Are we sure that putting format/flowed for Japanese do not break layouts 
(mozilla and others)?
I am not arguing about applying format/flowed to CJK should work or not.
For (4), we can investigate and estimate how much work needed for fixing it.

Comment 37

18 years ago
Without format=flowed, layout is technically not allowed to wrap lines.

There's no reason not to use format=flowed; the reason to use it is to remove 
the need for a format=fixed generator.

(4) is an issue for both format=flowed and format=fixed.  Whatever solution 
exists for format=fixed can be directly applied to format=flowed.

Comment 38

18 years ago
> Whatever solution exists for format=fixed can be directly applied to 
> format=flowed.
That's a theory. It depends on the implementation (e.g., when is the generation 
applied before/after convert to ISO-2022-JP, if the generation code does not 
understand ISO-2022-JP then that also need changes, etc...).

Comment 39

18 years ago
I thought of a reason why we might not want to use format=flowed for Japanese 
text.  With format=fixed and the "wrap incoming text" preference enabled, layout 
treats all lines as if they are flowed.  With format=flowed, all lines not 
ending with a space should be treated as fixed, regardless of the preference 
setting.  So emitting a hard line break in format=flowed could end up being 
worse than emitting it in format=fixed.

I still would like to see the message corresponding to the attached image.
(Reporter)

Comment 40

18 years ago
John, I'll send it you later this afternoon -- I'm involved in something else currently.
(Reporter)

Comment 41

18 years ago
I sent you the msgs using 2/7 build but the problem did not occur. 
My original problem was reporte dwith 2/4 build. Someone migh fixed the corruption problem.
I'll try out the new build again and report back.

Comment 42

18 years ago
The message you sent me had a hard line break (CRLF without preceeding SP) in 
the middle of a JIS X 0208-1983 sequence.  This is not legal ISO-2022-JP.
(Reporter)

Comment 43

18 years ago
John, I created 3 paragraphs in that message. It's not all one paragraph.

Comment 44

18 years ago
Yes, the third paragraph had the error.

Comment 45

18 years ago
Oops, never mind, that CRLF was added by the 4.7x client in the process of 
saving the message to a file.  The message itself appears to have no errors, 
other than having lines longer than 996 characters.

Comment 46

18 years ago
...plus each paragraph should end with a single SP, in order to mark the line as 
flowed.

Comment 47

18 years ago
Subject: Bugzilla bug 26734
   Date: Mon, 07 Feb 2000 17:18:28 -0800
   From: leger@netscape.com (Jan Leger)
     To: bobj@netscape.com

Bob-

Can you PDT+  or  PDT- this one, we're not sure.  -Jan
======================================================
With the powers vested in me by the PDT, I declare this PDT-.

This is a great feature, but not a Beta1 stopper.
If it does not work for CJK, then disable it by default for now and
resolve it post-Beta1.
Whiteboard: Fix waiting for someone to check it in. → [PDT-] Fix waiting for someone to check it in.

Comment 48

18 years ago
So the primary problem is line breaking/wrapping. There is no difference when 
using format=flowed and normal mails in the algorithm that finds the place to 
put a CRLF. Right now, when a line gets a little longer than the wrap length, 
the algoritm searches for a space using nsString::IsSpace(...), and finding a 
space it is replaced by CRLF.
Whiteboard: [PDT-] Fix waiting for someone to check it in. → [PDT-] Patch to disable format=flowed for certain charsets waiting for someone to check it in.

Comment 49

18 years ago
I've read through the history and comments and also the IRC log, and I agree 
with John's comments.

When generating Format=Flowed, soft breaks are only inserted after an existing 
space.  The generating agent SHOULD NOT insert a space, because that changes the 
text.  (However, if you are generating very long runs of octets with no spaces, 
you get into trouble, for example, with lines longer than 998 octets.)  If you 
already have guidelines for inserting CRLFs into non-Western charsets, you 
should use them.

It is the intent of Format=Flowed that non-flowed lines shouldn't be wrapped 
on display (they might be ASCII art or source code or a spreadsheet or 
something).  One of the advantages of F=F is that you can now tell flowable 
lines from non-flowable.

Comment 50

18 years ago
In discussing this with client engineers, the suggestion is that when dealing 
with non-Western charsets:

1.  When generating, insert soft breaks between characters, not between spaces.
2.  When receiving, delete the entire soft break (the SP CRLF).

Comment 51

18 years ago
Randy, is "utf-8" a "Western" or "non-Western" charset?

Comment 52

18 years ago
How about "when receiving, if the character before the SP of a soft break is 
CJK, delete the SP as well".

We'd need a good definition of "CJK"  I'm thinking something like U3000-UD7FF, 
UF900-UFAFF.  I don't know what todo with UFF00-UFFEF.

Comment 53

18 years ago
I think Korean uses spaces for word break.

Comment 54

18 years ago
Man, we really need a ZERO WIDTH SOFT HYPHEN character.

For utf-8 and other unicode charsets, format=flowed could be updated to say that 
a CRLF after ZERO WIDTH SPACE is a soft break, delete both the CRLF and ZERO 
WIDTH SPACE.

iso-2022-jp doesn't have an encoding of ZERO WIDTH SPACE, so we would seem to be 
stuck with having the receiver be charset dependent, deleting the space only for  
a specific list of charsets.

Another possibility is to revise format=flowed to say that TAB CRLF is a soft 
break, delete the TAB as well as the CRLF.

Comment 55

18 years ago
Don't forget that you cannot make any assumptions regarding what the receiving 
UA will do. The mail must be fine even if the receiving sida doesn't remove any 
inserted characters.

Updated

18 years ago
Target Milestone: M17

Comment 56

18 years ago
Katsuhiko Momoi wrote
k>oes not seem to provide for languages which do not contain
k>spaces as word-breaks. For example, there is no space between words in
k>pure Japanese, Chinese, and Korean text generally.

   This is simply not true for Korean.
  Korean text IS different from Chinese and Japanese in that
we DO use space as word delimeter. Of course, that does NOT
mean that lines can be broken ONLY at spaces as is currently
implemented in Mozilla 5.x(as of 20000321) and Netscape 3.x/4.x. Lines can be
broken at ANY syllable boundaries including space. 
(Reporter)

Comment 57

18 years ago
Thanks, jung-shik for the correction. I did realize my mistake but have forgotten to
correct it. Now you have. So the problem I describe here applies mainly to Japanese
and Chinese.

Comment 58

18 years ago
Are there many non-MIME UAs in Japan and China?  How do they deal with line 
wrapping?

Updated

18 years ago
Blocks: 14744

Comment 59

17 years ago
Is this completely fixed by nhotta's switch to the use of the linebreaker 
module? If noone objects, I'll mark this bug fixed in a couple of days.

Comment 60

17 years ago
I don't think this is completely fixed; the existing linebreaker module 
generates hard line breaks
causing paragraphs to occasionally format with short lines.

Does anyone have a feel for the prevalance of non-MIME UAs in this space?  Would 
it be reasonable to instead break lines with quoted-printable soft breaks?

Comment 61

17 years ago
We're discussing this within Eudora.  Comments that have come up include: using 

TAB CRLF will break a lot of stuff.  Using Q-P soft line breaks will cause 

problems because lots of clients mess that stuff up.  Having a rule to delete the 

SP CRLF if the preceding character is CJK means that when sending to older 

clients, a space will be left between characters, which looks bad.  Another 

suggestion is to update the F=F spec to say that the SP CRLF sequence should 

always be deleted, which means that SP SP CRLF needs to be used for soft breaks 

in Latin charsets.  This of course causes an extra space to be left between words 

in current F=F clients, and in non-F=F clients.

Comment 62

17 years ago
Just to clarify somewhat my comment above: the suggestion to add a rule to
delete the SP CRLF if the preceding character is CJK means that an older client
will leave the space, causing a space to be left between CJ characters, which
looks bad.  The suggestion to update the F=F spec to say that SP SP CRLF should
always be deleted would require an additional parameter, e.g.,
    Content-Type=text/plain; format=flowed; flowed-version=2
Or something equally ugly; otherwise old text sent to new clients would have no
spaces between words.  Even with the extra parameter, new text sent to old
clients will have an extra space between words. 

Comment 63

17 years ago
Randy, what do you recommend we do now?  I like the idea of a spec revision, but 
we're running up against a 5/16 deadline.

You could revise the F=F spec to specify that TAB CRLF always get deleted.  That 
way you could get away without a flowed-version parameter.

Comment 64

17 years ago
(Actually, the two spaces idea doesn't help -- when sending to old clients, an extra space will always be there.)

In addition to the problems that some existing clients would have with tab, it would still leave a tab between CJK characters.  Would that be better than a space?

It seems that an extra, unwanted character will be present between CJK characters when receivied by older clients.

Comment 65

17 years ago
Our Japanese team says tab is worse than space.

Not using F=F causes just as bad problems with poor line wrap and quotes in Japanese as in English, I'm told.

The proposed spec update is:

1.  On generation, add SP CRLF where soft line break is desired following or preceding a CJK character.
2.  If a space is desired at the point where a soft line break is being added (i.e., the user typed a space) add an
    extra space: SP SP CRLF.
3.  On receiving, if you see 1*SP CRLF, and the character before the space(s) is CJK, or the character
    following the CRLF is CJK, delete the SP as well as the CRLF.

Comment 66

17 years ago
Old F=F clients and non-F=F clients will treat TAB CRLF as whitespace at the end 
of the line followed by a hard line break.  This is no worse than the current 
alternative, which is to insert a hard line break.  Since TAB is whitespace, it 
will be invisible at the end of the line.

Comment 67

17 years ago
Sorry for the lateness of this comment: Ending lines with TAB isn't going to fly. I know too many cases where
TABs will do weird things, even if they are at the ends of lines (like ending up with double spacing because
the TAB wraps).

Randy's statement of the way to proceed seems to be the best; i.e., if you've got a CJK character on either side of the CRLF, remove one SP and the CRLF.

Comment 68

17 years ago
How do you define "a CJK character"?

Updated

17 years ago
Target Milestone: M17 → M20
(Reporter)

Comment 69

17 years ago
I think John asked an important question. 
How do we define CJK characters? For native encodings, 
for sure, we know. But CJK characters are also written 
in Uncode. If we have mixed script of ASCII & Japanese,
it could be hard to draw a clear distinction.

Some of the people on this bug report met at IMC MailConnect 6 in
San Jose recently and talked briefly about this issue.
Randy re-iterated his 5/11 proposal and John proposed 
the "tab" solution as seen above. 

So there is discussion going on.

In the meantime, we have a practical problem to solve for
Beta 2. If I'm not mistaken, we are inserting space+CR 
into Japanese plain text mail. So far, I have seen only 
Mozilla being able to understand this. Presumably, a new
version of Eudora will or is able to uderstand format-flowed.

Given that we probably will inconvenience many other mailers, 
I have a proposal for Beta 2. 

A. Let's consider a workaround -- Daniel, do you have a patch to
   make it format=fixed rather than format=flowed when the send 
   charset is one of Japanese or Chinese? (We'll put aside 
    Unicode issue unles we know a reasonable approach to John's
    question.) All other charsets can remain format=flowed as the
    default.
B. We can put in this workaround until the dust settles and then
   we can evaluate post Beta 2. 
C. One way to go in future is to have an option to turn on/off
   format=flowd in UI. For Japan, we might have to ship
   Beta 2 with the pref set to OFF without the UI. 

Daniel, do you have a safe workaround fix? If so, please tell us. I want 
this bug to move forward to a Beta 2 workaround solution. We can keep 
it open and see how we progress on revision of RFC 2646. 

My poitn again is that we should not be sending out Japanese or
Plain text message which does not conform to an established 
standard. 

Nominatinf ro nsbeta2 and removing [-PDT]. 
Keywords: beta1 → nsbeta2
Whiteboard: [PDT-] Patch to disable format=flowed for certain charsets waiting for someone to check it in. → Patch to disable format=flowed for certain charsets waiting for someone to check it in.

Comment 70

17 years ago
The attachment from 02/08/00 11:32 was a safe workaround but since then things 
have changed. The function where the decision about format=flowed or 
format=fixed is made (ConvertBufToPlainText in nsMsgCompUtils) no longer knows 
the target charset which it used to know. 

Rich removed it in version 1.67 of the file (it wasn't used then) but maybe we 
can put it back. Rich?

If we do it, it's easy to check the charset against a list of "good" (or "bad" 
depending on our approach) charsets when turning format=flowed on or off.

Comment 71

17 years ago
If you want to make the changes, thats fine. Just let me know what you want me 
to review. 

- rhp

Comment 72

17 years ago
Putting on [NEED INFO] radar.  Sending to intl folks to get help on PDT call.
Whiteboard: Patch to disable format=flowed for certain charsets waiting for someone to check it in. → [NEED INFO]Patch to disable format=flowed for certain charsets waiting for someone to check it in.

Comment 73

17 years ago
What I need to know is for which charsets we want to disable format=flowed and 
their exact name (for instance if they uses underscores or hyphens).
Shift_JIS?
ISO-2022-JP?
EUC_JP?
Big5?
GBK?
HZ?
EUC_TW?
EUC_KR?
GB2312?

Comment 74

17 years ago
I think you can use nsICharsetConverterManager2 interface.
A sample code is here.
http://lxr.mozilla.org/seamonkey/source/mailnews/base/util/nsMsgI18N.cpp#440

Comment 75

17 years ago
Putting on [nsbeta2+] radar.
Whiteboard: [NEED INFO]Patch to disable format=flowed for certain charsets waiting for someone to check it in. → [nsbeta2+]Patch to disable format=flowed for certain charsets waiting for someone to check it in.
(Reporter)

Comment 76

17 years ago
Daniel, please let us have an option to turn on
format-flowed to these language via prefs.js item
so that in the near future when the RFC issue is settled,
we can begin to use format=flowed for testing purposes.

Comment 77

17 years ago
So. I've reimplemented the ability to switch off format=flowed depending on 
charset. I had to change a little more than I would have wanted, but I think 
it's a safe fix. A say "think" because this is unchartered land for me. 

For those daring ones who think format=fixed are for cowards, I added a pref
mailnews.disable_format_flowed_for_cjk. Set it to false and there will be no 
check at charsets when deciding if format=flowed is good or not.

The check right now is to disable format=flowed if 
nsMsgI18Nmultibyte_charset(charset) is true.

I will attach the patch for review.

Comment 78

17 years ago
Created attachment 9947 [details] [diff] [review]
New patch that disables format=flowed for multibyte charsets

Comment 79

17 years ago
I'll put this on M17, not because the issue will be resolved then, but as a 
reminder to check in the bandage.

Target Milestone: M20 → M17

Comment 80

17 years ago
Created attachment 9982 [details] [diff] [review]
Patch after review by nhotta

Comment 81

17 years ago
I rewrote some parts of the patch after some suggestions and good advises by 
nhotta. This new patch is much smaller too. Rich, could you review the last 
patch I attached?

Comment 82

17 years ago
Hi Naoki,
The patch looks good to me.

- rhp

Comment 83

17 years ago
Feel free to check it in. I'm still an unresponsible person without checkin 
privileges. :-)

Comment 84

17 years ago
Let me test the last patch and I will check in.

Comment 85

17 years ago
I checked in the patch yesterday.
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 86

17 years ago
I will reopen this. We have a workaround, but the fix (to be able to use 
format=flowed with all charsets) doesn't exist yet. This has no longer any 
target milestone but waits for an updated RFC. Hopefully.
Clearing nsbeta2 keyword and status.
Status: RESOLVED → REOPENED
Keywords: nsbeta2
Resolution: FIXED → ---
Whiteboard: [nsbeta2+]Patch to disable format=flowed for certain charsets waiting for someone to check it in.
Target Milestone: M17 → Future

Updated

17 years ago
Status: REOPENED → ASSIGNED
(Reporter)

Comment 87

17 years ago
** Checked with 6/12/2000 late night build for Win32 **

The above build includes the patch authorde by Daniel and
checked in by nhotta.

Sfter looking at Plain text mail send for JPN (ISO-2022-JP),
Chinese (Big5 & GB2312), Korean (Unicode), and Unicode (UTF-8),
I am satisfied that these on the defaut menu are the only
ones which actually send out format=fixed msgs for plain text.
For other items on the menu like ISO-8859-x, KOI8-R, etc.
format=flowed msgs go out.

This confirms the workaround fix on an M17 build.

We will levae thi bug open for when the RFC has an 
agreed upon method to do format=flowed for these languages.
We could have left out EUC-KR but hopefully that would not
inconvenience a lot of people at this stage. Other unresolved
issue is Thai which uses dictionary-based line breaking and
has no discernible line breaks either. Don't know what we should do 
about it. Perhaps, we can leave it out of the list later also.
(Reporter)

Updated

17 years ago
Keywords: intl

Updated

14 years ago
Blocks: 168420

Comment 88

14 years ago
Maybe Bug 231701 is helpful to solve your problems: A _suggestion_ for a new 
standard defines "format=flowed delsp=yes" which means that a space before the 
CRLF linebreak is logically removed before the message is shown. So, Mozilla can 
add a space and CRLF (format=flowed) where the new linebreak should be without 
having a space in the final message that is shown.

The only open question would be: Where can/should such linebreaks be added?
Product: MailNews → Core
Is RFC 3676 already changed to "Standard" from "Draft"?
http://www.ietf.org/rfc/rfc3676.txt
Blocks: 231701

Updated

13 years ago
No longer blocks: 231701
Depends on: 231701
WADA in comment #89:
> Is RFC 3676 already changed to "Standard" from "Draft"?
> http://www.ietf.org/rfc/rfc3676.txt

dunno, but daniel writes:
"It's probably still a problem since it was something the specification authors had forgotten about, though for such messages format=flowed has been disabled to avoid destroying people's mail.

I won't look at it more since I've upgraded myself to Opera after my years with Mozilla."

reassigning to default
Assignee: bratell → nobody
Status: ASSIGNED → NEW
QA Contact: momoi → mime

Updated

11 years ago
OS: Windows NT → All

Comment 91

9 years ago
FYI, RFC 3676 is IETF standards track.  You can check the status for an RFC with
a URL like this:

  http://tools.ietf.org/html/rfc3676

You will see RFC 3676 is "PROPOSED STANDARD" (the same standard status as IMAP).
Product: Core → MailNews Core
Priority: P3 → --
Target Milestone: Future → ---

Updated

7 years ago
Depends on: 553540
depend on 553526.  Before passing body text to serializer, unexpected space is inserted to by wrap setting.
Depends on: 553526

Updated

7 years ago

Updated

7 years ago
Assignee: nobody → m_kato
Created attachment 448471 [details] [diff] [review]
WIP v0.1
Blocks: 355209

Comment 94

6 years ago
The problem still exists, see the screen shot and detailed report on:
http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453

Also, see the bug report:
https://bugzilla.mozilla.org/show_bug.cgi?id=653342

Thanks.

Updated

5 years ago
Blocks: 653342
Created attachment 646002 [details] [diff] [review]
V1
Attachment #448471 - Attachment is obsolete: true
Because rfc3676 is already changed to official RFC, "format=flowed,delsp=yes in text/plain mail composition or text/plain part generation" is sufficient as solution of this bug, isn't it?
Created attachment 648253 [details] [diff] [review]
xpcshell tests

This tests depends on the test for bug 653342.

I hope this test will work as expected but I am not 100% sure because there is no fix for this issue yet. ;-p
See Also: → bug 653342
See Also: bug 653342
Created attachment 8531547 [details] [diff] [review]
Part 1: Move getMsgHeaders into test_message-helpers and rename it to get_message_contents

This utility method can be also used for unit test for this issue.
Attachment #648253 - Attachment is obsolete: true
Attachment #8531547 - Flags: review?(Pidgeot18)
Created attachment 8531548 [details] [diff] [review]
Fix with a mozmill test

https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-central&revision=34c80ed7d9b3

I forgot to add https://bugzilla.mozilla.org/attachment.cgi?id=8530843 in bug 553526 so some mozmill tests failed there.
I will post a new patch which will pass all mozmill tests without the fix for bug 553526.
Attachment #646002 - Attachment is obsolete: true
Created attachment 8531573 [details] [diff] [review]
Part 2: Use delsp=yes for text/plain message

https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-central&revision=f7d8285a17cf

This try seems good.
Attachment #8531548 - Attachment is obsolete: true
Attachment #8531573 - Flags: review?(Pidgeot18)
Note about mozmill tests for this issue.

After bug 1069518, we can not implement nsIEditor in JS so we can not write any xpcshell tests for this issue.

Writing mozmill test that 'delsp=yes' message can send correctly needs mailnews/test/fakeserver/smtpd.js. And smtpd.js imports auth.js from resource://testing-common/mailnews. But testing-common path can not be used in mozmill tests now. I've open a new bug for it. Bug 1107045. I will write more tests for this issue after bug 1107045 is ficed.
Comment on attachment 8531547 [details] [diff] [review]
Part 1: Move getMsgHeaders into test_message-helpers and rename it to get_message_contents

Review of attachment 8531547 [details] [diff] [review]:
-----------------------------------------------------------------

::: mail/test/mozmill/shared-modules/test-message-helpers.js
@@ +49,5 @@
> +/**
> + * Helper to get the full message content.
> + *
> + * @param aMsgHdr: nsIMsgDBHdr object whose text body will be read
> + * @return {Map(partnum -> message headers), Map(partnum -> message text)}

This documentation is wrong--you're not mentioning the object property names.
Attachment #8531547 - Flags: review?(Pidgeot18) → feedback+
(In reply to Hiroyuki Ikezoe (:hiro) from comment #101)
> Writing mozmill test that 'delsp=yes' message can send correctly needs
> mailnews/test/fakeserver/smtpd.js. And smtpd.js imports auth.js from
> resource://testing-common/mailnews. But testing-common path can not be used
> in mozmill tests now. I've open a new bug for it. Bug 1107045. I will write
> more tests for this issue after bug 1107045 is ficed.

If you just need a body for the message, you can steal the draft-based approach I used in the test-multipart-related.js approach (which was split off from my large xpcshell test testing all of these conditions precisely because it needed an editor). There is very little done in SMTP- or NNTP-specific code that is not done when saving via draft, so I would be surprised if a draft-based approach were insufficient to test the code.
I've noticed that between this bug and bug 553526, as well as a few others in the tangled web of related bugs, you seem to be attempting to work on the problems surrounding the mess that is line wrapping in MIME messages. However, the patch in this bug and the patch in bug 553526 do not appear to fit together into one coherent story. This patch implies that delSp=true will be in effect whenever format=flowed is in effect, while bug 553526 implies they are two separate, independently configurable options. Could you please explain your overarching approach to these issues, and how all of these patches fit together? This is unfortunately necessary before I can try to fully review the remaining patches.
(In reply to Joshua Cranmer [:jcranmer] from comment #103)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #101)
> > Writing mozmill test that 'delsp=yes' message can send correctly needs
> > mailnews/test/fakeserver/smtpd.js. And smtpd.js imports auth.js from
> > resource://testing-common/mailnews. But testing-common path can not be used
> > in mozmill tests now. I've open a new bug for it. Bug 1107045. I will write
> > more tests for this issue after bug 1107045 is ficed.
> 
> If you just need a body for the message, you can steal the draft-based
> approach I used in the test-multipart-related.js approach (which was split
> off from my large xpcshell test testing all of these conditions precisely
> because it needed an editor). There is very little done in SMTP- or
> NNTP-specific code that is not done when saving via draft, so I would be
> surprised if a draft-based approach were insufficient to test the code.

Actually I do not know it's sufficient or not because code paths of draft and sending message were hard to read for me. So I just thought we should write sending case for the safety.
(In reply to Joshua Cranmer [:jcranmer] from comment #104)
> I've noticed that between this bug and bug 553526, as well as a few others
> in the tangled web of related bugs, you seem to be attempting to work on the
> problems surrounding the mess that is line wrapping in MIME messages.
> However, the patch in this bug and the patch in bug 553526 do not appear to
> fit together into one coherent story. This patch implies that delSp=true
> will be in effect whenever format=flowed is in effect, while bug 553526
> implies they are two separate, independently configurable options. Could you
> please explain your overarching approach to these issues, and how all of
> these patches fit together? This is unfortunately necessary before I can try
> to fully review the remaining patches.

In short:

* As a message sender Thunderbird should always use "delsp=yes". (this bug)
* As a message receiver Thunderbird should care both of "delsp=yes" and no "delsp". (bug 553526)
Comment on attachment 8531573 [details] [diff] [review]
Part 2: Use delsp=yes for text/plain message

Review of attachment 8531573 [details] [diff] [review]:
-----------------------------------------------------------------

I have not fully played with this patch yet (there's one change in nsMsgSend.cpp that scares me, noted below). At the very least, I think this patch needs some tests in HTML scenarios as well, since some of the changes you are poking at affect HTML logic.

::: mail/test/mozmill/composition/test-delsp.js
@@ +48,5 @@
> +}
> +
> +const kMultibyteWord = "いろはにほへと";
> +const kRepeatCount = 1000;
> +let longMultibyteLine = create_long_multibyte_line(kRepeatCount);

const kLongMultibyteLine = kMultibyteWord.repeat(kRepeatCount);
(It's awfully nice of JS to give us a very easy way to make very long strings. ☺)

::: mailnews/compose/src/nsMsgSend.cpp
@@ +1525,5 @@
>  
>    //
>    // Query the editor, get the body of HTML!
>    //
> +  uint32_t  flags = nsIDocumentEncoder::OutputRaw | nsIDocumentEncoder::OutputNoFormattingInPre;

This change is especially scary to me. As I understand it, this means we are now relying entirely on EnsureLineBreaks to keep the content tight enough, and that method has a pretty lousy line break mechanism. I will not accept a change here without some tests that specifically test overly-long lines in HTML.

@@ +3152,5 @@
>    body.Trim(" ", false, true);
>  
>    if (body.Length() > 0)
>    {
> +    if (!PL_strcmp(attachment1 [details] [diff] [review]_type, TEXT_HTML)) {

This condition should probably be inverted--if the body is text/plain, don't run the EnsureLineBreaks, otherwise run EnsureLineBreaks.

I don't think we get this far with bodies that aren't text/html or text/plain, but it never hurts to be safe.
Attachment #8531573 - Flags: review?(Pidgeot18) → feedback+
Created attachment 8537112 [details] [diff] [review]
Part 0 set_signature_prefs_in_each_test.patch

This patch is the same as attachment #8530843 [details] [diff] [review] in bug 553526.
Attachment #8537112 - Flags: review+
Created attachment 8537113 [details] [diff] [review]
Part 1 move_getMsgHeaders_into_helpers.patch

Fixed the comment.
Attachment #8531547 - Attachment is obsolete: true
Attachment #8537113 - Flags: review?(Pidgeot18)
Created attachment 8537115 [details] [diff] [review]
Part 2 use_delsp.patch

(In reply to Joshua Cranmer [:jcranmer] from comment #107)
> ::: mail/test/mozmill/composition/test-delsp.js
> @@ +48,5 @@
> > +}
> > +
> > +const kMultibyteWord = "いろはにほへと";
> > +const kRepeatCount = 1000;
> > +let longMultibyteLine = create_long_multibyte_line(kRepeatCount);
> 
> const kLongMultibyteLine = kMultibyteWord.repeat(kRepeatCount);

Done.

> ::: mailnews/compose/src/nsMsgSend.cpp
> @@ +1525,5 @@
> >  
> >    //
> >    // Query the editor, get the body of HTML!
> >    //
> > +  uint32_t  flags = nsIDocumentEncoder::OutputRaw | nsIDocumentEncoder::OutputNoFormattingInPre;
> 
> This change is especially scary to me. As I understand it, this means we are
> now relying entirely on EnsureLineBreaks to keep the content tight enough,
> and that method has a pretty lousy line break mechanism. I will not accept a
> change here without some tests that specifically test overly-long lines in
> HTML.

This was a mistake. Removed the change in this patch.

> @@ +3152,5 @@
> >    body.Trim(" ", false, true);
> >  
> >    if (body.Length() > 0)
> >    {
> > +    if (!PL_strcmp(attachment1 [details] [diff] [review] [diff] [review]_type, TEXT_HTML)) {
> 
> This condition should probably be inverted--if the body is text/plain, don't
> run the EnsureLineBreaks, otherwise run EnsureLineBreaks.

Done.
Attachment #8537115 - Flags: review?(Pidgeot18)
Created attachment 8537118 [details] [diff] [review]
Part 3 parse_flowed_draft_message.patch

This patch aims to parse format=flowed message on editing the message 'as new'.
Attachment #8537118 - Flags: review?(Pidgeot18)
I will post those patch series to try server once bug 1111304 is fixed.
Attachment #8531573 - Attachment is obsolete: true
No longer depends on: 553526
https://treeherder.mozilla.org/ui/#/jobs?repo=try-comm-central&revision=dab45275104a
Unfortunately there are some failures but all failures are not related to this issue.
Attachment #8537113 - Flags: review?(Pidgeot18) → review+
Comment on attachment 8537115 [details] [diff] [review]
Part 2 use_delsp.patch

Review of attachment 8537115 [details] [diff] [review]:
-----------------------------------------------------------------

To convince myself that these changes were okay, I started doing some deep dives of nsMsgSend-related code. It's all ugly done there. :-(

::: mail/test/mozmill/composition/test-delsp.js
@@ +48,5 @@
> +
> +function create_delsp_message(aMessage) {
> +  let msgCompose = Cc["@mozilla.org/messengercompose/compose;1"]
> +                     .createInstance(Ci.nsIMsgCompose);
> +  // 2: character width of each characaters in kMultibyteWord

Nit: characters
Attachment #8537115 - Flags: review?(Pidgeot18) → review+
Comment on attachment 8537118 [details] [diff] [review]
Part 3 parse_flowed_draft_message.patch

Review of attachment 8537118 [details] [diff] [review]:
-----------------------------------------------------------------

This is some code where I'd really like to know the history since it seems so brain-damaged, but that history has alas gone to the great bit bucket in the sky.

Excepting the unnecessary parts of the condition in mimetpfl.cpp and the nits, this patch looks good and shouldn't cause problems.

::: mail/test/mozmill/composition/test-delsp.js
@@ +105,5 @@
> +  let delspBody = create_delsp_message(longMultibyteLine);
> +  let converter = Cc["@mozilla.org/intl/scriptableunicodeconverter"].
> +    createInstance(Ci.nsIScriptableUnicodeConverter);
> +  converter.charset = "UTF-8";
> +  let convertedBody = converter.ConvertFromUnicode(delspBody);

I'm wary of using nsIScriptableUnicodeConverter (I'm not convinced it won't get deleted on us before long), but the message generator code only works on binary strings and not typed arrays, so there's not really a good alternative.

::: mailnews/mime/src/mimemsg.cpp
@@ +180,3 @@
>      {
> +      // if we are processing a flowed plain text line, we need to parse the
> +      // line in mimeInlineTextPlainFlowedClass

Nit: capitalize the If and add punctuation.

::: mailnews/mime/src/mimetpfl.cpp
@@ +325,5 @@
>    }
>  
> +  if (obj->options &&
> +      obj->options->decompose_file_p &&
> +      !obj->options->is_multipart_msg &&

This part of the condition is not necessary.

@@ +327,5 @@
> +  if (obj->options &&
> +      obj->options->decompose_file_p &&
> +      !obj->options->is_multipart_msg &&
> +      obj->options->decompose_file_output_fn &&
> +      !obj->options->decrypt_p)

Nor is this one. (I think).

@@ +330,5 @@
> +      obj->options->decompose_file_output_fn &&
> +      !obj->options->decrypt_p)
> +  {
> +    return obj->options->decompose_file_output_fn(line,
> +                                                   length,

Nit: a tad too many spaces.
Attachment #8537118 - Flags: review?(Pidgeot18) → review+
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
(Assignee)

Comment 117

2 years ago
Just curious:
A bug which is 15 years old, is classified 'critical', has 11 votes, has 4 patches with r+.
Is this going to land one day?

Kent, do you think we can find someone to finish this off?

BTW, how does this relate to bug 553526?

I believe we have a strong user base in Japan, so perhaps it's about time we fix this.
Flags: needinfo?(rkent)
Flags: needinfo?(Pidgeot18)
(In reply to Jorg K (GMT+1) from comment #117)
> Just curious:
> A bug which is 15 years old, is classified 'critical', has 11 votes, has 4
> patches with r+.
> Is this going to land one day?

Good catch Jorg.

It won't land from hiro - he had to leave. He was very prolific developer for us, and so there also many other patches of his that need attention.  I've just marked them up so that other developers can pick from them http://mzl.la/1HRuaJs


> Kent, do you think we can find someone to finish this off?
> 
> BTW, how does this relate to bug 553526?
> 
> I believe we have a strong user base in Japan, so perhaps it's about time we
> fix this.

Yes we do. Unfortunately have almost no developer representation from there.
Flags: needinfo?(Pidgeot18)
(Assignee)

Comment 119

2 years ago
As far as I can see, this bug is assigned to Makoto Kato, not Hiro.

Wayne, please leave the NI as it is, since I'd like to draw attention to this bug. Since we have r+ on all four patches, I could just fix the nits of the last review and then have it all (crash-)landed.

But I prefer not to do that. Perhaps Joshua has a few spare cycles to take another look at this and get it landed. Or even Magnus, who has worked in international stuff before.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
(In reply to Jorg K (GMT+1) from comment #119)
> As far as I can see, this bug is assigned to Makoto Kato, not Hiro.
> 
> Wayne, please leave the NI as it is, since I'd like to draw attention to
> this bug. Since we have r+ on all four patches, I could just fix the nits of
> the last review and then have it all (crash-)landed.

I don't see any problem in doing that, do you?

> But I prefer not to do that. Perhaps Joshua has a few spare cycles to take
> another look at this and get it landed. Or even Magnus, who has worked in
> international stuff before.

frankly IMO joshua's time is better spent on other issues. he barely has time to do his own reviews, which is one reason I took him off.
(Assignee)

Comment 121

2 years ago
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #120)
> frankly IMO joshua's time is better spent on other issues. he barely has
> time to do his own reviews, which is one reason I took him off.
Frankly, Joshua has already poured hours into this (see for example comment #114).
So I see two options: We let everyone's work on this go to waste, or we confirm that the overall solution will hold, and then I can do the leg-work of fixing nits and un-rotting it where required. Can we please take the "resource" discussion elsewhere.
(Assignee)

Comment 122

2 years ago
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #120)
> > I could just fix the nits of
> > the last review and then have it all (crash-)landed.
> I don't see any problem in doing that, do you?

I do. Having things landed means that you need to have at least a basic understanding of what's going on.

I think the first step is always to reproduce the problem. So here is what I've done:
1) Checked that mailnews.send_plaintext_flowed is set to 'true'
2) Created a plain-text message and pasted the text from comment #1 a good lot of times.
   Yes, annoyingly, I get one long line which doesn't wrap.
3) <CTRL><SHIFT><ENTER> so place the message into the outbox. This is what I get:

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: base64

44GT44KM44Gv6ZW344GE5pel5pys6Kqe44Gu44OG44Kt44K544OI44Gn44GZ44Gu44Gn44CB
6KGM44GM5oqY44KM44KL44Go5oCd44GE44G+44GZ44CC44GT44KM44Gv6ZW344GE5pel5pys
...

Since the message is base64 encoded, there is no problem. I tried another encoding and get:

MIME-Version: 1.0
Content-Type: text/plain; charset=Shift_JIS
Content-Transfer-Encoding: base64

grGC6oLNkreCopP6lnuM6oLMg2WDTINYg2eCxYK3gsyCxYFBjXOCqpDcguqC6YLGjnaCooLc
greBQoKxguqCzZK3gqKT+pZ7jOqCzINlg0yDWINngsWCt4LMgsWBQY1zgqqQ3ILqgumCxo52

So whilst the long line is still a problem, I can't reproduce the data corruption issue.

Next I pasted the text from comment #1 into Notepad++ first, then copied it from there and pasted it into a TB message.

Result: Wrapping happened as desired, so apparently another clipboard "flavour" is pasted.

Sending this I got:

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

or

MIME-Version: 1.0
Content-Type: text/plain; charset=Shift_JIS
Content-Transfer-Encoding: 8bit

or

MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

Shift_JIS and ISO-2022-JP show the message wrapped, but without the artefacts seen in attachment 5000 [details].

In other words:
Nothing will land here if there is not a problem that can be reproduced and proven fixed with the patches.

Maybe another Japanese developer can enlighten us here.
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(ishikawa)
My sense is that we should land the bugs after making sure they apply cleanly, and any nits are fixed. There was sufficient work done earlier to decide they were worth landing, so any new cleanup work does not have to reproduce that.

Comment 124

2 years ago
Hi, 
I noticed I was "needinfo"ed for this bug.
Let me digest what I can glean for this.
(I won't clear the needinfo request until I get to repeat the error thingy.)
However, I do notice that there seem to be strange white space character inserted
in the latest TB when I type in longish line (or pasting longish lines from another buffer) in HTML mail (!). Maybe unrelated bug.

Stay tuned. I am right now trying to fix OS X build issue (which seems to be due to duplicate entries in jar.mn for theme/osx and it seems to be finally solved by a patch in M-C tree. I created the similar patch to remove the duplication and FINALLY os x build proceeds to compile a binary for me (after six months or so).

Thank you for your patience.
Flags: needinfo?(ishikawa)

Updated

2 years ago
Flags: needinfo?(ishikawa)
(Assignee)

Comment 125

2 years ago
(In reply to Kent James (:rkent) from comment #123)
> My sense is that we should land the bugs after making sure they apply
> cleanly, and any nits are fixed. There was sufficient work done earlier to
> decide they were worth landing, so any new cleanup work does not have to
> reproduce that.

Boy, I need to contradict Kent for a third time within twenty minutes:
Facts are: There are four patches here, which may or may not apply cleanly. One has some issues and nits.
The patches are about one year old. A lot can have happened in one year.

The people who knew what they are doing (Hiro, perhaps Makoto) have disappeared. Perhaps Joshua can enlighten us.

The person who will carry this to landing needs to
a) understand partly what this is about and why it was abandoned
b) do the most basic test: Doesn't work without the patch, works with the patch.

I tried doing b) but couldn't even reproduce the (alleged) problem.

If this gets landed, there needs to be someone who can at least be partly responsible.
(Assignee)

Comment 126

2 years ago
Well, there seem to be many bugs around the area of CJK encoding, for example bug 653342. I also found http://forums.mozillazine.org/viewtopic.php?f=39&t=2315453.

That delivers a reproducible case:
Copy this into a plaintext e-mail a few times. Send the e-mail.
哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈

I get this result:
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

The content is pretty much broken. I'll see whether the patches fix anything.
(Assignee)

Comment 127

2 years ago
OK, my mistake, even with the original text
これは長い日本語のテキストですので、行が折れると思います。 
I can reproduce the problem now. Extra spaces are very had to spot for the European eye in a long string of Japanese characters.

I also looked at the four patches. There are 16 hunks, 10 that fix tests, 6 to change code. Those six still apply without a problem. I will do some testing.
(Sadly, right now on a Daily I get a crash upon sending a message:
Assertion failure: IsOuterWindow(), at nsGlobalWindow.cpp:7678)
(Assignee)

Comment 128

2 years ago
Now that I fixed the assertion failure, here the test result. Summary: No good, I won't land that.

Firstly, I never ever reproduced the problem that the patches here are supposed to fix: Some corrupted CJK characters, see attachment 5000 [details].

What I can reproduce it the problem reported in bug 653342, that is, that spurious spaces are inserted.

Applying the code part of the patches (6 hunks) makes no difference. This is what I get:

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8bit

Note the delsp=yes, which is new.

Maybe I'm missing something in viewing the message, but it's not included in the bug. The sent message is sprinkled with additional spaces.

Unless someone has some enlightenment, this is not going anywhere.

Note that bug 653342 is quite severe and Asian users are complaining heavily.
Bug 653342 is always in my list of top 5 bugs that need attention, but no active developer seems to understand it. If you have any ideas, feel free to pursue them.
(Assignee)

Comment 130

2 years ago
Created attachment 8688319 [details] [diff] [review]
Here are the 6 hunks of code changes extracted from the four patches.

I tested with this patch and don't see any difference other than "delsp=yes" being added.
Attachment #5068 - Attachment is obsolete: true
Attachment #9947 - Attachment is obsolete: true
Attachment #9982 - Attachment is obsolete: true
(Assignee)

Comment 131

2 years ago
I would like to stop working on this and instead focus on bug 653342 comment #158.
Flags: needinfo?(ishikawa)
Flags: needinfo?(Pidgeot18)

Comment 132

2 years ago
(In reply to Jorg K (GMT+1) from comment #131)
> I would like to stop working on this and instead focus on bug 653342 comment
> #158.

(In reply to ISHIKAWA, Chiaki from comment #124)

> However, I do notice that there seem to be strange white space character
> inserted
> in the latest TB when I type in longish line (or pasting longish lines from
> another buffer) in HTML mail (!). Maybe unrelated bug.

This behavior I think I have noticed is bug 653342 (!) 
It *IS* annoying.
The only reason I have noticed it
 - about 1/3 of the e-mail I compose is in English, and
 - most of the time my Japanese sentences are short for intra-division communication
   which are composed in TB and/or Emacs and copy&pasted into TB mail creation text buffer., and usually, longer stuff is put in the ATTACHMENT.

So I didn't realize this bug:  However, of late, I have exchanged longish e-mails with my colleagues and noticed that when I cc'ed myself I get the strange blank as mentioned in bug 653342. I was wondering what caused it, but now I know TB does it.
I believe bug 653342 has much higher priority as of now.

TIA
(Assignee)

Updated

2 years ago
Summary: format=flowed message generation must follow CJK line-breaking conventions → Implement RFC 3676 delsp=yes for format=flowed plaintext messages.
(Assignee)

Comment 133

2 years ago
After looking at this for a few days now I have come to the conclusion that implementing "delsp=yes" is the desperate attempt to get rid of spaces which were added to *incorrect* wrapping of CJK strings.

"delsp=yes" is only a plaintext option, so it wouldn't fix the full problem anyway since we also get erroneous spaces in HTML e-mail. The problem needs to be treated at the root in bug 1225864. Refer to bug 1225864 comment #4 for details.

If at a later stage we decide that "delsp=yes" is a useful option, we can do this in a new bug.

Also note: The corruption shown in attachment 5000 [details] cannot be reproduced. It also has nothing to do with this bug which is about adding a flag to a message and deleting a space. I have no idea how this corruption was caused back in 2000. Since then, "formatting" of messages has changed in bug 83381 and bug 84261. It is about to change again in bug 1225904 which will guarantee that we don't destroy multi-byte characters by injecting a space in the middle.
Status: NEW → RESOLVED
Last Resolved: 17 years ago2 years ago
Resolution: --- → WONTFIX
(Assignee)

Comment 134

2 years ago
Note that if we want to go ahead and implement delsp=yes one day, here is another snippet:
Attachment 8530844 [details] [diff] of bug 553526.

Also see bug 168420 for more on format=flowed.
No longer blocks: 653342
(Assignee)

Updated

2 years ago
Summary: Implement RFC 3676 delsp=yes for format=flowed plaintext messages. → Implement RFC 3676 delsp=yes for format=flowed plaintext messages for sending. Already implemented for reading in bug 231701.
(Assignee)

Comment 135

2 years ago
Note that in bug 1225904 comment #15 we discuss the connection between "delsp=yes" and ISO-2022-JP.

I quote parts:

For ISO-2022-JP (https://www.ietf.org/rfc/rfc1468.txt) a Content-Transfer-Encoding of 7bit is suggested:
===
The ISO-2022-JP encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in current JUNET software.
===

As it stands, ISO-2022-JP can be used for HTML encoding although it's only designed for plain text. In HTML we don't have "format=flowed; delsp=yes;" so we must encode base64 if there are long lines.

To fully adhere to the *suggestion* of the RFC 1468 (which is from 1993!), we'd have to do the following:
1) implement the "delsp=yes;" option and cut long CJK strings for plain text only.
2) not ever use ISO-2022-JP for HTML parts.
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #133)
> If at a later stage we decide that "delsp=yes" is a useful option, we can do
> this in a new bug.

It is, in principle, a useful option to include still.
(Assignee)

Comment 137

2 years ago
Yes.

But if we implement the "delsp=yes" option, we will also have to define the cases in which to use it, for example for ISO-2022-JP plaintext.

However, in the condensed patch I created from Hiro's four patches (attachment 8688319 [details] [diff] [review]) I don't see any decision taking. "delsp=yes" is *always* appended to "format=flowed", and that ain't desirable.

Plus the patch still maneuvers around EnsureLineBreaks() which is about to be eliminated in bug 1225904.

Apparently the whole sending logic is currently being rewritten, so let's leave this bug for later.
I don't understand why WONTFIX is the appropriate status for "If at a later stage we decide that "delsp=yes" is a useful option, we can do this in a new bug." Why do it in a new bug instead of this one? Usually WONTFIX represents an explicit decision to not implement something, ever.
(Assignee)

Comment 139

2 years ago
I prefer not to start implementing something at comment #140. The bug has become way too confusing. The patches are not correct and would have to change once I finish the CJK cleanup elsewhere.

Joshua and I have this option on the radar, but I prefer not to add this option to the system until we get his rewrite. So well and truly, this bug here won't be fixed in its current form, which doesn't mean we will never implement the feature.

Please note that I changed the summary from:
  format=flowed message generation must follow CJK line-breaking conventions
to
  Implement RFC 3676 delsp=yes for format=flowed plaintext messages for sending.
  Already implemented for reading in bug 231701
to make at least some sense of it.

If you want, I can carry it over to a new bug. If you insist I can also reopen it, but it won't receive any action until we get Joshua's rewrite. Joshua was already against meddling with the send logic in nsMsgSend.cpp and friends, but minor changes there are necessary to fix the CJK mess. Please refer to Joshua's statement in bug 1225904 comment #17. Forcing delsp=yes into the system now would just add more "magic EOL conversions".

Comment 140

2 years ago
I agree this should stay open if it's still a future want. (Though maybe the patches should be marked obsolete?)
This bug still has all the context, so no need to move it off to another bug.
Assignee: m_kato → nobody
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Updated

2 years ago
Status: REOPENED → NEW
(In reply to Jorg K (GMT+1) [currently frustrated by waiting for reviews/feedback] from comment #139)
> I prefer not to start implementing something at comment #140. The bug has
> become way too confusing. The patches are not correct and would have to
> change once I finish the CJK cleanup elsewhere.
> 
> Joshua and I have this option on the radar, but I prefer not to add this
> option to the system until we get his rewrite. So well and truly, this bug
> here won't be fixed in its current form, which doesn't mean we will never
> implement the feature.

There are lots of bugs and RFEs that are open that are unlikely to get fixed until post-compose-rewrite, but that doesn't entail closing a bug as WONTFIX.
(Assignee)

Comment 142

2 years ago
Looks like I got overruled. OK, we hang on to the bug from the year 2000 when neither Firefox nor Thunderbird existed ;-)
(Assignee)

Comment 143

2 years ago
Comment on attachment 8688319 [details] [diff] [review]
Here are the 6 hunks of code changes extracted from the four patches.

This is just not right. Without the patch I get this e-mail:

Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu 
huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu 

There is a space at the end of every line, since this is flowed.

With the patch I get:

Content-Type: text/plain; charset=windows-1252; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit

huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu  
huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu huhu  

There are *two* spaces at the end of every line, one to delete later and one to make it flowed.

This confirms one of the two reasons stated in comment #137 for rejecting this solution.
Attachment #8688319 - Flags: review-
(Assignee)

Comment 144

2 years ago
Comment on attachment 8537115 [details] [diff] [review]
Part 2 use_delsp.patch

This patch contains most of the code to which I object. See comment #137.
Attachment #8537115 - Flags: review-
(Assignee)

Comment 145

2 years ago
Sorry to repeat myself from comment #135:
The only use for "delsp=yes" I can see is for ISO-2022-JP encoded plaintext messages that are sent with CTE 7bit to comply with the "should" from RFC 1468, *if* they contain otherwise unbreakable long lines.
Frankly, I don't think that is a good enough reason to implement this, since these messages can also use CTE base64. Base64 might have upset someone in 1993 when the RFC was created, but not twenty years later.

It most certainly was a really bad idea to suggest to use "delsp=yes" for all flowed plaintext messages.
Summary: Implement RFC 3676 delsp=yes for format=flowed plaintext messages for sending. Already implemented for reading in bug 231701. → Implement RFC 3676 delsp=yes for format=flowed ISO-2022-JP CTE 7bit plaintext messages for sending (comment #145). Already implemented for reading in bug 231701.
(Assignee)

Updated

2 years ago
Attachment #5000 - Attachment description: JPG image of a corrupted Japanese message with a long long word/paragraph when sent with format=flowed → NOT REPRODUCIBLE in 2015 and most likely due to a different problem: JPG image of a corrupted Japanese message with a long long word/paragraph when sent with format=flowed
I'm not really sure why you are putting so much energy into a bug that in your opinion is either don't implement at all, or implement later. My earlier complaint about WONTFIX was a purely administrative comment, that it was an inappropriate status for "If at a later stage we decide that "delsp=yes" is a useful option, we can do this in a new bug." without any examination of the technical merits.

Are you really trying to get some of us to examine this issue more closely to decide if WONTFIX or "fix much later" is the best status? That does not seem to be important enough to be worthy of much attention.
(Assignee)

Comment 147

2 years ago
We have a mess of CJK related problems and we need to define very clearly what will help to cut though this mess and what won't. Fixing this here will help in one very specific (unimportant) case which is now even mentioned in the summary. I am not pleased that we got close to landing "delsp=yes" for all flowed e-mail, even in plain English. Anyway, I agree, let's stop commenting on this further.
(Assignee)

Comment 148

2 years ago
OK, the Japanese users have convinced me that delsp=yes is the right way to flow ISO-2022-JP text.

In bug 553540 the nsIDocumentEncoder::OutputFormatDelSp was already introduced and it works.
Makoto Kato implemented this, so after five years we finally come to use this option.

The reshuffling in bug 653342 made it an extra two lines to do delsp=yes.
So I decided to do it. Here you do: Attachment 8693229 [details] [diff].

So once bug 653342 lands, this will be fixed, too, and we can finally close a 15 year old bug.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Depends on: 653342
(Assignee)

Updated

2 years ago
Attachment #5000 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8537112 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8537113 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8537115 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8537118 - Attachment is obsolete: true
(Assignee)

Comment 149

2 years ago
Comment on attachment 8688319 [details] [diff] [review]
Here are the 6 hunks of code changes extracted from the four patches.

Everything is obsolete here.
This will be implemented by bug 653342.
Attachment #8688319 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Summary: Implement RFC 3676 delsp=yes for format=flowed ISO-2022-JP CTE 7bit plaintext messages for sending (comment #145). Already implemented for reading in bug 231701. → Implement RFC 3676 delsp=yes for format=flowed ISO-2022-JP CTE 7bit plaintext messages for sending (comment #148). Already implemented for reading in bug 231701.
(Assignee)

Comment 150

2 years ago
Joshua: I've implemented delsp=yes over in bug 653342 and it seems to work when testing it manually and with the test I wrote.

However, Hiro in his patches had changes to mimetpfl.cpp and and mimemsg.cpp which I don't understand and haven't carried forward. You can best see them here:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=26734&attachment=8688319

Since you reviewed and approved this, can you please tell me what they are about and whether they are required.
Flags: needinfo?(Pidgeot18)
(Assignee)

Comment 151

2 years ago
Or maybe it's better viewed in the original patch which also shows review comments:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=26734&attachment=8537118

Was this stuff added for testing??
(Assignee)

Comment 152

2 years ago
Fixed by bug 653342.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
(Assignee)

Comment 153

2 years ago
Looks like the changes to mimetpfl.cpp and mimemsg.cpp were necessary.
At least they fix some of bug 1230968. Let's move the discussion over there.
Flags: needinfo?(Pidgeot18)
Checked with Daily 53.0a1 (2016-12-18) (32-bit) on Win10.

1-1. mailnews.send_plaintext_flowed=true
1-2. compose text mail in iso-2022-jp, type lo------ng Japanese Kanji text without entering space or linebreak.
1-3. Send Later -> charset=iso-2022-jp; format=flowed; delsp=yes
2-1. mailnews.send_plaintext_flowed=false
2-2. compose text mail in iso-2022-jp, type lo------ng Japanese Kanji text without space or linebreak.
     compose text mail in utf-8, type lo---ng SBCS chars including euro-sign without space or linebreak. 
2-3. Send Later -> charset=iso-2022-jp or utf-8 without format=flowed.
     Content-Transfer-Encoding: base64

Content-Language: en-US was seen in source. Why en-US even though I wrote Japanese Kanji characters?
Because I like en-US build?
Content-Languge shouldn't be language what Thunderbird uses. 
Content-Languge should be language what I read, write, speak :-)

By sending in base64, "broken data when line break is inserted due to 1000bytes line length limitation" problem was resolved.
By this bug, pretty lo-----ng Japanese text without space/linebreak is sent in format=flowed;delsp=yes.
So, we'll not see bug report from "pretty lo-----ng Japanese text without space/linebreak" lovers in Japan.

Verified.

By the way, does how many mailers already support delsp=yes?
IIRC, Apple and/or Opera generated delsp=yes mail despite that many mailers didn't support delsp=yes in the past.
When did MS support delsp=yes? Can we already use format=flowed;delsp=yes freely?
Status: RESOLVED → VERIFIED
An "Asian crisis" has disappeared :-)
By the way, why you know so well/deeply about Character set/code, JavaScript, Thunderbird code and so on?
(Assignee)

Comment 157

6 months ago
(In reply to WADA:World Anti-bad-Duping Agency from comment #154)
Thunderbird has supported displaying e-mail with delsp=yes for a log time, I believe since bug 231701.

TB has never produced delsp=yes e-mail until the Asian crisis was fixed and this bug is part of this fix. Normally we transmit long lines base64-encoded, bug RFC 1468 mandates CTE 7bit when ISO-2022-JP is used (https://www.ietf.org/rfc/rfc1468.txt). So only for ISO-2022-JP we use delsp=yes:
https://dxr.mozilla.org/comm-central/rev/b020cffa188fac078bff74c9e19c98b044ea4173/mailnews/compose/src/nsMsgCompUtils.cpp#1794

Content-Language: en-US comes from the dictionary you were using. Use a Japanese dictionary and you'll get something else.

(In reply to WADA:World Anti-bad-Duping Agency from comment #156)
> By the way, why you know so well/deeply about Character set/code,
> JavaScript, Thunderbird code and so on?
I am one of the core developers here, Thunderbird and Compose peer (https://wiki.mozilla.org/Modules/Thunderbird) and Mailnews MIME and Editor peer (https://wiki.mozilla.org/Modules/MailNews_Core).

More questions?
More questions.

Why you can reply to my useless questions in this bug despite that you are pretty busy to resolve important bugs? :-)

> core developers here

Hired? If so, by whom? (I can't imagine big profit from Thunderbird...)
Or professor of University or people of Government, Army, Research Center, Laboratory and so on?
Or people of corporate who supports free software, co-operates with free software(in order to get big profits finally :-) )
Are you Netscape people? Do you know David Bienvenu and MScott? (but you seem younger than them...)
(Assignee)

Comment 159

6 months ago
As I said, every question/answer takes away time from more important problems. So don't ask ;-)

We are all volunteers, however, the Thunderbird Council is considering to hire a paid maintainer:
https://mail.mozilla.org/pipermail/tb-planning/attachments/20161119/7b31f41f/attachment.html
You should follow tb-planning.

None of the current volunteers is from the Netscape days. I don't know the people you mentioned, but I've had a short exchange with David once.
Whiteboard: [DelSP=yes is currently for iso-2022-jp only, so please don't open excess bug for other charset at B.M.O]
You need to log in before you can comment on or make changes to this bug.