Closed Bug 129122 Opened 22 years ago Closed 22 years ago

Reply mail to a plain text ISO-2022-JP mail in HTML format is partially garbled.

Categories

(MailNews Core :: Internationalization, defect, P2)

x86
Windows 2000
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: ji, Assigned: akkzilla)

References

Details

(Keywords: dataloss, intl, Whiteboard: [adt1] has-review, needs adt approval)

Attachments

(4 files, 7 obsolete files)

14.44 KB, text/plain
Details
688 bytes, patch
Details | Diff | Splinter Review
672 bytes, patch
Details | Diff | Splinter Review
2.08 KB, patch
akkzilla
: review+
sspitzer
: superreview+
jesup
: approval+
Details | Diff | Splinter Review
With the current trunk build, reply to a plain text ISO-2022-JP mail with
certian size in HTML format freezes the browser. (bug 128908).
With NS 6.2.1, the browser doesn't hang, but reply mail is partially garbled
after received. We also need to fix this problem after the block bug 128908 is
fixed.

Steps to reproduce:
1. With 6.2.1, download the zipped testing mail folder posted on bug 128908.
Unzip it and put it under local mail folder.
2. Select the testing mail and reply it in HTML format with quoting.
3. After reply mail is received, observe the garbled part in the middle of the
quoted mail body.
Keywords: intl, nsbeta1
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
nsbeta1+ per triage meeting
Keywords: nsbeta1nsbeta1+
Priority: -- → P2
Attached file replied mail (broken)
I found that the data is broken by line breaks. I think the line break happens
after the text is converted to ISO-2022-JP.
The line breaks differently in HTML mail if no quoting.
It look like the line break is done to workaround bug 67334 (now a dup of bug
125732).
http://lxr.mozilla.org/seamonkey/source/mailnews/compose/src/nsMsgSend.cpp#1536

Seth, is the mail code not needed after bug 125732 is fixed? When the text is in
ISO-2022-JP, we cannot break the lines without converting the text again.
I applied the patch of bug 125732 but the line breaking of the quoted HTML is
still the same. So the line breaking code in the mail is still used.
For this bug, we can do a special handling for ISO-2022-JP. I can duplicate the
line breaking code and change it to process Unicode string. If ISO-2022-JP then
ensure line breaks for the Unicode string before converting to ISO-2022-JP. Then
later in the code, skip the original line break code if ISO-2022-JP.
Attached patch cosmetic change (obsolete) — Splinter Review
Attachment #74217 - Attachment is obsolete: true
I have to put the line break earlier than the current patch. The converted
ISO-2022-JP string may be larger particularly for HTML because of the mixture of
the ASCII tags and JP text.
But line breaks actually break lines in <pre>, it breaks the formatting if I
insert too many. I experimented replacing <BR> by CRLF and that works fine
without loosing the format.

Q: A CRLF seems to behave the same as a <BR> within <pre>. So you 
   are replacing multiple instances of <BR> with a single CRLF?
Each <BR> is replaced by CRLF. If 4 <BR> then 4 CRLF.
editor bug
bug 92921 - composer should not replace CRs by <br> inside a <pre>
I have 3 comments about this patch:

1) Why refusing to work with severl <pre> block?
2) You should be able to avoid duplicating the buffer (once the whole one and
once the pre block)
2) ReplaceSubstring is a very expensive call as it will move the buffer from the
the position to replace up to the end every time chars get replaced. It's
exponential! We used to use it to convert linefeed during a reply and by writing
our own function we were able to save a lot of processing time especially with
large message (see TranslateLineEnding in nsMsgCompose.cpp)

By writting your own function, you should be able to do the job without having
to copy the buffer and with a minimum of block move.
I did not write my own but used nsString, unlike reply case, this is executed
after the user hit send and also only executed when quoting a plain text as html. 
I am going to investigate if I can improve performance also try to support
multiple <PRE> in the body.
Note that the code is not needed once the editor returns CRLF.
Why don't you apply BASE64 encoding to whole text/html part?
It makes your life much easier.
That's a good idea. I wil try it.
Local search does not work for base64 encoded body, filed as bug 132340.
Attachment #75040 - Attachment is obsolete: true
Comment on attachment 75246 [details] [diff] [review]
Specialize the code in order to avoid heap allocation.

Looks good. But please remove the NS_ASSERTION in matchUnichar, could be very
painfull if the buffer cointains a lot 8bits chars. R=ducarroz
Attachment #75246 - Flags: review+
The assertion is for the char argument which is compared against the unichar
(the actual data from the body). 
my bad, therefore it's fine.
I think you want jfrancis or mjudge to comment.

I think there is code in editor that converts text in <pre>
and replaces CRLF with <BR>.  The reason it does that is to work around some
sort of frame / caret issue.

see http://bugzilla.mozilla.org/show_bug.cgi?id=92921
("composer should not replace CRs by <br> inside a <pre>")

and see http://bugzilla.mozilla.org/show_bug.cgi?id=85505
("Frame code and Caret code need work to allow caret in more places")

I'm guessing they are going to say:  "Your patch wouldn't be necessary if #92921
was fixed, and that can't be fixed until #85505 is fixed.   But those aren't
going to fixed any time soon, so we probably have do what you are doing."

But I'd like to hear what they think.

Also, see bug http://bugzilla.mozilla.org/show_bug.cgi?id=84261

That's when I added EnsureLineBreaks(const char *body, PRUint32 body_len);

Does your code get called before (or after) my EnsureLineBreaks()?

Can we avoid parsing the message twice by combining the two?

>Does your code get called before (or after) my EnsureLineBreaks()?
It is called before yours because it has to be done while the data is in Unicode.
>Can we avoid parsing the message twice by combining the two?
My patch addresses the long <pre> line problem only.


Attachment #75246 - Attachment is obsolete: true
Seth is correct.  For now, the editor has to put in <br>s.

One thing that comes to mind is that we are walking this data twice.  Once to 
serialze from the dom to an html stream.  Then mail is walking that again to 
replace <br>s.  Wouldn't it be better to put a mode on the serializer itself to 
do the replacemetn for you?  It has to walk the data and build the string anyway, 
why not leverage that?

cc'ing Tanu and peterv.
Can mail compose/send access the serializer and set the option? 
The mail send code getting the body text using nsIEditorShell::GetContentsAs.
The editor already has flags telling it we are doing mail.  The serializer has 
flags out the wazoo.  My guess is that passing the needed information from mail 
to the serializer through the editor is not hard.  
From nsIEditorShell.idl:
  wstring GetContentsAs(in wstring format, in PRUint32 flags);
The flags are right there.

If all you want is for <br>s to be output as CRLF regardless of platform, the
plaintext serializer has flags to handle that (set both OutputCRLineBreak and
OutputLFLineBreak; if neither are set it will use the appropriate value for the
current platform).
I set both OutputCRLineBreak and OutputLFLineBreak but it did not affect the
lines inside <pre>, the lines still contain <br> only no CRLF.

148 // CR, LF, or CRLF.  If neither of these flags is set, then we
149 // will use platform line breaks.
150 OutputCRLineBreak = 512,
151 OutputLFLineBreak = 1024,

Joe, could you write a new bug for the new serialier option so this bug can
depend on? I don't know what serializer method needs to change.
If we want to take jfrancis's approach, we need the following
1. we need clear description what kind of changes are needed for the serializer,
jfrancis- can you describe it ?
2. tmutreja agree to add such thing to the serializer and compelete it soon.
tmutreja- how is your load right now ? do you think you can make it ?
3. nhotta change the code to use the new option

should we land the current patch untill that happen ? Currently is very very bad
for Japanese users. It is almost a unusable mail product.

this is regression from 6.1
Keywords: regression
jfrancis suggest we change the serializer to do this, it seems the same issue as
bug 92921

http://bugzilla.mozilla.org/show_bug.cgi?id=92921
currently, bug 92921 is mark as nsbeta1- as the following reason.

------- Additional Comment #19 From Kevin McCluskey 2002-03-20 08:43 -------
nsbeta1-, topembed-. This will require significant layout changes. Reassigning
to Marc for further investigation. Clearing + on EDITORBASE
------- Additional Comment #20 From syd@netscape.com 2002-03-21 13:45 -------

Job is too big for moz 1.0, minusing

Therefore, I think it is too sensitive to address this bug (bug 129122) by
depending on that bug (92921). 
Assuming bug 92921 won't got fixed by nsbeta1, should we go with nhotta's
current patch ?

It's awful to think that our model is so broken that the only workaround is to
implement part of the parser inside mail compose code.  That seems so wrong.

If we can't get the underlying layout bug fixed (sigh!), easier and more
efficient, I would think, would be to make the html serializer do this (perhaps
on a flag, but really, do we need to preserve <br> tags inside <pre>?)

The html serializer could easily remember when it's inside a <pre> tag, and
output mLineBreak whenever it sees a br inside a pre.  No need to parse the
serial html to keep track of the pre tags opening and closing, and it would
happen during a pass that we were doing anyway so it wouldn't slow things down.

I guess the question is whether anyone has time to do it before 1.0.
Keywords: regression
Depends on: 132938
Here is an example of how the fix might look in the html serializer.  I have
not tested this, so I don't know if this fix is complete or whether it needs
more work.  I don't know for sure that these br tags are marked as dirty (but
they *should* be, and if they aren't it would probably be an easy editor fix;
that way, in the unlikely event that we ever serialize a document that really
should have br tags inside a pre that weren't put there by our editor, we won't
rewrite them).	

Of course, if you decide you want to go with this, Tanu should review it and
make sure it's okay for the serializer.
Blech, that should be "return NS_OK" -- forgot to update the patch before I
attached it.
I applied Akkana's patch and now I am getting the body with CRLF instead of <BR>.
Thank you very much! How can we proceed from here?
Akkana, can I reassign the bug to you since your patch fixes the problem?
As long as you've tested it and it works, I don't mind checking it in -- go
ahead and reassign to me if you'd like.  Tanu, could you please review the
patch?  Naoki, is there an sr-er in intl who might be willing to sr this? 
Otherwise, we can try Kin, but he doesn't have much time for sr'ing.
Thanks, reassign to Akkana. 
No super-reviewer in intl group I think any super reivewer can review it, the
code is not intl specific anyway.
Assignee: nhotta → akkana
Status: ASSIGNED → NEW
Here's the correct patch, attached for review.
Attachment #75642 - Attachment is obsolete: true
Attachment #75702 - Attachment is obsolete: true
First off, sorry for not replying to all these questions sooner.  I guess I kinda 
dropped a bomb on this bug and then ran away :-) 

But I'm glad folks are considering taking the serializer approach.  Ordinarily I 
hate to add yet another knob to the serializer, but mail performance is important 
enough to bite the bullet here.

And I suspected that the fix would be very simple, a la Akkana's patch.

Having examined the patch, my only concern is the checking of the dirty 
attribute.  This is really an issue for the mail folks to decide:  do you want 
the serializer to replace all breaks inside pre's with linebreaks?  Or do you 
only want to replace those breaks which were not "originally" in the pre?

There are gotchas with both approach.  If you do what the patch does now, if 
someone writes a document with composer, saves it out, and later loads it again 
and sends it as mail (say, by copying it and pasting it into mail compose), you 
will not get the break replacement you desire.  This is because once the breaks 
"round trip" through save and reload, the dirty attribute will be gone.

The gotcha if you DO replace all breaks in pre's is that you will be potentially 
altering a document in some way the author didn't desire, if they put the breaks 
there intentionally.

All in all, given the goals of mail and typical usage patterns (as well as I can 
guess them), I would suggest that we simply replace all breaks, and not just the 
ones with the special "dirty" attribute.
Akkana your patch works fine for me too without breaking any build testcase. I 
understand "JFrancis" comments above but in general I don't know the exact 
usage of dirty attributes.
I feel only for mails we should replace the <br>'s because in that case it's 
not visible to the end user and as long as mail is rendered in the correct 
format it does not matter what we use behind but in other cases like copy & 
paste, we should preserve the <br> tags.

> This is really an issue for the mail folks to decide:  do you want 
> the serializer to replace all breaks inside pre's with linebreaks?  Or do you 
> only want to replace those breaks which were not "originally" in the pre?
> 
So which behavior do we have with the current patch?
I agree with Tanu but not sure if we need the mail specific behavior. In mail,
do we want to preserve the original <BR>.?
Need input from mail group, Jean-Francois, Seth?
In my understanding, we use <pre> only around plain text data like original
message body in a reply. Therefore the line break model we use should not matter
for the user. It's unlikely that the user will edit inside a <pre> tag except
maybe to remove some lines.

I personnaly vote to replace all the <br> by CRLF.

This patch follows Joe's suggestion which J-F also voted for: replace all br
tags in pre, not just dirty ones.  I'm fine with that, personally, since my
understanding is that br tags inside a pre are technically incorrect anyway. 
My previous patch was just erring on the side of not modifying existing
documents any more than necessary.

This does not address Tanu's suggestion of doing this only in mail.  The
serializer currently has no way of knowing whether it's being called from mail,
so  that would require adding a new flag to nsIDocumentEncoder, checking the
flag in the serializer, and passing it anywhere in mail that calls the html
serializer.  That's a much more invasive patch, and in any case I'm not
convinced that this should be mail only; if it causes I18n problems in mail,
won't it also potentially cause problems in composer?  I think we should do it
everywhere, not just in mail.

If someone wants to review this, we can work on getting it in.	Alternately, if
anyone wants to make a case for adding the new flag and doing this only in
mail, speak now and I'll make a patch that does that.  (Might want J-F's help
for the mail part of it.)
>if it causes I18n problems in mail,
>won't it also potentially cause problems in composer
That is mail specific, the i18n problem is caused by the mail code which forces
line breaks (bug 132938).
Keywords: dataloss
I'm hoping we will only be passing this flag through from mail usage.  My 
understanding is that is the case.  If not please let me know.

The bad thing about doing this for all composer usage is that you will severely 
affect load times of certain documents in composer.  If you have a big 
preformatted document with many lines of text, Composer will use breaks to 
seperate the lines for reasons we are all sick of hearing about by now. :-)
But if the serializer keeps replacing those with newlines everytime you save, 
then composer will have to convert those back to breaks every time you load the 
doc.  This will casue a delay when loading the doc.

So let's only instruct the serializer to do the translation when we know we want 
it (ie, for mail).
Sounds like Joe and Tanu both would rather see this on a flag.	So I've made
one -- here's the patch.

This is only the content part of the patch.  Mail will have to use this flag
when it calls the serializer wanting this behavior.  I'm not sure where that
happens: J-F, can you please do that part of the patch?
Ducarroz: if you want this in, it will be a lot easier if we try to get it in
Friday 3/28, since after that we'll have to go through ADT.  Do we want to try
for Friday?  Can you attach a patch of the places mail will have to change? 
Tanu, does this approach look all right, and can you review this part of it?
I am working on it...
I have tested id with the test case and it works well.
Attachment #76365 - Attachment is obsolete: true
Looks good to me.  Can someone review this?  Naoki?  Tanu?
The change for nsMsgSend.cpp looks fine to me.
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag

Change to serializer looks fine to me. r=tmutreja for serializer.
Okay, we have review of both parts now.  Seth, is your offer to super-review
still open?
Status: NEW → ASSIGNED
Whiteboard: has-review, needs sr, a
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag

Setting has-review bit, since Naoki reviewed the mail part and Tanu reviewed
the serializer part.
Attachment #76782 - Flags: review+
Nominate adt1.0.0 because the dataloss for multibyte cases (e.g. Japanese) when
reply quoting plain text body to html.
Keywords: adt1.0.0
Removing adt1.0.0. Pls get a sr= before resubmitting as adt1.0.0 nomination.
Keywords: adt1.0.0
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag

sr=sspitzer
Attachment #76782 - Flags: superreview+
Please update this bug with an [adt1] - [adt3] impact rating (or take it off the
list if it doesn't even rate adt3.)  Thanks!
Guessing adt2, but Naoki knows the impact better than I do -- Naoki, is that
reasonable?
Whiteboard: has-review, needs sr, a → [adt2] has-review, needs sr, a
I really think this is a adt1 bug , based on the following

Impact Platform: ALL 
Impact language users: ALL mutibyte charset in mail, which mean CJK users 132.8
M 23.7% of totaly current internet users. plus european users who want to use
UTF-8 mail (not sure about the number and %)
Probability of hitting the problem: HIGH, major workflow item
Severity if hit the problem in the worst case: the mail user send out will be
damage (data lost), only the receiver will notice that data lost, the data lost
won't show to the sender in the composer before the mail is send out. 
Way of recover after hit the problem: 
1. User can send out the mail again by using other mailer. 
2. send out mail as plain text mail won't have this problem. 
Risk of the fix:  (TBD)
Potential benefit of fix this problem: 
This is a Regression from 6.1


akkan, I cannot decide the Risk. Please state the risk of the patch, what could
be wrong in the worst case. If every thing go wrong (except crashing), what
could happen with your patch in the worst case ? (so we know what other thing
have no possibilty to be damage with your patch)



adt1.0.0+ (on ADT's behalf) for approval for checkin.
Keywords: adt1.0.0adt1.0.0+
The risk is fairly low.  Possible side effects:
- I believe this conversion will happen on saving as well, so as Joe pointed
out, there's a small performance penalty (which might become noticable on very
long documents) if we save a document to Drafts, then read it in again (saving
will be no slower, but loading it back into the compose window has to convert
line breaks back to br so that could be slower on a long page).  It should still
be no slower than loading the same document into the compose window was the
first time.
- If the user edits a page that already has br tags inside pre (quite unlikely
unless it was coming from an older mozilla mailer or editor), those tags will be
converted when the message is sent (which most users would consider desirable,
but it's possible that someone somewhere might not want it).

So the risks should be small; I can't think of any side effects worse than that.
Is this waiting for 'a=' now?
Yes, waiting for a=.  I've sent mail to drivers requesting it.
Whiteboard: [adt2] has-review, needs sr, a → [adt2] has-review, needs a
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag

a=rjesup@wgate.com for branch checkin.	Please make sure it's checked into
trunk as well.
Attachment #76782 - Flags: approval+
Now it just needs ADT approval.  Frank Tang thought it deserved to be ADT1, so
I've changed the ADT ranking accordingly.
Whiteboard: [adt2] has-review, needs a → [adt1] has-review, needs adt approval
This is already adt1.0.0+ (comment #62).
Fix checked in to both trunk and branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
With 04/10 1.0.0 brach build, the attached mail (id 73080) is partially garbled.
But reply mail to the testing mail for bug 128908 is okay. Naoki, is attachment
(id 73080) posted for problem illustration or for testing?
The mail attached to this bug id=73080 is already broken.
Please test using the attachment of bug 128908 id=72558.
Please also try the attachment of bug 132938 id=77353.

Verified as fixed with 4/10 1.0.0 branch build.
Status: RESOLVED → VERIFIED
adding verified1.0.0 keyword (branch verification) based on bonsai checkin and
QA comments.
Keywords: verified1.0.0
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: