Closed
Bug 129122
Opened 23 years ago
Closed 23 years ago
Reply mail to a plain text ISO-2022-JP mail in HTML format is partially garbled.
Categories
(MailNews Core :: Internationalization, defect, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.0
People
(Reporter: ji, Assigned: akkzilla)
References
Details
(Keywords: dataloss, intl, Whiteboard: [adt1] has-review, needs adt approval)
Attachments
(4 files, 7 obsolete files)
14.44 KB,
text/plain
|
Details | |
688 bytes,
patch
|
Details | Diff | Splinter Review | |
672 bytes,
patch
|
Details | Diff | Splinter Review | |
2.08 KB,
patch
|
akkzilla
:
review+
sspitzer
:
superreview+
jesup
:
approval+
|
Details | Diff | Splinter Review |
With the current trunk build, reply to a plain text ISO-2022-JP mail with
certian size in HTML format freezes the browser. (bug 128908).
With NS 6.2.1, the browser doesn't hang, but reply mail is partially garbled
after received. We also need to fix this problem after the block bug 128908 is
fixed.
Steps to reproduce:
1. With 6.2.1, download the zipped testing mail folder posted on bug 128908.
Unzip it and put it under local mail folder.
2. Select the testing mail and reply it in HTML format with quoting.
3. After reply mail is received, observe the garbled part in the middle of the
quoted mail body.
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Updated•23 years ago
|
Priority: -- → P2
Comment 2•23 years ago
|
||
Comment 3•23 years ago
|
||
I found that the data is broken by line breaks. I think the line break happens
after the text is converted to ISO-2022-JP.
The line breaks differently in HTML mail if no quoting.
Comment 4•23 years ago
|
||
It look like the line break is done to workaround bug 67334 (now a dup of bug
125732).
http://lxr.mozilla.org/seamonkey/source/mailnews/compose/src/nsMsgSend.cpp#1536
Seth, is the mail code not needed after bug 125732 is fixed? When the text is in
ISO-2022-JP, we cannot break the lines without converting the text again.
Comment 5•23 years ago
|
||
I applied the patch of bug 125732 but the line breaking of the quoted HTML is
still the same. So the line breaking code in the mail is still used.
For this bug, we can do a special handling for ISO-2022-JP. I can duplicate the
line breaking code and change it to process Unicode string. If ISO-2022-JP then
ensure line breaks for the Unicode string before converting to ISO-2022-JP. Then
later in the code, skip the original line break code if ISO-2022-JP.
Comment 6•23 years ago
|
||
Comment 7•23 years ago
|
||
Attachment #74217 -
Attachment is obsolete: true
Comment 8•23 years ago
|
||
I have to put the line break earlier than the current patch. The converted
ISO-2022-JP string may be larger particularly for HTML because of the mixture of
the ASCII tags and JP text.
But line breaks actually break lines in <pre>, it breaks the formatting if I
insert too many. I experimented replacing <BR> by CRLF and that works fine
without loosing the format.
Comment 9•23 years ago
|
||
Q: A CRLF seems to behave the same as a <BR> within <pre>. So you
are replacing multiple instances of <BR> with a single CRLF?
Comment 10•23 years ago
|
||
Each <BR> is replaced by CRLF. If 4 <BR> then 4 CRLF.
Comment 11•23 years ago
|
||
editor bug
bug 92921 - composer should not replace CRs by <br> inside a <pre>
Comment 12•23 years ago
|
||
Attachment #74223 -
Attachment is obsolete: true
Comment 13•23 years ago
|
||
I have 3 comments about this patch:
1) Why refusing to work with severl <pre> block?
2) You should be able to avoid duplicating the buffer (once the whole one and
once the pre block)
2) ReplaceSubstring is a very expensive call as it will move the buffer from the
the position to replace up to the end every time chars get replaced. It's
exponential! We used to use it to convert linefeed during a reply and by writing
our own function we were able to save a lot of processing time especially with
large message (see TranslateLineEnding in nsMsgCompose.cpp)
By writting your own function, you should be able to do the job without having
to copy the buffer and with a minimum of block move.
Comment 14•23 years ago
|
||
I did not write my own but used nsString, unlike reply case, this is executed
after the user hit send and also only executed when quoting a plain text as html.
I am going to investigate if I can improve performance also try to support
multiple <PRE> in the body.
Note that the code is not needed once the editor returns CRLF.
Comment 15•23 years ago
|
||
Why don't you apply BASE64 encoding to whole text/html part?
It makes your life much easier.
Comment 16•23 years ago
|
||
That's a good idea. I wil try it.
Comment 17•23 years ago
|
||
Local search does not work for base64 encoded body, filed as bug 132340.
Comment 18•23 years ago
|
||
Attachment #75040 -
Attachment is obsolete: true
Comment 19•23 years ago
|
||
Comment on attachment 75246 [details] [diff] [review]
Specialize the code in order to avoid heap allocation.
Looks good. But please remove the NS_ASSERTION in matchUnichar, could be very
painfull if the buffer cointains a lot 8bits chars. R=ducarroz
Attachment #75246 -
Flags: review+
Comment 20•23 years ago
|
||
The assertion is for the char argument which is compared against the unichar
(the actual data from the body).
Comment 21•23 years ago
|
||
my bad, therefore it's fine.
Comment 22•23 years ago
|
||
I think you want jfrancis or mjudge to comment.
I think there is code in editor that converts text in <pre>
and replaces CRLF with <BR>. The reason it does that is to work around some
sort of frame / caret issue.
see http://bugzilla.mozilla.org/show_bug.cgi?id=92921
("composer should not replace CRs by <br> inside a <pre>")
and see http://bugzilla.mozilla.org/show_bug.cgi?id=85505
("Frame code and Caret code need work to allow caret in more places")
I'm guessing they are going to say: "Your patch wouldn't be necessary if #92921
was fixed, and that can't be fixed until #85505 is fixed. But those aren't
going to fixed any time soon, so we probably have do what you are doing."
But I'd like to hear what they think.
Also, see bug http://bugzilla.mozilla.org/show_bug.cgi?id=84261
That's when I added EnsureLineBreaks(const char *body, PRUint32 body_len);
Does your code get called before (or after) my EnsureLineBreaks()?
Can we avoid parsing the message twice by combining the two?
Comment 23•23 years ago
|
||
>Does your code get called before (or after) my EnsureLineBreaks()?
It is called before yours because it has to be done while the data is in Unicode.
>Can we avoid parsing the message twice by combining the two?
My patch addresses the long <pre> line problem only.
Comment 24•23 years ago
|
||
Attachment #75246 -
Attachment is obsolete: true
Comment 25•23 years ago
|
||
Seth is correct. For now, the editor has to put in <br>s.
One thing that comes to mind is that we are walking this data twice. Once to
serialze from the dom to an html stream. Then mail is walking that again to
replace <br>s. Wouldn't it be better to put a mode on the serializer itself to
do the replacemetn for you? It has to walk the data and build the string anyway,
why not leverage that?
cc'ing Tanu and peterv.
Comment 26•23 years ago
|
||
Can mail compose/send access the serializer and set the option?
The mail send code getting the body text using nsIEditorShell::GetContentsAs.
Comment 27•23 years ago
|
||
The editor already has flags telling it we are doing mail. The serializer has
flags out the wazoo. My guess is that passing the needed information from mail
to the serializer through the editor is not hard.
Assignee | ||
Comment 28•23 years ago
|
||
From nsIEditorShell.idl:
wstring GetContentsAs(in wstring format, in PRUint32 flags);
The flags are right there.
If all you want is for <br>s to be output as CRLF regardless of platform, the
plaintext serializer has flags to handle that (set both OutputCRLineBreak and
OutputLFLineBreak; if neither are set it will use the appropriate value for the
current platform).
Comment 29•23 years ago
|
||
I set both OutputCRLineBreak and OutputLFLineBreak but it did not affect the
lines inside <pre>, the lines still contain <br> only no CRLF.
148 // CR, LF, or CRLF. If neither of these flags is set, then we
149 // will use platform line breaks.
150 OutputCRLineBreak = 512,
151 OutputLFLineBreak = 1024,
Comment 30•23 years ago
|
||
Joe, could you write a new bug for the new serialier option so this bug can
depend on? I don't know what serializer method needs to change.
Comment 31•23 years ago
|
||
If we want to take jfrancis's approach, we need the following
1. we need clear description what kind of changes are needed for the serializer,
jfrancis- can you describe it ?
2. tmutreja agree to add such thing to the serializer and compelete it soon.
tmutreja- how is your load right now ? do you think you can make it ?
3. nhotta change the code to use the new option
should we land the current patch untill that happen ? Currently is very very bad
for Japanese users. It is almost a unusable mail product.
this is regression from 6.1
Keywords: regression
Comment 32•23 years ago
|
||
jfrancis suggest we change the serializer to do this, it seems the same issue as
bug 92921
http://bugzilla.mozilla.org/show_bug.cgi?id=92921
currently, bug 92921 is mark as nsbeta1- as the following reason.
------- Additional Comment #19 From Kevin McCluskey 2002-03-20 08:43 -------
nsbeta1-, topembed-. This will require significant layout changes. Reassigning
to Marc for further investigation. Clearing + on EDITORBASE
------- Additional Comment #20 From syd@netscape.com 2002-03-21 13:45 -------
Job is too big for moz 1.0, minusing
Therefore, I think it is too sensitive to address this bug (bug 129122) by
depending on that bug (92921).
Assuming bug 92921 won't got fixed by nsbeta1, should we go with nhotta's
current patch ?
Assignee | ||
Comment 33•23 years ago
|
||
It's awful to think that our model is so broken that the only workaround is to
implement part of the parser inside mail compose code. That seems so wrong.
If we can't get the underlying layout bug fixed (sigh!), easier and more
efficient, I would think, would be to make the html serializer do this (perhaps
on a flag, but really, do we need to preserve <br> tags inside <pre>?)
The html serializer could easily remember when it's inside a <pre> tag, and
output mLineBreak whenever it sees a br inside a pre. No need to parse the
serial html to keep track of the pre tags opening and closing, and it would
happen during a pass that we were doing anyway so it wouldn't slow things down.
I guess the question is whether anyone has time to do it before 1.0.
Keywords: regression
Assignee | ||
Comment 34•23 years ago
|
||
Here is an example of how the fix might look in the html serializer. I have
not tested this, so I don't know if this fix is complete or whether it needs
more work. I don't know for sure that these br tags are marked as dirty (but
they *should* be, and if they aren't it would probably be an easy editor fix;
that way, in the unlikely event that we ever serialize a document that really
should have br tags inside a pre that weren't put there by our editor, we won't
rewrite them).
Of course, if you decide you want to go with this, Tanu should review it and
make sure it's okay for the serializer.
Assignee | ||
Comment 35•23 years ago
|
||
Blech, that should be "return NS_OK" -- forgot to update the patch before I
attached it.
Comment 36•23 years ago
|
||
I applied Akkana's patch and now I am getting the body with CRLF instead of <BR>.
Thank you very much! How can we proceed from here?
Akkana, can I reassign the bug to you since your patch fixes the problem?
Assignee | ||
Comment 37•23 years ago
|
||
As long as you've tested it and it works, I don't mind checking it in -- go
ahead and reassign to me if you'd like. Tanu, could you please review the
patch? Naoki, is there an sr-er in intl who might be willing to sr this?
Otherwise, we can try Kin, but he doesn't have much time for sr'ing.
Comment 38•23 years ago
|
||
Thanks, reassign to Akkana.
No super-reviewer in intl group I think any super reivewer can review it, the
code is not intl specific anyway.
Assignee: nhotta → akkana
Status: ASSIGNED → NEW
Assignee | ||
Comment 39•23 years ago
|
||
Here's the correct patch, attached for review.
Attachment #75642 -
Attachment is obsolete: true
Attachment #75702 -
Attachment is obsolete: true
Comment 40•23 years ago
|
||
First off, sorry for not replying to all these questions sooner. I guess I kinda
dropped a bomb on this bug and then ran away :-)
But I'm glad folks are considering taking the serializer approach. Ordinarily I
hate to add yet another knob to the serializer, but mail performance is important
enough to bite the bullet here.
And I suspected that the fix would be very simple, a la Akkana's patch.
Having examined the patch, my only concern is the checking of the dirty
attribute. This is really an issue for the mail folks to decide: do you want
the serializer to replace all breaks inside pre's with linebreaks? Or do you
only want to replace those breaks which were not "originally" in the pre?
There are gotchas with both approach. If you do what the patch does now, if
someone writes a document with composer, saves it out, and later loads it again
and sends it as mail (say, by copying it and pasting it into mail compose), you
will not get the break replacement you desire. This is because once the breaks
"round trip" through save and reload, the dirty attribute will be gone.
The gotcha if you DO replace all breaks in pre's is that you will be potentially
altering a document in some way the author didn't desire, if they put the breaks
there intentionally.
All in all, given the goals of mail and typical usage patterns (as well as I can
guess them), I would suggest that we simply replace all breaks, and not just the
ones with the special "dirty" attribute.
Comment 41•23 years ago
|
||
Akkana your patch works fine for me too without breaking any build testcase. I
understand "JFrancis" comments above but in general I don't know the exact
usage of dirty attributes.
I feel only for mails we should replace the <br>'s because in that case it's
not visible to the end user and as long as mail is rendered in the correct
format it does not matter what we use behind but in other cases like copy &
paste, we should preserve the <br> tags.
Comment 42•23 years ago
|
||
> This is really an issue for the mail folks to decide: do you want
> the serializer to replace all breaks inside pre's with linebreaks? Or do you
> only want to replace those breaks which were not "originally" in the pre?
>
So which behavior do we have with the current patch?
I agree with Tanu but not sure if we need the mail specific behavior. In mail,
do we want to preserve the original <BR>.?
Need input from mail group, Jean-Francois, Seth?
Comment 43•23 years ago
|
||
In my understanding, we use <pre> only around plain text data like original
message body in a reply. Therefore the line break model we use should not matter
for the user. It's unlikely that the user will edit inside a <pre> tag except
maybe to remove some lines.
I personnaly vote to replace all the <br> by CRLF.
Assignee | ||
Comment 44•23 years ago
|
||
This patch follows Joe's suggestion which J-F also voted for: replace all br
tags in pre, not just dirty ones. I'm fine with that, personally, since my
understanding is that br tags inside a pre are technically incorrect anyway.
My previous patch was just erring on the side of not modifying existing
documents any more than necessary.
This does not address Tanu's suggestion of doing this only in mail. The
serializer currently has no way of knowing whether it's being called from mail,
so that would require adding a new flag to nsIDocumentEncoder, checking the
flag in the serializer, and passing it anywhere in mail that calls the html
serializer. That's a much more invasive patch, and in any case I'm not
convinced that this should be mail only; if it causes I18n problems in mail,
won't it also potentially cause problems in composer? I think we should do it
everywhere, not just in mail.
If someone wants to review this, we can work on getting it in. Alternately, if
anyone wants to make a case for adding the new flag and doing this only in
mail, speak now and I'll make a patch that does that. (Might want J-F's help
for the mail part of it.)
Comment 45•23 years ago
|
||
>if it causes I18n problems in mail,
>won't it also potentially cause problems in composer
That is mail specific, the i18n problem is caused by the mail code which forces
line breaks (bug 132938).
Keywords: dataloss
Comment 46•23 years ago
|
||
I'm hoping we will only be passing this flag through from mail usage. My
understanding is that is the case. If not please let me know.
The bad thing about doing this for all composer usage is that you will severely
affect load times of certain documents in composer. If you have a big
preformatted document with many lines of text, Composer will use breaks to
seperate the lines for reasons we are all sick of hearing about by now. :-)
But if the serializer keeps replacing those with newlines everytime you save,
then composer will have to convert those back to breaks every time you load the
doc. This will casue a delay when loading the doc.
So let's only instruct the serializer to do the translation when we know we want
it (ie, for mail).
Assignee | ||
Comment 47•23 years ago
|
||
Sounds like Joe and Tanu both would rather see this on a flag. So I've made
one -- here's the patch.
This is only the content part of the patch. Mail will have to use this flag
when it calls the serializer wanting this behavior. I'm not sure where that
happens: J-F, can you please do that part of the patch?
Assignee | ||
Comment 48•23 years ago
|
||
Ducarroz: if you want this in, it will be a lot easier if we try to get it in
Friday 3/28, since after that we'll have to go through ADT. Do we want to try
for Friday? Can you attach a patch of the places mail will have to change?
Tanu, does this approach look all right, and can you review this part of it?
Comment 49•23 years ago
|
||
I am working on it...
Comment 50•23 years ago
|
||
I have tested id with the test case and it works well.
Attachment #76365 -
Attachment is obsolete: true
Assignee | ||
Comment 51•23 years ago
|
||
Looks good to me. Can someone review this? Naoki? Tanu?
Comment 52•23 years ago
|
||
The change for nsMsgSend.cpp looks fine to me.
Comment 53•23 years ago
|
||
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag
Change to serializer looks fine to me. r=tmutreja for serializer.
Assignee | ||
Comment 54•23 years ago
|
||
Okay, we have review of both parts now. Seth, is your offer to super-review
still open?
Status: NEW → ASSIGNED
Whiteboard: has-review, needs sr, a
Assignee | ||
Comment 55•23 years ago
|
||
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag
Setting has-review bit, since Naoki reviewed the mail part and Tanu reviewed
the serializer part.
Attachment #76782 -
Flags: review+
Comment 56•23 years ago
|
||
Nominate adt1.0.0 because the dataloss for multibyte cases (e.g. Japanese) when
reply quoting plain text body to html.
Keywords: adt1.0.0
Comment 57•23 years ago
|
||
Removing adt1.0.0. Pls get a sr= before resubmitting as adt1.0.0 nomination.
Keywords: adt1.0.0
Comment 58•23 years ago
|
||
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag
sr=sspitzer
Attachment #76782 -
Flags: superreview+
Comment 59•23 years ago
|
||
Please update this bug with an [adt1] - [adt3] impact rating (or take it off the
list if it doesn't even rate adt3.) Thanks!
Assignee | ||
Comment 60•23 years ago
|
||
Guessing adt2, but Naoki knows the impact better than I do -- Naoki, is that
reasonable?
Whiteboard: has-review, needs sr, a → [adt2] has-review, needs sr, a
Comment 61•23 years ago
|
||
I really think this is a adt1 bug , based on the following
Impact Platform: ALL
Impact language users: ALL mutibyte charset in mail, which mean CJK users 132.8
M 23.7% of totaly current internet users. plus european users who want to use
UTF-8 mail (not sure about the number and %)
Probability of hitting the problem: HIGH, major workflow item
Severity if hit the problem in the worst case: the mail user send out will be
damage (data lost), only the receiver will notice that data lost, the data lost
won't show to the sender in the composer before the mail is send out.
Way of recover after hit the problem:
1. User can send out the mail again by using other mailer.
2. send out mail as plain text mail won't have this problem.
Risk of the fix: (TBD)
Potential benefit of fix this problem:
This is a Regression from 6.1
akkan, I cannot decide the Risk. Please state the risk of the patch, what could
be wrong in the worst case. If every thing go wrong (except crashing), what
could happen with your patch in the worst case ? (so we know what other thing
have no possibilty to be damage with your patch)
Comment 62•23 years ago
|
||
adt1.0.0+ (on ADT's behalf) for approval for checkin.
Assignee | ||
Comment 63•23 years ago
|
||
The risk is fairly low. Possible side effects:
- I believe this conversion will happen on saving as well, so as Joe pointed
out, there's a small performance penalty (which might become noticable on very
long documents) if we save a document to Drafts, then read it in again (saving
will be no slower, but loading it back into the compose window has to convert
line breaks back to br so that could be slower on a long page). It should still
be no slower than loading the same document into the compose window was the
first time.
- If the user edits a page that already has br tags inside pre (quite unlikely
unless it was coming from an older mozilla mailer or editor), those tags will be
converted when the message is sent (which most users would consider desirable,
but it's possible that someone somewhere might not want it).
So the risks should be small; I can't think of any side effects worse than that.
Comment 64•23 years ago
|
||
Is this waiting for 'a=' now?
Assignee | ||
Comment 65•23 years ago
|
||
Yes, waiting for a=. I've sent mail to drivers requesting it.
Whiteboard: [adt2] has-review, needs sr, a → [adt2] has-review, needs a
Comment 66•23 years ago
|
||
Comment on attachment 76782 [details] [diff] [review]
same as before (add a new flag) but with mailnews change for using new flag
a=rjesup@wgate.com for branch checkin. Please make sure it's checked into
trunk as well.
Attachment #76782 -
Flags: approval+
Assignee | ||
Comment 67•23 years ago
|
||
Now it just needs ADT approval. Frank Tang thought it deserved to be ADT1, so
I've changed the ADT ranking accordingly.
Whiteboard: [adt2] has-review, needs a → [adt1] has-review, needs adt approval
Comment 68•23 years ago
|
||
This is already adt1.0.0+ (comment #62).
Assignee | ||
Comment 69•23 years ago
|
||
Fix checked in to both trunk and branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 70•23 years ago
|
||
With 04/10 1.0.0 brach build, the attached mail (id 73080) is partially garbled.
But reply mail to the testing mail for bug 128908 is okay. Naoki, is attachment
(id 73080) posted for problem illustration or for testing?
Comment 71•23 years ago
|
||
The mail attached to this bug id=73080 is already broken.
Please test using the attachment of bug 128908 id=72558.
Please also try the attachment of bug 132938 id=77353.
Reporter | ||
Comment 72•23 years ago
|
||
Verified as fixed with 4/10 1.0.0 branch build.
Status: RESOLVED → VERIFIED
Comment 73•23 years ago
|
||
adding verified1.0.0 keyword (branch verification) based on bonsai checkin and
QA comments.
Keywords: verified1.0.0
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•