Closed Bug 32336 Opened 24 years ago Closed 24 years ago

Double formatting

Categories

(Core :: DOM: Serializers, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: BenB, Assigned: BenB)

Details

Attachments

(2 files)

<quote src="my mail to akk">
As you know, ScanHTML looks for URLs and structured phrases and *adds*
HTML tags, if it thinks to have found one, leaving the original in. I
worried about it, because it is basically a double formatting, but in
all cases, I could think of, it was no problem in practice. In fact, it
revealed several scary data flows, like quoting of plain text in plain
text based on the displayed HTML, i.e. TXT->HTML->TXT.
But there's one case, for which I don't have a solution:

1. The user types (or quotes) "*bla*"
2. It is sent out as HTML (only), so we run ScanHTML over the msg before
sending. The result is "<strong class=txt_star>*bla*</strong>".
3. The recipient also uses Mozilla.
4a. The recipient decides to respond via plain text. At send, the msg,
including the (HTML) quote, runs through nsHTMLToTXTSinkStream, which
values "<strong>" as important information. The result is "**bla**".
or
4b. The recipient decides to respond via HTML. At send, the msg,
including the (HTML) quote, runs through ScanHTML, which values "*" as
important information. The result is "<strong class=txt_star><strong
class=txt_star>*bla*</strong></strong>".
Similar for URLs.

Now, imagine 2 Mozilla users discuss via email/news... I don't know, how
to solve this without special-casing (i.e. making nsHTMLToTXTSinkStream
and ScanHTML recognize our inserted HTML tags, which might not even
possible for a-tags). Do you?
</quote>

This should be fixed in HTML->TXT converte, since the problem also appears, if
mozTXTToHTMLConv is not called. E.g. we have the same problem, if a 4.x user
writes an URL and sends it as HTML only, and I quote it in plain text.

Proposed fix:
1. Add a class=txt_url to a elements generated by ScanTXT.
2. Do nothing in nsHTMLToTXTSinkStream for <em>, <strong>, <code> and <a>, if
class=txt_*.
3. Test in nsHTMLToTXTSinkStream::AddLeaf, if mURL.IsEmpty. If not, check, if
mURL is equal the content. If yes, mURL.Truncate().

1. and 2. fixed the double quoting we cause, and 3. fixes the double quoting
cuased by other mailers like 4.x.

But 3. has the problem, that it will do a if "(mURL.IsEmpty())" for *every* leaf
node. Fortunately, nsString caches the length in a PRUint32, but still... Akk,
do you think, that is a problem?
Status: NEW → ASSIGNED
Target Milestone: --- → M16
Target Milestone: M16 → M18
> it will do a if "(mURL.IsEmpty())" for *every* leaf node.

Intelligent Send does the same, so I guess, that's OK.

> 1. Add a class=txt_url to a elements generated by ScanTXT.

Done in bug 32420 with "class=txt-link"

> if class=txt_*

class=txt-*

> Proposed fix:

4. In mozTXTToHTMLConv::ScanHTML, skip content of tags with class=txt-*.
Whiteboard: Fixed except 4.
Target Milestone: M18 → M17
Well, it turns out, that I'm lucky and I don't need to special case in ScanHTML.
- With my latest changes, I insert e.g. "<strong class=txt-star><span
class=txt-tag>*</span>bla<span class=txt-tag>*</span></span>". There's no leaf
with *bla* anymore, so I don't "enhance" it.
- I skip <a> elements incl. content anyway.
- Glyph conversion removes the orignal content.
So, there's "by accident" no case, where we enhance twice.

I also added support for sub: "H<sub>2</sub>O" -> "H_2 O" (That's the convention
according to Richard Zach).

Akk, can you review and checkin, please? Patch follows.
Whiteboard: Fixed except 4. → Fixed. Waiting for review, approval and checkin.
Keywords: patch
Attached patch Fix, version 5Splinter Review
Fix fianlly checked in.

My first checkin to Mozilla. WHOOOHOOO!
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Whiteboard: Fixed. Waiting for review, approval and checkin.
verified in 7/25 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: