16507 - Improve Plain text -> HTML

I don't know if we will support directly those formating command during composition but we should be able to display them correctly. About the URLs, if I am not wrong, we already support it during the display of a message.

Ben Bucksch (:BenB)

Assignee

Comment 2

•

26 years ago

Yes, URLs are clickable (although the <URL: and > still appears). Why not support during composition? (Assuming composition is done via HTML editor and then converted to plain text.) We could stop data loss. At the moment, I'm just "evaluating" (browsing through the code).

Phil Peterson

Comment 3

•

26 years ago

Ben, are you talking about these improvements in the context of reading a plain text message which has *foobar* and then generating foobar for display purposes? If so, rhp@netscape.com is the right owner for the bug. Or are you talking about doing something when composing a message? If so, I'm not clear what your suggesting. In any case, I don't think we need to do any more with URLs than we do. We recognize them just fine without the extra syntax.

Ben Bucksch (:BenB)

Assignee

Comment 4

•

26 years ago

Both. The first case together with your metion of URLs catches the part plain text -> HTML. The conversion after composition from HTML -> plain text e.g. looses URLs and other formatting (URLs being the worst). Let's say I composed this <a href="http://www.mozilla.org">message</a> and decide to send m/a. The plain text part looks like this: Let's say I composed this message but it should look like this: Let's _say_ I *composed* this message <URL:http://www.mozilla.org> I assigned it to me, because I wanted to see, what I can do. If you want it to be implemented soon, assign it to rhp.

Karl Ove Hufthammer

Comment 5

•

26 years ago

Perhaps also foo^h^h^hbar in plain text could displayed as <strike>foo</strike>bar?

Ben Bucksch (:BenB)

Assignee

Comment 6

•

26 years ago

Attached patch conversion *bold* -> strong, _italic_ -> em — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 7

•

26 years ago

rhp, could you please review my patch and check it in? Please review very very carefully. I'm sure, there're still all kinds of bugs. At least, it's egcs proven and seems to work on Linux. (I have no licence for VC :-(.) Deleteme are tmp. comments for you.

Phil Peterson

Comment 8

•

26 years ago

This all sounds like good stuff. I completely agree with Ben's 10/15 comments that sending <a> links through plain text should include the URL. I'm cc'ing rhp since he can probably suggest places where reading plain text and generating HTML could be improved (as with the smiley face for :-) in mozilla) and akkana since she can probably suggest places to improve outputting the editor content model as plain text. Ben, if I were you, I might split this bug up into several smaller ones, but it's your call.

Mike Shaver (:shaver emeritus)

Comment 9

•

26 years ago

I always thought it would be _underline_, *bold*, /italic/. Isn't that the traditional way? (I'd love to see |code| work, too.)

rhp (gone)

Comment 10

•

26 years ago

Yes, the ability to do this type of formatting/text recognition is in the code today and was enhanced from 4.x to 5.0. In 5.0, plain text URL's that are not prefaced with the protocol (i.e. www.netscape.com) will be recognized as URL's. Also, email address that are just typed as rhp@netscape.com will also be link-a-fied. Something I should point out is that even plain text mail display is being done with an HTML capable rendering engine, so you can tweak the text however you want with HTML tags and the display will do the right thing. While I was doing that, I played with having emoticons display as the image they are supposed to represent. So :-) got replaced with a little smiley face. Of course, "purists" told me I was ruining the Internet so that is why I put it on a preference setting. The code to do this is somewhat isolated, but when I get some spare time (ha, ha, ha, ha...ok, I'm done) I wanted to make this thing truly extensible. I would love to be able to have an interface that would let you do whatever you wanted to do to the output before display. The code of interest here is in the file: http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/nsMimeURLUtils.cpp Look at the function: nsresult nsMimeURLUtils::ScanForURLs() and you can see what is going on. Enjoy! - rhp

Akkana Peck

Comment 11

•

26 years ago

Adding myself to cc list. I'm puzzled why this is in libmime -- shouldn't it live in the normal output methods, in nsHTMLToTXTSinkStream.? Does libmime do its own output conversion? It would be pretty easy to add these conversions to nsHTMLToTXTSinkStream.

Phil Peterson

Comment 12

•

26 years ago

Akkana, I think the reading side is in libmime, but the writing side uses your output stuff.

rhp (gone)

Comment 13

•

26 years ago

The rendering side of this lives in libmime. We also do some basic link-a-fying in the compose back end for when you type http://www.netscape.com into an HTML compose window, but don't acually create the link. - rhp

Ben Bucksch (:BenB)

Assignee

Comment 14

•

26 years ago

Does anybody know a RFC that can second/deny Mike's comment? |code| would be very easy to implement. But I need more info about usage, I've never seen that. Is this used for vars or code fragments? Are they aligned in blocks? I build in many security proofs, so none of the following would be converted: |code;| |<code>| |code code| But I could change this, if someone cann tell me more details, or, even better, a spec.

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Component: Composition → MIME

Summary: Support *bold*, _italic_ and URLs in plain text → Plain text -> HTML: *bold*, _italic_ and URLs

Ben Bucksch (:BenB)

Assignee

Comment 15

•

26 years ago

Obviously, we need to splitt HTML -> plain text conversion off. Created bug #16800. Summary changed. Not sure what Component "Networking-Mail" means, choosed MIME.

Ben Bucksch (:BenB)

Assignee

Comment 16

•

26 years ago

It's just a cosmetic change, but it would be nice, if <URL:...> would be converted to just <a href="...">...</a>, not <URL:<a href="...">...</a>>.

lchiang

Comment 17

•

26 years ago

Eventually, this will have to be tested. Is there a spec or something that we can follow and write testcases for so that we get good coverage on this feature?

Ben Bucksch (:BenB)

Assignee

Comment 18

•

26 years ago

lchiang, "Spec" is in the source as comment :-): *Bold* -> DELIMITER: not alphanumeric and not "*" We're searching for the following pattern: DELIMITER - "*" - ALPHA - [ some text (maybe more "*"-pairs) - ALPHA ] "*" - DELIMITER. is only inserted, if existance of a pair could be verified Same for _italic_ -> This is generally used to *stress* some word or *some phrase*, both cases should be covered, many others * bold * are excluded by intend not be be triggered by "5 * 3 * 4 = 60" (savety first). Providing test cases would make QA somewhat useless. (My own test cases work, of course.)

Phil Peterson

Comment 19

•

26 years ago

Not sure what HTML you'd generate for |code|. Since we're reading a text/plain message, it would already be rendered in a monospace font. Not much point in wrapping it in a <pre>. shaver, did you have something in mind? Lisa, I think this is testable in pretty much the same way as colorizing quoted material in plain text messages: 1. Send yourself a plain text message with *foo* /bar/ _baz_ 2. Read the message, and note that foo is bold, bar is italicized, and baz is underlined. Further capabilities (like what we do with |code| or <URL:xxx> TBD, I think.

Ben Bucksch (:BenB)

Assignee

Comment 20

•

26 years ago

lchaing, I just saw, *you* wanted to write testcases. Sorry, misuderstood you. BTW: "[ something ]" means "something" is optional. Is the Spec clear enough?

Ben Bucksch (:BenB)

Assignee

Comment 21

•

26 years ago

Phil, <code> come to my mind :-): <URL:http://www.w3.org/TR/REC-html40/struct/text.html#h-9.2.1>. Plaintext is not neccessary rendered as monospaced (at least in 4.x). I'm reading in a proportional font (screwing up tables and ASCII-art :-( ). But even if display is monospaced, <code> should be rendered differently to distinguish it from (prosa) text.

lchiang

Comment 22

•

26 years ago

(Thanks - I will review all this next week)

Ben Bucksch (:BenB)

Assignee

Comment 23

•

26 years ago

See <URL:news://news.mozilla.org/380D04D9.4D86E941@bucksch.org> ("ASCII-art detection" under "Assuming "plain text" or "html mail".." at n.p.m.mail-news from 19 Oct 99 23:55:05 GMT) for ASCII-art proposal.

John Moreno

Comment 24

•

26 years ago

I agree with Shaver, /italic/ and _underline_ are frequently done this way. I know of an amiga newsreader that does this, should I find out what all it does?

Ben Bucksch (:BenB)

Assignee

Comment 25

•

26 years ago

Planb, see shaver's comment on bug 16800. But a RFC or at least an Internet-Draft would be really helpful, I couldn't find any mention.

Ben Bucksch (:BenB)

Assignee

Comment 26

•

26 years ago

Attached patch Rewritten using nsString, bugs fixed — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 27

•

26 years ago

Usual warnings apply, this time especially regarding the passing of nsString between functions (leaks etc.). Again: I'm unable to take any responsibility for the code :-(. /italic/ works now. _underline_ is transformed to , since is deprecated, I would have to use stylesheets. Any ideas? |code| is commented out, because it is invisible in monospaced viewers and I remove the "|". It also works the same as *bold*, need more info (see my notes above).

Phil Peterson

Comment 28

•

26 years ago

RichP, would you code review this please?

Ben Bucksch (:BenB)

Assignee

Comment 29

•

26 years ago

Attached patch Fixes a leak in last patch. — Details — Splinter Review

rhp (gone)

Comment 30

•

26 years ago

Sorry for the delay in this review. Looks good to me! - rhp

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Target Milestone: M15 → M11

Ben Bucksch (:BenB)

Assignee

Comment 31

•

26 years ago

Cool. Marked M11. Need suggestions for ascii-art detection, see news://news.mozilla.org/380D04D9.4D86E941@bucksch.org or http://www.deja.com/msgid.xp?MID=<380D04D9.4D86E941@bucksch.org> and it's reply. BTW: I just noticed, dejanews uses "<" and ">" in URLs. Nice. |StructPhraseHit(nsCAutoString text, PRBool col0, ...| should better be |StructPhraseHit(const nsCAutoString text, PRBool col0, ...|

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Assignee: mozilla → rhp

Status: ASSIGNED → NEW

Ben Bucksch (:BenB)

Assignee

Comment 32

•

26 years ago

Assigning to rhp, so he can check it in.

rhp (gone)

Updated

•

26 years ago

Assignee: rhp → mozilla

rhp (gone)

Comment 33

•

26 years ago

Hi Ben, I am really suspect of the Right() call. I don't understand why you don't get garbage on return. If you do get a valid string returned, then the nsCAutoString is returning an allocated string, which means we are leaking. The problem is there are tons of string classes so I am unsure of the exact behavior. I would probably return a newly allocated string and free it on the calling side. This may be what nsCAutoString is doing (without the free), but I'm not sure. I know that nsString.ToNewCString() will do this, and then you have to free the memory. - rhp

Ben Bucksch (:BenB)

Assignee

Comment 34

•

26 years ago

Rich, I did some research and everything is like I hoped it would be :-). I love C++. - return copies Objects are returned by invoking the copy constructor (1997 C++ Public Review Document, Section [class.copy], <URL:http://www.maths.warwick.ac.uk/cpp/pub/wp/html/cd2/special.html#class.copy>). This is the reason, why I "don't get garbage". [stmt.return] <URL:http://www.maths.warwick.ac.uk/cpp/pub/wp/html/cd2/stmt.html#stmt.return> - Destruction on out of scope If an (automatic) object falls out of scope, the destructor is called. [class.dtor], paragraph 10, case 2 <URL:http://www.maths.warwick.ac.uk/cpp/pub/wp/html/cd2/special.html#class.dtor> - Example An example of my usage of objects is at [class.temporary], Paragraph 2 <URL:http://www.maths.warwick.ac.uk/cpp/pub/wp/html/cd2/special.html#class.dtor> - |ns*AutoString|s free the memory at destruction. "The point of nsAutoStrings is [...] to auto-destroy the string when it goes out of scope." (<URL:http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsStr.h#132>) If my understanding of |ns*String| is correct, all |ns*String|s free the memory at destruction, if they own it (see <URL:http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsString.cpp#137> and <URL:http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsStr.h#239>). Reassigning to me, since checkin is done.

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Status: NEW → ASSIGNED

Ben Bucksch (:BenB)

Assignee

Comment 35

•

26 years ago

Sorry, the link for the example is wrong. The correct one is: <URL:http://www.maths.warwick.ac.uk/cpp/pub/wp/html/cd2/special.html#class.temporary>

Ben Bucksch (:BenB)

Assignee

Comment 36

•

26 years ago

huftis, I didn't forget your question^H^H^H^H^H^H^Hproposal, but it will be hard to implement, because the code walks char by char through the msg. The other plain text tags enclose the phrase like HTML tags do, so I could just substitute. I think, /I/ won't implement that^H^H^H^Hyour proposal.

Karl Ove Hufthammer

Comment 37

•

26 years ago

> huftis, > I didn't forget your question^H^H^H^H^H^H^Hproposal, > but it will be hard to implement. I think, /I/ won't > implement that^H^H^H^Hyour proposal. OK, but how about character substitution. Example: => --> U+21D2 --> or -> --> U+2192 And perhaps even: ^2 --> U+00B2 1/2 --> U+00BD (C) --> U+00A9

Ben Bucksch (:BenB)

Assignee

Comment 38

•

26 years ago

Attached patch Some (more) glyph substitution and exponents — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 39

•

26 years ago

huftis, the following strings are not substituted: |TXT |HTML |Reason +------+---------+---------- -> ← Char not displayed on Linux (not even a placeholder)* => ⇐ dito <- → dito <= ⇒ dito (tm) ™ dito 1/4 ¼ is triggered by 1/4 Part 1, 2/4 Part 2, ... 3/4 ¾ dito 1/2 ½ similar != ≠ used in C/C++(-pseudo)-code <= ≤ dito ... ... dito +------+---------+------------ *I'd like to know why. I'm substituting "(c)", "(r)" and "+/-" (using rhp's glyph substitution code), but I'm not even sure, if the signs for these display correctly on all platforms (tested only on Linux). You might be interested in <URL:http://www.w3.org/TR/REC-html40/sgml/entities.html>. rhp, could you please review that and check it in? Tnx. QA, Test this (all my patches) with wild and unusual test cases. Every substitution, where it shouldn't be, is a bug. File it against me.

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Summary: Plain text -> HTML: *bold*, _italic_ and URLs → Improve Plain text -> HTML

Ben Bucksch (:BenB)

Assignee

Comment 40

•

26 years ago

Oh, I forgot the best :-): Exponents are 'ed. Changing Summary.

Karl Ove Hufthammer

Comment 41

•

26 years ago

Wouldn't it be better if ^5 was substituted with ⁵ (U+2075). If the font didn't contain that glyph, it could *then* be converted to 5 (see BUG #12662 <URL:http://bugzilla.mozilla.org/show_bug.cgi?id=12662>).

Akkana Peck

Comment 42

•

26 years ago

I've filed a couple of bugs on some of these entities which aren't displayed in Linux (• is another one). Bug 454 seems to be the main bug concerning these (currently marked as TRIVIAL so I wonder if we're going to be stuck with this bug forever); 5383 concerned ™ but was duped to 454; 16872 is another one on &bull specifically (which might be a different issue since it can be done in gfx instead of requiring a font that has those characters).

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Depends on: 454

Matthew Tuck [:CodeMachine]

Comment 43

•

26 years ago

If _a_ means underline, you should use underline. If you don't want to use , then use and CSS. It's certainly better to have HTML be document markup rather than presentational, but these ARE distinctly presentational, and I think they should be rendered presentationally. The same for italic - making it em seems wrong to me.

Ben Bucksch (:BenB)

Assignee

Comment 44

•

26 years ago

Matty, what is the plain text equivalent to ? I usually don't want to underline something, I want to stress something in different levels. I think, is deprecated for a good reason.

Akkana Peck

Comment 45

•

26 years ago

I would suggest that *emphasized text* is used more to indicate emphasis (i.e. ) than to indicate bold. I would expect *starred text to be in italic* (or whatever is being used for and ALL_CAPS text to be in BOLD. I agree that _underlined_text_ should map to since that ascii construct is very specific (and awkward to type, so no one would use it unless they really do mean underline).

Ben Bucksch (:BenB)

Assignee

Comment 46

•

26 years ago

*sigh* The only thing, on which we all agreed till now, were that *stars* mean bold. I use _this_ usually to emphasize something (but not as much as *bold*), having italic in mind. So I _do_ use it and do not mean underline. And I think, others do, too. We can't use , because it's deprecated; we would have to add a stylesheet. My personal opinion is: I want presentational layout to die. And underline is an ugly looking leftover from the times of typesetting machines, where there were no other methods to stress something.

Ben Bucksch (:BenB)

Assignee

Comment 47

•

26 years ago

What, if I add a pref "mail.do_struct_phrase_presentational" defaulting to FALSE, that maps *bold* to , /italic/ to and _underline_ to underlining based on stylesheets? (We already have an "mail.do_struct_phrase" (and "mail.do_glyph_substitution" BTW). Maybe, we could compensate them and change the names.)

Matthew Tuck [:CodeMachine]

Comment 48

•

26 years ago

Ben, you shouldn't use any more than . It may not be deprecated, but it's still presentational, and hopefully it will be deprecated in future. I think you can use a span and have a style attribute that will allow you to do all with CSS. Regarding translating em and strong to plain text, I'm not really sure, but it would be really nice to use a stylesheet. That sounds quite complicated though. Maybe just use bold. I guess it seems they aren't used consistently, in which case your original mappings make sense. This way at least the user can edit their plaintext stylesheet to reflect how they want messages displayed.

Ben Bucksch (:BenB)

Assignee

Comment 49

•

26 years ago

Matty, a stlyesheet for all formatting sounds like a good idea to me. Unfortunately, I never really worked with stylesheets. May take some time till code follows. HTML -> plain text is offtopic here now, see bug #16800. I don't know, how you want to use a stylesheet for HTML -> plain text conversion, but this is too late anyway, I think (unless this is a really good idea), because I already have working code. I just need to make last checks.

Matthew Thomas, usability weenie

Comment 50

•

26 years ago

My two cents, as maintainer of the news:alt.ascii-art FAQ (http://cantua.canterbury.ac.nz/~mpt26/art/ascii/faq/): (1) ASCII artists (and others who use ASCII art sigs, etc) are going to be * extremely* annoyed if the formatting-detection algorithm guesses wrong. And it * will* guess wrong occasionally, no matter how good it is. (What happens, for example, if I insert a /*C comment*/ ...) ASCII artists like Mozilla (in its current incarnation as Netscape Messenger) because, unlike MS Outlook, it leaves ASCII art alone. And they'd rather it stayed that way. As a compromise, I would suggest doing it the way (IIRC) XEmacs' mail reader does it. That is, apply the formatting, but *leave the special characters there* (perhaps dim them, but leave them there). That way things won't get too mangled if the algorithm guesses wrong. (2) When I use /slashes/, sometimes I mean , and sometimes I mean <cite>. You can't know which I mean, because text isn't structural markup. So I see no option except to use CSS italics. Similarly for *asterisks*, you can't know whether I mean or <vector-space> or whatever, so CSS bold is really the only option. (3) If URLs without protocols are going to be detected as http addresses, as rhp@netscape.com suggests, surely the wrong thing is going to happen for these: * you can download this by anonymous FTP at foo.bar.net * to begin, telnet to library.canterbury.ac.nz and log in as "guest" * you are cordially invited to mozilla.party three.oh And please don't tell me you're just going to do it for domains starting with `www.', or I'll scream. -- mpt (http://critique.net.nz/ -- not a www. in sight)

Mike Shaver (:shaver emeritus)

Comment 51

•

26 years ago

What he said. I regularly send stuff like this: ``we should just prune entries matching /(mozilla\.org|netscape\.com)$/'' ``go into your srcdir and rm *TitledButton*, then update your tree'' and I mean neither italics nor emphasis. DWIM is very hard to get right.

Ben Bucksch (:BenB)

Assignee

Comment 52

•

26 years ago

mpt and Mike, /(mozilla\.org|netscape\.com)$/'' would not be changed as described to lchiang. But rm *TitledButton* and /*This comment*/ would. I'll let the plaintext tags in the text. mpt, what is bad with em tags (maybe with a type attribute), if you have a stylesheet? I think, disabled people wouldn't like CSS bold. (Would they like us to leave plain text tags in? Maybe one more pref? :-( )

Matthew Tuck [:CodeMachine]

Comment 53

•

26 years ago

Agreed that the characters should not disappear. Disability issues should go away then, and there's always the chance to apply a stylesheet to a plaintext message. What I was referring to about stylesheets on plaintext messages applies both to this, and it would also be nice to turn quoting into using <BLOCKQUOTE> (do we do this already?) Sorry about being offtopic, I got confused for a minute.

Ben Bucksch (:BenB)

Assignee

Comment 54

•

26 years ago

matty, sorry, I don't understand anything, what you're saying, but maybe I'm just too tired.

Mike Shaver (:shaver emeritus)

Comment 55

•

26 years ago

I don't see any explanation of why /[abcdef0123456789]/ wouldn't get ``fixed'' to regex above. Can you elaborate?

Ben Bucksch (:BenB)

Assignee

Comment 56

•

26 years ago

Mike, the explanation in question is (to make sure, we're speaking about the same thing) for *bold* -> : DELIMITER: not alphanumeric and not "*" We're searching for the following pattern: DELIMITER - "*" - ALPHA - [ some text (maybe more "*"-pairs) - ALPHA ] "*" - DELIMITER. is only inserted, if existance of a pair could be verified What do you meman with "/[abcdef0123456789]/"? Should *I* evaluate that or take it as string? If this or "/(mozilla\.org|netscape\.com)$/" appears exactly that way (or without the quotes) in the msg, my code would leave it. But if you mean, if "/abc[abcdef0123456789]def/" (with or without the quotes) would be changed, the answer is yes. (But neither "/abc[abcdef0123456789]789/" nor "/abc[abcdef0123456789]/" would be changed.)

Ben Bucksch (:BenB)

Assignee

Comment 57

•

26 years ago

It's not yet clear, what "change" means. At the moment, "/" is substitued with "". But as you pointed out, it would do the wrong thing for "rm *diff*". The only solution I see is, as mta suggested, to let the plain text tags in and dim them. Mikes idea of content-before in CSS to readd the TXT tags sounds very nice, but I'm not sure, what would happen, if we reply with HTML. The TXT tags *should* still be there, even if the recipient uses a non-CSS-capable HTML viewer.

Ben Bucksch (:BenB)

Assignee

Comment 58

•

26 years ago

mpt, start to scream: <URL:http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/nsMimeURLUtils.cpp#323> But I don't know, what so wrong with guessing www.bucksch.org would really be a reference to http://www.bucksch.org.

Ben Bucksch (:BenB)

Assignee

Comment 59

•

26 years ago

Matty, I still don't understand your comment. I think, when we /remove/ the plaintext tags, this will help disabled persons ( how will "..we slash remove slash the.." sound?). The interface (whatever this may be) can do, what it thinks is best with tags like em and strong, and they're well-known. This is one of the reasons why I vote for or similar and not any "font-style: italic". I never heard of stylesheets for plain text. What is that?

Matthew Tuck [:CodeMachine]

Comment 60

•

26 years ago

A stylesheet for plaintext messages, as in, by the time you do Plain Text -> HTML, it will be styleable. Your comment about disability is true, but for all users, the potential loss of characters is too great a chance no matter what scheme you adopt.

chris hofmann

Updated

•

26 years ago

Target Milestone: M11 → M12

chris hofmann

Comment 61

•

26 years ago

any fix in hand for this? moving to m12. move back if its ready to resolve in the next day or so.

Ben Bucksch (:BenB)

Assignee

Comment 62

•

26 years ago

I'll try to get rid of the hardcoded quote formatting, too, and use the stylesheet.

Matthew Thomas, usability weenie

Comment 63

•

26 years ago

*** mozilla@bucksh.org said, > `what is bad with em tags (maybe with a type attribute), if you have a > stylesheet?'. What is bad with them is that: - you'd convert "I saw /Gone with the wind/" to "I saw Gone with the wind", when it should be "I saw <cite>Gone with the wind</cite>" - you'd convert "some random value of /x/" to "some random value of x", when it should be "some random value of <var>x</var>". See? /slashing/ is presentational content. You can't tell which semantic thingy I mean by it, so the only honourable thing to do is to go (or whatever) instead. The same with *asterisking* as either or <vectorspace> (or whatever the MathML tag for a vector space is). You have to do . I'm an ardent defender of the Internet rights of disabled people, but you have to give up the semantic content in this case, simply because you don't know which semantic content was intended. This is plain text we're dealing with, remember, so it's not as if we're making things any worse. (Are you going to try to turn >s into <blockquote class="cite"></blockquote>, for example? That would be similarly confusing for disabled people ...) *** mozilla@bucksh.org wrote: > mpt, start to scream: > <URL:http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/nsMimeURLUtils.cpp#323> > > But I don't know, what so wrong with guessing www.bucksch.org would really be > a reference to http://www.bucksch.org. Aaaaaaaaaaaaargh! (Further screaming available on request.) Because it will lead people to assume that if something doesn't start with `www.', it's not a Web addresss. It's not going to highlight cnn.com, or slashdot.org, or home.netscape.com, for example. That sucks lots. *** I've been drawing some ASCII art (a mockup of a new Mozilla prefs dialog, actually), and keeping it in my Drafts mail folder. I was not amused to access the message using the latest nightly build, and find the following. * {anything}@{anything} is assumed to be an e-mail address, even if {anything} has non-e-mail-characters such as `)' in it. What's up with that? What's wrong with "mailto:"? * The smiley algorithm interfered with my picture (excerpt attached). Make the smiley faces go away. NOW. Matthew `a smiley killed my father' Thomas

Matthew Thomas, usability weenie

Comment 64

•

26 years ago

Attached file damage done to ASCII art — Details

Ben Bucksch (:BenB)

Assignee

Comment 65

•

26 years ago

Matthew Thomas, thanks for your notes. First: I'm not responsible for the URL and smiley things :-). Nevertheless, a discussion on IRC brought up the same problem, and I fixed it. I also readded the plaintetxt tags to the generated msg (the stars in *bold* are included in the msg content now). I didn't use content-before/after, in part, because it might get lost in a HTML reply viewed in other MUAs. The code lies on my machine, because the tree is closed. At the moment, we (with my version) search for the following patterns: ":-)", ":-(, ";-), ";-P", " :)" and " :(". (":(" occurs C++ Code.) They are still very wide, because they have to catch e.g. "... bla :-).". Remember, you can always disable it, we even have separate prefs for Gylph substitution and structured phrases. Just look to your own post to answer, why we try to find email adresses w/o "mailto:" (BTW: my domain is "bucksch.org", not "bucksh.org"). Can you point me to the RFC and place, where the valid characters of email adresses are defined? "(" is at least valid in general URLs. >Because it will lead people to assume that if something doesn't start with >`www.', it's not a Web addresss. Sorry, but this reasoning is broken, for everything. A <= B must not be true, if A => B. There's no problem, if it doesn't highlight cnn.com, it's no valid url. All we do is guess, that www.cnn.com is one. Structured phrases: > - you'd convert "some random value of /x/" to "some random value of > x", when it should be "some random value of <var>x</var>". This is a misuse of the convention, because |code| is used for marking code fragments. Nevertheless, I see nothing wrong in emphazising "x". > - you'd convert "I saw /Gone with the wind/" to "I saw Gone with the > wind", when it should be "I saw <cite>Gone with the wind</cite>" dito (with you should have used """, but strong is not bad). > This is plain text we're dealing with, remember, so it's not as if we're > making things any worse. I have to give that back. Speaking of structured phrases: I just *add* markup, the content remains now. And I hope, reading humans will be able to correct the 1% of wrong markup we may cause, although I try to avoid wrong markup if possible. > Are you going to try to turn >s into <blockquote class="cite"></blockquote> I don't understand that.

Ben Bucksch (:BenB)

Assignee

Comment 66

•

26 years ago

Corrections: > All we do is guess, that www.cnn.com is one. All we do is guess, that www.cnn.com can be transformed into one and that it was the intention of the author to do so. > Nevertheless, I see nothing wrong in emphazising "x". Nevertheless, I see nothing wrong in emphazising "/x/".

Ben Bucksch (:BenB)

Assignee

Comment 67

•

26 years ago

I just noted another problem: "/italic/" is doubled markup and conversion back to plaintext (done as usual at Mozilla Mailnews) will result in "//italic//". (We will stop here, because I don't convert "//italic//".) Of course, we can avoid that in own our conversion, but not in the ones of other mailers (mail in plaintext, reply via HTML, reply via plaintext). Maybe a /italic/" is better. But this will take the ability from other mailers to pretty-up the text (while it doesn't force to do so).

Matthew Thomas, usability weenie

Comment 68

•

26 years ago

1. So the style-triggering characters are being left behind when the styles are applied. Good. One less thing for GNUS users to gloat about. 2. In the same vein, why not just colorize smileys instead of replacing them with a graphic? For example in ":-)", make the ":" blue (eyes), the "-" brown (nose), and the ")" dark red (mouth)? That would get the effect across, without corrupting accidental smileys, in the same way as making *this* bold gets the effect across without corrupting *accidental* strings. 2. Here's a test case for you: Senator John Smith (R) said today that he was not amused at Mozilla thinking his name was a registered trademark ... 3. > But this will take the ability from other mailers to pretty-up the text (while > it doesn't force to do so). Let's get this straight ... are we applying all this formatting to *outgoing* messages, or just to the display (and not the replying-to or forwarding) of *incoming* messages? I sincerely hope it's just the latter ... people won't be happy if, Eudora-Mail-like, you're misrepresenting the contents of forwarded messages. 4. What I meant by converting >s to <blockquote>s is converting, for example, > > foo! > bar! to <blockquote class="cite"><blockquote class="cite">foo!</blockquote>bar!</ blockquote>. But it's a bad idea, so don't do it. :-) 5. Section 6.1 of http://www.faqs.org/rfcs/rfc822.html says that the local part of an e-mail address must be word *("." word) and I'd be surprised if `word' included brackets (it's not defined in the RFC). But that might be outside the scope of this bug ...?

Ben Bucksch (:BenB)

Assignee

Comment 69

•

26 years ago

Matthew Thomas, I just noticed, that you brought that discussion to alt.ascii. I'm not sure, if I like the discussion to be that open. 2. (The first) That's rhp's decission, but I like the graphics smilies. If they start to annoy me, I'll disable Glyph substitution. 2. (The second) Tomorrow, will somebody say ":-)" is a valid word in some language :-). Dunno, what to do with that. Anyone else? (Two "2."s: That's the reason, why we have HTML mail :-).) 3. When I started implementing this, I assumed, I only change display. Later, Akk told me, that we generate the quote in a HTML reply from the displayed msg. This would include the smiley reference, which would break. Akk? 4. Can you explain, why that's a bad idea? Don't tell me, it's used in ASCII-art. 5. "word" is defined in Section 3.3. ("(" may occur in the local part of an email address.)

Matthew Thomas, usability weenie

Comment 70

•

26 years ago

0. Because (a) Mozilla is open source; (b) along with Forte (Free/non-free) Agent, Mozilla is a popular choice for ASCII artists because it doesn't munge ASCII art, *yet*; and (c) posting there was the best way I could think of to solicit the opinions of the ASCII art community. I might be clever, but I can't necessarily think of every impact this smoke-and-mirrors stuff will have on ASCII art. The newsgroup as a whole has a better shot at being able to. Open source, y'see. 3. Having the styles inserted only for display, not for replying/forwarding, would solve the //italic// problem you described earlier, wouldn't it? 4. Because various clients use different symbols for quoting -- some use ">", some "> ", others use ": " or "| ", some let the user select the symbol. You're going to have a difficult job working out whether something's quoted or not, and the net result will probably be making it look *more* of a mess. There's a point, I think, where you've got to accept that most people who use plain-text mail are doing so because they *want* plain-text mail. If they want fancy formatting, they'll use HTML mail. So don't try to force too much fancy formatting on their plain-text messages. Linkifying: fine. Emboldening/ italicizing: ok. Colorizing: perhaps. Inserting, deleting, or changing characters: uh uh. Going too far.

Ben Bucksch (:BenB)

Assignee

Comment 71

•

26 years ago

> (a) Mozilla is open source Ah. Really? :-) (Note: This was just a joke.) Posting to alt.ascii-art leads to a biased result, because groups with other interests are not appropiately represented. We can't get a vote from the whole usenet before each feature. If I knew, to which discussion this feature would lead, I would not had started to implement it. And I don't know, if that is in the best interest of our users. > 4. Because various clients use different symbols for quoting -- some use ">", > some "> ", others use ": " or "| " The algorithm in Netscape Messenger 4.x works quite well. > There's a point, I think, where you've got to accept that most people who use > plain-text mail are doing so because they *want* plain-text mail. I disagree. I'm almost certain, most users user plain text mail only for compatibility reasons. If not, the web would be plain text plus links.

Akkana Peck

Comment 72

•

26 years ago

2. I don't much like the graphic smileys, but I'm sure someone will, and maybe I'd get used to them. I do think I would like the ability to have (R) turned into a real trademark symbol (but mozilla doesn't do trademark symbols on Unix, sadly, so until that bug is fixed, if we did that substitution we'd see nothing at all instead of the (R)). 3. If you see it in the mail window, then that's what the message actually contains as far as mozilla is concerned, and if you reply to it, that's what you'll be replying to and you have to trust to the output system to convert it back in a reasonable way. The smiley glyph is a good point -- there's no code there now to detect it and turn it back into a smiley. Maybe we need to add that. 4. Yes, plaintext quotes of recognized formats will be turned into blockquote cites. Currently, the only "recognized format" is a leading >. This shouldn't be a problem for people who use other quote characters (e.g. leading |) -- we'll just keep those as plaintext quotes just as in the no-substitution case. 4.x didn't understand quote characters other than "> " either (e.g. it didn't change them to the user-defined quote color and font), but there didn't seem to be many complaints about that. Re why people use ascii mode: Put me down as someone who uses plaintext for compatibility reasons. If really do get to the point where we have reliable substitution in both directions, I might embrace the new semi-ascii mode. But it seems fairly clear that we need to keep a "complete ascii" mode for people who prefer that mode, in which no substitution at all is done, and one can rely on ascii art, tables, etc. coming through unchanged. In fact, we should have an easy way of switching modes (something in the View menu, probably), so that if I normally use substitution mode but someone sends me something that abviously has ascii art in it and it's not displaying correctly, I can toggle a switch and see the original message untouched.

Akkana Peck

Comment 73

•

26 years ago

Ugh. We were just talking about smiley substitution on IRC, and I realized that this will totally break an idiom I use a lot: Some text (with a little joke :-) and some more text In other words, I use the ) in a smiley to close a parenthetical expression as well as to be the mouth of the smiley, because I don't like having two close-parens next to each other. Now people using mozilla will see all my parens as being unbalanced. :-( Parsing for paren balancing to see if a smiley is being used this way sounds nontrivial, though.

Matthew Thomas, usability weenie

Comment 74

•

26 years ago

Yes, I use the bracket-smiley combo too (like this:-). Which is one of the reasons I suggest a smiley be colorized, rather than converted to a graphic. The bottom line is that I have little problem with various styles being applied to certain strings, but I draw the line at actually changing the text. And I would apply the same principle to blockquote citing, because it'll do the wrong thing to this (for example): > > > IMPORTANT ANNOUNCEMENT!!! < < < Anyway, having a toggle item in the View menu for `Smart Styling' sounds like a good idea, as long as its value is persistent (it stays the same between messages and between sessions).

Ben Bucksch (:BenB)

Assignee

Comment 75

•

26 years ago

Akk, 3. I don't think, I like the way we create HTML replies. Preparing display and creating content are different tasks. We *have* to convert the smily substitution back: I don't think, Outlook Express will be able to use the "chrome://" URLs - data loss bug. (R) is a similar problem. If we currently don't display ®, how can we trust, that all recipients do? Structures phrases are not that bad, since I don't remove content anymore, but if we misstyle quotes, others could smile about us and our users; confusion and flames are possible reactions, too. 4. I have no problem with "> > > foo < < <" being interpreted as quote: this is because the sender ignored widely used internet rules and no real content is lost. > Now people using mozilla will see all my parens as being unbalanced. :-( They *are* unbalanced. Matthew Thomas, I think, what you want is a pref. The idea behind the toggle-menuitem/-icon is exactly the per-message basis.

Matthew Thomas, usability weenie

Comment 76

•

26 years ago

When replying, Moilla shouldn't be converting the converted text->HTML back to some semblance of the original text; it should be using the exact text from the message source. Wouldn't that avoid a whole lot of hassle with data loss or corruption? The text->HTML conversion should be used only for display purposes, IMO. 4. I don't think there's such a thing as a `widely used Internet rule' for quoting. Sure, make >ed text smaller, italic, green, or whatever. But 3 + 2 > 4 ... which is why you should leave the > symbol there, because it *might* be being used for something other than quoting, as I just showed in that equation. Just like the *asterisks* or the /slashes/ *might* be being used for something other than emphasis. And yes, Ben, I do want a pref. And I do want it in the View menu, not hidden away in the prefs dialog, for the same reason rot13 belongs in the View menu and not in the prefs dialog -- because it's something you generally need instant access to.

Ben Bucksch (:BenB)

Assignee

Comment 77

•

26 years ago

Akk, "(bla :))"-problem: there was a discussion on alt.ascii-art, thread "Smiley-face query" <URL:http://www.deja.com/viewthread.xp?search=thread&recnum=%3c68j7dn$ccq@tron.sci.fi%3e%231/1> about this. One quote: "In any case, it'd say we can all agree on the following points: [...] Emoticons cannot close a parenthesis." (<URL:http://x21.deja.com/getdoc.xp?AN=312573064>).

Ben Bucksch (:BenB)

Assignee

Comment 78

•

26 years ago

I've changed the smily detection code (in my tree) to avoid problems with ":-))" etc. It searches after the following pattern: SPACE - Smily [- [.|,|;]] - WHITESPACE. "WHITESPACE" means nsString::IsSpace return true, "[- [.|,|;]]" mean, that optionally either ".", "," or ";" may appear after the smily (and stay in the msg). Everything else is ignored. Any objections?

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Depends on: 18718

Ben Bucksch (:BenB)

Assignee

Comment 79

•

26 years ago

The latter was a bit ambiguously. My most recent changes avoid problems with *smilies* like " :-)) ", " :-(( " etc., not the "Bla (bla :-) bla." (instead of "Bla (bla :-)) bla.") problem. I created bug #18718 and a dependency for the "graphical smily etc. in reply" problem.

Ben Bucksch (:BenB)

Assignee

Comment 80

•

26 years ago

Attached patch [Preliminary] GlyphSubstitution, code, quote, class attributes — Details — Splinter Review

Ben Bucksch (:BenB)

Assignee

Comment 81

•

26 years ago

Thanks to Daniel Bratell for pointing me to the (original) post. >> Warren Harris wrote: >>> We probably need an extensible way (based on protocols modules) to >>> recognize strings as URLs. I don't know >>> whether that needs to be a special method, or whether your code can just >>> look for "<alphanumeric>*:" up to the >>> next whitespace character, and then just try to construct a URL from it. >>> If it succeeds, then highlight it, if not, don't.> Ben Bucksch wrote: Warren Harris wrote: > Ben Bucksch wrote: > > "<alphanumeric>*:<non-whitespace>*" triggers far too often. > > Users already complain, that "file://" urls are turned into links. > > > > Do you have an idea, how to make this dynamic? > > Yes, I think you/they should look for the pattern suggested above, and then try calling > NS_NewURI. This should only succeed if the protocol exists, and the string is a > syntactically valid URL. > > I guess we should special-case file: since that's almost always a link to the sender's > machine, and not the receiver's. Alternatively, we could call the nsIFileChannel::Exists > method and only highlight the URL if it's there. I like this idea. Since the main purpose of nsMimeURLUtils is this recognition, I started rewriting the class.

Ben Bucksch (:BenB)

Assignee

Comment 82

•

26 years ago

WOW: txt2html.pl <http://www.thehouse.org/txt2html/>

chris hofmann

Updated

•

26 years ago

Target Milestone: M12 → M14

chris hofmann

Comment 83

•

26 years ago

m14. let me know if there are more changes ready for this in in the next couple of days and we can see about getting them into m12. maybe this is even post beta1?

Ben Bucksch (:BenB)

Assignee

Updated

•

26 years ago

Status: ASSIGNED → RESOLVED

Closed: 26 years ago

Resolution: --- → FIXED

Target Milestone: M14 → M12

Ben Bucksch (:BenB)

Assignee

Comment 84

•

26 years ago

chofmann, it has been checked in together with bug #19251 recently. M12 FIXED.

Ben Bucksch (:BenB)

Assignee

Comment 85

•

25 years ago

Docs at <http://www.bucksch.org/1/projects/mozilla/>

URL: http://www.bucksch.org/1/projects/moz...

lchiang

Comment 86

•

25 years ago

I think it's safe to mark this verified. The code is there. Any specific bugs we find will be filed separately. asj@ipa.net has graciously offered to test this feature. He has started writing tests at: http://www.mozilla.org/quality/mailnews/tests/mn-html-to-txt.txt

Status: RESOLVED → VERIFIED

QA Contact: lchiang → asj

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

Doug Hockin

Comment 88

•

16 years ago

My Bug 522893 was marked as a duplicate. In Bug 522893 it was recommended that I try out: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.1/ Which I just did. Upon install it "imported" my existing mail folders. The emails that had the problem in TBird 2 still have it in 3. I'll attach a screen shot from one of them. Doesn't appear fixed to me. Unless it somehow has to do with message storage format on disk and I need to test with freshly received messages?

Doug Hockin

Comment 89

•

16 years ago

Attached image Still truncates URLs in old saved messages — Details

Screen shot of old saved (TBird 2) mail message that still has truncated URL when viewed in Tbird 3.

Ben Bucksch (:BenB)

Assignee

Comment 90

•

16 years ago

This bug is FIXED. Bug 522893 is not a duplicate, I'll reopen the latter.

conversion bold -> strong, _italic_ -> em 26 years ago Ben Bucksch (:BenB) 10.12 KB, patch		Details \| Diff \| Splinter Review
Rewritten using nsString, bugs fixed 26 years ago Ben Bucksch (:BenB) 12.27 KB, patch		Details \| Diff \| Splinter Review
Fixes a leak in last patch. 26 years ago Ben Bucksch (:BenB) 3.87 KB, patch		Details \| Diff \| Splinter Review
Some (more) glyph substitution and exponents 26 years ago Ben Bucksch (:BenB) 5.45 KB, patch		Details \| Diff \| Splinter Review
damage done to ASCII art 26 years ago Matthew Thomas, usability weenie 1.93 KB, text/plain		Details
[Preliminary] GlyphSubstitution, code, quote, class attributes 26 years ago Ben Bucksch (:BenB) 23.07 KB, patch		Details \| Diff \| Splinter Review
Still truncates URLs in old saved messages 16 years ago Doug Hockin 9.88 KB, image/png		Details