Open Bug 780365 Opened 12 years ago Updated 1 year ago

Mailto link in HTML email becomes duplicated e-mail address in plain text email

Categories

(Thunderbird :: Message Compose Window, defect)

14 Branch
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: theant, Unassigned)

References

Details

(Keywords: polish, regression, testcase)

Attachments

(10 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Build ID: 20120713224749

Steps to reproduce:

Replied to an email.


Actual results:

When composing messages in plain text mode to recipients in other email systems (notably Gmail but many others as well), if the message is a reply, then Thunderbird will automatically duplicate all links and email addresses within the body of the previous messages.  For example, "me@my.com" becomes "me@my.com <mailto:me@my.com>"; and "www.site.com" becomes "www.site.com <http://www.site.com>".

This is nonsensical in the first place because, since we're in plaintext, those <mailto>s and <http>s will never actually be clickable.  But it's worse than that, because Thunderbird continues to duplicate these items again on every subsequent reply, thus causing the size of the message to grow exponentially with every reply: "www.site.com <http://www.site.com>" becomes "www.site.com <http://www.site.com> <http://www.site.__com <http://www.site.com>>" (yes, Thunderbird starts randomly inserting underscores too...).

Below is a typical example of this, showing what happens after just 3 replies (6 messages total).  Note especially the final gigantic chunk at the bottom: this is the original message, which started out as simply: "This is a test from me@gmail.com to you@other.com regarding www.thatnicewebsite.com and other matters."


#------------------------------

On 08/04/2012 03:54 AM, Gmail User wrote:
> Here's the third reply from gmail.
> 
> 
> On Sat, Aug 4, 2012 at 3:54 AM, Anthony DiSante <you@other.com 
> <mailto:you@other.com>> wrote:
> 
>     Here comes the third reply from Thunderbird.
> 
>     --
>     Anthony DiSante
> 
> 
>     On 08/04/2012 03:53 AM, Gmail User wrote:
> 
>         And here's a second reply from gmail.
> 
> 
>         On Sat, Aug 4, 2012 at 3:52 AM, Anthony DiSante <you@other.com
>         <mailto:you@other.com>
>         <mailto:you@other.com <mailto:you@other.com>>__> wrote:
> 
>              This is a second reply from Thunderbird...
> 
>              --
>              Anthony DiSante
> 
> 
>              On 08/04/2012 03:52 AM, Gmail User wrote:
> 
>                  This is a reply from gmail.
> 
> 
>                  On Sat, Aug 4, 2012 at 3:48 AM, Anthony DiSante
>         <you@other.com <mailto:you@other.com>
>                  <mailto:you@other.com <mailto:you@other.com>>
>                  <mailto:you@other.com <mailto:you@other.com>
>         <mailto:you@other.com <mailto:you@other.com>>__>__> wrote:
> 
>                       This is a reply from Thunderbird.
> 
>                       --
>                       Anthony DiSante
> 
> 
>                       On 08/04/2012 03:47 AM, Gmail User wrote:
> 
>                           Hello,
> 
>                           This is a test from me@gmail.com
>         <mailto:me@gmail.com> <mailto:me@gmail.com <mailto:me@gmail.com>>
>                  <mailto:me@gmail.com <mailto:me@gmail.com>
>         <mailto:me@gmail.com <mailto:me@gmail.com>>>
>                           <mailto:me@gmail.com <mailto:me@gmail.com>
>         <mailto:me@gmail.com <mailto:me@gmail.com>>
>                  <mailto:me@gmail.com <mailto:me@gmail.com>
>         <mailto:me@gmail.com <mailto:me@gmail.com>>>> to you@other.com
>         <mailto:you@other.com>
>                  <mailto:you@other.com <mailto:you@other.com>>
>                           <mailto:you@other.com <mailto:you@other.com>
>         <mailto:you@other.com <mailto:you@other.com>>>
>                           <mailto:you@other.com <mailto:you@other.com>
>         <mailto:you@other.com <mailto:you@other.com>>
>                  <mailto:you@other.com <mailto:you@other.com>
>         <mailto:you@other.com <mailto:you@other.com>>>> regarding
>         www.thatnicewebsite.com <http://www.thatnicewebsite.com>
>         <http://www.thatnicewebsite.__com <http://www.thatnicewebsite.com>>
>                  <http://www.thatnicewebsite.____com
>         <http://www.thatnicewebsite.__com <http://www.thatnicewebsite.com>>>
>                           <http://www.thatnicewebsite.______com
>                  <http://www.thatnicewebsite.____com
>         <http://www.thatnicewebsite.__com <http://www.thatnicewebsite.com>>>>
>                           and other matters.
> 
>                           Thanks,
>                           Anthony
> 
> 
> 
> 

#------------------------------



Expected results:

Thunderbird does not modify/duplicate any links nor email addresses.
IIRC, same phenomenon was reported to a bug.
Mechanism was;
(1) Link in HTML is: <a href="URL">Link-text-for-URL</a>
    For example, <a href="http://www.google.com">www.google.com</a>
    Please note that "www.google.com" is not URL. It's merely a text string.
(2) In Inline Forward of Tb in Text mode, Link is represented as :
       Link-text-for-URL : <URL>
    This is for keeping both Link-text-for-URL and URL written in HTML mail.
    Note: Both Link-text-for-URL and URL is merely a string because of text mode.
    For above example, www.google.com <http://www.google.com>
(3) Because of Text mode, mail is sent in text/plain.
(4) If recipient of this mail views this mail by Thunder bird,
    Thunderird shows TEXT STRING of http://www.google.com as "Clickable link"
    even though this is text/plain mail and the mail is always shown as Text.
    This is done by;
       text/plain mail display of Tb utilizes HTML(XUL) internally.
       Tb shows string of URL format as Link internally, 
          <a href="string of URL format>string of URL format</a>
       for user's convenience.
    Note: Original TEXT STRING of http://www.google.com is still merely a string.
(5) When the text/plain mail is received by a Web mail,
      Content-Type: text/plain
      A TEXT STRING in the text/plain mail :
        www.google.com <http://www.google.com>
    if the Web mail composes reply mail in HTML mode,
    the Web mail converts TEXT STRING of http://www.google.com
    to HTML link in HTML mail.
        www.google.com (kept because it's a text string)
        &lt; (because < is a text string)
        <a href="http://www.google.com">http://www.google.com</a>
        &gt; (because > is a text string)
(6) When this text/html mail generated by a Web mail is received by Tb,
    and when Tb user requests "Forward inline" in Text mode,
    Tb does do same thing as step (2) on actual HTML link in HTML mail.
       www.google.com
       < (kept because &lt; is a text string)
       http://www.google.com <http://www.google.com>
       > (kept because &lt; is a text string)
(7) If this mail is received by the Web mail, same thing as step (5) occurs.
(8) Thus, number of link in HTML mail is increased, and step (2) & step (5)
    is repeated on each increased HTML links by Forwarding.

Gmail looks to execute step (5) on text/plain mail.
No way to prohibit Gmail's "auto-linkify of URL format string"?
Does your problem occur even in Text mode composition of Gmail?

A simple workaround by Tb user is "Forward in HTML mode".
  HTML link in original HTML mail is represented as HTML link in Forward too.
  So, Gmail won't produce excess <a> tags by "auto-linkify of URL format string".
A possible improvement in Tb to avoid this kind of problem.
  If URL==Link-text-for-URL in <a href="URL">Link-text-for-URL</a>,
  don't generate <URL> part upon text conversion.
  It's sufficient for Tb's "linkify of URL format string in Text mode display".
> Bug summary : Thunderbird makes a mess of links and email addresses(snip)

Who increased number of HTML link is never Tb. How can Tb increase number of HTML link even though Tb composes mail in Plain Text mode and sends it as text/plain mail?
Who increased number of HTML link and produced the mess is definitely Gmail, although Tb's "HTML link to text conversion" is relevant to the phenomenon of "mess".
>> Bug summary : Thunderbird makes a mess of links and email addresses(snip)
>
> Who increased number of HTML link is never Tb.

Yes, it is Thunderbird.  When I send an email from Gmail to Thunderbird containing the sentence "The website is www.foo.com", then Thunderbird displays that sentence correctly (exactly as it was written), until I hit the Reply button, at which point, Thunderbird changes it to "The website is www.foo.com <http://www.foo.com>".

> How can Tb increase number of HTML link even though Tb composes mail in
> Plain Text mode and sends it as text/plain mail?

I think you're talking about a different problem/bug, because in what I'm describing here, Thunderbird is not creating HTML links -- it's just duplicating the text and then putting angle-brackets around it.
> Yes, it is Thunderbird. (snip)

How can any mailer including Tb increase number of "HTML link" in TEXT/PLAIN mail?
Have you checked message source of each mail sent from Gmail and each mail sent by Tb?

If HTML mail of <a href="href-text">link-text-for-URL</a>, Tb shows clickable link of "link-text-for-URL" when HTML mode display, and Tb may show one of next if Plain Text mode display(linkify of Tb in text mail display).
  Clickable href-text,
  Clickable link-text-for-URL,
  or Clickable link-text-for-URL + href-text + something else
If multipart/alternative with text/plain+text/html in it, Tb shows text/plain part when Text Plain mode display, so displayed string may be different from HTML mode display which uses text/html part.

With which View/Message Body As?
With which kind of mail?

> what I'm describing here, Thunderbird is not creating HTML links

You requested Text mode composition for Reply or Forward to Tb explicitly or via. Tb's setting, so Tb started to compose Reply or Forward mail in text mode and sent as text/plain mail. Is it wrong?
How can any mailer including Tb create HTML link in text/plain mail? 

> it's just duplicating the text and then putting angle-brackets around it.

As I already stated, quoted text of "link-text-for-URL <href-text>" in TEXT/PLAIN mail by Tb comes from <a href="href-text">link-text-for-URL</a> in original text/html mail. This is for keeping both string of link-text-for-URL & href-text even in text/plain mail, because, as you know, link-text-for-URL is never href-text(==href attribute value of <a> tag, actual URL of the link).
And this is current design/implementation in Tb.

Problem in it is "it looks duplicating if link-text-for-URL==href-text" only, isn't it?
Are you claiming that it should be link-text-for-URL part only or href-text part only?

> I think you're talking about a different problem/bug, because in what I'm
> describing here, Thunderbird is not creating HTML links -- it's just
> duplicating the text and then putting angle-brackets around it

Actual problem in this bug is "mess", isn't it?
The "mess" is produced by repeated combination of following, which are current implementation of Tb and Gmail, isn't it?
(i)  Tb generates string of "link-text-for-URL <href-text>" in TEXT/PLAIN mail,
     from each <a href=href-text>link-text-for-URL</a> in text/html mail.
(ii) Gmail generates HTML link of <a href="href-text">href-text</a> in text/html
     mail, from each TEXT STRING of href-text in TEXT/PLAIN mail.
     Note:
     Gmail perhaps generates <a href="http://www.google.com">www.google.com</a>
     also when TEXT STRING of "www.google.com" is typed at composition panel.
     So, Gmail may generate <a href="protocol://part-of-URL">part-of-URL</a>  
     for each TEXT STRING of part-of-URL in TEXT/PLAIN mail.
     (<a href="http://www.google.com">www.google.com</a> for www.google.com)

If this bug is for phenomenon produced by above (i) and (ii), I believe main cause of mess in this bug is "creating HTML link always from each URL-string(or URL-like-string) which is merely a text string in text/plain mail, even though merely text string in text/plain mail" by Gmail. Thunderbird is merely a super-powerful helper of the mess :-)
>> Yes, it is Thunderbird. (snip)
> 
> How can any mailer including Tb increase number of
> "HTML link" in TEXT/PLAIN mail?

I didn't say "HTML link".  I said that Thunderbird turns "www.site.com" into "www.site.com <http://www.site.com>".  That is NOT an HTML link.

> Problem in it is "it looks duplicating if
> link-text-for-URL==href-text" only, isn't it?

That might be correct.

> Are you claiming that it should be link-text-for-URL
> part only or href-text part only?

When the two are identical, it doesn't matter which part Thunderbird chooses.  And from the user's point of view, there are not actually two parts: he typed only "www.site.com", so that is what it should be.  He didn't type any angle-brackets.  He didn't type any HTML.

Now, maybe Gmail *did* convert "www.site.com" into "<a href='http://www.site.com'>www.site.com</a>".  But Thunderbird can't control what Gmail does.  It can only control its own conversions.  And my contention is that converting either "www.site.com" *or* "<a href='http://www.site.com'>www.site.com</a>" into "www.site.com <http://www.site.com>" is the wrong thing to do in a plain text email, and it makes Thunderbird's plain text mode extremely annoying and difficult to use, since messages quickly become huge in size and very difficult to read.
(In reply to Anthony DiSante from comment #5)
> And my contention is that converting either "www.site.com"
> *or* "<a href='http://www.site.com'>www.site.com</a>" into "www.site.com  <http://www.site.com>"
> is the wrong thing to do in a plain text email,

When original is HTML link of <a href="http://www.sony.com">Jump to Sony Site</a> in an HTML mail,
(A) if quoted text by Reply/Forward mail in text/plain is "Jump to Sony Site" only, actual URL of http://www.sony.com" is lost,
and (B) if quoted text is "http://www.sony.com" only, original link text of the HTML link, text of "Jump to Sony Site", which original HTML mail sender wanted to show, is lost.
If both "Jump to Sony Site" and "http://www.sony.com" is contained in quoted text in text/plain mail, 
(a) "Jump to Sony Site" which original HTML mail sender wanted to show is not lost,
and (b) original URL of <a>, to whick original mail sender wanted to jump by link click, is not lost too.
And, (c) some mailers including Tb provide an easy way(for mail recipient of the text/plain mail) to jump to an URL if text string in text/plain mail is string used by URL such as http://www.google.com.

Reason why quoted text is currently "Jump to Sony Site <http://www.sony.com>" :
  - Developers don't want any of (A) and (B). 
  - Developers want both (a) and (b).
  - Because of (c), string of http://www.sony.com in text/plain mail is
    convenient for users who receives the text/plain Forward/Reply mail.
  - Many Tb users(needless to say, Netscape users, Mozilla users, Seamonkey users
    too) accepted it as reasonable one for long time.
  - "<" and ">" is popular as "(" & ")", "[" & "]", "{" & "}" is popular
    in referring etc.

What is base of your "wrong thing in a plain text email"?
What is "correct thing" in "quoted text by Reply/Forward mail in text/plain mail, when original in HTML mail is <a href="http://www.sony.com">Jump to Sony Site</a>?
If your correct thing exists, what is base of your "correctness"?
What do you want as "quoted text by Reply/Forward mail in text/plain mail, when original in HTML mail is <a href="http://www.sony.com">Jump to Sony Site</a>?
What do you think about most convenient one as "quoted text by Reply/Forward mail in text/plain mail, when original in HTML mail is <a href="http://www.sony.com">Jump to Sony Site</a>" for many mail users?

> and it makes Thunderbird's plain text mode extremely annoying and difficult to use, (snip)

Actualy "extremely annoying and difficult to use" in many plain/text mails of many Tb users?

> since messages quickly become huge in size and very difficult to read.

Main culprit of "huge size" and "dificulty to read" is Gmail who generated excess HTML link, isn't it?
(1) Tb user receives an HTML mail                 : <a href="http://...">X<a>
(2) Tb user sends reply/forward as text/plain mail: X <http://...>
    There is no problem at this stage.
    If Gmail is not involved, no problem will happen here after.
(3) Gmail user sends reply/forward as text/html mail :
>   X &lt;<a href="http://...>http://...<a>&gt;
(4) Tb user sends reply/forward as text/plain mail   :
>   X < http://... <http://...> >
(5) Gmail user sends reply/forward as text/html mail :
>   X &lt; <a href="http://...">http://...</a> <a href="http://...">http://...</a> &gt; &gt;
(6) If "Tb user sends reply/forward as text/plain" and "Gmail user sends reply/forward as text/html" is repeated, excess HTML link in HTML mail is increased by Gmail.

There is known way to stop above infinite "mess".
  At Tb's step (2), send reply/forward in text/html.
  Because quoted text in Tb's HTML reply/forward mail is following,
    <a href="http://...">X<a>
  Gmail won't generate excess HTML link.
  Because Gmail won't generate excess HTML link, and because Tb never produces
  excess HTML link in text/html mail, problem doesn't occur.
This is "workaround by Tb user" I already mentioned in comment #1.
Tb can't control Gmail's behaviour, but Tb user can avoid problem due to Gmail by controling Tb's behavior.

Even if Gmail continuously produces excess HTML link, if Tb sends "http:www.google.com" only or "<http:www.google.com>" only intead of current "http:www.google.com <http:www.google.com>" in text/plain mail for <a href="http:www.google.com">http:www.google.com</a>(link URL==link text) in a HTML mail, above "infinite increment of HTML links by Gmail" is avoided.
This is "possible improvemnt in Tb" in my comment #1.
In this case, "http:www.google.com only" is better than "<http:www.google.com only> only" because increment of "<" and ">" is not avoided by "<http:www.google.com only>".

However, if Gmail generates <a href="http://www.google.com">www.google.com</a> in text/html mail for text string of www.google.com in text/plain mail, "infinite increment of HTML links by Gmail" can not be avoided by such improvement in Tb. This reason why "possible".

Do you know Gmail's behavior on "text string of www.google.com in text/plain mail"?
FYI.

Second way to stop "infinite increase of HTML link by Gmail" is found.
  At Gmail, send reply mail in text/plain only from Gmail.
    Gmail also can't create HTML link in text/plain mail.
    If text/plain mail, Tb can't do "Reply in text/plain to HTML mail"
  with any View/Message Body As.
Third way to stop "infinite increase of HTML link by Gmail" is also found.
  At Tb, view Reply mail from Gmail with View/Message Body As/Plain Text.
    Gmail can't create HTML link in text/plain part under multipart/alternative.
    Because Tb uses text/plain part under multipart/alternative when View/Message
    Body As/Plain Text, Tb doesn't do "Reply in text/plain to HTML mail".
  Note: This is not applicable if Gmail sends text/html only.

(0) Original HTML mail, with http: link and mailto: link.
> <html>
>   <head>
>     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
>   </head>
>   <body bgcolor="#FFFFFF" text="#000000">
>     Hello<br>
>     <br>
>     <a href="http://www.google.com">www.google.com</a><br>
>     <a href="mailto:abc@x.y.z">abc@x.y.z</a><br>
>   </body>
> </html>

(1) text/plain Reply mail by Tb to the HTML mail
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> 
> On 2012/08/06 08:26, x-01@x.x.x wrote:> Hello
>  >
>  > www.google.com <http://www.google.com>
>  > abc@x.y.z <mailto:abc@x.y.z>
> 
> Reply in text/plain by Tb

(2) Reply mail by Gmail to the text/plain Reply from Tb
> --f46d044472b998232c04c68e307f
> Content-Type: text/plain; charset=UTF-8
> 
> On Mon, Aug 6, 2012 at 8:34 AM, <y-01@x.x.x> wrote:
> 
>> On 2012/08/06 08:26, x-01@x.x.x wrote:> Hello
>> >
>> > www.google.com <http://www.google.com>
>> > abc@x.y.z <mailto:abc@x.y.z>
>>
>> Reply in text/plain by Tb
>>
> 
> Reply by Gmail #2
> 
> --f46d044472b998232c04c68e307f
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
> 
> <div class=3D"gmail_quote">On Mon, Aug 6, 2012 at 8:34 AM,  <span dir=3D"lt=
> r">&lt;<a href=3D"mailto:y-01@x.x.x" target=3D"_blank">y-01@x.x.x</a>&gt;</=
> span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
> x;border-left:1px #ccc solid;padding-left:1ex">
> On 2012/08/06 08:26, x-01@x.x.x wrote:&gt; Hello<br>
> &gt;<br>
> &gt; <a href=3D"http://www.google.com" target=3D"_blank">www.google.com</a>=
>  &lt;<a href=3D"http://www.google.com" target=3D"_blank">http://www.google.=
> com</a>&gt;<br>
> &gt; abc@x.y.z &lt;mailto:<a href=3D"mailto:abc@x.y.z" target=3D"_blank">ab=
> c@x.y.z</a>&gt;<br>
> <br>
> Reply in text/plain by Tb<br></blockquote><div><br>Reply by Gmail #2 <br></=
> div></div><br>
> 
> --f46d044472b998232c04c68e307f--
Adding "Gmail's Web mail" to bug summary to state phenomenon correctly.
If this bug is only for "wrong thing" in Tb's plaintext mode behaviour and is request of its correction, and if changed bug summary is wrong, correct it, please.
Summary: Thunderbird makes a mess of links and email addresses when composing replies in plaintext mode → "Gmail 's Web mail + Thunderbird" can easily make a mess of links and email addresses when Tb user composes replies to HTML mail in plaintext mode
This behavior is not Gmail-specific.  It occurs when replying in plain text to any HTML email regardless of sender.  I've corrected the summary to reflect this.
Summary: "Gmail 's Web mail + Thunderbird" can easily make a mess of links and email addresses when Tb user composes replies to HTML mail in plaintext mode → Thunderbird makes a mess of links and email addresses when composing replies to HTML mail in plaintext mode
> What is base of your "wrong thing in a plain text email"?
> What is "correct thing" in "quoted text by Reply/Forward
> mail in text/plain mail, when original in HTML mail is
> <a href="http://www.sony.com">Jump to Sony Site</a>?

That is not what this bug is about.  Preserving both the URL and the text is probably the correct behavior when the URL and the text are actually different.  But this bug is about the situation where the URL and the text are the same, and the situation where the URL is actually a mailto: containing an email address which is also the content of the displayed link text (so, again, the two are the same).

> Main culprit of "huge size" and "dificulty to read" is Gmail
> who generated excess HTML link, isn't it?

It would be nice if Gmail didn't turn plain text URLs into HTML links.  But that's only part of the problem, and this bug report is about the bug in Thunderbird.  Regardless of who generated the HTML email (Gmail or someone else), it is Thunderbird that is needlessly generating excess plain text when it turns "<a href='http://www.site.com'>www.site.com</a>" into "www.site.com <http://www.site.com>".
I see.

(case-1) <a href="href-URL">link-text-for-URL</a>
(case-1-a) link-text-for-URL = simply a text which Gmail doesn't consider link
  "link-text-for-URL <href-URL>" is not acceptable,
   because Gmail generates following from it,
     link-text-for-URL &lt;<a href="href-URL">href-URL</a>&gt;
   and Tb generates following, even if Tb will generate only "href-URL" when
   href-URL==link-text-for-URL.
      link-text-for-URL <href-URL>
   This is applicable to any character other than < and >.
   So, when string of href-URL is used in quoting, heading character, trailing
   character can't be added due to behavior of Gmail and Tb.
(case-1-b) link-text-for-URL = href-URL
           <a href="http://www.abc.com">http://www.abc.com</a>
   If Tb's change of "http://www.abc.com only" will be implemented,
   Tb's quoted text is "http://www.abc.com", so Gmail generates
   <a href="http://www.abc.com">http://www.abc.com</a> only,                   
   then problem with Gmail won't occur.
(case-1-c) link-text-for-URL = subpart of URL
           <a href="http://www.abc.com">www.abc.com</a>
   If Tb's change of "www.abc.com http://www.abc.com"(no additional string),
   Gmail generates both
     <a href="http://www.abc.com">www.abc.com</a>
     and
     <a href="http://www.abc.com">http://www.abc.com</a> 
   Even after such changes, Tb generates following from it.
     www.abc.com http://www.abc.com http://www.abc.com
   Because Gmail generates following from it,
     www.abc.com <a href="http://www.abc.com">http://www.abc.com</a>
                 <a href="http://www.abc.com">http://www.abc.com</a>
   further increase of HTML link will be avoided, if Tb is changed.

However, one excess URL is added by "Tb and Gmail", even after above changes of Tb. What do you think about it?

Condition may be enhanced to "if link-text-for-URL in <a href="href-URL">link-text-for-URL</a> is substring of href-url, use href-URL only".
But, in this case, string of abc in <a href="http://www.abc.com">abc</a> is lost.
What do you think about it?

If generated string by Tb for <a href="href-URL">link-text-for-URL</a> is "link-text-for-URL href-URL"(no heading char & no trailing char of href-URL), readability is worse than current.
What do you think about it?

(case-2) <a href="mailto-link">link-text-for-mailto</a>
(case-2-a) <a href="mailto:abc@x.y.z">link-text-for-mailto</a>
If Tb generates string of "mailto:abc@x.y.z" in text/plain mail,
Gmail generates following.
  mailto:<a href="mailto:abc@x.y.z">abc@x.y.z</a>
So, Tb can't use string of mail-to-link in <a href="mailto-link"> when quoting in textplain mode Reply.
(case-2-b) <a href="mailto:<abc@x.y.z>">link-text-for-mailto</a>
It probably worse than (case-2-a)
(case-2-c) <a href="mailto:<abc@x.y.z>,<def@x.y.z>, ...">link-text-for-mailto</a>
it perhaps worse than any.

Even when <a href="mailto:abc@x.y.z">Mail me</a>, Tb can't use href-URL in a HTML link due to behavior of Gmail.
What do you think about mailto: link case?
Please note that string like "mailto:abc@x.y.z" in text/plain mail is required to utilize Tb's and some other mailer's "linkify of URL etc. in text/plain mail display".

Above is main reason why I asked you about "what is WRONG in Tb", "what is correct thing", "what is base of your correctness", and "what do you want to Tb".
(A) As for <a href="http://www.google.com">http://www.google.com</a>, "http://www.google.com <http://www.google.com>" is very bad, rather WRONG, and it should be "http://www.google.com". This is mandatory to avoid unwanted problems like "infinite increase of HTML links by Thunderbird+Gmail".
(B) For <a href="http://www.google.com">www.google.com</a>(link-text is FQDN in href-URL), "http://www.google.com only" is sufficient and better than current "www.google.com <http://www.google.com>". This is needed to avoid unwanted problem like "HTML link generation for www.google.com by Gmail".
(C) For <a href="mailto:abc@x.y.z">abc@x.y.z</a>(link-text is same as mail address in href-URL), "abc@x.y.z only without mailto: part" is sufficient for users and for Tb's linkify of mail-address in text/plain display. This is needed to avoid unwanted problems like "infinite increase of string of mailto: by Thunderbird+Gmail".

(B)/(C) may be enhancement request, but I think (A) can be called "Tb's bug".
Confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All
Summary: Thunderbird makes a mess of links and email addresses when composing replies to HTML mail in plaintext mode → Thunderbird makes a mess of links and email addresses when composing replies to HTML mail in plaintext mode(Duped string of "http://x <http://x>" by Tb for <a href="http://x">http://x</a> is wrong. If <a href="mailto:a@x">a@x</a>, "a@x" only is needed.)
> Condition may be enhanced to "if link-text-for-URL in
> <a href="href-URL">link-text-for-URL</a> is substring of
> href-url, use href-URL only".  But, in this case, string
> of abc in <a href="http://www.abc.com">abc</a> is lost.

I think it can be simpler than that, for example:

if($link_url == $link_text || $link_url == "http://$link_text")
{
    # then just use $link_text
}
else
{
    # use $link_text <$link_url> (current behavior)
}

Links that use https:// or ftp:// would fall into the else{} so the behavior for them is unchanged.  But the behavior for http:// links (which is probably the vast majority of them) would be much improved.

And, again, this is not really a Thunderbird+Gmail bug.  It's a Thunderbird+HTML_email bug.  I've seen this behavior many times over the past few years from many email accounts, not all of which were Gmail.  I just tried it with an HTML email generated by MIME::Lite as a test, and verified that Thunderbird does the same thing (i.e. the bug is present).  I think the MUA is irrelevant; all that's necessary is that the message be in HTML rather than plain text.
(In reply to Anthony DiSante from comment #13)

I agree with you on simple logic for http and on "the vast majority of (them) would be much improved."
But I prefer;
> if( $link_url == $link_text || 
>     $link_url ==  "http://$link_text" ||
>     $link_url == "https://$link_text" ||
>     $link_url ==   "ftp://$link_text" || 
>     $link_url ==  "mailto:$link_text"  
>   ) // Note: "==" in above is case insensitive comparison

Problem of your proposal.
If "www.google.com" is generated from <a href="http://www.google.com">www.google.com</a>, string of "www.google.com" is not linkified when the text/plain mail is displayed by Tb.
This is apparent regression from current by the change.
Can you accept "http://www.google.com" for <a href="http://www.google.com">www.google.com</a>? 

In mailto: case, from "mailto:a@x.x.x,b@x.x.x,c@x.x.x" by Tb, Gmail generates 
'mailto:<a href="mailto:a@x.x.x">a@x.x.x</a>,b@x.x.x,c@x.x.x'.
Gmail generates mailto: link only for (i) first mail-addr like string (ii) just after "mailto:" (iii) without name part of mail address spec (iv) without sorounding "<" & ">" of mail address.
By above change, Tb generates 'mailto:a@x.x.x,,b@x.x.x,c@x.x.x' from mailto: link by Gmail, so infinite increase of string of "mailto:" by Gmail+Thunderbird is avoided even if "mailto:a@x.x.x,b@x.x.x?cc=p@x.x.x,?subject=..." was initially generated by Tb from a mailto: link in a HTML mail.
Second reason to remove string of "mailto:".
  - Tb's "linkify of mail-addr in text/plain mail display" is for text pattern of
    a@x.x.x,b@y.y.y,c@z.z.z instead of for entire string of "mailto:...".
  - For "aa <a@x.x.x>, bb <b@y.y.y>, cc <c@z.z.z>", each "a@x.x.x", "b@y.y.y",
    "c@z.z.z" is linkified.
  - If "mailto:aa <a@x.x.x>", "aa" and "a@x.x.x" is linkified.
  So, removal of mailto: won't produce regression in linkify of Tb.
Third reason is that "xxx wrote: ... a@x.y.z ..." from '... <a href="mailto:a@x.y.z">abc@x.y.z</a> ...' in HTML is acceptable.
Randomly inserted underscores (mentioned in comment 0): Bug 416222?
See Also: → 662696
Thomas, inserting underscores doesn't cause the exponential linkification to stop. For example, now I see:

https://bugzilla.mozilla.org/____show_bug.cgi?id=662696
        <https://bugzilla.mozilla.org/__show_bug.cgi?id=662696>
                 <https://bugzilla.mozilla.org/__show_bug.cgi?id=662696
        <https://bugzilla.mozilla.org/show_bug.cgi?id=662696>>

Like Anthony and WADA had been discussing, you probably will have to check if links have already been converted when the conversion from HTML to text takes place and not do the conversion again if so.
Any progress on this?

I searched the internet and found this, finally.
This bug is really annoying. TB31 is out and it is still present.

I created an e-mail in GMail, these 4 situations and only the last one 
is problematic for me.
link with text> test <http://test.com>
link only> http://test.com
mail link with text> mail <mailto:a@b.cz>
mail only> a@b.cz

It duplicates e-mail links as it adds "<mailto:a@b.cz>".

Solution:
if mail and text is the same, insert only the mail address
while converting it from HTML to plain text.
Still REPRODUCIBLE with EN-US Seamonkey 2.33.1 (German Language pack)  Gecko/20100101 Build 20150321194901 (Default Theme) on German WIN7 64bit:

1. Launch e-mail-client
2. from an inbox: In News-Symbol-Toolbar Click 'compose  → HTML-Mail
3. Insert 2 Lines:
   First Line:  "info@email.de" (simply text)
   Second Line: "info@email.de", select this text and add a mailto hyperlink using 
                 the HRML edit functions below Subject input line
4. Add your own (POP3) email address to recipient.
5. Add an additional web-mail-recipient (youraccount@gmail.com or similar) 
6. Menu 'File → Save as → Template'
7. Send email

Results:
a) In received Mails second line will no longer contain a correct mailto: link,
   but a duplicated email-address as described. Same in both received emails
b) Template source code still looks ok.
c) receiving HTML-mails with mailto:links works fine
d) I think "Message Compose Window" is  not the optimum component. This seems
   not to be a problem with composing, but with sending.
e) core or at least part of the problem might be that sending is with wrong mime and contents type.
   received mail source should show: Content-Type: text/html; charset=UTF-8
   *DOES* show: Content-Type: text/plain; charset=UTF-8; format=flowed
Keywords: regression
Summary: Thunderbird makes a mess of links and email addresses when composing replies to HTML mail in plaintext mode(Duped string of "http://x <http://x>" by Tb for <a href="http://x">http://x</a> is wrong. If <a href="mailto:a@x">a@x</a>, "a@x" only is needed.) → Mailto Link in Content-Type: text/html; charset=UTF-8 becomes duplicated e-mail-address in Content-Type: text/plain; charset=UTF-8; format=flowed during sending
f) Does anybody know whether that is fixed in SoftMaker Thunderbird version and 
   how to contact those guys?
Severity: normal → major
Summary: Mailto Link in Content-Type: text/html; charset=UTF-8 becomes duplicated e-mail-address in Content-Type: text/plain; charset=UTF-8; format=flowed during sending → during sending: Mailto Link in Content-Type: text/html; charset=UTF-8 becomes duplicated e-mail-address in Content-Type: text/plain; charset=UTF-8; format=flowed
Simplifying the summary line.  I can't see why we need all this technical bloat.
Summary: during sending: Mailto Link in Content-Type: text/html; charset=UTF-8 becomes duplicated e-mail-address in Content-Type: text/plain; charset=UTF-8; format=flowed → Mailto link in HTML email becomes duplicated e-mail address in plain text email
Does anybody know where in the TB source "me@my.com" becomes "me@my.com <mailto:me@my.com>"?
Any updates on this issue? It's quite annoying in long email threads.
What an embarrassing bug. Why hasn't it been fixed in FIVE YEARS? Is it because "WADA:World Anti-bad-Duping Agency" spammed the thread with hopelessly incorrect garbage for long enough that anybody who would have fixed it had no idea what the problem was anymore?

Walt, Does this make sense to you?

Flags: needinfo?(wls220spring)

I haven't read every comment.

Testing replying to a couple of received HTML composed emails in plain text I do see the senders name and email address in <> in the To: field.

John Doe <johndoe@qwdvbm.com>

If there is a mailto: in <> in the body of the message it should be clickable. I would expect it to show the name 'John Doe' and link '<johndoe@qwdvbm.com>' so the recipient could click the link and open a composition window.

Links in <> are also clickable.
Any links will have the name of the link with the link in <> like 'Disappearing Earth by Julia Phillips'
https://cityofasylumpittsburgh.us8.list-manage.com/track/click?u=22aa59e8c9bdb5b7bcc6394d6&id=446ae5c72e&e=5575b2b7cb

When I first started doing newsgroup support I learned to always include links in the <> so they didn't wrap, get broken and were clickable.
The above link would be broken and not clickable.

I'd have to do some more testing to see if things get duplicated.

Flags: needinfo?(wls220spring)

Do more testing to see if links get duplicated in plain text replies to HTML messages with site and mailto: links in the message body.

Flags: needinfo?(wls220spring)

(In reply to WaltS48 [:walts48] from comment #27)

Do more testing to see if links get duplicated in plain text replies to HTML messages with site and mailto: links in the message body.

Did more testing and I can't reproduce using Thunderbird 68.2.2 on Ubuntu Linux 18.04.3 LTS.

Flags: needinfo?(wls220spring)

Just my two cents:

This bug has been confirmed too often that we could consider it WFM after some undocumented, hence possibly erratic testing.
The only way forward for this bug is for someone to translate comment 0 and some of the subsequent comments into proper STR, so that we are actually talking about a clearly defined scenario.

(In reply to Thomas D. from comment #29)

Just my two cents:

This bug has been confirmed too often that we could consider it WFM after some undocumented, hence possibly erratic testing.
The only way forward for this bug is for someone to translate comment 0 and some of the subsequent comments into proper STR, so that we are actually talking about a clearly defined scenario.

So true.

Confirmed in version 14 branch.

I tested with the current release by sending an HTML email with a mailto and URL links from my Comcast account to my Gmail account. Replied in plain text from the Gmail account. Sent the email back and forth between accounts several times.

The root cause of this issue is pretty simple.

Most email programs, when configured to use HTML format, automatically turn URLs and email addresses into hyperlinks, with the display text the same as what it links to. This happens both when typed by the user and when replying to a plain-text email. Possibly also when copying and pasting.

When you reply to an HTML email in plain text, Thunderbird turns "<a href="mailto:me@example.com">Email me</a>" into "Email me <mailto:me@example.com>". Unfortunately, the programmer neglected to make it check whether the display text is the same as the href (or the same with 'mailto:' or 'http://' or 'http://www.' in front or '/' behind).

It's pretty simple to fix, so why has it taken 7 years so far for anyone to get round to it?

If it is so simple, supply a patch, get it reviewed and see what happens.

I was asked if it made sense to me, and I replied with my observations.

Then I did more testing for the duplication problem which I can't reproduce.

I used Insert > HTML to add the mailto: link, and Insert > Link to add the site URL in the composition window.

Keywords: polish

I realise now I can't seem to reproduce the issue under TB 60.9.1 (32-bit version; Windows 10).

WaltS48: Any chance you could try the exact same reproduction steps under a current TB version?

(In reply to Stewart Gordon from comment #33)

I realise now I can't seem to reproduce the issue under TB 60.9.1 (32-bit version; Windows 10).

WaltS48: Any chance you could try the exact same reproduction steps under a current TB version?

So far I have not seen any clear, numbered, reduced steps to reproduce (STR) in this bug...

STR:

  1. In gmail, do exactly this
  2. In TB, do exactly that
  3. Next, do this and that

Actual Result:
....

Expected Result:
...

(In reply to WaltS48 [:walts48] from comment #32)

I used Insert > HTML to add the mailto: link, and Insert > Link to add the site URL in the composition window.

This looks like a procedure in TB, which is different from this bug's scenario, iiuc.

I appreciate the testing efforts of WaltS48 and I'm not saying that he's wrong in any way, it's just that I'm surprised that we're acting on a bug without having a crystal clear, numbered set of instructions (STR) that everyone can use for testing if this is reproducable or not. Maybe it works now, but without clear STR, who can tell? I suggest to add STR in the user story at the top of this bug. From my triage experience, people understand all sorts of things from untidy descriptions like comment 0 which don't follow protocoll, but unfortunately in bug triage, every detail matters.

Even the exact strings of that mailto: address might matter for some obscure reason. As a recent example, have a look at bug 1504455 where reporter claimed that pressing Enter after a BCC recipient did not create a new BCC recipient row, but a TO row instead. Everybody including reporter himself was failing to reproduce reliaby (trying with "John Doe <john@asdf.com>"), until reporter revealed the actual recipient string which he used: "Doe, John <john@asdf.com>". That little comma in the display name made all the difference, bug found and fixed. Just saying.

For purposes of illustration, here's why comment 0 isn't clear, especially wrt STR. Please do not reply to the questions in this comment, just write up ordered STR instead.

(In reply to Anthony DiSante from comment #0)

Steps to reproduce:
Replied to an email.

No STR here.

Actual results:
When composing messages in plain text mode to recipients in other email
systems (notably Gmail but many others as well), if the message is a reply,

Some sort of messy STR here, wrongly intertwined with actual results. So we have to start from Gmail, isn't it? So what does the gmail message look like, exactly?

then Thunderbird will automatically duplicate all links and email addresses
within the body of the previous messages. For example, "me@my.com" becomes

What's really in that Gmail message? The email address as a simple string? Is it a plaintext mail? HTML mail? mailto: link in HTML? What exactly does that mailto: link look like?

"me@my.com <mailto:me@my.com>"; and "www.site.com" becomes "www.site.com
http://www.site.com".

When exactly does this happen? We need to know the exact order of replies and their results.

(In reply to WaltS48 [:walts48] from comment #30)

I tested with the current release by sending an HTML email with a mailto and URL links from my Comcast account to my Gmail account. Replied in plain text from the Gmail account. Sent the email back and forth between accounts several times.

Again from experience, you can't trust TB to send exactly what you have composed. TB's delivery format auto-detect is known to mangle things, and links whose link text matches URL are a prime candidate for mangling. Also, so far I understood that the first HTML message with mailto and URL links originates from a Gmail webmail account (not composed with TB), which might make a difference. Then we need to specify exactly what those mailto and URL links look like. Also, let's be aware that TB mail reader linkifies even plaintext link-like strings in HTML which aren't real links with href attribute. So what you're seeing in reader is not necessarily identical with the actual message source. And so on...

Is there a such thing as a Gmail webmail account? I would have thought a Gmail account is a Gmail account, and webmail, POP and IMAP are just different ways of accessing it.

I got the impression that WaltS48 is using TB to access both Gmail and Comcast accounts, and has one configured to compose in HTML format and the other configured to compose in plain text format. But I realise this isn't clear. Walt, could you please clarify?

All this said, I'm not convinced that how the email being replied to was generated is directly relevant. If TB treats a given HTML hyperlink in a certain way when sent from the Gmail web interface, it would surely treat an identical hyperlink (same display text, same href) identically when sent from Hotmail, Mail.com, Outlook or even TB itself.

I think a good plan is if Walt, or someone else experiencing the problem, could attach here an .eml file of a message that one can reply to in order to reproduce the issue. We can then see the exact HTML code of a message that triggers the bug, and load it into TB and try to reproduce the issue by replying to it in plain text and HTML formats. (You could maybe munge the email addresses by editing it in a plain text editor if you're worried about spam, but make sure you munge all instances of the same email address identically. If you do this, please test again with your munged version to make sure the issue is still reproducible.)

Additionally, my earlier comment about checking if you can reproduce the bug in a current TB version still applies.

(In reply to Stewart Gordon from comment #38)

Additionally, my earlier comment about checking if you can reproduce the bug in a current TB version still applies.

Walt already stated that he tested for current release version, after the ambiguous note of "Confirmed in version 14 branch." which in context was trying to emphasise that the last time this bug was confirmed was on TB14 (I don't know if that's factually correct).

(In reply to WaltS48 [:walts48] from comment #30)

(In reply to Thomas D. from comment #29)

This bug has been confirmed too often...

So true. Confirmed in version 14 branch.

I tested with the current release by sending an HTML email with a mailto and URL links from my Comcast account to my Gmail account. Replied in plain text from the Gmail account. Sent the email back and forth between accounts several times.

(In reply to Stewart Gordon from comment #37)

Is there a such thing as a Gmail webmail account? I would have thought a Gmail account is a Gmail account, and webmail, POP and IMAP are just different ways of accessing it.

Generally yes, but could you swear that if you enter foo@bar.com into a composition on www.gmail.com (which I call webmail, using your account from providers online web site), they might not convert that into a proper mailto:link with linktext and URL behind your back?
Thunderbird definitely changes messages after you press send, and I guess webmailers might as well. So the same original message, if sent with TB-gmail vs. webmail-gmail, might very well be different when it arrives.

I got the impression that WaltS48 is using TB to access both Gmail and Comcast accounts, and has one configured to compose in HTML format and the other configured to compose in plain text format. But I realise this isn't clear. Walt, could you please clarify?

Exactly. That's where concise and numbered steps help...

All this said, I'm not convinced that how the email being replied to was generated is directly relevant. If TB treats a given HTML hyperlink in a certain way when sent from the Gmail web interface, it would surely treat an identical hyperlink (same display text, same href) identically when sent from Hotmail, Mail.com, Outlook or even TB itself.

The generation process matters only to know exactly what the format/content of the generated message being replied to really is. I haven't seen any clear description of what the first email really looks like, message format and link format and link text if any. There might be information hidden in dozens of comments, but I think we really want clear, numbered or bulleted STR for everyone to be on the same page.

I think a good plan is if Walt, or someone else experiencing the problem, could attach here an .eml file of a message that one can reply to in order to reproduce the issue.

Unfortunately, Walt failed to reproduce the problem. I myself haven't tried yet, and would really want to avoid wasting time on finding out what the scenario really is, when others like reporter or those who confirmed in the past could just make that information available in a structured format.

But I can't stop anyone from proceeding with more explanatory comments and without clearly defined STR...

Fwiw, Wada's comment 1 looks pretty correct and plausible (maybe forgot http somemtimes) as a description of the bug as originally seen. He's using incremental steps with immediate actual result on every step.

I checked the problem again and I can't reproduce it anymore.
It seems that TB removes all hyperlinks when you reply.

Attached file Testcase

I have a testcase! Reproduced in 68.3.1 (32-bit, Windows 10). This is based on an actual email I received, with the email addresses munged and the message bodies trimmed. Steps:

  1. Have 'Compose messages in HTML format' switched off in the account settings.
  2. Open the attached .eml file.
  3. Press Reply.

Excerpt from the garbage emitted by TB:

>     *To:* Robert Stone <abcdefghijklm@livee.co.uk.invalid
>     <mailto:abcdefghijklm@livee.co.uk.invalid>>; Shabir Okhai
>     <abcdefghijk@gmaill.com.invalid
>     <mailto:abcdefghijk@gmaill.com.invalid>>;
>     abcdefghi@btinternett.com.invalid
>     <mailto:abcdefghi@btinternett.com.invalid>
>     <abcdefghi@btinternett.com.invalid
>     <mailto:abcdefghi@btinternett.com.invalid>>
>     *Cc:* Stewart Gordon <abcd@inamee.com.invalid
>     <mailto:abcd@inamee.com.invalid>>
>     *Subject:* Re: Thursday 12th December chess matches

So clearly the bug still occurs under particular conditions, but I've no idea what those conditions are. Hopefully somebody can diagnose the bug from this testcase.

Keywords: testcase

So I have reproduced this when being asked why my mails create gunk of links, literally 3 pages of mailto conversions of an original line, when going back and forth enough.

The way to reproduce this is to set View Message Body as Original HTML but write email in text only (no html).

1: in tb: create an email to yourself, include "this is www.cnn.com"
2: in gmail (with default settings): reply to self
3: in tb respond to yourself again

in step 3, in the text editor you will see that:

www.cnn.com

has been converted to

this is www.cnn.com <http://www.cnn.com>

while what hitting reply in plain text mode should have happened is that all <> is removed or at least not duplicated.

You do this often enough and you end up with a plethora of http://www.cnn.comhttp://www.cnn.comhttp://www.cnn.comhttp://www.cnn.com

I have attached screenshots for all of the steps and a screenshot what this looks like after 5 times back and forth

Attached image originaldraft.png
Attached image step1linkified.png
Attached image step1source.png
Attached image step1ingmail.png
Attached image step3respondintb.png
Attached image 5xbackandforthintb.png

And this was with TB 68.7.0 on Ubuntu

See Also: → 1692771

A duplicate of 1692771 I think.

Bug 1692771 is about web hyperlinks and is marked as fixed. This is about mailto links and is still present, as shown by the attached testcase under Thunderbird 91.7.0 (latest at this time of writing).

You need to log in before you can comment on or make changes to this bug.