Last Comment Bug 133016 - [mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)
: [mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)
Status: VERIFIED FIXED
: verified1.8.1.8
Product: MailNews Core
Classification: Components
Component: Backend (show other bugs)
: Trunk
: All All
: P2 normal with 1 vote (vote)
: mozilla1.9alpha6
Assigned To: Ben Bucksch (:BenB)
:
:
Mentors:
news://news.mozilla.org:119/b5vsip$79...
: 119963 145473 145703 200216 230972 236915 248319 260714 271374 275183 303590 313252 316428 326189 348012 349531 353013 356291 367050 390779 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-23 09:58 PST by Jacob Eisinger
Modified: 2008-07-31 04:30 PDT (History)
42 users (show)
dveditz: blocking1.8.1.8+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
message with bad link (1.02 KB, message/rfc822)
2002-09-08 11:42 PDT, Jacob Eisinger
no flags Details
Possible Patch, v1 (not sure, if good idea or WONTFIX) (1.19 KB, patch)
2005-09-05 12:02 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Possible Patch, v2 (not sure, if good idea or WONTFIX) (1.19 KB, patch)
2005-09-05 12:05 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
proposed fix (2.07 KB, patch)
2007-01-18 11:40 PST, Magnus Melin
ben.bucksch: review-
Details | Diff | Splinter Review
Fix, v4, broken (3.07 KB, patch)
2007-01-18 17:06 PST, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Fix, v5 (3.07 KB, patch)
2007-01-18 18:24 PST, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Fix, v6. Alternative to v5: ) only after ( (3.02 KB, patch)
2007-01-19 03:57 PST, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
My old test folders (ZIP with mbox) (189.63 KB, application/octet-stream)
2007-01-20 19:29 PST, Ben Bucksch (:BenB)
no flags Details
Test mail (1.80 KB, message/rfc822)
2007-01-20 20:19 PST, Ben Bucksch (:BenB)
no flags Details
Test mail screenshot (with patch v6) (6.97 KB, image/png)
2007-01-20 20:26 PST, Ben Bucksch (:BenB)
no flags Details
proposed fix (v6 + thighter check for mail addresses) (3.18 KB, patch)
2007-02-03 05:40 PST, Magnus Melin
no flags Details | Diff | Splinter Review
Fix, v8 (3.06 KB, patch)
2007-02-11 06:27 PST, Ben Bucksch (:BenB)
mscott: review+
cbiesinger: superreview+
mscott: approval‑thunderbird2+
Details | Diff | Splinter Review
Test mail (2.83 KB, text/plain)
2007-02-11 06:31 PST, Ben Bucksch (:BenB)
no flags Details
Screenshot with patch v8 on trunk (295.13 KB, image/jpeg)
2007-02-26 14:20 PST, Henrik Skupin (:whimboo) [away 09/30 - 10/06]
no flags Details
Branch fixup, v1 (1.43 KB, patch)
2007-08-04 06:00 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Branch fixup screenshot (9.86 KB, image/png)
2007-08-05 04:32 PDT, Ben Bucksch (:BenB)
no flags Details
Branch fixup, v1 (1.66 KB, patch)
2007-08-05 04:58 PDT, Ben Bucksch (:BenB)
dveditz: approval1.8.1.8+
Details | Diff | Splinter Review

Description Jacob Eisinger 2002-03-23 09:58:01 PST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9+)
Gecko/20020318
BuildID:    2002031803

When reading a e-mail message, Mozilla turns all web addresses to clickable
links.  However, some links are not correctly highlighted; they are either
segmented leaving the hyperlinks useless or the hyper link is not extended enough.

Two examples:
http://www.finaid.vt.edu/flash/Mar02flash.pdf
http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG

Reproducible: Always
Steps to Reproduce:
1. recieve an e-mail with an above mentioned link in the body

Actual Results:  With http://www.finaid.vt.edu/flash/Mar02flash.pdf, the link is
segmented so that http://www.finaid becomes a hyper link, .vt. just becomes
text, and edu/flash/Mar02flash.pdf becomes a hyper link.

With http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG,
http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s becomes a
hyper link and ).JPG becomes just text.

Expected Results:  the whole address should become a hyperlink
Comment 1 bugzilla1@edcint.co.nz 2002-05-05 19:06:51 PDT
The following information may or may not be related. It may also be a problem at
the senders end. I received the following mail message from the bugzilla system
with a URL that was not fully recognised as a URL. Only the first line of the
field "URL:" was recognised as a clickable URL - the rest was just text.
http://bugzilla.mozilla.org/show_bug.cgi?id=140851

           Summary: LDAP query in browser loads entries into address book
           Product: Browser
           Version: other
          Platform: PC
               URL: http://bugzilla.mozilla.org/buglist.cgi?bug_status=NEW&b
                    ug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype
                    1=substring&emailassigned_to1=1&email2=&emailtype2=subst
                    ring&emailreporter2=1&bugidtype=include&bug_id=&changedi
                    n=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&short
                    _desc=ldap&short_desc_type=allwordssubstr&long_desc=&lon
                    g_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_ty
                    pe=allwordssubstr&status_whiteboard=&status_whiteboard_t
                    ype=allwordssubstr&keywords=&keywords_type=anywords&fiel
                    d0-0-0=noop&type0-0-0=noop&value0-0-
                    0=&cmdtype=doit&order=Reuse+same+sort+as+last+time
        OS/Version: Windows 2000
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: --
         Component: URL Bar
        AssignedTo: hewitt@netscape.com
        ReportedBy: matthew.jurgens@solnet.com.au
         QAContact: claudius@netscape.com


>From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.0+)
Gecko/20020428
BuildID:    2002042808


Comment 2 Pádraig Brady 2002-09-05 01:50:04 PDT
The following is truncated at ! also (Version 1.1 release)

http://linux.bkbits.net:8080/linux-2.5/user=akpm/ChangeSet@-4w?nav=!-|index.html|stats|!+|index.html
Comment 3 Boris 'pi' Piwinger 2002-09-05 02:05:45 PDT
Please attach the complete message in question to this bug for further inspection.

Possibly a dupe of bug 90161.

pi
Comment 4 Jacob Eisinger 2002-09-08 11:42:15 PDT
Created attachment 98330 [details]
message with bad link

The first link seems to work now!  However, when parenthesis are used, the link
terminates before the ending parenthesis ( ')' ).
Comment 5 Michael Gabriel 2002-09-14 00:14:35 PDT
*** Bug 145703 has been marked as a duplicate of this bug. ***
Comment 6 Derwood 2002-10-01 22:04:25 PDT
*** Bug 145473 has been marked as a duplicate of this bug. ***
Comment 7 Artiom Morozov 2002-12-26 00:15:28 PST
Besides that, arbitrary text can be hilited when it should not be =( E.g. here

PDF::parse: cannot find pdf parser /usr/local/bin/acroread

PDF::parse: is shown as a link (since Mozilla 1.3a, 1.2 didn't have this
effect). I think protocol should be verified with getservbyname() or similiar call.
Comment 8 Torben 2003-04-02 05:32:23 PST
*** Bug 200216 has been marked as a duplicate of this bug. ***
Comment 9 Torben 2003-04-02 05:38:38 PST
To stop this bug from morphing I am changing the summary to reflect URL 2 in
comment O, which is what the dupes are about too. This bug affects normal ")",
square "]", and sqiugly "}" brackets.

Comment 7 is an other bug, which I belive is fixed.
Comment 10 David G King 2003-04-02 05:43:21 PST
Adding URL from dupped bug that demonstrates problem.

Also, placing a "<" and a ">" around the URL fixes the problem, as long as the
URL is not wrapped to multiple lines.
Comment 11 Andreas Otte 2003-04-02 08:39:25 PST
There was some work done with bug 183111, which may affect this bug as Torben
said. RFC 2396 at one point has this to say about delimiters:


2.4.3. Excluded US-ASCII Characters

   Although they are disallowed within the URI syntax, we include here a
   description of those US-ASCII characters that have been excluded and
   the reasons for their exclusion.

   The control characters in the US-ASCII coded character set are not
   used within a URI, both because they are non-printable and because
   they are likely to be misinterpreted by some control mechanisms.

   control     = <US-ASCII coded characters 00-1F and 7F hexadecimal>

   The space character is excluded because significant spaces may
   disappear and insignificant spaces may be introduced when URI are
   transcribed or typeset or subjected to the treatment of word-
   processing programs.  Whitespace is also used to delimit URI in many
   contexts.

   space       = <US-ASCII coded character 20 hexadecimal>

   The angle-bracket "<" and ">" and double-quote (") characters are
   excluded because they are often used as the delimiters around URI in
   text documents and protocol fields.  The character "#" is excluded
   because it is used to delimit a URI from a fragment identifier in URI
   references (Section 4). The percent character "%" is excluded because
   it is used for the encoding of escaped characters.

   delims      = "<" | ">" | "#" | "%" | <">

   Other characters are excluded because gateways and other transport
   agents are known to sometimes modify such characters, or they are
   used as delimiters.

   unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

   Data corresponding to excluded characters must be escaped in order to
   be properly represented within a URI.


The code that marks a url in a text does also recognize () and {} as delimiters,
maybe it can be made a little bit smarter.


Also there is the general problem with linebreaks, RFC 2396 is very clear about
that:

E. Recommendations for Delimiting URI in Context

   URI are often transmitted through formats that do not provide a clear
   context for their interpretation.  For example, there are many
   occasions when URI are included in plain text; examples include text
   sent in electronic mail, USENET news messages, and, most importantly,
   printed on paper.  In such cases, it is important to be able to
   delimit the URI from the rest of the text, and in particular from
   punctuation marks that might be mistaken for part of the URI.

   In practice, URI are delimited in a variety of ways, but usually
   within double-quotes "http://test.com/", angle brackets
   <http://test.com/>, or just using whitespace

                             http://test.com/

   These wrappers do not form part of the URI.

   In the case where a fragment identifier is associated with a URI
   reference, the fragment would be placed within the brackets as well
   (separated from the URI with a "#" character).

   In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
   need to be added to break long URI across lines. The whitespace
   should be ignored when extracting the URI.

   No whitespace should be introduced after a hyphen ("-") character.
   Because some typesetters and printers may (erroneously) introduce a
   hyphen at the end of line when breaking a line, the interpreter of a
   URI containing a line break immediately after a hyphen should ignore
   all unescaped whitespace around the line break, and should be aware
   that the hyphen may or may not actually be part of the URI.

   Using <> angle brackets around each URI is especially recommended as
   a delimiting style for URI that contain whitespace.

   The prefix "URL:" (with or without a trailing space) was recommended
   as a way to used to help distinguish a URL from other bracketed
   designators, although this is not common in practice.

   For robustness, software that accepts user-typed URI should attempt
   to recognize and strip both delimiters and embedded whitespace.

   For example, the text:

      Yes, Jim, I found it under "http://www.w3.org/Addressing/",
      but you can probably pick it up from <ftp://ds.internic.
      net/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
      ietf/uri/historical.html#WARNING>.

   contains the URI references

      http://www.w3.org/Addressing/
      ftp://ds.internic.net/rfc/
      http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING


Mozilla does not conform to the linebreak handling described here.
Comment 12 Torben 2003-04-04 05:46:17 PST
> Mozilla does not conform to the linebreak handling described here (comment 11)

For the record, that is bug 5351.
Comment 13 David G King 2003-04-04 05:55:27 PST
OK, so RFC3296 has certain rules governing what can be in a URI. However, may I
suggest that if a user receives an eMail that contains some "unwise" characters
in the URI, that Mozilla should be able to link to the complete URI.

Of course, sending an eMail with these "unwise" characters is a totally
different matter.
Comment 14 Andreas Otte 2003-04-04 06:13:34 PST
As I said, our detection code probably needs to be a little bit smarter, for
example: If the url does not start with a bracket then counting brackets and
stopping when the count reaches zero makes no sense as it seems to happen with
the second example, but that's just a guess.
Comment 15 Ben Bucksch (:BenB) 2003-04-04 09:42:08 PST
Thanks for ccing me Andreas. Most of the comments on this bug are actually wrong
or offtopic, so I am focussing on the problem mentioned in comment 9. Shortening
summary.

For the spec, see <http://www.bucksch.com/1/projects/mozilla/16507/>.

Andreas, what are you suggesting? My opinion is basically: if you're sending
freaky URLs like those with (), then wrap them in <>. If the sender didn't, fix
the sender (if possible) or copy the URL manually (happens rarely enough IMHO).
But, what would you do? IIRC, I didn't count spaces, I just stop at any bracket
when in freetext mode.
Comment 16 Ben Bucksch (:BenB) 2003-04-04 10:02:30 PST
narrowing summary even further, because } and ] are illegal in URLs and should
IMHO not be linked. It's a recognizer, after all, it's not expected to get it
right in all cases, esp. not the abuse ones.
Comment 17 Ben Bucksch (:BenB) 2003-04-04 10:04:58 PST
Code is in mozTXTToHTMLConv::FindURLEnd
<http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#299>
(sorry for the spam).
Comment 18 Ed Grochowski 2003-04-04 10:14:27 PST
Interesting... I have a couple of observations/info.

I am the one who posted the newsgroup message currently listed in the URL of
this bug. I obtained this URL as follows. Using Mozilla, goto microsoft.com
navigate to the search and select search knowledgebase by id. I typed in the id#
815411, the corresponding page is loaded and the URL in Mozilla location bar is
the one in the message. I then simply selected and copied it from my location
bar, and pasted it to my news post, to get:

http://support.microsoft.com/default.aspx?scid=kb;[LN];815411

- I find it interesting that when posting this to bugzilla it handles it 
"correctly" (i.e. it highlights the entire url). I pasted the above URI string
the same way.
- As one would expect, since Microsoft is the originator of above URL Outlook
also highlights the whole URL in the posting.
Comment 19 Derwood 2003-04-04 10:37:56 PST
Perhapse there needs to be a "900lbs-gorilla" tag for bugs which highlight bugs
which are technically compliant with public specs but M$ ignores thus
promulgating bad behavior.

I leave it up to others to make the political/pragmatic decisions if the
prevalence needs to be accommodated or not.
Comment 20 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2003-10-10 11:22:51 PDT
Take a look at RFC 2396 (2.3):

mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

There is described that ´)' and ´'' are also allowed within an URI. But both
aren't working within MailNews.

Like the comment [1] tells me, it was forgotten to update and RFC2396E sounds
like a draft for me. 

Within the Browser you see this url [2] completly because Bugzilla creates a
hyperlink and so its shown correctly. Instead MailNews renders the URI for itself.

[1]
http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#308
[2] http://support.microsoft.com/default.aspx?scid=kb;[LN];815411
Comment 21 Mike Cowperthwaite 2004-04-20 11:38:18 PDT
Bug 119963 is about the apostrophe case.
See also bug 218287.
Comment 22 Ben Bucksch (:BenB) 2004-04-20 12:25:19 PDT
*** Bug 119963 has been marked as a duplicate of this bug. ***
Comment 23 Ben Bucksch (:BenB) 2004-04-20 12:37:38 PDT
I marked bug 119963, which is about apostrophes, a dup of this, because they are
so similar.

I am tending towards WONTFIX. This is a recognizer, designed to get most valid
cases right.

Cases outside the spec (e.g. "[" as in comment 18) generally are not supported.
Comment 18 asks to allow "[" in URLs, while bug 218287 asks *not* to support it
and gives a good example. This shows that the recognizer is just guesswork and a
judgement call.

"(" and apostrophe are allowed in URL per spec. But we're not in a situation
where we know for sure that it's supposed a URL, we need to *find* it in a bunch
of human language text (like http://example.com). This is why I excluded
brackets from the URL and probably apostrophe as well. For consistency and for
bug 218287, I should probably actually disallow both opening and closing
brackets everywhere in the URL.
Comment 24 Ed Grochowski 2004-04-20 13:57:39 PDT
A quick point of clarification (since I added comment #18):

I am not advocating support for parsing characters in URI's such as the "[" if
it is outside the spec. 

I was simply adding contextual information on where I came across it in the
field and how from an end user perspective there "seemed to be" differing
behaviours between Mozilla, IE, an Bugzilla. 

If end users dont come across these non-standard cases very often in practice,
its probably not a big deal, and Mozilla should do the right thing.

Comment 25 Ian Wells 2004-04-22 00:26:18 PDT
I know this is polluting the menus, but perhaps on selecting the full URL by 
hand you could add 'open selection as link' on the context menu?  Would be 
less painful than the 'cut and paste and open a browser window and and 
and ...' solution that's required otherwise.
Comment 26 timeless 2004-06-13 14:28:55 PDT
we should not do that. but i can today select text and drag it to a tab or
window if i want it treated as a url. if i'm using x11 i can probably middle
click on my selection to open it somewhere.
Comment 27 Mike Cowperthwaite 2004-09-03 10:47:51 PDT
*** Bug 230972 has been marked as a duplicate of this bug. ***
Comment 28 Mike Cowperthwaite 2004-09-05 10:07:51 PDT
*** Bug 248319 has been marked as a duplicate of this bug. ***
Comment 29 Mike Cowperthwaite 2004-09-21 10:20:11 PDT
*** Bug 260714 has been marked as a duplicate of this bug. ***
Comment 30 Sorin Sbarnea 2004-09-21 23:42:21 PDT
I think that & (ampersant) and | (pipe) are included in standard but it still 
doesn't work with them. 
Comment 31 Mike Cowperthwaite 2004-11-23 07:01:33 PST
*** Bug 271374 has been marked as a duplicate of this bug. ***
Comment 32 Mike Cowperthwaite 2004-12-02 11:05:21 PST
Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
with their bretheren.
Comment 33 Mike Cowperthwaite 2005-02-09 22:15:01 PST
*** Bug 275183 has been marked as a duplicate of this bug. ***
Comment 34 Matthias Versen [:Matti] 2005-04-22 12:34:03 PDT
*** Bug 236915 has been marked as a duplicate of this bug. ***
Comment 35 Mike Cowperthwaite 2005-08-05 12:40:46 PDT
*** Bug 303590 has been marked as a duplicate of this bug. ***
Comment 36 Lyman Byrd 2005-08-23 06:15:25 PDT
Domino produces links like:
https://lsd.memach.com/ConditionReports.nsf/(lookup)/CC8CB6274C8EEBE385257062005E54A6?OpenDocument
and the user has to copy the link and paste it into the browser in order to view
the link. It would really be nice if the behavior of thunderbird would use the
entire link vice stopping at the letter before the ")".  I cant change the way
domino produces the links. Thanks for your efforts
Comment 37 Lyman Byrd 2005-09-01 06:08:22 PDT
(In reply to comment #32)
> Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
> with their bretheren.

What does this mean? That the bug is going to die for lack of attention or it is
going to be looked at and action taken. I was planning on deploying a bunch of
thunderbird apps but since we use domino for one of our main application and
domino insists on using ( and ) in the links I am stuck unless this is resolved.
I will have to find an alternative email program... How can I find out what is
being done. Do I need to look for an alternative? Thanks for all your efforts...

Lyman
Comment 38 Mike Cowperthwaite 2005-09-01 08:49:46 PDT
(In reply to comment #37)
> (In reply to comment #32)
> > Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
> > with their bretheren.
> 
> What does this mean? That the bug is going to die for lack of attention or it
> is going to be looked at and action taken. I was planning on deploying a bunch
> of thunderbird apps but since we use domino for one of our main application
> and domino insists on using ( and ) in the links I am stuck unless this is
> resolved.

Comment 23, written by the person who implemented URL recognition in the first 
place, gives the arguments *against* allowing parentheses in a recognized URL.
The main issue here is that someone might write a message such as
---
  ... and you can visit my homepage (http://homepage.somewhere.tld) to see.
---
In this case, the closing parenthesis should *not* be part of the URL.  Making 
the code smart enough to recognize parentheses in context is harder than it 
sounds -- particularly since there could be a legal URL which contains *only* a 
closing or opening parenthesis, e.g.
  http://paranoia.tld/kso03)cldke.html


You say you can't change how Domino works, but unless you can write some code 
for Mozilla (or convince/pay someone to do so for you), you can't change how TB 
works either.  Why not file a bug with them and see if they can change the URL 
generation to either enclose the URL within angle brackets, or escape the 
parentheses?  That change on their part would be much easier to implement than 
the change required on Mozilla's part.
Comment 39 Derwood 2005-09-01 09:00:11 PDT
Why isn't it simple enough to include a paren if there's no whitespace after it?
Comment 40 Oliver Saier 2005-09-02 00:42:52 PDT
Both (http:example.com) and http://paranoia.tld/kso03)cldke.html are correctly
recognized here. I guess that Bugzilla's recognition is better than Thunderbird's :p
Comment 41 Adam Guthrie 2005-09-04 23:10:43 PDT
*** Bug 301363 has been marked as a duplicate of this bug. ***
Comment 42 Ben Bucksch (:BenB) 2005-09-05 12:02:40 PDT
Created attachment 194943 [details] [diff] [review]
Possible Patch, v1 (not sure, if good idea or WONTFIX)

In response to Lyman Byrd, I created a simple patch which does what I think is
the only thing I can do here: URL recognition won't stop at "(", ")" or "'" in
the middle of the URL, but will exclude ")" and "'", if they appear at the end.


The patch is completely untested (it does compile, though).

I am still not sure, if this bug is a good idea in the first place or should be
WONTFIXed. Please read my comments above before applying the patch.

And please do contact IBM/Lotus to fix their broken software - it's highly
unwise to use things like brackets and apostrophs in URLs, exactly because of
cases like these.
Comment 43 Ben Bucksch (:BenB) 2005-09-05 12:05:42 PDT
Created attachment 194945 [details] [diff] [review]
Possible Patch, v2 (not sure, if good idea or WONTFIX)

Ops, last patch had an obvious error.

(Also: Remove "broken" from last paragraph of last comment. Rest stays.)
Comment 44 Karsten Düsterloh 2005-10-21 04:39:48 PDT
*** Bug 313252 has been marked as a duplicate of this bug. ***
Comment 45 Ian Stokes-Rees 2005-10-23 01:01:53 PDT
I assume this is the catch-all bug for automatic URL detection problems.  This URL is also segmented at a strange place:

http://www.modpython.org/FAQ/faqw.py?query=expat&querytype=simple&casefold=yes&req=search

Specifically, right before the "?".  Now, obviously "?" is a pretty common character to be in a URL, so this really shouldn't happen.  It is hard to imagine a regular sentence which is a question which ends with a URL... e.g.:

Have you looked at: http://www.example.com/?

In fact, if you wanted to catch that sort of thing, the logic should be fairly straight forward -- just see if there is any whitespace after the ?.  I think the same goes for the parenthesis examples.

I'm using Thunderbird 1.5beta1.
Comment 46 Ben Bucksch (:BenB) 2005-10-28 07:46:03 PDT
> I assume this is the catch-all bug for automatic URL detection problems

It is most definitely not and has a specific summary.

> Specifically, right before the "?"

WFM, verbatim in the bugmail from your comment.
Comment 47 Elmar Ludwig 2005-11-14 12:28:08 PST
*** Bug 316428 has been marked as a duplicate of this bug. ***
Comment 48 zug_treno 2006-02-07 10:27:52 PST
*** Bug 326189 has been marked as a duplicate of this bug. ***
Comment 49 moz.champion 2006-02-08 14:56:52 PST
If ) brackets are not to be recognized as part of the URL, then why does this link exist?
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)

It cannot be copy/pasted or typed reliably with either Thunderbird, or reproduced with Firefox, because the final ) is rendered as text.

If ) closing parenthesis are not to be used in a link, then why has mozillazine.org used them?

The problem in Thunderbird 1.5 is even worse
none of the following show as links, all failing at the ) character in TB 1.5, yet they render in Firefox!

http://test(no.org/test(situation(
http://test(no.org/test(situation)
http://test(no.org/test)situation)
http://test)no.org/test(situation)

As read here in Firefox, items 2 thru 4 render incompletely on the LAST )
but in Thunderbird 1.5, the link breaks at the FIRST ) in each url.

Thunderbird 1.5 and Firefox render the links differently

Comment 50 Ian Stokes-Rees 2006-02-08 15:30:36 PST
It is clear that there are cases when a round bracket might be a legitimate part of a URL.  The fact of the matter is that USUALLY it won't be, so the auto-sensing URL checker should go for the most regular use case.  Basically, it should get it right *most* of the time.  This, in my experience, will mean ignoring closing brackets (of any sort), or punctuation *other than* a slash as the last character in the string.
Comment 51 moz.champion 2006-02-08 16:18:25 PST
(In reply to comment #50)

That is understood as the rational behind not rendering the LAST ) as part of a link. But even mozillazine uses such as links.

And, in Thunderbird 1.5 even a ) within the body of a link breaks it.
while in Firefox ONLY a trailing ) breaks it.

According to RFC http://www.faqs.org/rfcs/rfc1738.html
From section 2.2
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL. 

typing http://test)this.is.a.test
here in Firefox, and the link is NOT broken
try sending that in Thunderbird 1.5 and it comes out as
http://test )this.is.a.test (space added after test for display only)

The program is ignoring a valid character in a URL, simply because some users MIGHT use the character as a delimiter. Would it not be better to include the ) in links, and correct the improper use of such as a delimiter when it arises?
And once again, the problem is more profound in Thunderbird 1.5 which renders ANY ) as an invalid character (in a URL)
Comment 52 intendentedelleacque 2006-04-25 15:33:19 PDT
Just a doubt: Am I wrong or  with the rfc 3986 the chars ( ) and ' are  considered as Reserved Characters (par. 2.2) and so must be escaped?

Comment 53 Ben Bucksch (:BenB) 2006-04-26 06:01:00 PDT
You are right that they are now "reserved" (they weren't in RFC2396). This means that they *must* be escaped when not used as delimiters, but *must not* be escaped when used in their origial purpose as delimiter. I'm not sure if the Lotus Notes use falls into the category of a proper use as delimiter.
Comment 54 Ben Bucksch (:BenB) 2006-04-26 06:06:02 PDT
Re last paragraph in comment 51: No, this is a *recognizer* mainly targetted at text that *humans* write. This is something totally different from an HTTP header parser. "Go to Google (http://www.google.com)" is far more often in emails than the rare URLs which make use of () inside the URL. See my other comments above.
Comment 55 Mike Cowperthwaite 2006-08-09 18:57:13 PDT
*** Bug 348012 has been marked as a duplicate of this bug. ***
Comment 56 Magnus Melin 2006-08-21 10:56:30 PDT
*** Bug 349531 has been marked as a duplicate of this bug. ***
Comment 57 Oliver Saier 2006-08-24 13:18:24 PDT
The workaround to send a good clickable link is to replace ) with %29
Compare:
http://en.wikipedia.org/wiki/O_(Cyrillic)
http://en.wikipedia.org/wiki/O_(Cyrillic%29
http://en.wikipedia.org/wiki/O_%28Cyrillic%29

Does anyone know the % codes for ' and " ?
Comment 58 moz.champion 2006-08-24 16:48:37 PDT
Another workaround is to use <> brackets
observe
http://en.wikipedia.org/wiki/O_(Cyrillic)
<http://en.wikipedia.org/wiki/O_(Cyrillic)>

Firefox doesnt have this problem, what code do they use? Workarounds are good but they detract from the fact that in this case Thunderbirds actions are non-standard. a ) is a legitimate character in urls, and one shouldnt have to use a 'workaround' to get it to work in Thunderbird
Comment 59 Magnus Melin 2006-09-17 07:00:30 PDT
*** Bug 353013 has been marked as a duplicate of this bug. ***
Comment 60 Mike Cowperthwaite 2006-09-20 08:20:01 PDT
xref bug 281321
Comment 61 Magnus Melin 2006-10-11 11:40:24 PDT
*** Bug 356291 has been marked as a duplicate of this bug. ***
Comment 62 Ben Bucksch (:BenB) 2006-10-11 13:26:15 PDT
It's not clear what to do there. traditionally (even netscape 4), we assumed
that if you write "my company (http://www.foobar.com) does ...", you would
not want the () included in the URL. wikipedia unfortunately makes very
liberal use of that, which I think is very unwise. But it poses the question:
which case do we favor?

The only compromise I see is to look whether there's a ( after the URL start,
and if so, include the ) as well. I like that.

That doesn't solve the apostrophe proble, though, but I guess that's less
severe, we could just include it? It (only) breaks people using ' as quote.
Comment 63 Ben Bucksch (:BenB) 2006-10-11 13:29:28 PDT
Comment on attachment 194945 [details] [diff] [review]
Possible Patch, v2 (not sure, if good idea or WONTFIX)

Example:
http://en.wikipedia.org/wiki/Bulb_(disambiguation) (very common in wikipedia)
http://en.wikipedia.org/wiki/Lemarchand's_box
Comment 64 Ben Bucksch (:BenB) 2006-10-11 13:39:01 PDT
OK, I want to get this fixed.
Comment 65 moz.champion 2006-10-11 14:07:10 PDT
Its not only common in Wikipedia, Mozillazine uses it quite often as well i.e.
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)

The essence imho is; Thunderbird currently DOESNT follow web standards in that () are approved characters in urls.
Instead, it is 'fudging' the standards to meet developers expectations. Funny, when Microsoft does this, its propeity code and wrong, but here is Thunderbird doing the same exact thing.

Currently you have to tell people they CANNOT quote a legimate website
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)
because Thunderbird wont render it properly - even tho there is NOTHING wrong with the web site at all!

I would MUCH rather have Thunderbird properly render the () as authorized url characters which they in fact are, and then tell someone why HIS rendition of a url  (http://mozilla.com/html) doesnt work (the extra )). Simply tell the person NOT to use () to quote.
Comment 66 Anthony 2006-10-11 15:11:26 PDT
(In reply to comment #65)

When this bug (and so many related to it) was logged I would have agreed with you wholeheartedly, since RFC 2396 [Aug 1998] was in effect and "(" and ")" characters were permissible as data characters in URI's.

Four years later, though, Thunderbird's behaviour is now "accidentally compliant" with 2396's successor, RFC 3986 [Jan 2005], which moves "(" and ")" into sub-delims, meaning that they should be pct-encoded as %28 and %29 when appearing as data characters in URI's. Ref: http://www.ietf.org/rfc/rfc3986.txt

It seems that we should now be chasing the web masters of popular sites like Google Maps, Wikipedia and Mozillazine, to get them to fix their URI encoding. Or we could wait another four years and see what standards are in effect then. :)
Comment 67 Ben Bucksch (:BenB) 2006-10-11 15:23:38 PDT
Guys, this is not about standards. They are the ground rule, but this is a fuzzy recognizer, see my comments above. Unless somebody finds a problem with my idea in comment 62 or has better ideas:

Please stop from making comments.
Comment 68 Ben Bucksch (:BenB) 2006-10-14 05:22:01 PDT
Here's another idea (in addition to comment 62): bug 356657. Only helps, if sender used Thunderbird/Seamonkey.
Comment 69 Glenn 2006-11-01 00:00:48 PST
I frequently receive correspondance from companies which include ( and ) as part of their URL.  I had a survey which I filled out this past week for a company wanting to know how I felt about their customer service, and I was directed to the below URL, and whether this is wise from a standards perspective seems to be moot when compared to the fact that the links do not get properly interpreted as would be reasonably expected by a human.  I should add, though, that there was a bunch of other junk added to allow the browser to interpret it as a possible HTML link but since I display in plain text, I was unable to click without a bunch of cut and paste.  (I just tested this with the message in question, and it was never intended to be an HTML e-mail, so perhaps the sender just tried to get fancy or something and Thunderbird didn't know what to properly do with it?  Should Thunderbird even try to interpret such thing?)  The link I received was as follows:  

http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use
r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10

Within its context, though, it actually looked like this mess:

&lt;br&gt;&lt;br&gt;&lt;a 
href="http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use
r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10"&gt;http://surveys.benchmarkportal.c
om/ebay/survey.taf?survey_id=1283&amp;user_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5A
D10&lt;/a&gt;&lt;br&gt;&lt;br&gt;

The URL was spread across multiple lines and even with the quotes around the URL, it was misinterpreted by thunderbird.  I think it is PROBABLY safe to suggest that anything beginning with a quotation mark followed by the string http:// is a URL which should be interpreted as a URL until the next occurrence of a quotation mark, and one can probably make the same assumption with other network protocols like ftp:// or https:// for example.  (Whether this is easy to code for or not, I don't know.)  What I was able to click was only everything up until a line break occurred in Thunderbird.
Comment 70 Simon Montagu :smontagu 2007-01-15 20:47:54 PST
*** Bug 367050 has been marked as a duplicate of this bug. ***
Comment 71 Magnus Melin 2007-01-18 11:40:44 PST
Created attachment 251951 [details] [diff] [review]
proposed fix

This fixes it for all cases I have found.
Comment 72 Mike Cowperthwaite 2007-01-18 12:05:33 PST
Nit: separate keywords from parentheses with a space:
   while (...) ...;   if (....) ...;
        ^               ^

You could perform the same matching-parenthesis test at the end for matching apostrophe.  I don't think this is common, so maybe it's not worth testing for, but it's possible someone would put in the URL as:
   'http://mozilla.org'
Comment 73 Ben Bucksch (:BenB) 2007-01-18 13:17:49 PST
Comment on attachment 251951 [details] [diff] [review]
proposed fix

Thanks for the patch, but the code in the |if (state[check] == endok)| doesn't fit the design of the code. Any reason not to put it in the code you commented as "back up a bit"?

No reason to change the loop from for to while. And it *is* iterating.

Comments are not entirely correct, but comments would make sense, yes.

Unless you have a reason why the above proposal won't work, I'll attach a patch for that.
Comment 74 Ben Bucksch (:BenB) 2007-01-18 13:31:22 PST
> Any reason not to put it in the code you commented as "back up a bit"?

Nevermind, I realized it. However, your code - apart from not being in FindURL where is belongs - would not recognize (for example http://www.yahoo.com) correctly, it only looks at the char before the URL start. See comment 62.
Comment 75 Ben Bucksch (:BenB) 2007-01-18 17:06:21 PST
Created attachment 251988 [details] [diff] [review]
Fix, v4, broken

Here's what I mean:
- Changes for loop to be more understandable
- Allows ' in middle of URL, but not at end
- Allowes ) in middle of URL, but not at end, unless there was a ( earlier

However, it doesn't work as intended. I don't see why. Magnus, when looking at the code, can you see what's wrong?
Comment 76 Ben Bucksch (:BenB) 2007-01-18 17:35:04 PST
> However, it doesn't work as intended. I don't see why.

2 reasons:
- should be "&& !haveOpeningBracket" (forgot the ! )
- I stupidly built static, so my changes were not picked up. rebuilding.
Comment 77 Ben Bucksch (:BenB) 2007-01-18 18:24:25 PST
Created attachment 251997 [details] [diff] [review]
Fix, v5

Yup, that was it. Fix works as intended. I couldn't find anything it breaks either.
Magnus, please review.
Comment 78 Magnus Melin 2007-01-18 23:12:18 PST
I'll be away for a couple of days, so I wont be able to try it out until monday. Looks like it could work though. Assuming it works, it sure would be nice to get this in for tb2.0...

Do we want to fix the 'http://mozilla.org' case too? Don't know how usual that is, but it would be pretty straight forward fix.
Comment 79 Ben Bucksch (:BenB) 2007-01-19 03:57:31 PST
Created attachment 252040 [details] [diff] [review]
Fix, v6. Alternative to v5: ) only after (

With the patch v5, http://test)foo is recognized. This patch is an alternative, allowing ) only after a (. I.e. http://test_(Comp) and http://test(lookup)bla are recognized, but http://text)foo is not.

> Do we want to fix the 'http://mozilla.org'

Works before and after the patch (both versions).
Comment 80 Ben Bucksch (:BenB) 2007-01-20 19:29:12 PST
Created attachment 252193 [details]
My old test folders (ZIP with mbox)

Here are my old test mails. Quite a lot. For thorough testing, they should be checked with all relevant pref combinations.
Comment 81 Ben Bucksch (:BenB) 2007-01-20 20:19:57 PST
Created attachment 252195 [details]
Test mail

Ops, the last attachment was actually intended for another bug, but may be useful here as well.
Here's a mail with testcases specifically for the brackets and similar. Clicking on it should open it in the browser and render it with the browser's libmime, if available. You can also copy the file to LocalFolders and open it in Thunderbird.
Comment 82 Ben Bucksch (:BenB) 2007-01-20 20:26:28 PST
Created attachment 252196 [details]
Test mail screenshot (with patch v6)
Comment 83 Ben Bucksch (:BenB) 2007-01-20 20:27:17 PST
Comment on attachment 251997 [details] [diff] [review]
Fix, v5

Yes, I like v6 better
Comment 84 Magnus Melin 2007-01-22 11:32:56 PST
Comment on attachment 252040 [details] [diff] [review]
Fix, v6. Alternative to v5: ) only after (

Seems good except for one case I happend to stumble upon. In a bug it said: per foo@bar.com's suggestion. For emails apostrophes should mark the end, no? And hypothetical cases like foo@(bar).com probably shouldn't be links either.

I'm not a reviewer, but with that fixed, seems ok to me.
Comment 85 Magnus Melin 2007-02-03 05:40:09 PST
Created attachment 253854 [details] [diff] [review]
proposed fix (v6 + thighter check for mail addresses)

This is fix v6 + don't allow ' and ( in email address recognition.
Comment 86 Magnus Melin 2007-02-11 05:48:14 PST
Ben: have you had time to look at this?
Comment 87 Ben Bucksch (:BenB) 2007-02-11 06:27:10 PST
Created attachment 254729 [details] [diff] [review]
Fix, v8

Yes. The 2-line check you added was in the wrong place (there's a similar check later, should be added there). I just fixed that.

I'll also disallow [ and {. Currently, ] and } are disallowed (reasons see above), so including the opening ones doesn't make any sense whatsoever.
Comment 88 Ben Bucksch (:BenB) 2007-02-11 06:31:32 PST
Created attachment 254730 [details]
Test mail
Comment 89 Ben Bucksch (:BenB) 2007-02-11 17:31:12 PST
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

ops, this is netwerk, so I guess I need a netwerker sr. trying biesi.
Comment 90 Christian :Biesinger (don't email me, ping me on IRC) 2007-02-12 03:28:03 PST
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

sr=biesi
Comment 91 Ben Bucksch (:BenB) 2007-02-17 12:45:32 PST
Fix v8 checked into trunk.

Will bake on trunk for a few days, then ask for permission for branch.
Comment 92 Ben Bucksch (:BenB) 2007-02-19 09:54:21 PST
Can somebody QA this on trunk, please, so that we have a chance to get it into TB2.0? For expected results, see screenshot in comment 82.
Comment 93 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-02-26 14:20:36 PST
Created attachment 256518 [details]
Screenshot with patch v8 on  trunk

Tested with Thunderbird Version 3 alpha 1 (20070226) on Windows. Ben, have a look at the screenshot if the output is that what we want to have.
Comment 94 Ben Bucksch (:BenB) 2007-02-26 15:24:20 PST
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

Thanks Henrik. Yes, that output is as expected.

Requesting TB2 branch approval.
Comment 95 Ben Bucksch (:BenB) 2007-03-06 06:19:57 PST
checked into branch.
Comment 96 Oliver Saier 2007-03-07 04:42:26 PST
WFM with Tb 2.0pre 2007030604. Good job guys! - especially as this bug was about to get Wontfix'ed 3 years ago :)
Comment 97 Oliver Saier 2007-03-07 04:44:18 PST
^ wrong clipboard content: the build id is 2007030703 (Lin)
Comment 99 Ben Bucksch (:BenB) 2007-08-04 03:36:57 PDT
No, IIRC that was not intentional.
Comment 100 Ben Bucksch (:BenB) 2007-08-04 03:39:25 PDT
Branch apparently got an older version of the fix. Trunk seems OK.
I don't know what the branch rules are. Are correctness fixes allowed or only critical and security fixes?
If not allowed, I'll mark the bug fixed again.
Comment 101 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-08-04 04:15:23 PDT
Ben, if trunk is fine we could leave it as fixed. Due to the wrong patch checked into 1.8.1.3 we should ask for approval 1.8.1.7. Could you attach the patch which corrects the mistake?
Comment 102 Ben Bucksch (:BenB) 2007-08-04 04:53:50 PDT
Bug 390779 comment 8 is the result of the bad branch check-in.
Comment 103 Ben Bucksch (:BenB) 2007-08-04 06:00:24 PDT
Created attachment 275242 [details] [diff] [review]
Branch fixup, v1

This is the diff between branch and trunk on my disk, only the relevant part for this bug.
I have *not tested* it yet, will do later, or somebody else can if you want to.
Comment 104 Ben Bucksch (:BenB) 2007-08-04 06:05:04 PDT
*** Bug 390779 has been marked as a duplicate of this bug. ***
Comment 105 Ben Bucksch (:BenB) 2007-08-05 04:32:33 PDT
Created attachment 275311 [details]
Branch fixup screenshot
Comment 106 Ben Bucksch (:BenB) 2007-08-05 04:35:56 PDT
Tested Branch fixup, v1 with current branch, it indeed brings it on par with trunk, see screenshot.
mscott: please approve. This is to fix bug 390779.
Comment 107 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-08-05 04:46:41 PDT
Comment on attachment 275242 [details] [diff] [review]
Branch fixup, v1

>--- 2.0-branch/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp	2007-01-21 04:53:17.806002000 +0100
>+++ trunk/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp	2007-06-17 13:27:56.999825000 +0200

Dunno if we need a real diff against CVS 1.8 branch here. You only supplied your local diff between trunk and 1.8 branch.

Adjusting approval flag to 1.8.1.7. No review needed?
Comment 108 Ben Bucksch (:BenB) 2007-08-05 04:58:13 PDT
Created attachment 275313 [details] [diff] [review]
Branch fixup, v1

Same thing, but if you insist on it.
The patch already had review and checkin-approval for branch. I simply checked in the wrong patch.
Comment 109 Daniel Veditz [:dveditz] 2007-08-21 15:50:36 PDT
Comment on attachment 275313 [details] [diff] [review]
Branch fixup, v1

approved for 1.8.1.7, a=dveditz for release-drivers
Comment 110 Reed Loden [:reed] (use needinfo?) 2007-09-05 23:08:16 PDT
Ben, can you check-in attachment 275313 [details] [diff] [review] on MOZILLA_1_8_BRANCH, please?
Comment 111 Ben Bucksch (:BenB) 2007-09-12 04:00:36 PDT
Checked in on branch.
Comment 112 Daniel Veditz [:dveditz] 2007-09-28 12:57:12 PDT
hskupin: Ben checked this into the 1.8 (.1) branch, not the 1.8.0 branch. Also, please do not change the target milestone unless it's your bug, that's for developer's use. But I'm not sure what you were getting at anyway since this was apparently already fixed on the trunk before you set it.
Comment 113 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-09-28 13:54:09 PDT
Daniel, sorry for the fixed1.8.0.7 keyword. Yes, I ment fixed1.8.1.7. Something got wrong. I set the target milestone after the bug was fixed and which should give a hint for which version this issue is resolved. I don't think that this is inconvenient because Ben forgot that, isn't it? At least it will give us better results when doing QA.
Comment 114 Daniel Veditz [:dveditz] 2007-09-29 08:49:19 PDT
Is the target milestone really used that way? Ok, guess that's all right if the bug's actually fixed.

Note You need to log in before you can comment on or make changes to this bug.