[mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)

VERIFIED FIXED in mozilla1.9alpha6

Status

MailNews Core
Backend
P2
normal
VERIFIED FIXED
15 years ago
9 years ago

People

(Reporter: Jacob Eisinger, Assigned: BenB)

Tracking

({verified1.8.1.8})

Trunk
mozilla1.9alpha6
verified1.8.1.8
Bug Flags:
blocking1.8.1.8 +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(8 attachments, 9 obsolete attachments)

1.02 KB, message/rfc822
Details
189.63 KB, application/octet-stream
Details
6.97 KB, image/png
Details
3.06 KB, patch
Scott MacGregor
: review+
Biesinger
: superreview+
Scott MacGregor
: approval-thunderbird2+
Details | Diff | Splinter Review
2.83 KB, text/plain
Details
295.13 KB, image/jpeg
Details
9.86 KB, image/png
Details
1.66 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

15 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.9+)
Gecko/20020318
BuildID:    2002031803

When reading a e-mail message, Mozilla turns all web addresses to clickable
links.  However, some links are not correctly highlighted; they are either
segmented leaving the hyperlinks useless or the hyper link is not extended enough.

Two examples:
http://www.finaid.vt.edu/flash/Mar02flash.pdf
http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG

Reproducible: Always
Steps to Reproduce:
1. recieve an e-mail with an above mentioned link in the body

Actual Results:  With http://www.finaid.vt.edu/flash/Mar02flash.pdf, the link is
segmented so that http://www.finaid becomes a hyper link, .vt. just becomes
text, and edu/flash/Mar02flash.pdf becomes a hyper link.

With http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG,
http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s becomes a
hyper link and ).JPG becomes just text.

Expected Results:  the whole address should become a hyperlink
The following information may or may not be related. It may also be a problem at
the senders end. I received the following mail message from the bugzilla system
with a URL that was not fully recognised as a URL. Only the first line of the
field "URL:" was recognised as a clickable URL - the rest was just text.
http://bugzilla.mozilla.org/show_bug.cgi?id=140851

           Summary: LDAP query in browser loads entries into address book
           Product: Browser
           Version: other
          Platform: PC
               URL: http://bugzilla.mozilla.org/buglist.cgi?bug_status=NEW&b
                    ug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype
                    1=substring&emailassigned_to1=1&email2=&emailtype2=subst
                    ring&emailreporter2=1&bugidtype=include&bug_id=&changedi
                    n=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&short
                    _desc=ldap&short_desc_type=allwordssubstr&long_desc=&lon
                    g_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_ty
                    pe=allwordssubstr&status_whiteboard=&status_whiteboard_t
                    ype=allwordssubstr&keywords=&keywords_type=anywords&fiel
                    d0-0-0=noop&type0-0-0=noop&value0-0-
                    0=&cmdtype=doit&order=Reuse+same+sort+as+last+time
        OS/Version: Windows 2000
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: --
         Component: URL Bar
        AssignedTo: hewitt@netscape.com
        ReportedBy: matthew.jurgens@solnet.com.au
         QAContact: claudius@netscape.com


>From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.0+)
Gecko/20020428
BuildID:    2002042808


Comment 2

15 years ago
The following is truncated at ! also (Version 1.1 release)

http://linux.bkbits.net:8080/linux-2.5/user=akpm/ChangeSet@-4w?nav=!-|index.html|stats|!+|index.html
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 3

15 years ago
Please attach the complete message in question to this bug for further inspection.

Possibly a dupe of bug 90161.

pi
(Reporter)

Comment 4

15 years ago
Created attachment 98330 [details]
message with bad link

The first link seems to work now!  However, when parenthesis are used, the link
terminates before the ending parenthesis ( ')' ).

Comment 5

15 years ago
*** Bug 145703 has been marked as a duplicate of this bug. ***

Comment 6

15 years ago
*** Bug 145473 has been marked as a duplicate of this bug. ***

Comment 7

15 years ago
Besides that, arbitrary text can be hilited when it should not be =( E.g. here

PDF::parse: cannot find pdf parser /usr/local/bin/acroread

PDF::parse: is shown as a link (since Mozilla 1.3a, 1.2 didn't have this
effect). I think protocol should be verified with getservbyname() or similiar call.

Comment 8

14 years ago
*** Bug 200216 has been marked as a duplicate of this bug. ***

Comment 9

14 years ago
To stop this bug from morphing I am changing the summary to reflect URL 2 in
comment O, which is what the dupes are about too. This bug affects normal ")",
square "]", and sqiugly "}" brackets.

Comment 7 is an other bug, which I belive is fixed.
Summary: mail message does not correctly identify URI / URL / web address / hyper link → mail message does not correctly identify URI / URL / web address / hyper link containing right bracket ")", "]", or "}"

Comment 10

14 years ago
Adding URL from dupped bug that demonstrates problem.

Also, placing a "<" and a ">" around the URL fixes the problem, as long as the
URL is not wrapped to multiple lines.

Comment 11

14 years ago
There was some work done with bug 183111, which may affect this bug as Torben
said. RFC 2396 at one point has this to say about delimiters:


2.4.3. Excluded US-ASCII Characters

   Although they are disallowed within the URI syntax, we include here a
   description of those US-ASCII characters that have been excluded and
   the reasons for their exclusion.

   The control characters in the US-ASCII coded character set are not
   used within a URI, both because they are non-printable and because
   they are likely to be misinterpreted by some control mechanisms.

   control     = <US-ASCII coded characters 00-1F and 7F hexadecimal>

   The space character is excluded because significant spaces may
   disappear and insignificant spaces may be introduced when URI are
   transcribed or typeset or subjected to the treatment of word-
   processing programs.  Whitespace is also used to delimit URI in many
   contexts.

   space       = <US-ASCII coded character 20 hexadecimal>

   The angle-bracket "<" and ">" and double-quote (") characters are
   excluded because they are often used as the delimiters around URI in
   text documents and protocol fields.  The character "#" is excluded
   because it is used to delimit a URI from a fragment identifier in URI
   references (Section 4). The percent character "%" is excluded because
   it is used for the encoding of escaped characters.

   delims      = "<" | ">" | "#" | "%" | <">

   Other characters are excluded because gateways and other transport
   agents are known to sometimes modify such characters, or they are
   used as delimiters.

   unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"

   Data corresponding to excluded characters must be escaped in order to
   be properly represented within a URI.


The code that marks a url in a text does also recognize () and {} as delimiters,
maybe it can be made a little bit smarter.


Also there is the general problem with linebreaks, RFC 2396 is very clear about
that:

E. Recommendations for Delimiting URI in Context

   URI are often transmitted through formats that do not provide a clear
   context for their interpretation.  For example, there are many
   occasions when URI are included in plain text; examples include text
   sent in electronic mail, USENET news messages, and, most importantly,
   printed on paper.  In such cases, it is important to be able to
   delimit the URI from the rest of the text, and in particular from
   punctuation marks that might be mistaken for part of the URI.

   In practice, URI are delimited in a variety of ways, but usually
   within double-quotes "http://test.com/", angle brackets
   <http://test.com/>, or just using whitespace

                             http://test.com/

   These wrappers do not form part of the URI.

   In the case where a fragment identifier is associated with a URI
   reference, the fragment would be placed within the brackets as well
   (separated from the URI with a "#" character).

   In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
   need to be added to break long URI across lines. The whitespace
   should be ignored when extracting the URI.

   No whitespace should be introduced after a hyphen ("-") character.
   Because some typesetters and printers may (erroneously) introduce a
   hyphen at the end of line when breaking a line, the interpreter of a
   URI containing a line break immediately after a hyphen should ignore
   all unescaped whitespace around the line break, and should be aware
   that the hyphen may or may not actually be part of the URI.

   Using <> angle brackets around each URI is especially recommended as
   a delimiting style for URI that contain whitespace.

   The prefix "URL:" (with or without a trailing space) was recommended
   as a way to used to help distinguish a URL from other bracketed
   designators, although this is not common in practice.

   For robustness, software that accepts user-typed URI should attempt
   to recognize and strip both delimiters and embedded whitespace.

   For example, the text:

      Yes, Jim, I found it under "http://www.w3.org/Addressing/",
      but you can probably pick it up from <ftp://ds.internic.
      net/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
      ietf/uri/historical.html#WARNING>.

   contains the URI references

      http://www.w3.org/Addressing/
      ftp://ds.internic.net/rfc/
      http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING


Mozilla does not conform to the linebreak handling described here.

Comment 12

14 years ago
> Mozilla does not conform to the linebreak handling described here (comment 11)

For the record, that is bug 5351.

Comment 13

14 years ago
OK, so RFC3296 has certain rules governing what can be in a URI. However, may I
suggest that if a user receives an eMail that contains some "unwise" characters
in the URI, that Mozilla should be able to link to the complete URI.

Of course, sending an eMail with these "unwise" characters is a totally
different matter.

Comment 14

14 years ago
As I said, our detection code probably needs to be a little bit smarter, for
example: If the url does not start with a bracket then counting brackets and
stopping when the count reaches zero makes no sense as it seems to happen with
the second example, but that's just a guess.
(Assignee)

Comment 15

14 years ago
Thanks for ccing me Andreas. Most of the comments on this bug are actually wrong
or offtopic, so I am focussing on the problem mentioned in comment 9. Shortening
summary.

For the spec, see <http://www.bucksch.com/1/projects/mozilla/16507/>.

Andreas, what are you suggesting? My opinion is basically: if you're sending
freaky URLs like those with (), then wrap them in <>. If the sender didn't, fix
the sender (if possible) or copy the URL manually (happens rarely enough IMHO).
But, what would you do? IIRC, I didn't count spaces, I just stop at any bracket
when in freetext mode.
Summary: mail message does not correctly identify URI / URL / web address / hyper link containing right bracket ")", "]", or "}" → Freetext url recognition stops at right bracket ")", "]", or "}"
(Assignee)

Comment 16

14 years ago
narrowing summary even further, because } and ] are illegal in URLs and should
IMHO not be linked. It's a recognizer, after all, it's not expected to get it
right in all cases, esp. not the abuse ones.
Summary: Freetext url recognition stops at right bracket ")", "]", or "}" → Freetext url recognition stops at ")"
(Assignee)

Comment 17

14 years ago
Code is in mozTXTToHTMLConv::FindURLEnd
<http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#299>
(sorry for the spam).

Comment 18

14 years ago
Interesting... I have a couple of observations/info.

I am the one who posted the newsgroup message currently listed in the URL of
this bug. I obtained this URL as follows. Using Mozilla, goto microsoft.com
navigate to the search and select search knowledgebase by id. I typed in the id#
815411, the corresponding page is loaded and the URL in Mozilla location bar is
the one in the message. I then simply selected and copied it from my location
bar, and pasted it to my news post, to get:

http://support.microsoft.com/default.aspx?scid=kb;[LN];815411

- I find it interesting that when posting this to bugzilla it handles it 
"correctly" (i.e. it highlights the entire url). I pasted the above URI string
the same way.
- As one would expect, since Microsoft is the originator of above URL Outlook
also highlights the whole URL in the posting.

Comment 19

14 years ago
Perhapse there needs to be a "900lbs-gorilla" tag for bugs which highlight bugs
which are technically compliant with public specs but M$ ignores thus
promulgating bad behavior.

I leave it up to others to make the political/pragmatic decisions if the
prevalence needs to be accommodated or not.
Take a look at RFC 2396 (2.3):

mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

There is described that ´)' and ´'' are also allowed within an URI. But both
aren't working within MailNews.

Like the comment [1] tells me, it was forgotten to update and RFC2396E sounds
like a draft for me. 

Within the Browser you see this url [2] completly because Bugzilla creates a
hyperlink and so its shown correctly. Instead MailNews renders the URI for itself.

[1]
http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#308
[2] http://support.microsoft.com/default.aspx?scid=kb;[LN];815411

Comment 21

13 years ago
Bug 119963 is about the apostrophe case.
See also bug 218287.
Summary: Freetext url recognition stops at ")" → [mozTXTToHTMLConv] Freetext url recognition stops at ")"
(Assignee)

Comment 22

13 years ago
*** Bug 119963 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 23

13 years ago
I marked bug 119963, which is about apostrophes, a dup of this, because they are
so similar.

I am tending towards WONTFIX. This is a recognizer, designed to get most valid
cases right.

Cases outside the spec (e.g. "[" as in comment 18) generally are not supported.
Comment 18 asks to allow "[" in URLs, while bug 218287 asks *not* to support it
and gives a good example. This shows that the recognizer is just guesswork and a
judgement call.

"(" and apostrophe are allowed in URL per spec. But we're not in a situation
where we know for sure that it's supposed a URL, we need to *find* it in a bunch
of human language text (like http://example.com). This is why I excluded
brackets from the URL and probably apostrophe as well. For consistency and for
bug 218287, I should probably actually disallow both opening and closing
brackets everywhere in the URL.
Summary: [mozTXTToHTMLConv] Freetext url recognition stops at ")" → [mozTXTToHTMLConv] Freetext url recognition stops at ")" and "'" (apostrophe)

Comment 24

13 years ago
A quick point of clarification (since I added comment #18):

I am not advocating support for parsing characters in URI's such as the "[" if
it is outside the spec. 

I was simply adding contextual information on where I came across it in the
field and how from an end user perspective there "seemed to be" differing
behaviours between Mozilla, IE, an Bugzilla. 

If end users dont come across these non-standard cases very often in practice,
its probably not a big deal, and Mozilla should do the right thing.

Comment 25

13 years ago
I know this is polluting the menus, but perhaps on selecting the full URL by 
hand you could add 'open selection as link' on the context menu?  Would be 
less painful than the 'cut and paste and open a browser window and and 
and ...' solution that's required otherwise.

Comment 26

13 years ago
we should not do that. but i can today select text and drag it to a tab or
window if i want it treated as a url. if i'm using x11 i can probably middle
click on my selection to open it somewhere.

Comment 27

13 years ago
*** Bug 230972 has been marked as a duplicate of this bug. ***

Comment 28

13 years ago
*** Bug 248319 has been marked as a duplicate of this bug. ***

Comment 29

13 years ago
*** Bug 260714 has been marked as a duplicate of this bug. ***

Comment 30

13 years ago
I think that & (ampersant) and | (pipe) are included in standard but it still 
doesn't work with them. 
Product: Browser → Seamonkey

Comment 31

13 years ago
*** Bug 271374 has been marked as a duplicate of this bug. ***

Comment 32

13 years ago
Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
with their bretheren.

Updated

13 years ago
Component: MailNews: Main Mail Window → MailNews: Backend
Product: Mozilla Application Suite → Core

Comment 33

13 years ago
*** Bug 275183 has been marked as a duplicate of this bug. ***
*** Bug 236915 has been marked as a duplicate of this bug. ***

Comment 35

12 years ago
*** Bug 303590 has been marked as a duplicate of this bug. ***

Comment 36

12 years ago
Domino produces links like:
https://lsd.memach.com/ConditionReports.nsf/(lookup)/CC8CB6274C8EEBE385257062005E54A6?OpenDocument
and the user has to copy the link and paste it into the browser in order to view
the link. It would really be nice if the behavior of thunderbird would use the
entire link vice stopping at the letter before the ")".  I cant change the way
domino produces the links. Thanks for your efforts

Comment 37

12 years ago
(In reply to comment #32)
> Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
> with their bretheren.

What does this mean? That the bug is going to die for lack of attention or it is
going to be looked at and action taken. I was planning on deploying a bunch of
thunderbird apps but since we use domino for one of our main application and
domino insists on using ( and ) in the links I am stuck unless this is resolved.
I will have to find an alternative email program... How can I find out what is
being done. Do I need to look for an alternative? Thanks for all your efforts...

Lyman

Comment 38

12 years ago
(In reply to comment #37)
> (In reply to comment #32)
> > Moving a bunch of mozTXTToHTMLConv bugs into Core / MailNews Backend to live 
> > with their bretheren.
> 
> What does this mean? That the bug is going to die for lack of attention or it
> is going to be looked at and action taken. I was planning on deploying a bunch
> of thunderbird apps but since we use domino for one of our main application
> and domino insists on using ( and ) in the links I am stuck unless this is
> resolved.

Comment 23, written by the person who implemented URL recognition in the first 
place, gives the arguments *against* allowing parentheses in a recognized URL.
The main issue here is that someone might write a message such as
---
  ... and you can visit my homepage (http://homepage.somewhere.tld) to see.
---
In this case, the closing parenthesis should *not* be part of the URL.  Making 
the code smart enough to recognize parentheses in context is harder than it 
sounds -- particularly since there could be a legal URL which contains *only* a 
closing or opening parenthesis, e.g.
  http://paranoia.tld/kso03)cldke.html


You say you can't change how Domino works, but unless you can write some code 
for Mozilla (or convince/pay someone to do so for you), you can't change how TB 
works either.  Why not file a bug with them and see if they can change the URL 
generation to either enclose the URL within angle brackets, or escape the 
parentheses?  That change on their part would be much easier to implement than 
the change required on Mozilla's part.

Comment 39

12 years ago
Why isn't it simple enough to include a paren if there's no whitespace after it?

Comment 40

12 years ago
Both (http:example.com) and http://paranoia.tld/kso03)cldke.html are correctly
recognized here. I guess that Bugzilla's recognition is better than Thunderbird's :p

Comment 41

12 years ago
*** Bug 301363 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 42

12 years ago
Created attachment 194943 [details] [diff] [review]
Possible Patch, v1 (not sure, if good idea or WONTFIX)

In response to Lyman Byrd, I created a simple patch which does what I think is
the only thing I can do here: URL recognition won't stop at "(", ")" or "'" in
the middle of the URL, but will exclude ")" and "'", if they appear at the end.


The patch is completely untested (it does compile, though).

I am still not sure, if this bug is a good idea in the first place or should be
WONTFIXed. Please read my comments above before applying the patch.

And please do contact IBM/Lotus to fix their broken software - it's highly
unwise to use things like brackets and apostrophs in URLs, exactly because of
cases like these.
Assignee: sspitzer → ben.bucksch
Status: NEW → ASSIGNED
(Assignee)

Comment 43

12 years ago
Created attachment 194945 [details] [diff] [review]
Possible Patch, v2 (not sure, if good idea or WONTFIX)

Ops, last patch had an obvious error.

(Also: Remove "broken" from last paragraph of last comment. Rest stays.)
Attachment #194943 - Attachment is obsolete: true

Comment 44

12 years ago
*** Bug 313252 has been marked as a duplicate of this bug. ***

Comment 45

12 years ago
I assume this is the catch-all bug for automatic URL detection problems.  This URL is also segmented at a strange place:

http://www.modpython.org/FAQ/faqw.py?query=expat&querytype=simple&casefold=yes&req=search

Specifically, right before the "?".  Now, obviously "?" is a pretty common character to be in a URL, so this really shouldn't happen.  It is hard to imagine a regular sentence which is a question which ends with a URL... e.g.:

Have you looked at: http://www.example.com/?

In fact, if you wanted to catch that sort of thing, the logic should be fairly straight forward -- just see if there is any whitespace after the ?.  I think the same goes for the parenthesis examples.

I'm using Thunderbird 1.5beta1.
(Assignee)

Comment 46

12 years ago
> I assume this is the catch-all bug for automatic URL detection problems

It is most definitely not and has a specific summary.

> Specifically, right before the "?"

WFM, verbatim in the bugmail from your comment.

Comment 47

12 years ago
*** Bug 316428 has been marked as a duplicate of this bug. ***

Comment 48

12 years ago
*** Bug 326189 has been marked as a duplicate of this bug. ***

Comment 49

11 years ago
If ) brackets are not to be recognized as part of the URL, then why does this link exist?
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)

It cannot be copy/pasted or typed reliably with either Thunderbird, or reproduced with Firefox, because the final ) is rendered as text.

If ) closing parenthesis are not to be used in a link, then why has mozillazine.org used them?

The problem in Thunderbird 1.5 is even worse
none of the following show as links, all failing at the ) character in TB 1.5, yet they render in Firefox!

http://test(no.org/test(situation(
http://test(no.org/test(situation)
http://test(no.org/test)situation)
http://test)no.org/test(situation)

As read here in Firefox, items 2 thru 4 render incompletely on the LAST )
but in Thunderbird 1.5, the link breaks at the FIRST ) in each url.

Thunderbird 1.5 and Firefox render the links differently

Comment 50

11 years ago
It is clear that there are cases when a round bracket might be a legitimate part of a URL.  The fact of the matter is that USUALLY it won't be, so the auto-sensing URL checker should go for the most regular use case.  Basically, it should get it right *most* of the time.  This, in my experience, will mean ignoring closing brackets (of any sort), or punctuation *other than* a slash as the last character in the string.

Comment 51

11 years ago
(In reply to comment #50)

That is understood as the rational behind not rendering the LAST ) as part of a link. But even mozillazine uses such as links.

And, in Thunderbird 1.5 even a ) within the body of a link breaks it.
while in Firefox ONLY a trailing ) breaks it.

According to RFC http://www.faqs.org/rfcs/rfc1738.html
From section 2.2
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL. 

typing http://test)this.is.a.test
here in Firefox, and the link is NOT broken
try sending that in Thunderbird 1.5 and it comes out as
http://test )this.is.a.test (space added after test for display only)

The program is ignoring a valid character in a URL, simply because some users MIGHT use the character as a delimiter. Would it not be better to include the ) in links, and correct the improper use of such as a delimiter when it arises?
And once again, the problem is more profound in Thunderbird 1.5 which renders ANY ) as an invalid character (in a URL)

Comment 52

11 years ago
Just a doubt: Am I wrong or  with the rfc 3986 the chars ( ) and ' are  considered as Reserved Characters (par. 2.2) and so must be escaped?

(Assignee)

Comment 53

11 years ago
You are right that they are now "reserved" (they weren't in RFC2396). This means that they *must* be escaped when not used as delimiters, but *must not* be escaped when used in their origial purpose as delimiter. I'm not sure if the Lotus Notes use falls into the category of a proper use as delimiter.
(Assignee)

Comment 54

11 years ago
Re last paragraph in comment 51: No, this is a *recognizer* mainly targetted at text that *humans* write. This is something totally different from an HTTP header parser. "Go to Google (http://www.google.com)" is far more often in emails than the rare URLs which make use of () inside the URL. See my other comments above.

Comment 55

11 years ago
*** Bug 348012 has been marked as a duplicate of this bug. ***

Updated

11 years ago
OS: Windows NT → All
Hardware: PC → All

Comment 56

11 years ago
*** Bug 349531 has been marked as a duplicate of this bug. ***

Comment 57

11 years ago
The workaround to send a good clickable link is to replace ) with %29
Compare:
http://en.wikipedia.org/wiki/O_(Cyrillic)
http://en.wikipedia.org/wiki/O_(Cyrillic%29
http://en.wikipedia.org/wiki/O_%28Cyrillic%29

Does anyone know the % codes for ' and " ?

Comment 58

11 years ago
Another workaround is to use <> brackets
observe
http://en.wikipedia.org/wiki/O_(Cyrillic)
<http://en.wikipedia.org/wiki/O_(Cyrillic)>

Firefox doesnt have this problem, what code do they use? Workarounds are good but they detract from the fact that in this case Thunderbirds actions are non-standard. a ) is a legitimate character in urls, and one shouldnt have to use a 'workaround' to get it to work in Thunderbird

Comment 59

11 years ago
*** Bug 353013 has been marked as a duplicate of this bug. ***

Comment 60

11 years ago
xref bug 281321

Comment 61

11 years ago
*** Bug 356291 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 62

11 years ago
It's not clear what to do there. traditionally (even netscape 4), we assumed
that if you write "my company (http://www.foobar.com) does ...", you would
not want the () included in the URL. wikipedia unfortunately makes very
liberal use of that, which I think is very unwise. But it poses the question:
which case do we favor?

The only compromise I see is to look whether there's a ( after the URL start,
and if so, include the ) as well. I like that.

That doesn't solve the apostrophe proble, though, but I guess that's less
severe, we could just include it? It (only) breaks people using ' as quote.
(Assignee)

Comment 63

11 years ago
Comment on attachment 194945 [details] [diff] [review]
Possible Patch, v2 (not sure, if good idea or WONTFIX)

Example:
http://en.wikipedia.org/wiki/Bulb_(disambiguation) (very common in wikipedia)
http://en.wikipedia.org/wiki/Lemarchand's_box
Attachment #194945 - Attachment is obsolete: true
(Assignee)

Comment 64

11 years ago
OK, I want to get this fixed.
Severity: minor → normal
Priority: -- → P2

Comment 65

11 years ago
Its not only common in Wikipedia, Mozillazine uses it quite often as well i.e.
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)

The essence imho is; Thunderbird currently DOESNT follow web standards in that () are approved characters in urls.
Instead, it is 'fudging' the standards to meet developers expectations. Funny, when Microsoft does this, its propeity code and wrong, but here is Thunderbird doing the same exact thing.

Currently you have to tell people they CANNOT quote a legimate website
http://kb.mozillazine.org/Reducing_memory_usage_(Firefox)
because Thunderbird wont render it properly - even tho there is NOTHING wrong with the web site at all!

I would MUCH rather have Thunderbird properly render the () as authorized url characters which they in fact are, and then tell someone why HIS rendition of a url  (http://mozilla.com/html) doesnt work (the extra )). Simply tell the person NOT to use () to quote.

Comment 66

11 years ago
(In reply to comment #65)

When this bug (and so many related to it) was logged I would have agreed with you wholeheartedly, since RFC 2396 [Aug 1998] was in effect and "(" and ")" characters were permissible as data characters in URI's.

Four years later, though, Thunderbird's behaviour is now "accidentally compliant" with 2396's successor, RFC 3986 [Jan 2005], which moves "(" and ")" into sub-delims, meaning that they should be pct-encoded as %28 and %29 when appearing as data characters in URI's. Ref: http://www.ietf.org/rfc/rfc3986.txt

It seems that we should now be chasing the web masters of popular sites like Google Maps, Wikipedia and Mozillazine, to get them to fix their URI encoding. Or we could wait another four years and see what standards are in effect then. :)
(Assignee)

Comment 67

11 years ago
Guys, this is not about standards. They are the ground rule, but this is a fuzzy recognizer, see my comments above. Unless somebody finds a problem with my idea in comment 62 or has better ideas:

Please stop from making comments.
(Assignee)

Comment 68

11 years ago
Here's another idea (in addition to comment 62): bug 356657. Only helps, if sender used Thunderbird/Seamonkey.

Comment 69

11 years ago
I frequently receive correspondance from companies which include ( and ) as part of their URL.  I had a survey which I filled out this past week for a company wanting to know how I felt about their customer service, and I was directed to the below URL, and whether this is wise from a standards perspective seems to be moot when compared to the fact that the links do not get properly interpreted as would be reasonably expected by a human.  I should add, though, that there was a bunch of other junk added to allow the browser to interpret it as a possible HTML link but since I display in plain text, I was unable to click without a bunch of cut and paste.  (I just tested this with the message in question, and it was never intended to be an HTML e-mail, so perhaps the sender just tried to get fancy or something and Thunderbird didn't know what to properly do with it?  Should Thunderbird even try to interpret such thing?)  The link I received was as follows:  

http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use
r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10

Within its context, though, it actually looked like this mess:

&lt;br&gt;&lt;br&gt;&lt;a 
href="http://surveys.benchmarkportal.com/ebay/survey.taf?survey_id=1283&amp;use
r_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5AD10"&gt;http://surveys.benchmarkportal.c
om/ebay/survey.taf?survey_id=1283&amp;user_id=19B0E8D6-9BFE-4FB7-9996-90BF83F5A
D10&lt;/a&gt;&lt;br&gt;&lt;br&gt;

The URL was spread across multiple lines and even with the quotes around the URL, it was misinterpreted by thunderbird.  I think it is PROBABLY safe to suggest that anything beginning with a quotation mark followed by the string http:// is a URL which should be interpreted as a URL until the next occurrence of a quotation mark, and one can probably make the same assumption with other network protocols like ftp:// or https:// for example.  (Whether this is easy to code for or not, I don't know.)  What I was able to click was only everything up until a line break occurred in Thunderbird.

Updated

11 years ago
Duplicate of this bug: 367050

Comment 71

11 years ago
Created attachment 251951 [details] [diff] [review]
proposed fix

This fixes it for all cases I have found.
Attachment #251951 - Flags: review?(ben.bucksch)

Comment 72

11 years ago
Nit: separate keywords from parentheses with a space:
   while (...) ...;   if (....) ...;
        ^               ^

You could perform the same matching-parenthesis test at the end for matching apostrophe.  I don't think this is common, so maybe it's not worth testing for, but it's possible someone would put in the URL as:
   'http://mozilla.org'
(Assignee)

Comment 73

11 years ago
Comment on attachment 251951 [details] [diff] [review]
proposed fix

Thanks for the patch, but the code in the |if (state[check] == endok)| doesn't fit the design of the code. Any reason not to put it in the code you commented as "back up a bit"?

No reason to change the loop from for to while. And it *is* iterating.

Comments are not entirely correct, but comments would make sense, yes.

Unless you have a reason why the above proposal won't work, I'll attach a patch for that.
Attachment #251951 - Flags: review?(ben.bucksch) → review-
(Assignee)

Comment 74

11 years ago
> Any reason not to put it in the code you commented as "back up a bit"?

Nevermind, I realized it. However, your code - apart from not being in FindURL where is belongs - would not recognize (for example http://www.yahoo.com) correctly, it only looks at the char before the URL start. See comment 62.
(Assignee)

Comment 75

11 years ago
Created attachment 251988 [details] [diff] [review]
Fix, v4, broken

Here's what I mean:
- Changes for loop to be more understandable
- Allows ' in middle of URL, but not at end
- Allowes ) in middle of URL, but not at end, unless there was a ( earlier

However, it doesn't work as intended. I don't see why. Magnus, when looking at the code, can you see what's wrong?
Attachment #251951 - Attachment is obsolete: true
Attachment #251988 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 76

11 years ago
> However, it doesn't work as intended. I don't see why.

2 reasons:
- should be "&& !haveOpeningBracket" (forgot the ! )
- I stupidly built static, so my changes were not picked up. rebuilding.
(Assignee)

Updated

11 years ago
Attachment #251988 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 77

11 years ago
Created attachment 251997 [details] [diff] [review]
Fix, v5

Yup, that was it. Fix works as intended. I couldn't find anything it breaks either.
Magnus, please review.
Attachment #251988 - Attachment is obsolete: true
Attachment #251997 - Flags: review?(mkmelin+mozilla)
(Assignee)

Updated

11 years ago
Attachment #251997 - Attachment is patch: true
Attachment #251997 - Attachment mime type: text/x-patch → text/plain

Comment 78

11 years ago
I'll be away for a couple of days, so I wont be able to try it out until monday. Looks like it could work though. Assuming it works, it sure would be nice to get this in for tb2.0...

Do we want to fix the 'http://mozilla.org' case too? Don't know how usual that is, but it would be pretty straight forward fix.
(Assignee)

Comment 79

11 years ago
Created attachment 252040 [details] [diff] [review]
Fix, v6. Alternative to v5: ) only after (

With the patch v5, http://test)foo is recognized. This patch is an alternative, allowing ) only after a (. I.e. http://test_(Comp) and http://test(lookup)bla are recognized, but http://text)foo is not.

> Do we want to fix the 'http://mozilla.org'

Works before and after the patch (both versions).
Attachment #252040 - Flags: review?(mkmelin+mozilla)
(Assignee)

Updated

11 years ago
Attachment #252040 - Attachment is patch: true
Attachment #252040 - Attachment mime type: text/x-patch → text/plain
(Assignee)

Updated

11 years ago
Attachment #251997 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 80

11 years ago
Created attachment 252193 [details]
My old test folders (ZIP with mbox)

Here are my old test mails. Quite a lot. For thorough testing, they should be checked with all relevant pref combinations.
(Assignee)

Comment 81

11 years ago
Created attachment 252195 [details]
Test mail

Ops, the last attachment was actually intended for another bug, but may be useful here as well.
Here's a mail with testcases specifically for the brackets and similar. Clicking on it should open it in the browser and render it with the browser's libmime, if available. You can also copy the file to LocalFolders and open it in Thunderbird.
(Assignee)

Comment 82

11 years ago
Created attachment 252196 [details]
Test mail screenshot (with patch v6)
(Assignee)

Comment 83

11 years ago
Comment on attachment 251997 [details] [diff] [review]
Fix, v5

Yes, I like v6 better
Attachment #251997 - Attachment is obsolete: true

Comment 84

11 years ago
Comment on attachment 252040 [details] [diff] [review]
Fix, v6. Alternative to v5: ) only after (

Seems good except for one case I happend to stumble upon. In a bug it said: per foo@bar.com's suggestion. For emails apostrophes should mark the end, no? And hypothetical cases like foo@(bar).com probably shouldn't be links either.

I'm not a reviewer, but with that fixed, seems ok to me.

Comment 85

10 years ago
Created attachment 253854 [details] [diff] [review]
proposed fix (v6 + thighter check for mail addresses)

This is fix v6 + don't allow ' and ( in email address recognition.
Attachment #252040 - Attachment is obsolete: true
Attachment #253854 - Flags: review?(ben.bucksch)
Attachment #252040 - Flags: review?(mkmelin+mozilla)

Comment 86

10 years ago
Ben: have you had time to look at this?
(Assignee)

Comment 87

10 years ago
Created attachment 254729 [details] [diff] [review]
Fix, v8

Yes. The 2-line check you added was in the wrong place (there's a similar check later, should be added there). I just fixed that.

I'll also disallow [ and {. Currently, ] and } are disallowed (reasons see above), so including the opening ones doesn't make any sense whatsoever.
Attachment #253854 - Attachment is obsolete: true
Attachment #254729 - Flags: review?(mscott)
Attachment #253854 - Flags: review?(ben.bucksch)
(Assignee)

Comment 88

10 years ago
Created attachment 254730 [details]
Test mail
Attachment #252195 - Attachment is obsolete: true

Updated

10 years ago
Attachment #254729 - Flags: review?(mscott) → review+
(Assignee)

Updated

10 years ago
Attachment #254729 - Flags: superreview?(mscott)
(Assignee)

Updated

10 years ago
Attachment #254729 - Flags: superreview?(mscott) → superreview?(bienvenu)
(Assignee)

Comment 89

10 years ago
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

ops, this is netwerk, so I guess I need a netwerker sr. trying biesi.
Attachment #254729 - Flags: superreview?(bienvenu) → superreview?(cbiesinger)
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

sr=biesi
Attachment #254729 - Flags: superreview?(cbiesinger) → superreview+
(Assignee)

Comment 91

10 years ago
Fix v8 checked into trunk.

Will bake on trunk for a few days, then ask for permission for branch.
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Assignee)

Comment 92

10 years ago
Can somebody QA this on trunk, please, so that we have a chance to get it into TB2.0? For expected results, see screenshot in comment 82.
Attachment #254730 - Attachment mime type: message/rfc822 → text/plain
Created attachment 256518 [details]
Screenshot with patch v8 on  trunk

Tested with Thunderbird Version 3 alpha 1 (20070226) on Windows. Ben, have a look at the screenshot if the output is that what we want to have.
(Assignee)

Comment 94

10 years ago
Comment on attachment 254729 [details] [diff] [review]
Fix, v8

Thanks Henrik. Yes, that output is as expected.

Requesting TB2 branch approval.
Attachment #254729 - Flags: approval-thunderbird2?
Attachment #256518 - Attachment description: Screenshot with patch v6 on fixed trunk → Screenshot with patch v8 on trunk

Updated

10 years ago
Attachment #254729 - Flags: approval-thunderbird2? → approval-thunderbird2+
Whiteboard: [checkin needed]

Updated

10 years ago
QA Contact: olgam → backend
Whiteboard: [checkin needed]
(Assignee)

Comment 95

10 years ago
checked into branch.
Keywords: fixed1.8.1.3

Comment 96

10 years ago
WFM with Tb 2.0pre 2007030604. Good job guys! - especially as this bug was about to get Wontfix'ed 3 years ago :)

Comment 97

10 years ago
^ wrong clipboard content: the build id is 2007030703 (Lin)
the checkin to branch
http://bonsai.mozilla.org/cvsview2.cgi?file=mozTXTToHTMLConv.cpp&branch=1.77.18.2&subdir=/mozilla/netwerk/streamconv/converters&command=DIFF_FRAMESET&rev1=1.77.18.1&rev2=1.77.18.2

looks quite a bit different from the checkin to trunk
http://bonsai.mozilla.org/cvsview2.cgi?file=mozTXTToHTMLConv.cpp&branch=1.85&subdir=/mozilla/netwerk/streamconv/converters&command=DIFF_FRAMESET&rev1=1.83&rev2=1.84

is that intentional?
Depends on: 390779
(Assignee)

Comment 99

10 years ago
No, IIRC that was not intentional.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 100

10 years ago
Branch apparently got an older version of the fix. Trunk seems OK.
I don't know what the branch rules are. Are correctness fixes allowed or only critical and security fixes?
If not allowed, I'll mark the bug fixed again.
Ben, if trunk is fine we could leave it as fixed. Due to the wrong patch checked into 1.8.1.3 we should ask for approval 1.8.1.7. Could you attach the patch which corrects the mistake?
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Flags: blocking1.8.1.7?
Keywords: fixed1.8.1.3
Resolution: --- → FIXED
(Assignee)

Comment 102

10 years ago
Bug 390779 comment 8 is the result of the bad branch check-in.
(Assignee)

Comment 103

10 years ago
Created attachment 275242 [details] [diff] [review]
Branch fixup, v1

This is the diff between branch and trunk on my disk, only the relevant part for this bug.
I have *not tested* it yet, will do later, or somebody else can if you want to.
(Assignee)

Updated

10 years ago
Duplicate of this bug: 390779
No longer depends on: 390779
(Assignee)

Comment 105

10 years ago
Created attachment 275311 [details]
Branch fixup screenshot
(Assignee)

Updated

10 years ago
Attachment #275242 - Flags: approval-thunderbird2?
(Assignee)

Comment 106

10 years ago
Tested Branch fixup, v1 with current branch, it indeed brings it on par with trunk, see screenshot.
mscott: please approve. This is to fix bug 390779.
Comment on attachment 275242 [details] [diff] [review]
Branch fixup, v1

>--- 2.0-branch/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp	2007-01-21 04:53:17.806002000 +0100
>+++ trunk/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp	2007-06-17 13:27:56.999825000 +0200

Dunno if we need a real diff against CVS 1.8 branch here. You only supplied your local diff between trunk and 1.8 branch.

Adjusting approval flag to 1.8.1.7. No review needed?
Attachment #275242 - Flags: approval-thunderbird2? → approval1.8.1.7?
(Assignee)

Comment 108

10 years ago
Created attachment 275313 [details] [diff] [review]
Branch fixup, v1

Same thing, but if you insist on it.
The patch already had review and checkin-approval for branch. I simply checked in the wrong patch.
Attachment #275242 - Attachment is obsolete: true
Attachment #275313 - Flags: approval1.8.1.7?
Attachment #275242 - Flags: approval1.8.1.7?
Flags: blocking1.8.1.7? → blocking1.8.1.7+
Comment on attachment 275313 [details] [diff] [review]
Branch fixup, v1

approved for 1.8.1.7, a=dveditz for release-drivers
Attachment #275313 - Flags: approval1.8.1.7? → approval1.8.1.7+
Keywords: checkin-needed
Ben, can you check-in attachment 275313 [details] [diff] [review] on MOZILLA_1_8_BRANCH, please?
(Assignee)

Comment 111

10 years ago
Checked in on branch.
(Assignee)

Updated

10 years ago
Keywords: checkin-needed
Keywords: fixed1.8.0.7
Target Milestone: --- → mozilla1.9alpha6
Keywords: fixed1.8.1.8
hskupin: Ben checked this into the 1.8 (.1) branch, not the 1.8.0 branch. Also, please do not change the target milestone unless it's your bug, that's for developer's use. But I'm not sure what you were getting at anyway since this was apparently already fixed on the trunk before you set it.
Keywords: fixed1.8.0.7
Target Milestone: mozilla1.9alpha6 → ---
Daniel, sorry for the fixed1.8.0.7 keyword. Yes, I ment fixed1.8.1.7. Something got wrong. I set the target milestone after the bug was fixed and which should give a hint for which version this issue is resolved. I don't think that this is inconvenient because Ben forgot that, isn't it? At least it will give us better results when doing QA.
Is the target milestone really used that way? Ok, guess that's all right if the bug's actually fixed.
Target Milestone: --- → mozilla1.9alpha6

Comment 115

10 years ago
Verified in nightly version 2.0.0.7pre (20071010) on Mac OS 10.4.9.  All these cases work great:

http://www.finaid.vt.edu/flash/Mar02flash.pdf
http://filebox.vt.edu/users/jeisinge/frisbee/09%20(at%2013h07m02s).JPG,
http://www.mozilla.org'u
http://www.mozilla.org)u
http://www.mozilla.org()
http://www.mozilla.org()u
http://www.mozilla.org(u)u
http://www.mozilla.org)u(
http://www.mozilla.org[]
http://www.mozilla.org{} 
Status: RESOLVED → VERIFIED
Keywords: fixed1.8.1.8 → verified1.8.1.8
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.