Open Bug 1379205 Opened 7 years ago Updated 2 years ago

Exclamation points at the end of URLs are not linkified

Categories

(Core :: Networking, defect, P5)

52 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: Pascal, Unassigned)

Details

(Whiteboard: [necko-would-take])

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

Steps to reproduce:

I clicked on a url in a received email that ended in two exclamation points ('!!')


Actual results:

The url did not work because Thunderbird did not include the exclamation points as part of the link.


Expected results:

Thunderbird needs to include as part of a link all characters that are valid in a url.

RFC 1738 Section 2.2 states, in pertinent part: alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

This appears to be a regression.  https://bugzilla.mozilla.org/show_bug.cgi?id=435836 states this problem existed in v2 but was fixed in v3.

Of the above list, in addition to ! the following characters also appear to have this problem: -.'),
That's not easy to fix because it is ambiguous whether the !., still belongs to the URL or to the surrounding text when the URL is not properly surrounded by spaces or enclosed in <…> or something like that.

IMHO it is much more common that the !., belongs to the surrounding text, e.g. look at this cool URL http://www.mozilla.org/firefox, I've found today, it is far better than https://www.microsoft.com/de-de/windows/microsoft-edge!

:-(

Lars R.
In bug 1274242 I "fixed" this problem for "|" (although that was hotly discussed later).

A patch would likely look very similar. Looks like BMO doesn't linkify the ! as can be seen in the link in comment #1.
Component: Message Reader UI → Networking
Product: Thunderbird → Core
(In reply to Jorg K (GMT+2) from comment #2)
> A patch would likely look very similar.
Or maybe not since the ! isn't listed as terminator character which could simply be removed.
(In reply to Lars Rohwedder from comment #1)
> That's not easy to fix because it is ambiguous whether the !., still belongs
> to the URL or to the surrounding text when the URL is not properly
> surrounded by spaces or enclosed in <…> or something like that.
Perhaps, upon an ambiguous URL being clicked, TB could send an HTTP HEAD for each of the possibilities and send the user to the best reply?  I've never thought about "best" in that context before, but it looks like it could be as simple as the lowest HTTP status code.  If no replies are received within a short interval, default to the current behavior.
(In reply to Pascal from comment #4)
> I've never thought about "best" in that context before, but it looks like it
> could be as simple as the lowest HTTP status code.
Break ties with longest Content-Length, or longest URL if equal or no Content-Length.
I'd say if the URL is ambiguous, TB should offer a list of possible URLs. Maybe this list might be pre-checked wich HTTP HEAD and all 404 replies can already be removed from that list.
(In reply to Lars Rohwedder from comment #6)
> Maybe this list might be pre-checked wich HTTP HEAD and all 404 replies can
> already be removed from that list.
404 replies may be all that you receive.

The current implementation breaks valid URLs in favor of people that do not understand you cannot add characters to the end of URLs and still expect them to work.  This irks me on principal.  Unfortunately, I agree that it probably handles the situation correctly the majority of the time.

A table listing all the possible URLs and the results of an HTTP HEAD on each would be nice, but is probably pie in the sky and may even confuse more users than it helped.

I wonder what prompted this regression.  If intentional, there may be a corner case documented somewhere that any further changes need to account for.

I don't think it would be controversial to say that if a URL is not part of a sentence then any punctuation at the end of it should be included.  Unfortunately, email clients are required to insert newlines to limit line length (see RFC 5322 Section 2.1.1) so just being on their own line is not sufficient to prove a URL is not in the middle of a sentence.

The URL I ran into was its own paragraph in the middle of the email.  For the example given in Bug 435836, the URL was the entire message.

Can we agree that a URL that is its own paragraph (preceded by start of message or two newlines and followed by two newlines or end of message) should have attached punctuation included?

It would be nice if the above was also able to accommodate a list of URLs, one per line, but I can see someone creating a comma separated list of URLs inside of a sentence (newlines being added automatically due to line length).  Perhaps the above could be expanded to one or more URLs that are their own paragraph?  So a list of URLs, either space or newline separated, with no text in the paragraph other than those URLs, are assumed to have no extraneous punctuation.
Whiteboard: [necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.