Open Bug 878752 Opened 11 years ago Updated 2 years ago

www.xxx can't be a valid URL

Categories

(MailNews Core :: MIME, defect)

x86_64
Windows 7
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: brunis, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130602 Firefox/24.0 (Nightly/Aurora)
Build ID: 20130602031240

Steps to reproduce:

read a mail with an html signature:

Med venlig hilsen,<br>
<b>www.aarhus<span style="color:red;">INDOOR</span>golf.dk</b>



Actual results:

Thunderbird takes text out of context and constructs an invalid url. It thinks "www.aarhus" is a url and ruins my signature.


Expected results:

no url parsing of signatures. if people want links in there i'm sure they'll write the whole <a href="">somewhere</a>.
Component: Untriaged → Mail Window Front End
Version: 18 → 23
xref bug 892406 requests fixing the parsing algorithm.

I'd agree with Dennis that parsing of plaintext URLs and converting them into real links <a href="...">url</a>, while potentially helpful for many users, is still an alteration of users content which at least should have a pref to switch off (ux-wysiwyg).
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

We just ordered something from our firm and i was cc'ed on my gmail account.

[edited mail]
Skal sendes til:
Dennis Jakobsen
Hougårdsvej 36, 3.TH,
8220 Brabrand
[/edited mail]

Guess what Thunderbird turned into a URL? so useful!
Version: 23 → 24
(In reply to Dennis Jakobsen from comment #2)

Hi Dennis,

thanks for providing scenarios/testcases which illustrate the problem of this bug.

> [edited mail]
> Skal sendes til:
> Dennis Jakobsen
> Hougårdsvej 36, 3.TH,
> 8220 Brabrand
> [/edited mail]
> Guess what Thunderbird turned into a URL? so useful!

Guessing generally isn't helpful for solving bugs, but I'll have a lucky guess anyway pls confirm if I'm right ;)

STR

user composes plaintext which happens to contain pattern of foo.bar (plaintext), where foo.bar is NOT an URL (and for human parsing, would never occur to be one both from context and pattern, as in above example, where it's part of a physical address line, "Hougårdsvej 36, 3.TH").

Actual result

- during composition, everything looks fine as entered by user (plaintext)
- but when sending, without notice or prior consent of user, TB converts plaintext "foo.bar" into linkified <a href="http://foo.bar">foo.bar</a> (probably with some moz-markers which I've skipped here), so e.g. "Hougårdsvej 36, <a href="http://3.TH">3.TH</a>"
- recipient receives link on foo.bar which is non-sensical and non-functional.

Expected result

- find ways of avoiding false positives in automatic link recognition and/or
- find ways of giving users more control over link recognition
=> UI wanted

So Dennis, what's a bug for you (and I agree) is a feature for others (who are happy when their normal plain links get linkified), and the challenge here is how exactly to improve the UI/behaviour so that it's less error-prone and more controllable, but preferably without removing the helpful intention of the feature altogether.

Then, after developing such UI/behaviour, we'll have the next challenge of finding a volunteer to code this (can you do it?), and the more complexity, the harder it gets.
Your input on this will be welcome.

Some general ideas for improvement of UI/behaviour:
a) skip signatures in post-composition linkification (when user defines the signature, he could/should really make the call about links himself?)
b) implement & expose pref to turn off post-composition linkification
c) ideally (but complex), linkify *during* composition and provide ways of per-instance user control

Just ideas which need more thought and detail.
a) looks like an easy one if we agree on it.
Keywords: uiwanted
(In reply to Thomas D. (away till 23rd Oct) from comment #3)

Oh, Dennis, you're looking at message *reader*, right?
So this is only about pre-parsing of *received* messages?

In that case, much of my comment 3 might not apply because I was looking at *composition* and sending.
I don't see how we could be expected to render 
  <b>www.aarhus<span style="color:red;">INDOOR</span>golf.dk</b>
as a valid url

ref:
http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid
https://tools.ietf.org/html/rfc3986
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Component: Mail Window Front End → Message Reader UI
Resolution: --- → INVALID
@Wayne, that was my entire point. Thunderbird should stop constructing invalid URL's!! I never wanted the reader to screw up url's!

Basically you just confirmed there is an issue and you close the bug?

How about you make the url detection a little more strict? You could start by respecting that foo.bar is just a hostname, not neccessarily an http url. If i get a mail that says, hey buddy, see this ftp site for the docs you wanted: foo.bar ..what use is a clickable http:// url?

You could keep a list of valid top level domains to prevent grammer issues and typo's like "hi there.Did you have good time yesterday?"

You could also require a full www.something.com or require http://foo.bar if it's only the domain-name.topleveldomain.

You could also prevent text from being turned into a url when it's immediately followed by an html element, like in my signature.

As you can see, a lot can be done to make sure it doesn't just turn any non-url into a URL.
FWIW I misread comment 0, but I beleive this is still invalid.

Having *some* html in the signature is a red herring, and I know that's part of your point. You make some interesting suggestions - but I fear those are all jury rigs. 

Thunderbird doesn't care, correctly, I suggest, that www.aarhus is "surrounded" by html. Where there is "valid text", TB will try to make a link if there is reasonable expectaiton that the string is a URL.  www.aarhus is such a string.  1. It is a valid URL according to spec - there is no requirement in the spec that any characters, of any specific form, must follow www.aarhus. 2. The fact that no such site exists is of secondary consequence.

Further, these days top level domain can be almost anything - 810 according to http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains - and it will change frequently over time.

Note:
1. gmail also renders www.aarhus as a link
2. www. is (correctly) not rendered as a link

If you want any abitrary email client to show that "string" as you want, you'll need to make an image out of it.
Component: Message Reader UI → MIME
Keywords: uiwanted
Product: Thunderbird → MailNews Core
That said, there may be room for improvement for the general case of www.foo, because we know that something starting with www must have 2 additional subjstrings to be a working URL.  

Does this make sense, and can it be generalized as a requirement to mozTXTToHTMLConv?  

IS there a match amongst https://bugzilla.mozilla.org/buglist.cgi?f1=short_desc&list_id=12008542&short_desc=url%20www&bug_severity=major&bug_severity=normal&bug_severity=minor&bug_severity=enhancement&o1=nowordssubstr&resolution=---&resolution=INVALID&resolution=WONTFIX&classification=Client%20Software&classification=Components&query_format=advanced&f2=OP&short_desc_type=anywordssubstr&longdesc=www%20url%20&component=Networking&longdesc_type=allwordssubstr&product=Core ?
Severity: normal → minor
Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(bugzilla2007)
Flags: needinfo?(Pidgeot18)
Resolution: INVALID → ---
Summary: mail readers url interpreter is bad → www.xxx can't be a valid URL
I think that makes sense. The url can technically exists (on a local level) but that's just in theory but not useful in general.
Flags: needinfo?(bugzilla2007)
Flags: needinfo?(Pidgeot18)
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.