878752 - www.xxx can't be a valid URL

Reporter

Description

•

11 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130602 Firefox/24.0 (Nightly/Aurora)
Build ID: 20130602031240

Steps to reproduce:

read a mail with an html signature:

Med venlig hilsen,<br>
<b>www.aarhus<span style="color:red;">INDOOR</span>golf.dk</b>



Actual results:

Thunderbird takes text out of context and constructs an invalid url. It thinks "www.aarhus" is a url and ruins my signature.


Expected results:

no url parsing of signatures. if people want links in there i'm sure they'll write the whole <a href="">somewhere</a>.

Dennis Jakobsen

Reporter

Updated

•

11 years ago

Component: Untriaged → Mail Window Front End

Version: 18 → 23

Thomas D. (:thomas8)

Comment 1

•

11 years ago

xref bug 892406 requests fixing the parsing algorithm.

I'd agree with Dennis that parsing of plaintext URLs and converting them into real links <a href="...">url</a>, while potentially helpful for many users, is still an alteration of users content which at least should have a pref to switch off (ux-wysiwyg).

Dennis Jakobsen

Reporter

Comment 2

•

11 years ago

Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

We just ordered something from our firm and i was cc'ed on my gmail account.

[edited mail]
Skal sendes til:
Dennis Jakobsen
Hougårdsvej 36, 3.TH,
8220 Brabrand
[/edited mail]

Guess what Thunderbird turned into a URL? so useful!

Dennis Jakobsen

Reporter

Updated

•

11 years ago

Version: 23 → 24

Thomas D. (:thomas8)

Comment 3

•

11 years ago

(In reply to Dennis Jakobsen from comment #2)

Hi Dennis,

thanks for providing scenarios/testcases which illustrate the problem of this bug.

> [edited mail]
> Skal sendes til:
> Dennis Jakobsen
> Hougårdsvej 36, 3.TH,
> 8220 Brabrand
> [/edited mail]
> Guess what Thunderbird turned into a URL? so useful!

Guessing generally isn't helpful for solving bugs, but I'll have a lucky guess anyway pls confirm if I'm right ;)

STR

user composes plaintext which happens to contain pattern of foo.bar (plaintext), where foo.bar is NOT an URL (and for human parsing, would never occur to be one both from context and pattern, as in above example, where it's part of a physical address line, "Hougårdsvej 36, 3.TH").

Actual result

- during composition, everything looks fine as entered by user (plaintext)
- but when sending, without notice or prior consent of user, TB converts plaintext "foo.bar" into linkified <a href="http://foo.bar">foo.bar</a> (probably with some moz-markers which I've skipped here), so e.g. "Hougårdsvej 36, <a href="http://3.TH">3.TH</a>"
- recipient receives link on foo.bar which is non-sensical and non-functional.

Expected result

- find ways of avoiding false positives in automatic link recognition and/or
- find ways of giving users more control over link recognition
=> UI wanted

So Dennis, what's a bug for you (and I agree) is a feature for others (who are happy when their normal plain links get linkified), and the challenge here is how exactly to improve the UI/behaviour so that it's less error-prone and more controllable, but preferably without removing the helpful intention of the feature altogether.

Then, after developing such UI/behaviour, we'll have the next challenge of finding a volunteer to code this (can you do it?), and the more complexity, the harder it gets.
Your input on this will be welcome.

Some general ideas for improvement of UI/behaviour:
a) skip signatures in post-composition linkification (when user defines the signature, he could/should really make the call about links himself?)
b) implement & expose pref to turn off post-composition linkification
c) ideally (but complex), linkify *during* composition and provide ways of per-instance user control

Just ideas which need more thought and detail.
a) looks like an easy one if we agree on it.

Keywords: uiwanted

Thomas D. (:thomas8)

Comment 4

•

11 years ago

(In reply to Thomas D. (away till 23rd Oct) from comment #3)

Oh, Dennis, you're looking at message *reader*, right?
So this is only about pre-parsing of *received* messages?

In that case, much of my comment 3 might not apply because I was looking at *composition* and sending.

Wayne Mery (:wsmwk)

Comment 5

•

9 years ago

I don't see how we could be expected to render 
  <b>www.aarhus<span style="color:red;">INDOOR</span>golf.dk</b>
as a valid url

ref:
http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid
https://tools.ietf.org/html/rfc3986

Status: UNCONFIRMED → RESOLVED

Closed: 9 years ago

Component: Mail Window Front End → Message Reader UI

Resolution: --- → INVALID

Dennis Jakobsen

Reporter

Comment 6

•

9 years ago

@Wayne, that was my entire point. Thunderbird should stop constructing invalid URL's!! I never wanted the reader to screw up url's!

Basically you just confirmed there is an issue and you close the bug?

How about you make the url detection a little more strict? You could start by respecting that foo.bar is just a hostname, not neccessarily an http url. If i get a mail that says, hey buddy, see this ftp site for the docs you wanted: foo.bar ..what use is a clickable http:// url?

You could keep a list of valid top level domains to prevent grammer issues and typo's like "hi there.Did you have good time yesterday?"

You could also require a full www.something.com or require http://foo.bar if it's only the domain-name.topleveldomain.

You could also prevent text from being turned into a url when it's immediately followed by an html element, like in my signature.

As you can see, a lot can be done to make sure it doesn't just turn any non-url into a URL.

Wayne Mery (:wsmwk)

Comment 7

•

9 years ago

FWIW I misread comment 0, but I beleive this is still invalid.

Having *some* html in the signature is a red herring, and I know that's part of your point. You make some interesting suggestions - but I fear those are all jury rigs. 

Thunderbird doesn't care, correctly, I suggest, that www.aarhus is "surrounded" by html. Where there is "valid text", TB will try to make a link if there is reasonable expectaiton that the string is a URL.  www.aarhus is such a string.  1. It is a valid URL according to spec - there is no requirement in the spec that any characters, of any specific form, must follow www.aarhus. 2. The fact that no such site exists is of secondary consequence.

Further, these days top level domain can be almost anything - 810 according to http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains - and it will change frequently over time.

Note:
1. gmail also renders www.aarhus as a link
2. www. is (correctly) not rendered as a link

If you want any abitrary email client to show that "string" as you want, you'll need to make an image out of it.

Component: Message Reader UI → MIME

Keywords: uiwanted

Product: Thunderbird → MailNews Core

Wayne Mery (:wsmwk)

Comment 8

•

9 years ago

That said, there may be room for improvement for the general case of www.foo, because we know that something starting with www must have 2 additional subjstrings to be a working URL.  

Does this make sense, and can it be generalized as a requirement to mozTXTToHTMLConv?  

IS there a match amongst https://bugzilla.mozilla.org/buglist.cgi?f1=short_desc&list_id=12008542&short_desc=url%20www&bug_severity=major&bug_severity=normal&bug_severity=minor&bug_severity=enhancement&o1=nowordssubstr&resolution=---&resolution=INVALID&resolution=WONTFIX&classification=Client%20Software&classification=Components&query_format=advanced&f2=OP&short_desc_type=anywordssubstr&longdesc=www%20url%20&component=Networking&longdesc_type=allwordssubstr&product=Core ?

Severity: normal → minor

Status: RESOLVED → REOPENED

Ever confirmed: true

Flags: needinfo?(bugzilla2007)

Flags: needinfo?(Pidgeot18)

Resolution: INVALID → ---

Summary: mail readers url interpreter is bad → www.xxx can't be a valid URL

Magnus Melin [:mkmelin]

Comment 9

•

9 years ago

I think that makes sense. The url can technically exists (on a local level) but that's just in theory but not useful in general.

Flags: needinfo?(bugzilla2007)

Flags: needinfo?(Pidgeot18)

BMO Automation

Updated

•

2 years ago

Severity: minor → S4

Bugzilla

Quick Search

www.xxx can't be a valid URL

Categories

(MailNews Core :: MIME, defect)

Tracking

(Not tracked)

People

(Reporter: brunis, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated