Closed Bug 1176280 Opened 5 years ago Closed 5 years ago

Links should be clickable in the text entries area

Categories

(Hello (Loop) :: Client, defect, P1)

defect
Points:
5

Tracking

(firefox41 fixed, firefox42 fixed)

RESOLVED FIXED
mozilla42
Iteration:
42.3 - Aug 10
Tracking Status
firefox41 --- fixed
firefox42 --- fixed

People

(Reporter: standard8, Assigned: dmose)

References

Details

(Whiteboard: [chat])

Attachments

(3 files, 5 obsolete files)

Currently, if the user pastes in a link to the text entry area, then it isn't shown as clickable.

It would be nice if we could fix that in some manner.
Flags: firefox-backlog+
Rank: 13
Priority: -- → P1
Assignee: nobody → dmose
This turns out to be a little more interesting than one might think.  The common way to do it is to use a regexp, which is edge-casey and generally asking for trouble.  See <http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links/21925491#21925491> for details on that.

After poking around a bit, Autolinker.js, as suggested by the author of that post, looks pretty good.

It has lots of contributors and forks, compatible license (MIT), no dependencies, doesn't need a CommonJS loader, seems to be regularly maintained, has a big test suite (substantially more comprehensive than the other libraries I looked it, including all the react ones I could find).  There are a few downsides as well:

* The minified version is 15K, which seems like a lot to swallow for the link-clicker, particularly on mobile.

* It's going to need at least one i18n fix: https://github.com/gregjacobs/Autolinker.js/issues/89

Whatever linkification solution we go with, we're also going to need URL sanitization too.  <https://github.com/gregjacobs/Autolinker.js/issues/23> suggests using JsHtmlSanitizer, but that seems like a pretty big hammer.  More on that in a bit.
Given that React already sanitizes the HTML for us:

<cite>
All string values are sanitized before inserted into the DOM (except for CSS styles which is a known wontfix issue for IE8).
</cite>

(from the beginning of https://github.com/facebook/react/issues/3473 written by one of the core devs)

I think all we really need to sanitize here might be the URLs.  

https://github.com/facebook/react/issues/3473#issuecomment-91349525 has some discussion of this, and I'm wondering if there's anything else we could/should do outside of whitelisting the http and https URL schemes.

One wrinkle is that this code will be running in two different contexts:

* with the semi-elevated about:loop* privs on the embedded desktop conversation window
* with normal DOM privs in the link-clicker webapp

abr, I'd love your thoughts on this, as well as any pointers as to other security experts/sources that it would be good to solicit opinions on href sanitization from.
Flags: needinfo?(adam)
I have long ago concluded that Matt knows something about pretty much everything, so I'm asking him to weigh in if he has cycles. I'm also tagging Valentin, as I believe he's been doing some URL parsing work recently. (If either of you don't have time to look into this, feel free to clear your ni flag without comment).

For my own position: I'm concerned that none of these seem to have any hope of operating properly on non-ASCII URLs. On the other hand, as Autolinker has two kitchen sinks and a stove in it and still can't get i18n right, I shudder to think how large a library would have to be in order to get i18n correct. While the "UTF8 in the middle of a path" problem is probably easy to fix, there are some much trickier problems in here.

In particular, I think it would be a bit too provincial to choose an approach that couldn't at least handle URLs like these:

- http://www.投资者关系.中国/en/showcases.php?code=000063.SZ
- http://الوزيرالاول.الجزائر/

I suspect we can probably go out the door with a simple approach that only works with strings starting with "http://" or "https://", and then try to get smarter after we've had more time to investigate approaches. Basically, we would look for either of those two strings to find a start of a URL, scan until we find a character that violates URL ABNF [1] (considering all UTF-8 characters not to be a violation regardless of where they appear), and us that as the string we're linkifying.

____
[1] By which I mean the ABNF defined here: https://tools.ietf.org/html/rfc3986#appendix-A
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(linuxwolf)
Flags: needinfo?(adam)
(In reply to Adam Roach [:abr] from comment #3)

> I suspect we can probably go out the door with a simple approach that only
> works with strings starting with "http://" or "https://", and then try to
> get smarter after we've had more time to investigate approaches. 


By this, I assume you're suggesting go out the door with only that approach, not that approach in addition to (eg) Autolinker?

That seems like a reasonable starting point to me, though it doesn't handle the use case of heuristically linkifying stuff that doesn't start with a scheme, as requested in the design doc for text chat: https://docs.google.com/document/d/1HwYbM4Eru3rHVT3MStO8eeTunJ7pN5PYBingtmwe-bA/edit# 
on page 19.

> we would look for either of those two strings to find a start of a URL, scan
> until we find a character that violates URL ABNF [1] (considering all UTF-8
> characters not to be a violation regardless of where they appear), and us
> that as the string we're linkifying.

As far as validating the URL after we've found it, I wonder if it would make sense to let the browser handle it for us, eg 

http://www.joezimjs.com/javascript/the-lazy-mans-url-parsing/

or

https://developer.mozilla.org/en-US/docs/Web/API/URL
Flags: needinfo?(adam)
(In reply to Dan Mosedale (:dmose) - needinfo? me for response from comment #4)
> By this, I assume you're suggesting go out the door with only that approach,
> not that approach in addition to (eg) Autolinker?

Right, at least initially.

> That seems like a reasonable starting point to me, though it doesn't handle
> the use case of heuristically linkifying stuff that doesn't start with a
> scheme, as requested in the design doc for text chat:
> https://docs.google.com/document/d/1HwYbM4Eru3rHVT3MStO8eeTunJ7pN5PYBingtmwe-
> bA/edit# 
> on page 19.

This would be a PM trade-off evaluation, then. I agree with you that 15kB is a silly amount of code to send for this function (especially when it's not even complete enough to handle the two URLs I indicate above). Sometimes the UX has to be modified based on how what the incremental cost is for a given behavior.

> As far as validating the URL after we've found it, I wonder if it would make
> sense to let the browser handle it for us

I think you misunderstand. The point of scanning against the ABNF isn't to make sure the URL is valid (although that will be a side effect): it's to make an intelligent decision about where the URL ends. I suppose you could do something like scan the string "check out https://www.yahoo.com/featured/ and tell me what you think" by checking incrementally longer strings to see where a URL stops being valid, but this would entail something like checking:

parser.href = 'https://w' - valid
parser.href = 'https://ww' - valid
parser.href = 'https://www' - valid
parser.href = 'https://www.' - valid
parser.href = 'https://www.y' - valid
parser.href = 'https://www.ya' - valid
parser.href = 'https://www.yah' - valid
parser.href = 'https://www.yaho' - valid
parser.href = 'https://www.yahoo' - valid
parser.href = 'https://www.yahoo.' - valid
parser.href = 'https://www.yahoo.c' - valid
parser.href = 'https://www.yahoo.co' - valid
parser.href = 'https://www.yahoo.com' - valid
parser.href = 'https://www.yahoo.com/' - valid
parser.href = 'https://www.yahoo.com/f' - valid
parser.href = 'https://www.yahoo.com/fe' - valid
parser.href = 'https://www.yahoo.com/fea' - valid
parser.href = 'https://www.yahoo.com/feat' - valid
parser.href = 'https://www.yahoo.com/featu' - valid
parser.href = 'https://www.yahoo.com/featur' - valid
parser.href = 'https://www.yahoo.com/feature' - valid
parser.href = 'https://www.yahoo.com/featured' - valid
parser.href = 'https://www.yahoo.com/featured/' - valid
parser.href = 'https://www.yahoo.com/featured/ ' - invalid

And then deciding that the string must be "https://www.yahoo.com/featured/", since incorporating the next character makes it break. Even at that, I think that the HREF parser will percent-escape invalid characters and consider the resulting string to be valid, making it impossible to use for this purpose (unless you were doing something like checking whether something got percent encoded when you passed it through the parser, but I believe that breaks on UTF8).
Flags: needinfo?(adam)
This sounds like a nice idea, but I wouldn't go as far as trying to `linkify` URLs after every keypress. Parsing URLs can be fairly slow, so I suggest doing it only when a message is submitted. If at all.

A few thoughts:
- our URL parsing requires a scheme, so it wouldn't work out of the box for the ones starting with www.
- URL parsing generally does UTF/percent encoding and decoding thus changing the URL
- it might be possible to use nsIURIFixup to validate the URL, but that might be slow (I don't know if slower than a library)
- from what I've seen, a lot of URL linkifiers break with certain unicode characters (or even unexpected ASCII ones), so if we decide to pursue this feature I suggest we go for a simple and fast linker library, and not worry too much about covering all of the corner cases.
Flags: needinfo?(valentin.gosu)
(In reply to Adam Roach [:abr] from comment #3)
> I have long ago concluded that Matt knows something about pretty much
> everything, so I'm asking him to weigh in if he has cycles. I'm also tagging
> Valentin, as I believe he's been doing some URL parsing work recently. (If
> either of you don't have time to look into this, feel free to clear your ni
> flag without comment).
> 

I got strange looks when I first read this, as I literally LOL'd on a bus (-:

> For my own position: I'm concerned that none of these seem to have any hope
> of operating properly on non-ASCII URLs. On the other hand, as Autolinker
> has two kitchen sinks and a stove in it and still can't get i18n right, I
> shudder to think how large a library would have to be in order to get i18n
> correct. While the "UTF8 in the middle of a path" problem is probably easy
> to fix, there are some much trickier problems in here.
> 

As someone that's tried to do the regexp approach multiple times, I can tell you it gets very messy very very quickly.  Autolinker has always felt like it's too heavy to me, so I've never tried to use it.

> In particular, I think it would be a bit too provincial to choose an
> approach that couldn't at least handle URLs like these:
> 
> - http://www.投资者关系.中国/en/showcases.php?code=000063.SZ
> - http://الوزيرالاول.الجزائر/
> 
> I suspect we can probably go out the door with a simple approach that only
> works with strings starting with "http://" or "https://", and then try to
> get smarter after we've had more time to investigate approaches. Basically,
> we would look for either of those two strings to find a start of a URL, scan
> until we find a character that violates URL ABNF [1] (considering all UTF-8
> characters not to be a violation regardless of where they appear), and us
> that as the string we're linkifying.
> 

This would work for the first release targeting web geeks, but most people will consider linkifying broken.  They will expect to get links out of www.example.net, nobbiton.example, and often even nobody@example.com.

What I've seen is that you don't have to be perfect, or really approach perfect; it just needs to get close.  I will admit that I have continued to use regex, at least for a first-pass.

1) Use a regex to find candidates that require closer scrutiny.

2) Then try to rule it out based on a handful of things from RFC3896 that seem important (that list has evolved over time, but I don't have mine handy )-: ).

It's horrible, it's not even close perfect, but it seems to have satisfied enough people enough of the time.  It will also be code that gets tweaked with almost every release, and that can't be helped.

Also, I don't try to do it as the user types.  Chasing the edge cases of editing (e.g., select + insert-replace + backspace + backspace) are spectacularly not fun.


Hope this helps.
Flags: needinfo?(linuxwolf)
(In reply to Valentin Gosu [:valentin] from comment #6)
> This sounds like a nice idea, but I wouldn't go as far as trying to
> `linkify` URLs after every keypress. Parsing URLs can be fairly slow, so I
> suggest doing it only when a message is submitted. If at all.

I'm apparently doing a surprisingly bad job of expressing my thoughts using words today.

I wasn't proposing doing an evaluation on every keystroke; in fact, I wasn't proposing using the existing URL parser to do this job at all. The example I gave above was intended to demonstrate that the URL parser -- which is, as far as I know, only designed to work with URLs rather than URLs embedded in other text -- is a poor fit for the task.

_____

I'll start again from the beginning.

To linkify a URL, there are basically three prerequisite tasks:

1) Detect that a URL is present
2) If a URL is present, find the beginning of the URL
3) If a URL is present, find the end of the URL

In the general case (dealing with URLs that have no schemes), all three steps are hard. If you only deal with URLs that have two known schemes, and require those schemes to actually be present, then those first *two* steps are trivial. However, if we go that path, we still need to solve the somewhat tricky problem of finding the *end* of the URL.

That's the question to answer here: given an arbitrary string, how do you find the end of a URL? I have a proposal above, but you need to internalize the problem -- finding the end of a URL in an arbitrary string -- before it makes any sense.

Maybe there's something about our URL parser that makes it well suited for this task. My experience using it and the developer.m.o page don't give any obvious ways for finding the end of a URL in an arbitrary block of text, but it's completely possible that I've overlooked something.

_______

Now, that's all relevant if we try to come up with something ourselves. That would be driven by a desire to ship something that works with UTF8 in general (I'm comfortable that what I propose above would handle UTF8 rather completely), and to ship something that's not quite as large as Autolinker.

Clearly, we could also bite the bullet and just ship Autolinker. For the bloat, we'd get a far more aggressive link detection algorithm, but we'd also incorporate known weak (and possibly non-existent) UTF8 support. I may be putting a bit too much import on the "global" part of our principle that "The Internet is a global public resource that must remain open and accessible," but the thought of leaving out IDN websites and websites with non-Latin characters in their paths at least seems reason for rumination.

Of course, we could also bring Autolinker in, fix the i18n problems it has, and then submit pull requests upstream -- but I don't know where we'd get the resources for that.

_____

And that takes us to...

(In reply to Matthew Miller [:linuxwolf] from comment #7)
> This would work for the first release targeting web geeks, but most people
> will consider linkifying broken.  They will expect to get links out of
> www.example.net, nobbiton.example, and often even nobody@example.com.

Which, the more I think about it, the more I'm inclined to agree, but it requires us to keep up with the explosion of TLDs. I wonder how hard it would be to update Autolinker to automatically grab http://data.iana.org/TLD/tlds-alpha-by-domain.txt as part of its operation.
(In reply to Adam Roach [:abr] from comment #8)
> I wonder how hard it
> would be to update Autolinker to automatically grab
> http://data.iana.org/TLD/tlds-alpha-by-domain.txt as part of its operation.

Actually, scratch that -- if Hello ever gets successful, that would be an unfair load to impose on IANA. We'd need to mirror it if we intend to use it.
Not sure if it helps, but Diego Perini's regex over at https://mathiasbynens.be/demo/url-regex seems like a good enough fit here to allow for fast, relatively safe and UTF-8 friendly URL detection.
If we want to support more than _just_ the http/ https protocol, or even the ability to omit the protocol, we can adjust the regex to make that work.
(In reply to Mike de Boer [:mikedeboer] from comment #10)
> Not sure if it helps, but Diego Perini's regex over at
> https://mathiasbynens.be/demo/url-regex seems like a good enough fit here to
> allow for fast, relatively safe and UTF-8 friendly URL detection.

That's for URL validation, not URL detection.

The use of regexes for detection has a problem: most of them that I've found in my searches contain productions like "(([0-9a-z_-]+\.)+", which can turn into O(n^2) for certain character sequences. See https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS and http://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex

Actual strings that trigger the DOS are discussed in this thread: https://gist.github.com/gruber/8891611#comment-1310172
Whiteboard: [chat]
Points: --- → 5
Iteration: --- → 42.1 - Jul 13
After further discussion with Adam, the current plan is to suck up the Autolinker issues, particularly if we can get at least basic i18n (aka UTF8 in URL) support going.  Everything else seems like it's got nastier problems.
Adam points out that it'll be a good idea to wrap this API in a generic wrapper so that we can swap it out for another library later, should such a thing come along.
This is an impressively deep rathole.

To help clarify my thinking, here are the things I think we absolutely must have these two things:

a) linkifies URLs with schemes http: and https: (and no other schemes)
b) secure against HTML injection (typed HTML on one side can't be injected or XSS other side)
c) don't add a large amount of bloat to the link-clicker (particularly on mobile)

Doing (or re-using) a regexp-based approach inside React should get the above two things pretty cheaply, because React itself does a good job of refusing to stuff unsanitized HTML into the DOM unless you make it really explicit that you're doing it intentionally.

We really want:

d) linkifies things that looks like URLs without schemes (eg linkifies "I usually use yahoo.com for my searches")

This is harder, because it requires an approach that knows about TLDs.  None of the React libraries I've found do this.  

We could use Autolinkify.js or linkifyjs v2.0.  Both are big (~15k minified/5k gzipped minified), and neither one include an HTML sanitizer, which would add even more bloat and stuff to keep up-to-date, since React wouldn't be helping us out here.

e) have a reasonable amount of automated testing

A number of the libraries have a not totally trivial amount of unit tests.  Autolinkify looks like the only one with good coverage.

f) supports URLs with UTF8 (some subset of international domains)

I suspect (but haven't tested, and might well be wrong!) that we could fix up whatever version we get to do this.

g) supports URLs with IDN (most/all international domains)

The one thing that even _tries_ to do this at all is <https://github.com/webmodules/urlregexp>.  It generates a 24k minified RegExp, _and_ requires it's own regexp engine (XRegeX) in addition.  There are no automated tests at all.  This might not to be too hard to fix if/when we get a version of React that uses babel and turns on ES6 features, since ES6 has Unicode Regexp support.

h) Denial-of-Service resistant (not vulnerable to O(n^k) loops in the browser that often crop up in complex regexps).  

Unlike pretty much everything else, linkifyjs2.0 appears to be almost the only thing that has very little RegExp use.  All the rest is pretty much entirely regexp-y.
So what I propose to do for this bug is just try and handle http:// and https:// links, using a React component for security.  I'm currently looking at react-autolink and react-autolink-text, though those both seem to want CommonJS loaders.  Hmmm.
Which is to say, go for a, b, and c now; everything else later.
For more on Unicode in JS, see <https://mathiasbynens.be/notes/es6-unicode-regex>.  Good times!
Again, I think the solution _should_ be as simple as a modified version of https://gist.github.com/dperini/729294:

re_weburl = /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S*)?/gi

... which doesn't match from start to end of a string and makes the protocol part optional.

This results in:

"lalala a www.google.com/ loeloeloe adad".match(re_weburl)
  --> [ "www.google.com/" ]
"lalala a www.google.com/ or http://उदाहरण.परीक्षा loeloeloe adad".match(re_weburl)
  --> [ "www.google.com/", "http://उदाहरण.परीक्षा " ]
(In reply to Mike de Boer [:mikedeboer] from comment #18)
> Again, I think the solution _should_ be as simple as a modified version of
> https://gist.github.com/dperini/729294:
> 
> re_weburl =
> /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,
> 3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-
> 1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-
> 4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-
> \uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-
> z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/
> ?#]\S*)?/gi
> 
> ... which doesn't match from start to end of a string and makes the protocol
> part optional.
> 
> This results in:
> 
> "lalala a www.google.com/ loeloeloe adad".match(re_weburl)
>   --> [ "www.google.com/" ]
> "lalala a www.google.com/ or http://उदाहरण.परीक्षा loeloeloe
> adad".match(re_weburl)
>   --> [ "www.google.com/", "http://उदाहरण.परीक्षा " ]

Off the top of my head, we have a couple of issues:

"The problem is that every time I call widget.frob(), it hangs.".match(re_weburl)
--> [ 'widget.frob(' ] # Oh, that's going to get annoying fast.

"ƪ(‾.‾“)┐".match(re_weburl)  # Note: this is a real emoticon found in the wild.
--> [ '‾.‾“)' ] # Probably violates user expectations


There's also a bug here, but that's fixable:
"I found it at amazon.com last time I went looking".match(re_weburl)
--> [ 'amazon.com ' ] # Note trailing space


The real problem is that this is going to match way, way too many things. There are really only two viable ways to do this: either (a) look for and require a scheme, or (b) have a complete list of domains and incorporate them into the matching. Since we've decided that (a) isn't going to fly, we have to fall back to (b).
I totally agree that this approach is liberal and match plenty of strings that "shouldn't" be matched.

That said, whether my (and others in this bug, I think, have the belief that users are somewhat tolerant here).

abr, you've said "way, way too many things".  The question in my mind is whether that "too many" is higher or lower than what users on the whole will perceive as reasonable. It's hard to imagine how to resolve this without some user testing.   My gut is that being liberal and shipping sooner is better than being conservative and shipping later.  

FWIW, my work-in-progress is at <https://github.com/dmose/gecko-dev/pull/9/files> if you want to see what it feels like so far.
s/whether my (and others in this bug, I think,/I, (and others in this bug, I think)/
To be clear, what I'm asking for isn't all that complicated. Take whatever regex you want to use and drop in something that matches "all valid TLDs" in the part of the expression that is intended to match the TLD portion of a URL. Granted, with the ICANN madness, this is bigger than it should be, but not unreasonably so. If I grab the current list of TLDs, munge the punycode into UTF8, then UTF16, then escape it, and then pass this whole mess through something that generates a minimal regex, you get:

(((?:c(?:[cdgkmnvwxz]|o(?:n(?:s(?:truction|ulting)|(?:tractor|do)s)|m(?:m(?:unity|bank)|p(?:uter|any))?|u(?:(?:pon|rse)s|ntry)|(?:l(?:leg|ogn)|ffe)e|o(?:[lp]|king)|rsica|ach|des)?|a(?:[bl]|r(?:e(?:ers?)?|avan|tier|d?s)|n(?:cerresearch|on)|p(?:etown|ital)|s(?:[ah]|ino)|t(?:ering)?|m(?:era|p)|fe)?|h(?:r(?:istmas|ome)|a(?:nnel|t)|urch|eap|loe)?|l(?:o(?:thing|ud)|i(?:nic|ck)|eaning|aims|ub)?|r(?:edit(?:card)?|(?:uise)?s|icket|own)?|i(?:t(?:ic|y)|sco)?|y(?:(?:mr|o)u)?|e(?:nter|rn|o)|u(?:isinella)?|f[ad]?|b[an])|s(?:[bdgjlmrvxz]|c(?:[ab]|h(?:o(?:larships|ol)|midt|warz|ule)|ience|o[rt])?|a(?:ndvik(?:coromant)?|arland|msung|kura|le|rl|xo|p)?|o(?:l(?:utions|ar)|c(?:cer|ial)|ftware|n?y|hu)?|u(?:pp(?:l(?:ies|y)|ort)|r(?:gery|f)|zuki|cks)?|t(?:a(?:rhub|toil)|ud(?:io|y)|yle)?|h(?:o(?:es|w)|iksha|riram)?|p(?:readbetting|iegel|ace)|e(?:rvices|ner|xy?|at|w)?|k(?:y(?:pe)?|i)?|y(?:stems|dney)?|i(?:ngles|te)?|w(?:atch|iss)|n(?:cf)?)|b(?:[dfgjstvwy]|a(?:r(?:c(?:lay(?:card|s)|elona)|gains)?|n[dk]|uhaus|yern)?|r(?:o(?:th|k)er|idgestone|adesco|ussels)?|u(?:ild(?:ers)?|dapest|siness|zz)|l(?:ack(?:friday)?|oomberg|ue)|i(?:[doz]|(?:bl|k)e|ngo?)?|e(?:ntley|rlin|er|st)?|o(?:utique|ats|nd|o)?|n(?:pparibas|l)?|b(?:va|c)?|h(?:arti)?|mw?|zh?|cn)|a(?:[ow]|c(?:c(?:ountants?|enture)|t(?:ive|or)|ademy)?|i(?:r(?:force|tel)|g)?|b(?:b(?:ott)?|ogado)|u(?:ction|tos?|dio)?|l(?:lfinanz|sace)?|s(?:sociates|ia)?|p(?:artments|p)|r(?:chi|my|pa)?|(?:msterda)?m|q(?:uarelle)?|t(?:torney)?|d(?:ult|s)?|n(?:droid)?|e(?:ro|g)?|g(?:ency)?|z(?:ure)?|fl?|xa?)|m(?:[cdghklnpqrsvwxyz]|o(?:n(?:tblanc|ash|ey)|v(?:i(?:star|e))?|r(?:tgage|mon)|torcycles|scow|bi|da|e)?|a(?:r(?:ket(?:ing|s)?|riott)|n(?:agement|go)|i(?:son|f)|drid)?|e(?:m(?:orial|e)|lbourne|dia|nu?|et)?|i(?:(?:am|n)i|crosoft|l)|t(?:pc|n)?|u(?:seum)?|ma?|ba)|p(?:[efgkmnstwy]|r(?:o(?:d(?:uctions)?|pert(?:ies|y)|f)?|axi|ess)?|h(?:oto(?:graphy|s)?|armacy|ilips|ysio)?|a(?:r(?:t(?:(?:ner)?s|y)|is)|nerai|ge)?|i(?:c(?:t(?:ures|et)|s)|aget|zza|nk)|l(?:u(?:mbing|s)|a(?:ce|y))?|o(?:ker|hl|rn|st)|ub)|g(?:[fhnpqstwy]|o(?:[pv]|l(?:d(?:point)?|f)|o(?:g(?:le)?)?)|r(?:a(?:phic|ti)s|een|ipe)?|a(?:l(?:lery)?|rden|me)?|u(?:i(?:tars|de)|ge|ru)?|l(?:ob(?:al|o)|ass|e)?|e(?:nt(?:ing)?)?|i(?:fts?|ves)?|m(?:[ox]|ail)?|b(?:iz)?|g(?:ee)?|dn?)|t(?:[cdfgjklmntvwz]|e(?:ch(?:nology)?|l(?:efonica)?|masek|nnis|am)|o(?:(?:ol|ur|y)s|[dr]ay|shiba|kyo|wn|p)?|r(?:a(?:d(?:ing|e)|ining|vel)|ust)?|i(?:(?:cket|p)s|r(?:es|ol)|enda)|a(?:t(?:too|ar)|ipei|xi?)|h(?:eater|d)?|ui)|f(?:[jm]|i(?:nanc(?:ial|e)|sh(?:ing)?|t(?:ness)?|rmdale|lm)?|o(?:r(?:sale|ex|um)|o(?:tball)?|undation)?|l(?:o(?:rist|wers)|smidth|ights|y)|a(?:i(?:th|l)|shion|ns?|rm)|u(?:rniture|tbol|nd)|r(?:ogans|l)?|(?:eedbac)?k|yi)|d(?:[jmz]|e(?:nt(?:ist|al)|si(?:gn)?|livery|mocrat|gree|als|v)?|i(?:(?:scoun|e)t|rect(?:ory)?|amonds|gital)|o(?:(?:main|c)s|wnload|osan|ha|g)?|a(?:[dy]|t(?:ing|sun|e)|bur|nce)|(?:cl)?k|urban|rive|vag|np)|e(?:[ceg]|n(?:gineer(?:ing)?|terprises|ergy)|x(?:p(?:osed|ress|ert)|change)|u(?:rovision|s)?|ve(?:rbank|nts)|(?:quipmen)?t|du(?:cation)?|m(?:erck|ail)|s(?:tate|q)?|a(?:rth|t)|r(?:ni)?|pson)|l(?:[bckrvy]|i(?:m(?:ited|o)|ghting|[fv]e|aison|dl|nk)?|a(?:t(?:robe)?|w(?:yer)?|caixa|salle|nd)?|o(?:tt[eo]|ans?|ndon|ve|l)|u(?:x(?:ury|e)|pin)?|e(?:clerc|ase|gal)|t(?:da)?|d?s|gbt)|r(?:e(?:p(?:ublican|air|ort)|n(?:t(?:als)?)?|s(?:tauran)?t|alt(?:or|y)|d(?:stone)?|i(?:sen?|t)|views?|cipes|hab)?|i(?:[op]|co?h)|o(?:cks|deo)?|u(?:hr|n)?|s(?:vp)?|acing|yukyu|w)|h(?:[kmnrtu]|o(?:l(?:dings|iday)|t(?:eles|mail)|me(?:depot|s)|st(?:ing)?|[ru]se|ckey|nda|w)|e(?:r(?:mes|e)|althcare|lp)|a(?:mburg|ngout|us)|i(?:tachi|phop|v)|sbc)|i(?:[deloqs]|n(?:[gk]|(?:vestment|dustrie)s|t(?:ernational)?|s(?:titut|ur)e|f(?:initi|o))?|m(?:mo(?:bilien)?)?|(?:ine)?t|c(?:bc|u)|r(?:ish)?|[bf]m|wc)|v(?:[cgu]|i(?:s(?:ta(?:print)?|ion)|(?:aje|lla)s|deo)?|e(?:(?:nture|ga)s|rsicherung|t)?|o(?:t(?:[eo]|ing)|yage|dka)|(?:laandere)?n|a(?:cations)?)|w(?:[fs]|e(?:b(?:site|cam)|d(?:ding)?|ir)|i(?:n(?:dows)?|lliamhill|en|ki)|a(?:l(?:ter|es)|tch|ng)|or(?:ks?|ld)|hoswho|t[cf]|me)|n(?:[flopuz]|e(?:t(?:(?:ban|wor)k)?|ustar|ws?|xus|c)?|a(?:goya|dex|me|vy)?|i(?:ssan|nja|co)?|r[aw]?|go?|y?c|hk|tt)|o(?:r(?:a(?:cl|ng)e|g(?:anic)?)|(?:(?:tsu|sa)k|kinaw)a|n(?:[eg]|l(?:ine)?)|m(?:ega)?|ffice|oo|vh)|k(?:[eghmpwz]|i(?:tchen|wi|m)?|o(?:matsu|eln)|(?:aufe)?n|r(?:e?d)?|y(?:oto)?|ddi)|j(?:o(?:b(?:urg|s))?|e(?:welry|tzt)?|p(?:rs)?|l[cl]|uegos|ava|cb|m)|y(?:[et]|o(?:(?:koham|g)a|dobashi|utube)|a(?:chts|ndex))|u(?:[agksyz]|n(?:iversity|o)|ol)|x(?:(?:(?:er|b)o|x)x|in|yz)|z(?:[amw]|uerich|one|ip)|q(?:uebec|pon|a)))|\u4F5B\u5C71|\u6148\u5584|\u96C6\u56E2|\u5728\u7EBF|\uD55C\uAD6D|\u09AD\u09BE\u09B0\u09A4|\u516B\u5366|\u0645\u0648\u0642\u0639|\u516C\u76CA|\u516C\u53F8|\u79FB\u52A8|\u6211\u7231\u4F60|\u043C\u043E\u0441\u043A\u0432\u0430|\u049B\u0430\u0437|\u043E\u043D\u043B\u0430\u0439\u043D|\u0441\u0430\u0439\u0442|\u0441\u0440\u0431|\u0431\u0435\u043B|\u65F6\u5C1A|\u6DE1\u9A6C\u9521|\u043E\u0440\u0433|\uC0BC\uC131|\u0B9A\u0BBF\u0B99\u0BCD\u0B95\u0BAA\u0BCD\u0BAA\u0BC2\u0BB0\u0BCD|\u5546\u6807|\u5546\u5E97|\u5546\u57CE|\u0434\u0435\u0442\u0438|\u043C\u043A\u0434|\u5DE5\u884C|\u4E2D\u6587\u7F51|\u4E2D\u4FE1|\u4E2D\u56FD|\u4E2D\u570B|\u5A31\u4E50|\u8C37\u6B4C|\u0C2D\u0C3E\u0C30\u0C24\u0C4D|\u0DBD\u0D82\u0D9A\u0DCF|\u0AAD\u0ABE\u0AB0\u0AA4|\u092D\u093E\u0930\u0924|\u7F51\u5E97|\u0938\u0902\u0917\u0920\u0928|\u9910\u5385|\u7F51\u7EDC|\u0443\u043A\u0440|\u9999\u6E2F|\u98DE\u5229\u6D66|\u53F0\u6E7E|\u53F0\u7063|\u624B\u673A|\u043C\u043E\u043D|\u0627\u0644\u062C\u0632\u0627\u0626\u0631|\u0639\u0645\u0627\u0646|\u0627\u06CC\u0631\u0627\u0646|\u0627\u0645\u0627\u0631\u0627\u062A|\u0628\u0627\u0632\u0627\u0631|\u0627\u0644\u0627\u0631\u062F\u0646|\u0628\u06BE\u0627\u0631\u062A|\u0627\u0644\u0645\u063A\u0631\u0628|\u0627\u0644\u0633\u0639\u0648\u062F\u064A\u0629|\u0633\u0648\u062F\u0627\u0646|\u0645\u0644\u064A\u0633\u064A\u0627|\u653F\u5E9C|\u0634\u0628\u0643\u0629|\u10D2\u10D4|\u673A\u6784|\u7EC4\u7EC7\u673A\u6784|\u5065\u5EB7|\u0E44\u0E17\u0E22|\u0633\u0648\u0631\u064A\u0629|\u0440\u0443\u0441|\u0440\u0444|\u062A\u0648\u0646\u0633|\u307F\u3093\u306A|\u30B0\u30FC\u30B0\u30EB|\u4E16\u754C|\u0A2D\u0A3E\u0A30\u0A24|\u7F51\u5740|\u6E38\u620F|verm\u00F6gensberater|verm\u00F6gensberatung|\u4F01\u4E1A|\u4FE1\u606F|\u0645\u0635\u0631|\u0642\u0637\u0631|\u5E7F\u4E1C|\u0B87\u0BB2\u0B99\u0BCD\u0B95\u0BC8|\u0B87\u0BA8\u0BCD\u0BA4\u0BBF\u0BAF\u0BBE|\u0570\u0561\u0575|\u65B0\u52A0\u5761|\u0641\u0644\u0633\u0637\u064A\u0646|\u653F\u52A1)

For example, you can drop this into Mike's proposed regex like this, although I'll note that it suffers the same bug I point out above that it usually pulls in one more trailing character than it should:

var re_weburl = /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(((?:c(?:[cdgkmnvwxz]|o(?:n(?:s(?:truction|ulting)|(?:tractor|do)s)|m(?:m(?:unity|bank)|p(?:uter|any))?|u(?:(?:pon|rse)s|ntry)|(?:l(?:leg|ogn)|ffe)e|o(?:[lp]|king)|rsica|ach|des)?|a(?:[bl]|r(?:e(?:ers?)?|avan|tier|d?s)|n(?:cerresearch|on)|p(?:etown|ital)|s(?:[ah]|ino)|t(?:ering)?|m(?:era|p)|fe)?|h(?:r(?:istmas|ome)|a(?:nnel|t)|urch|eap|loe)?|l(?:o(?:thing|ud)|i(?:nic|ck)|eaning|aims|ub)?|r(?:edit(?:card)?|(?:uise)?s|icket|own)?|i(?:t(?:ic|y)|sco)?|y(?:(?:mr|o)u)?|e(?:nter|rn|o)|u(?:isinella)?|f[ad]?|b[an])|s(?:[bdgjlmrvxz]|c(?:[ab]|h(?:o(?:larships|ol)|midt|warz|ule)|ience|o[rt])?|a(?:ndvik(?:coromant)?|arland|msung|kura|le|rl|xo|p)?|o(?:l(?:utions|ar)|c(?:cer|ial)|ftware|n?y|hu)?|u(?:pp(?:l(?:ies|y)|ort)|r(?:gery|f)|zuki|cks)?|t(?:a(?:rhub|toil)|ud(?:io|y)|yle)?|h(?:o(?:es|w)|iksha|riram)?|p(?:readbetting|iegel|ace)|e(?:rvices|ner|xy?|at|w)?|k(?:y(?:pe)?|i)?|y(?:stems|dney)?|i(?:ngles|te)?|w(?:atch|iss)|n(?:cf)?)|b(?:[dfgjstvwy]|a(?:r(?:c(?:lay(?:card|s)|elona)|gains)?|n[dk]|uhaus|yern)?|r(?:o(?:th|k)er|idgestone|adesco|ussels)?|u(?:ild(?:ers)?|dapest|siness|zz)|l(?:ack(?:friday)?|oomberg|ue)|i(?:[doz]|(?:bl|k)e|ngo?)?|e(?:ntley|rlin|er|st)?|o(?:utique|ats|nd|o)?|n(?:pparibas|l)?|b(?:va|c)?|h(?:arti)?|mw?|zh?|cn)|a(?:[ow]|c(?:c(?:ountants?|enture)|t(?:ive|or)|ademy)?|i(?:r(?:force|tel)|g)?|b(?:b(?:ott)?|ogado)|u(?:ction|tos?|dio)?|l(?:lfinanz|sace)?|s(?:sociates|ia)?|p(?:artments|p)|r(?:chi|my|pa)?|(?:msterda)?m|q(?:uarelle)?|t(?:torney)?|d(?:ult|s)?|n(?:droid)?|e(?:ro|g)?|g(?:ency)?|z(?:ure)?|fl?|xa?)|m(?:[cdghklnpqrsvwxyz]|o(?:n(?:tblanc|ash|ey)|v(?:i(?:star|e))?|r(?:tgage|mon)|torcycles|scow|bi|da|e)?|a(?:r(?:ket(?:ing|s)?|riott)|n(?:agement|go)|i(?:son|f)|drid)?|e(?:m(?:orial|e)|lbourne|dia|nu?|et)?|i(?:(?:am|n)i|crosoft|l)|t(?:pc|n)?|u(?:seum)?|ma?|ba)|p(?:[efgkmnstwy]|r(?:o(?:d(?:uctions)?|pert(?:ies|y)|f)?|axi|ess)?|h(?:oto(?:graphy|s)?|armacy|ilips|ysio)?|a(?:r(?:t(?:(?:ner)?s|y)|is)|nerai|ge)?|i(?:c(?:t(?:ures|et)|s)|aget|zza|nk)|l(?:u(?:mbing|s)|a(?:ce|y))?|o(?:ker|hl|rn|st)|ub)|g(?:[fhnpqstwy]|o(?:[pv]|l(?:d(?:point)?|f)|o(?:g(?:le)?)?)|r(?:a(?:phic|ti)s|een|ipe)?|a(?:l(?:lery)?|rden|me)?|u(?:i(?:tars|de)|ge|ru)?|l(?:ob(?:al|o)|ass|e)?|e(?:nt(?:ing)?)?|i(?:fts?|ves)?|m(?:[ox]|ail)?|b(?:iz)?|g(?:ee)?|dn?)|t(?:[cdfgjklmntvwz]|e(?:ch(?:nology)?|l(?:efonica)?|masek|nnis|am)|o(?:(?:ol|ur|y)s|[dr]ay|shiba|kyo|wn|p)?|r(?:a(?:d(?:ing|e)|ining|vel)|ust)?|i(?:(?:cket|p)s|r(?:es|ol)|enda)|a(?:t(?:too|ar)|ipei|xi?)|h(?:eater|d)?|ui)|f(?:[jm]|i(?:nanc(?:ial|e)|sh(?:ing)?|t(?:ness)?|rmdale|lm)?|o(?:r(?:sale|ex|um)|o(?:tball)?|undation)?|l(?:o(?:rist|wers)|smidth|ights|y)|a(?:i(?:th|l)|shion|ns?|rm)|u(?:rniture|tbol|nd)|r(?:ogans|l)?|(?:eedbac)?k|yi)|d(?:[jmz]|e(?:nt(?:ist|al)|si(?:gn)?|livery|mocrat|gree|als|v)?|i(?:(?:scoun|e)t|rect(?:ory)?|amonds|gital)|o(?:(?:main|c)s|wnload|osan|ha|g)?|a(?:[dy]|t(?:ing|sun|e)|bur|nce)|(?:cl)?k|urban|rive|vag|np)|e(?:[ceg]|n(?:gineer(?:ing)?|terprises|ergy)|x(?:p(?:osed|ress|ert)|change)|u(?:rovision|s)?|ve(?:rbank|nts)|(?:quipmen)?t|du(?:cation)?|m(?:erck|ail)|s(?:tate|q)?|a(?:rth|t)|r(?:ni)?|pson)|l(?:[bckrvy]|i(?:m(?:ited|o)|ghting|[fv]e|aison|dl|nk)?|a(?:t(?:robe)?|w(?:yer)?|caixa|salle|nd)?|o(?:tt[eo]|ans?|ndon|ve|l)|u(?:x(?:ury|e)|pin)?|e(?:clerc|ase|gal)|t(?:da)?|d?s|gbt)|r(?:e(?:p(?:ublican|air|ort)|n(?:t(?:als)?)?|s(?:tauran)?t|alt(?:or|y)|d(?:stone)?|i(?:sen?|t)|views?|cipes|hab)?|i(?:[op]|co?h)|o(?:cks|deo)?|u(?:hr|n)?|s(?:vp)?|acing|yukyu|w)|h(?:[kmnrtu]|o(?:l(?:dings|iday)|t(?:eles|mail)|me(?:depot|s)|st(?:ing)?|[ru]se|ckey|nda|w)|e(?:r(?:mes|e)|althcare|lp)|a(?:mburg|ngout|us)|i(?:tachi|phop|v)|sbc)|i(?:[deloqs]|n(?:[gk]|(?:vestment|dustrie)s|t(?:ernational)?|s(?:titut|ur)e|f(?:initi|o))?|m(?:mo(?:bilien)?)?|(?:ine)?t|c(?:bc|u)|r(?:ish)?|[bf]m|wc)|v(?:[cgu]|i(?:s(?:ta(?:print)?|ion)|(?:aje|lla)s|deo)?|e(?:(?:nture|ga)s|rsicherung|t)?|o(?:t(?:[eo]|ing)|yage|dka)|(?:laandere)?n|a(?:cations)?)|w(?:[fs]|e(?:b(?:site|cam)|d(?:ding)?|ir)|i(?:n(?:dows)?|lliamhill|en|ki)|a(?:l(?:ter|es)|tch|ng)|or(?:ks?|ld)|hoswho|t[cf]|me)|n(?:[flopuz]|e(?:t(?:(?:ban|wor)k)?|ustar|ws?|xus|c)?|a(?:goya|dex|me|vy)?|i(?:ssan|nja|co)?|r[aw]?|go?|y?c|hk|tt)|o(?:r(?:a(?:cl|ng)e|g(?:anic)?)|(?:(?:tsu|sa)k|kinaw)a|n(?:[eg]|l(?:ine)?)|m(?:ega)?|ffice|oo|vh)|k(?:[eghmpwz]|i(?:tchen|wi|m)?|o(?:matsu|eln)|(?:aufe)?n|r(?:e?d)?|y(?:oto)?|ddi)|j(?:o(?:b(?:urg|s))?|e(?:welry|tzt)?|p(?:rs)?|l[cl]|uegos|ava|cb|m)|y(?:[et]|o(?:(?:koham|g)a|dobashi|utube)|a(?:chts|ndex))|u(?:[agksyz]|n(?:iversity|o)|ol)|x(?:(?:(?:er|b)o|x)x|in|yz)|z(?:[amw]|uerich|one|ip)|q(?:uebec|pon|a)))|\u4F5B\u5C71|\u6148\u5584|\u96C6\u56E2|\u5728\u7EBF|\uD55C\uAD6D|\u09AD\u09BE\u09B0\u09A4|\u516B\u5366|\u0645\u0648\u0642\u0639|\u516C\u76CA|\u516C\u53F8|\u79FB\u52A8|\u6211\u7231\u4F60|\u043C\u043E\u0441\u043A\u0432\u0430|\u049B\u0430\u0437|\u043E\u043D\u043B\u0430\u0439\u043D|\u0441\u0430\u0439\u0442|\u0441\u0440\u0431|\u0431\u0435\u043B|\u65F6\u5C1A|\u6DE1\u9A6C\u9521|\u043E\u0440\u0433|\uC0BC\uC131|\u0B9A\u0BBF\u0B99\u0BCD\u0B95\u0BAA\u0BCD\u0BAA\u0BC2\u0BB0\u0BCD|\u5546\u6807|\u5546\u5E97|\u5546\u57CE|\u0434\u0435\u0442\u0438|\u043C\u043A\u0434|\u5DE5\u884C|\u4E2D\u6587\u7F51|\u4E2D\u4FE1|\u4E2D\u56FD|\u4E2D\u570B|\u5A31\u4E50|\u8C37\u6B4C|\u0C2D\u0C3E\u0C30\u0C24\u0C4D|\u0DBD\u0D82\u0D9A\u0DCF|\u0AAD\u0ABE\u0AB0\u0AA4|\u092D\u093E\u0930\u0924|\u7F51\u5E97|\u0938\u0902\u0917\u0920\u0928|\u9910\u5385|\u7F51\u7EDC|\u0443\u043A\u0440|\u9999\u6E2F|\u98DE\u5229\u6D66|\u53F0\u6E7E|\u53F0\u7063|\u624B\u673A|\u043C\u043E\u043D|\u0627\u0644\u062C\u0632\u0627\u0626\u0631|\u0639\u0645\u0627\u0646|\u0627\u06CC\u0631\u0627\u0646|\u0627\u0645\u0627\u0631\u0627\u062A|\u0628\u0627\u0632\u0627\u0631|\u0627\u0644\u0627\u0631\u062F\u0646|\u0628\u06BE\u0627\u0631\u062A|\u0627\u0644\u0645\u063A\u0631\u0628|\u0627\u0644\u0633\u0639\u0648\u062F\u064A\u0629|\u0633\u0648\u062F\u0627\u0646|\u0645\u0644\u064A\u0633\u064A\u0627|\u653F\u5E9C|\u0634\u0628\u0643\u0629|\u10D2\u10D4|\u673A\u6784|\u7EC4\u7EC7\u673A\u6784|\u5065\u5EB7|\u0E44\u0E17\u0E22|\u0633\u0648\u0631\u064A\u0629|\u0440\u0443\u0441|\u0440\u0444|\u062A\u0648\u0646\u0633|\u307F\u3093\u306A|\u30B0\u30FC\u30B0\u30EB|\u4E16\u754C|\u0A2D\u0A3E\u0A30\u0A24|\u7F51\u5740|\u6E38\u620F|verm\u00F6gensberater|verm\u00F6gensberatung|\u4F01\u4E1A|\u4FE1\u606F|\u0645\u0635\u0631|\u0642\u0637\u0631|\u5E7F\u4E1C|\u0B87\u0BB2\u0B99\u0BCD\u0B95\u0BC8|\u0B87\u0BA8\u0BCD\u0BA4\u0BBF\u0BAF\u0BBE|\u0570\u0561\u0575|\u65B0\u52A0\u5761|\u0641\u0644\u0633\u0637\u064A\u0646|\u653F\u52A1)).?)(?::\d{2,5})?(?:[/?#]\S*)?/gi;

That said, I think the regex Mike found can be simplified a bit. The searching for numeric IPs, for example, seems like overkill -- and it accounts for the majority of the regex.
Attached file Script to create the regex (obsolete) —
By request, here's the perl script I used to create the regex in my previous comment.
This time with the script instead of its output. :)
Attachment #8633735 - Attachment is obsolete: true
Iteration: 42.1 - Jul 13 → 42.2 - Jul 27
Duplicate of this bug: 1184977
Depends on: 1186240
Blocks: 1186245
Blocks: 1186254
Attached patch Make links clickable (obsolete) — Splinter Review
Attachment #8636862 - Flags: review?(standard8)
There are a still a few more tests I want to write (particularly around special character handling and attempted HTML injection), but I'd like to get the ball rolling on review so that this hits the tree sooner rather than later.  I'll attach those tests as a separate patch in this bug, and I'll land the two of them together.
Comment on attachment 8636862 [details] [diff] [review]
Make links clickable

Review of attachment 8636862 [details] [diff] [review]:
-----------------------------------------------------------------

I've not yet tested this, no reviewed the test for the linkified view, but I think there's a reasonable amount of cleanup to do in the comments.

::: browser/components/loop/content/shared/js/linkifiedTextView.jsx
@@ +1,4 @@
> +// The monster regexp we're using is derived from Diego Perini's code,
> +// currently available at https://gist.github.com/dperini/729294
> +//
> +// So, we'll put this file under that license (we may later wish to extract

I think we should just split this out now to avoid any possible license confusion.

We've just removed three files from the standalone in the feedback rework patch, so having two files would mean we're at a net of one less which is still an improvement. Granted we want to get as much as we can, but I don't think its worth sacrificing possible license clarity to obtain that.

LinkifiedTextView could always go into views.jsx anyway (with the parser going into utils.js) if we really wanted to drop down to two less.

@@ +89,5 @@
> +     * Find the first URL in a string
> +     *
> +     * @param {String} s - the string to be searched for a URL
> +     * @returns {{matchedLink: string, // the first link found
> +     *            index: number}} // zero-based starting index of matchLink

This is quite strange from a documentation point of view. Traditionally we've done something like:

* @returns {Object} The result which contains
*   - {String} matchedLink The first link found
*   - {Number} index       Zero-based starting index of matchLink

To me that reads better than the example you have - however, I've also just found:

http://usejsdoc.org/tags-param.html#parameters-with-properties

which might be even nicer.

@@ +100,5 @@
> +      }
> +
> +      return {
> +        matchedLink: result[0],
> +        index: result.index     // index into s where matchedLink begins

The document here seems to be a duplicate of the jsdoc at the start of the function. If its really necessary, I think it'd be better on the previous line.

@@ +113,5 @@
> +   *
> +   * Uses LinkifiedTextParser for the actual parsing.
> +   */
> +  var LinkifiedTextView = React.createClass(
> +    {

nit: our normal convention is to put the { on the same line as the createClass.

@@ +115,5 @@
> +   */
> +  var LinkifiedTextView = React.createClass(
> +    {
> +      propTypes: {
> +        // the text to be linkified

Nit: please make this a proper sentence.

@@ +117,5 @@
> +    {
> +      propTypes: {
> +        // the text to be linkified
> +        rawText: React.PropTypes.string.isRequired,
> +        // should the links send a referrer?  defaults to false.

nit: Capital S

@@ +119,5 @@
> +        // the text to be linkified
> +        rawText: React.PropTypes.string.isRequired,
> +        // should the links send a referrer?  defaults to false.
> +        sendReferrer: React.PropTypes.bool,
> +        // should we suppress target="_blank" on the link? defaults to false,

nit: captial S.

@@ +129,5 @@
> +        React.addons.PureRenderMixin
> +      ],
> +
> +      getInitialState: function() {
> +        return { parser: new LinkifiedTextParser() };

Why does this need to be a state? It doesn't change at all. Wouldn't a member variable be better?

@@ +142,5 @@
> +          linkAttributes.target = "_blank";
> +        }
> +
> +        if (!this.props.sendReferrer) {
> +          linkAttributes.rel = "noreferrer";

note to self: check this on desktop.

@@ +157,5 @@
> +       *
> +       * @returns {Array} of strings and React <a> elements in order.
> +       */
> +      parseStringToElements: function (s) {
> +

nit: prefer no blank line at start of function (this file is also inconsistent with itself).

@@ +196,5 @@
> +        }
> +
> +        // no URL was matched; terminate the recursion and return the final text
> +        return [s];
> +

nit: no blank line necessary after return.

@@ +200,5 @@
> +
> +      },
> +
> +      render: function () {
> +

nit: prefer no blank line at start of function

@@ +204,5 @@
> +
> +        var elements = this.parseStringToElements(this.props.rawText);
> +
> +        // React needs all elements of an array to be keyed for efficient
> +        // rendering

nit: missing dot.

::: browser/components/loop/ui/ui-showcase.jsx
@@ +284,5 @@
>    });
>  
> +  // XXX get rid of hardcoded string constants being passed to
> +  // _appendTextChatMessage
> +  textChatStore.receivedTextChatMessage = function (actionData) {

This seems to be a bad duplicate of the receivedTectChatMessage function, I don't understand why it is necessary.

@@ +294,5 @@
> +    this._appendTextChatMessage("recv", actionData);
> +  };
> +
> +  textChatStore.sendTextChatMessage = function (actionData) {
> +    this._appendTextChatMessage("sent", actionData);

This seems to be working around the fact that sendTextChatMessage is sending the message - which gets sent to itself via the mockSDK - which you're modifying above.

Hence, I'm confused as to what this two overrides are really doing, or if they are really the right thing.
Attachment #8636862 - Flags: review?(standard8)
(In reply to Mark Banner (:standard8) from comment #28)
> ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx
> @@ +1,4 @@
> > +// The monster regexp we're using is derived from Diego Perini's code,
> > +// currently available at https://gist.github.com/dperini/729294
> > +//
> > +// So, we'll put this file under that license (we may later wish to extract
> 
> I think we should just split this out now to avoid any possible license
> confusion.

OK, done.

> @@ +89,5 @@
> > +     * Find the first URL in a string
> > +     *
> > +     * @param {String} s - the string to be searched for a URL
> > +     * @returns {{matchedLink: string, // the first link found
> > +     *            index: number}} // zero-based starting index of matchLink
> 
> This is quite strange from a documentation point of view. Traditionally
> we've done something like:
> 
> * @returns {Object} The result which contains
> *   - {String} matchedLink The first link found
> *   - {Number} index       Zero-based starting index of matchLink
> 
> To me that reads better than the example you have 

I picked that syntax out from other places in usejsdoc.org, and picked it because it should be machine-parsable as well as human-readable, once you get used to it (our traditional form seems unlikely to be well-understood by the various IDE and other jsdoc parsers). 

> - however, I've also just
> found:
> 
> http://usejsdoc.org/tags-param.html#parameters-with-properties
> 
> which might be even nicer.

It's not obvious to me how we would use that with a retval, since those are unnamed.

Right now, I've switched to this:

    /**
     * Find the first URL in a string
     *
     * @param {String} s - the string to be searched for a URL
     *
     * @typedef LinkableResult
     * @type Object
     *   @property {string} matchedLink - the first link found
     *   @property {number} index - zero-based starting index of matchedLink in s
     *
     * @returns {LinkableResult|null}
     */

Thoughts?

> ::: browser/components/loop/ui/ui-showcase.jsx
> @@ +284,5 @@
> >    });
> >  
> > +  // XXX get rid of hardcoded string constants being passed to
> > +  // _appendTextChatMessage
> > +  textChatStore.receivedTextChatMessage = function (actionData) {
> 
> This seems to be a bad duplicate of the receivedTectChatMessage function, I
> don't understand why it is necessary.
> 
> @@ +294,5 @@
> > +    this._appendTextChatMessage("recv", actionData);
> > +  };
> > +
> > +  textChatStore.sendTextChatMessage = function (actionData) {
> > +    this._appendTextChatMessage("sent", actionData);
> 
> This seems to be working around the fact that sendTextChatMessage is sending
> the message - which gets sent to itself via the mockSDK - which you're
> modifying above.
> 
> Hence, I'm confused as to what this two overrides are really doing, or if
> they are really the right thing.

Right now, text chat date stamps are busted in the UI showcase (a bunch of things show up as "Invalid date").  IIRC, part of this is related to the fact that it wasn't easy to override the "receive" stuff in the SDK driver, because that's triggered somewhere inconvenient from a data channel.  

However, you're quite right that this is a mess.  I'll try and figure out something better after lunch.  I kinda suspect the right thing to do is just make it easier to override the receive action in the SDK driver.

All other stuff has been fixed/addressed.  I'll reupload the patch after I sort the showcase stuff.
Attached patch make links in chat clickable (obsolete) — Splinter Review
Attachment #8637619 - Flags: review?(standard8)
Attachment #8636862 - Attachment is obsolete: true
Attachment #8637619 - Flags: review?(gerv)
OK, all comments from the previous review have been addressed.  I've also added integration tests to check that pasted HTML is not injected into the real DOM, but is instead rendered as HTML source code.

Gerv, if you could rs the license.html change, that'd be great.  I didn't need to add the Autolinker.js stuff, as that's only tests, and isn't shipping as part of Firefox.
Comment on attachment 8637619 [details] [diff] [review]
make links in chat clickable

Review of attachment 8637619 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/content/license.html
@@ +4213,5 @@
> +
> +Author: Diego Perini
> +Updated: 2010/12/05
> +License: MIT
> +

Cut out these top six lines and just start with the Copyright line below. Then, r=gerv.

Gerv
Attachment #8637619 - Flags: review?(gerv) → review+
Comment on attachment 8637619 [details] [diff] [review]
make links in chat clickable

Review of attachment 8637619 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have time to do justice on reviewing this at the moment, so passing to Mike.
Attachment #8637619 - Flags: review?(standard8) → review?(mdeboer)
Comment on attachment 8637619 [details] [diff] [review]
make links in chat clickable

Review of attachment 8637619 [details] [diff] [review]:
-----------------------------------------------------------------

The approach looks good to me, I think we can simplify it a bit more.

::: browser/components/loop/content/shared/js/linkifiedTextView.jsx
@@ +73,5 @@
> +    },
> +
> +    _generateLinkAttributes: function(href) {
> +      var linkAttributes = {
> +        href: href

Please be aware that on desktop, we use `mozLoop.openURL()` to open links, to prevent certain security vulnerabilities.

@@ +95,5 @@
> +     * @param {String} s - the raw string to be parsed
> +     *
> +     * @returns {Array} of strings and React <a> elements in order.
> +     */
> +    parseStringToElements: function (s) {

Since we're dealing with tail-recursion here, it's not necessary to make this recursive or introduce a separate parser object for this.

```js
var elements = [];
var result;
while (result = loop.shared.urlRegExps.fullUrlMatch.exec(s)) {
  if (result.index) {
    elements.push(s.substr(0, result.index));
    s = s.substr(result.index);
  }

  elements.push(
    <a { ...this._generateLinkAttributes(result[0]) }>
      {result[0]}
    </a>
  );
  s = s.substr(result[0].length);
}

if (s) {
  elements.push(s);
}

return elements;
```

@@ +152,5 @@
> +
> +      return (
> +        <p>
> +          { keyedElements }
> +        </p>

nit: I think this'll fit on a single line ;-)

::: browser/components/loop/content/shared/js/urlRegExps.js
@@ +1,2 @@
> +// This is derived from Diego Perini's code,
> +// currently available at https://gist.github.com/dperini/729294

Please move the regex to utils.js. A single regex is not worth having its own file for.
Plus, but not the main reason to ask for this, until we concatenate our JS files for the standalone, introducing more files to load is not free.

@@ +26,5 @@
> +// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
> +// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
> +// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> +// OTHER DEALINGS IN THE SOFTWARE.

If this is really just a 1:1 copy of the MIT license text, then I'm really unsure whether the added value of having it here is valuable. If possible, I'd like to remove it.

@@ +40,5 @@
> +  //
> +  // * https://regex101.com/#javascript
> +  // * https://www.debuggex.com/
> +  // * http://regviz.org/
> +  // * http://scraping.pro/10-best-online-regex-testers/

Is this something you added yourself? I think this stuff is better suited to be placed on a wiki, instead of in-code. URLs can vanish at any point, so I'd rather not have them here...

@@ +43,5 @@
> +  // * http://regviz.org/
> +  // * http://scraping.pro/10-best-online-regex-testers/
> +
> +  var fullUrlMatch = new RegExp(
> +    // protocol identifier

nit: 'Protocol identifier.' Please apply this nit throughout this file.

::: browser/components/loop/test/shared/linkifiedTextView_test.js
@@ +1,1 @@
> +/*

You forgot to add our own CC blob :)

@@ +27,5 @@
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +describe("loop.shared.views.LinkifiedTextView", function () {

General comment for these tests: I'd prefer to use fictional URLs, instead of 'yahoo' and/ or 'sencha'.
Attachment #8637619 - Flags: review?(mdeboer) → feedback+
Iteration: 42.2 - Jul 27 → 42.3 - Aug 10
(In reply to Mike de Boer [:mikedeboer] from comment #34)
> ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx
> @@ +73,5 @@
> > +    },
> > +
> > +    _generateLinkAttributes: function(href) {
> > +      var linkAttributes = {
> > +        href: href
> 
> Please be aware that on desktop, we use `mozLoop.openURL()` to open links,
> to prevent certain security vulnerabilities.

Thanks for the heads up; I wasn't aware of that. I've added code to handle this.

> @@ +95,5 @@
> > +     * @param {String} s - the raw string to be parsed
> > +     *
> > +     * @returns {Array} of strings and React <a> elements in order.
> > +     */
> > +    parseStringToElements: function (s) {
> 
> Since we're dealing with tail-recursion here, it's not necessary to make
> this recursive or introduce a separate parser object for this.

My previous version was indeed making premature assumptions here, particularly in light of your idea of doing post-matching fixup rather than breaking up the regexp itself, which I like.  Fixed.

> @@ +152,5 @@
> > +
> > +      return (
> > +        <p>
> > +          { keyedElements }
> > +        </p>
> 
> nit: I think this'll fit on a single line ;-)

Heh; fixed.

> ::: browser/components/loop/content/shared/js/urlRegExps.js
> @@ +1,2 @@
> > +// This is derived from Diego Perini's code,
> > +// currently available at https://gist.github.com/dperini/729294
> 
> Please move the regex to utils.js. A single regex is not worth having its
> own file for.
> Plus, but not the main reason to ask for this, until we concatenate our JS
> files for the standalone, introducing more files to load is not free.

After discussion with Mike, we decided to leave this alone, given the previous review comments from Mark and Gerv.

> @@ +40,5 @@
> > +  //
> > +  // * https://regex101.com/#javascript
> > +  // * https://www.debuggex.com/
> > +  // * http://regviz.org/
> > +  // * http://scraping.pro/10-best-online-regex-testers/
> 
> Is this something you added yourself? I think this stuff is better suited to
> be placed on a wiki, instead of in-code. URLs can vanish at any point, so
> I'd rather not have them here...

OK, put it up in the Loop section of wikimo at https://wiki.mozilla.org/Loop/Development/RegExpDebugging and linked to that from the code for discoverability.

> @@ +43,5 @@
> > +  // * http://regviz.org/
> > +  // * http://scraping.pro/10-best-online-regex-testers/
> > +
> > +  var fullUrlMatch = new RegExp(
> > +    // protocol identifier
> 
> nit: 'Protocol identifier.' Please apply this nit throughout this file.

Fixed.

> ::: browser/components/loop/test/shared/linkifiedTextView_test.js
> @@ +1,1 @@
> > +/*
> 
> You forgot to add our own CC blob :)

It's not obvious to me how anyone would be able to figure out which changes in the file were MIT and which were CC, so it feels like the pragmatic thing to do is just put it all under MIT.  Am I missing something?
 
> @@ +27,5 @@
> > +describe("loop.shared.views.LinkifiedTextView", function () {
> 
> General comment for these tests: I'd prefer to use fictional URLs, instead
> of 'yahoo' and/ or 'sencha'.

Fixed.
Attached patch make links in chat clickable (obsolete) — Splinter Review
Attachment #8637619 - Attachment is obsolete: true
Attachment #8640935 - Flags: review?(mdeboer)
Comment on attachment 8640935 [details] [diff] [review]
make links in chat clickable

Review of attachment 8640935 [details] [diff] [review]:
-----------------------------------------------------------------

Cool! r=me with comments addressed.

As a side-note: I don't quite see _why_ the MIT license text needs to spelled out in its entirety in two places within this patch. It's not like there are multiple.

::: browser/components/loop/content/shared/js/linkifiedTextView.jsx
@@ +5,5 @@
> +var loop = loop || {};
> +loop.shared = loop.shared || {};
> +loop.shared.views = loop.shared.views || {};
> +loop.shared.views.linkifiedText = (function(mozL10n) {
> +

nit: superfluous newline.

@@ +15,5 @@
> +   * links inside a <p> container.
> +   */
> +  var LinkifiedTextView = React.createClass({
> +    propTypes: {
> +      // Call this instead of allowing the default <a> click semantics, if given

nit: missing dot.

@@ +19,5 @@
> +      // Call this instead of allowing the default <a> click semantics, if given
> +      linkClickHandler: React.PropTypes.func,
> +      // The text to be linkified.
> +      rawText: React.PropTypes.string.isRequired,
> +      // Should the links send a referrer?  defaults to false.

nit: 'Defaults'.

@@ +21,5 @@
> +      // The text to be linkified.
> +      rawText: React.PropTypes.string.isRequired,
> +      // Should the links send a referrer?  defaults to false.
> +      sendReferrer: React.PropTypes.bool,
> +      // Should we suppress target="_blank" on the link? defaults to false,

nit: 'Defaults'.

@@ +22,5 @@
> +      rawText: React.PropTypes.string.isRequired,
> +      // Should the links send a referrer?  defaults to false.
> +      sendReferrer: React.PropTypes.bool,
> +      // Should we suppress target="_blank" on the link? defaults to false,
> +      // mostly for testing use.

nit: 'Mostly'.

@@ +60,5 @@
> +    /**
> +     * Parse the given string into an array of strings and React <a> elements
> +     * in the order in which they should be rendered (i.e. FIFO).
> +     *
> +     * @param {String} s - the raw string to be parsed

nit: I'm used to: '@param {String} s The raw string to be parsed.'

@@ +64,5 @@
> +     * @param {String} s - the raw string to be parsed
> +     *
> +     * @returns {Array} of strings and React <a> elements in order.
> +     */
> +    parseStringToElements: function (s) {

nit: please unify the function declarations in this file. I recommend `function(s) {`, as that's most common in loop/.

@@ +68,5 @@
> +    parseStringToElements: function (s) {
> +      var elements = [];
> +      var result;
> +
> +      result = loop.shared.urlRegExps.fullUrlMatch.exec(s);

You might as well make this `var result = ...` and remove the `var` declaration here.

@@ +71,5 @@
> +
> +      result = loop.shared.urlRegExps.fullUrlMatch.exec(s);
> +      while (result) {
> +        // if there's text preceding the first link, push it onto the array
> +        // and update the string pointer

nit: missing dot.

@@ +77,5 @@
> +          elements.push(s.substr(0, result.index));
> +          s = s.substr(result.index);
> +        }
> +
> +        // Push the first link itself, and advance the string pointer again

nit: missing dot.

@@ +85,5 @@
> +          </a>
> +        );
> +        s = s.substr(result[0].length);
> +
> +        // check for another link, and perhaps continue...

nit: 'Check'.

@@ +101,5 @@
> +      var elements = this.parseStringToElements(this.props.rawText);
> +
> +      // React needs all elements of an array to be keyed for efficient
> +      // rendering.
> +      var keyedElements = elements.map(

Why not keep a counter in `parseStringToElements` and add the `key` attribute to the constructed <a> directly? Saves you another loop.

@@ +111,5 @@
> +        });
> +
> +      return ( <p>{ keyedElements }</p> );
> +    }
> +    });

nit: indentation.

@@ +113,5 @@
> +      return ( <p>{ keyedElements }</p> );
> +    }
> +    });
> +
> +  return {

Please just return `LinkifiedTextView`, since that's the only component this file will ever export.

::: browser/components/loop/content/shared/js/textChatView.jsx
@@ +57,5 @@
>        });
>  
> +      var optionalProps = {};
> +      if (navigator.mozLoop) {
> +        optionalProps.linkClickHandler = navigator.mozLoop.openURL;

I'd recommend adding this as an optional property that is passed along down from <TextChatView/>.
However, since we've got the unified, shared mediaView, we need to discuss how we should approach this better - because passing down props through a dozen components is not sustainable.
In the E10S work/ bug I made a sharedUtils.isDesktop() function, which could be an improvement and the RPC mechanism allows for more shallow abstractions.

</long_story> ;-)

@@ +58,5 @@
>  
> +      var optionalProps = {};
> +      if (navigator.mozLoop) {
> +        optionalProps.linkClickHandler = navigator.mozLoop.openURL;
> +      }

This still sets the `target` and `referer` attributes on the links, which are not necessary anymore on desktop.

@@ +64,3 @@
>        return (
>          <div className={classes}>
> +          <sharedViews.linkifiedText.View {...optionalProps}

...so this will become `<sharedViews.LinkifiedText .../>`

::: browser/components/loop/test/shared/textChatView_test.js
@@ +251,5 @@
>        expect(node.querySelector(".text-chat-entry-timestamp")).to.not.eql(null);
>      });
> +
> +    // note that this is really an integration test to be sure that we don't
> +    // inadvertently regress using linkifiedText.View.

Please update this comment when you update linkifiedTextView.jsx...
Attachment #8640935 - Flags: review?(mdeboer) → review+
Mentioned nits have been fixed.

(In reply to Mike de Boer [:mikedeboer] from comment #37)
> As a side-note: I don't quite see _why_ the MIT license text needs to
> spelled out in its entirety in two places within this patch. It's not like
> there are multiple.

You might well be right, but we do seem to treat different variants of the same license as multiple in license.html, though perhaps that's not entirely analogous.  That said, I feel like the pragmatic choice is not to spend more energy on this, so I'm going to leave it as-is.

> @@ +101,5 @@
> > +      var elements = this.parseStringToElements(this.props.rawText);
> > +
> > +      // React needs all elements of an array to be keyed for efficient
> > +      // rendering.
> > +      var keyedElements = elements.map(
> 
> Why not keep a counter in `parseStringToElements` and add the `key`
> attribute to the constructed <a> directly? Saves you another loop.

Good catch; done.  This was a leftover from an earlier iteration.

> @@ +113,5 @@
> > +      return ( <p>{ keyedElements }</p> );
> > +    }
> > +    });
> > +
> > +  return {
> 
> Please just return `LinkifiedTextView`, since that's the only component this
> file will ever export.

Done, along with other related comments.

> ::: browser/components/loop/content/shared/js/textChatView.jsx
> @@ +57,5 @@
> >        });
> >  
> > +      var optionalProps = {};
> > +      if (navigator.mozLoop) {
> > +        optionalProps.linkClickHandler = navigator.mozLoop.openURL;
> 
> I'd recommend adding this as an optional property that is passed along down
> from <TextChatView/>.
> However, since we've got the unified, shared mediaView, we need to discuss
> how we should approach this better - because passing down props through a
> dozen components is not sustainable.
> In the E10S work/ bug I made a sharedUtils.isDesktop() function, which could
> be an improvement and the RPC mechanism allows for more shallow abstractions.
> 
> </long_story> ;-)

Agreed that our current setup is not ideal, and that passing a zillion props down the tree is not ideal either.  Upcoming cleanup of React.context is likely to help us out here.

> @@ +58,5 @@
> >  
> > +      var optionalProps = {};
> > +      if (navigator.mozLoop) {
> > +        optionalProps.linkClickHandler = navigator.mozLoop.openURL;
> > +      }
> 
> This still sets the `target` and `referer` attributes on the links, which
> are not necessary anymore on desktop.

Added a test and code to avoid this.
Updated with review comments addressed; carrying forward r=mikedeboer & r=gerv for license changes
Attachment #8640935 - Attachment is obsolete: true
Attachment #8642614 - Flags: review+
Updated version with additions to code-coverage configuration
Attachment #8642614 - Attachment is obsolete: true
Attachment #8642646 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ece4d8f79cfc
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Approval Request Comment
[Feature/regressing bug #]:

Bug 1176280 (this bug)

[User impact if declined]:

Text Chat for Hello goes out with Firefox 41.  Auto-linkification of at least http(s) URLs is a table-stakes feature for any sort of web-based chat, and one that plenty of Nightly/Aurora users have complained about missing.  User impact if declined is that the premier of the text chat feature for Hello will feel incomplete.

[Describe test coverage new/current, TreeHerder]:

Test coverage for both JS files added here hits 100% of the statements, according to our code-coverage tool.  Testing includes coverage for ASCII links being properly linkified, as well as several tests that sample HTML strings are not vulnerable to CSRF attacks on the UI, and it only linkifies stuff starting with http:// and https://, to avoid attacks by other protocols.  These tests are all run on treeherder with each checkin.

[Risks and why]: 

This patch has been baking on nightly for most of a week.  No bugs have been found and filed during that time.  

It is a heuristic-based approach, so most of the risk is that the heuristics are sometimes wrong, and things that should be linkified aren't, or things are linkified incorrectly, leading to users having pasted links somtimes not work.  This is very standard in the industry (eg looking at Apple and Facebook auto-linkification shows a various edge case errors).  The known cases of that are listed in bug 1186254, and are believed to be within tolerable limits.

[String/UUID change made/needed]:

None.
Attachment #8645218 - Flags: approval-mozilla-aurora?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cdd2fde417f is a try server of this commit on Aurora.  I went through all the oranges; they're all either known intermittents, or in no way plausibly related to this commit.  In other words, things look clear for landing this after getting approval.
Comment on attachment 8645218 [details] [diff] [review]
Make Hello chat links clickable (Aurora 41 port)

[Triage Comment]

Approved for uplift to Beta41. Dev has confirmed that the fix was locally verified and automated test coverage is at 100% pass.
Attachment #8645218 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+
Looks like this has pretty good automated coverage. CC'ing Bogdan so he's aware of this implementation.
Flags: qe-verify-
Flags: in-testsuite?
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.