<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Updated

•

10 years ago

Rank: 13

Priority: -- → P1

Assignee

Updated

•

10 years ago

Assignee: nobody → dmose

Assignee

Comment 1

•

10 years ago

This turns out to be a little more interesting than one might think. The common way to do it is to use a regexp, which is edge-casey and generally asking for trouble. See <http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links/21925491#21925491> for details on that. After poking around a bit, Autolinker.js, as suggested by the author of that post, looks pretty good. It has lots of contributors and forks, compatible license (MIT), no dependencies, doesn't need a CommonJS loader, seems to be regularly maintained, has a big test suite (substantially more comprehensive than the other libraries I looked it, including all the react ones I could find). There are a few downsides as well: * The minified version is 15K, which seems like a lot to swallow for the link-clicker, particularly on mobile. * It's going to need at least one i18n fix: https://github.com/gregjacobs/Autolinker.js/issues/89 Whatever linkification solution we go with, we're also going to need URL sanitization too. <https://github.com/gregjacobs/Autolinker.js/issues/23> suggests using JsHtmlSanitizer, but that seems like a pretty big hammer. More on that in a bit.

Assignee

Comment 2

•

10 years ago

Given that React already sanitizes the HTML for us: <cite> All string values are sanitized before inserted into the DOM (except for CSS styles which is a known wontfix issue for IE8). </cite> (from the beginning of https://github.com/facebook/react/issues/3473 written by one of the core devs) I think all we really need to sanitize here might be the URLs. https://github.com/facebook/react/issues/3473#issuecomment-91349525 has some discussion of this, and I'm wondering if there's anything else we could/should do outside of whitelisting the http and https URL schemes. One wrinkle is that this code will be running in two different contexts: * with the semi-elevated about:loop* privs on the embedded desktop conversation window * with normal DOM privs in the link-clicker webapp abr, I'd love your thoughts on this, as well as any pointers as to other security experts/sources that it would be good to solicit opinions on href sanitization from.

Flags: needinfo?(adam)

Comment 3

•

10 years ago

I have long ago concluded that Matt knows something about pretty much everything, so I'm asking him to weigh in if he has cycles. I'm also tagging Valentin, as I believe he's been doing some URL parsing work recently. (If either of you don't have time to look into this, feel free to clear your ni flag without comment). For my own position: I'm concerned that none of these seem to have any hope of operating properly on non-ASCII URLs. On the other hand, as Autolinker has two kitchen sinks and a stove in it and still can't get i18n right, I shudder to think how large a library would have to be in order to get i18n correct. While the "UTF8 in the middle of a path" problem is probably easy to fix, there are some much trickier problems in here. In particular, I think it would be a bit too provincial to choose an approach that couldn't at least handle URLs like these: - http://www.投资者关系.中国/en/showcases.php?code=000063.SZ - http://الوزيرالاول.الجزائر/ I suspect we can probably go out the door with a simple approach that only works with strings starting with "http://" or "https://", and then try to get smarter after we've had more time to investigate approaches. Basically, we would look for either of those two strings to find a start of a URL, scan until we find a character that violates URL ABNF [1] (considering all UTF-8 characters not to be a violation regardless of where they appear), and us that as the string we're linkifying. ____ [1] By which I mean the ABNF defined here: https://tools.ietf.org/html/rfc3986#appendix-A

Flags: needinfo?(valentin.gosu)

Flags: needinfo?(linuxwolf)

Flags: needinfo?(adam)

Assignee

Comment 4

•

10 years ago

(In reply to Adam Roach [:abr] from comment #3) > I suspect we can probably go out the door with a simple approach that only > works with strings starting with "http://" or "https://", and then try to > get smarter after we've had more time to investigate approaches. By this, I assume you're suggesting go out the door with only that approach, not that approach in addition to (eg) Autolinker? That seems like a reasonable starting point to me, though it doesn't handle the use case of heuristically linkifying stuff that doesn't start with a scheme, as requested in the design doc for text chat: https://docs.google.com/document/d/1HwYbM4Eru3rHVT3MStO8eeTunJ7pN5PYBingtmwe-bA/edit# on page 19. > we would look for either of those two strings to find a start of a URL, scan > until we find a character that violates URL ABNF [1] (considering all UTF-8 > characters not to be a violation regardless of where they appear), and us > that as the string we're linkifying. As far as validating the URL after we've found it, I wonder if it would make sense to let the browser handle it for us, eg http://www.joezimjs.com/javascript/the-lazy-mans-url-parsing/ or https://developer.mozilla.org/en-US/docs/Web/API/URL

Flags: needinfo?(adam)

Valentin Gosu [:valentin] (he/him)

Comment 5

•

10 years ago

(In reply to Dan Mosedale (:dmose) - needinfo? me for response from comment #4) > By this, I assume you're suggesting go out the door with only that approach, > not that approach in addition to (eg) Autolinker? Right, at least initially. > That seems like a reasonable starting point to me, though it doesn't handle > the use case of heuristically linkifying stuff that doesn't start with a > scheme, as requested in the design doc for text chat: > https://docs.google.com/document/d/1HwYbM4Eru3rHVT3MStO8eeTunJ7pN5PYBingtmwe- > bA/edit# > on page 19. This would be a PM trade-off evaluation, then. I agree with you that 15kB is a silly amount of code to send for this function (especially when it's not even complete enough to handle the two URLs I indicate above). Sometimes the UX has to be modified based on how what the incremental cost is for a given behavior. > As far as validating the URL after we've found it, I wonder if it would make > sense to let the browser handle it for us I think you misunderstand. The point of scanning against the ABNF isn't to make sure the URL is valid (although that will be a side effect): it's to make an intelligent decision about where the URL ends. I suppose you could do something like scan the string "check out https://www.yahoo.com/featured/ and tell me what you think" by checking incrementally longer strings to see where a URL stops being valid, but this would entail something like checking: parser.href = 'https://w' - valid parser.href = 'https://ww' - valid parser.href = 'https://www' - valid parser.href = 'https://www.' - valid parser.href = 'https://www.y' - valid parser.href = 'https://www.ya' - valid parser.href = 'https://www.yah' - valid parser.href = 'https://www.yaho' - valid parser.href = 'https://www.yahoo' - valid parser.href = 'https://www.yahoo.' - valid parser.href = 'https://www.yahoo.c' - valid parser.href = 'https://www.yahoo.co' - valid parser.href = 'https://www.yahoo.com' - valid parser.href = 'https://www.yahoo.com/' - valid parser.href = 'https://www.yahoo.com/f' - valid parser.href = 'https://www.yahoo.com/fe' - valid parser.href = 'https://www.yahoo.com/fea' - valid parser.href = 'https://www.yahoo.com/feat' - valid parser.href = 'https://www.yahoo.com/featu' - valid parser.href = 'https://www.yahoo.com/featur' - valid parser.href = 'https://www.yahoo.com/feature' - valid parser.href = 'https://www.yahoo.com/featured' - valid parser.href = 'https://www.yahoo.com/featured/' - valid parser.href = 'https://www.yahoo.com/featured/ ' - invalid And then deciding that the string must be "https://www.yahoo.com/featured/", since incorporating the next character makes it break. Even at that, I think that the HREF parser will percent-escape invalid characters and consider the resulting string to be valid, making it impossible to use for this purpose (unless you were doing something like checking whether something got percent encoded when you passed it through the parser, but I believe that breaks on UTF8).

Flags: needinfo?(adam)

Comment 6

•

10 years ago

This sounds like a nice idea, but I wouldn't go as far as trying to `linkify` URLs after every keypress. Parsing URLs can be fairly slow, so I suggest doing it only when a message is submitted. If at all. A few thoughts: - our URL parsing requires a scheme, so it wouldn't work out of the box for the ones starting with www. - URL parsing generally does UTF/percent encoding and decoding thus changing the URL - it might be possible to use nsIURIFixup to validate the URL, but that might be slow (I don't know if slower than a library) - from what I've seen, a lot of URL linkifiers break with certain unicode characters (or even unexpected ASCII ones), so if we decide to pursue this feature I suggest we go for a simple and fast linker library, and not worry too much about covering all of the corner cases.

Flags: needinfo?(valentin.gosu)

Matthew Miller [:m_and_m]

Comment 7

•

10 years ago

(In reply to Adam Roach [:abr] from comment #3) > I have long ago concluded that Matt knows something about pretty much > everything, so I'm asking him to weigh in if he has cycles. I'm also tagging > Valentin, as I believe he's been doing some URL parsing work recently. (If > either of you don't have time to look into this, feel free to clear your ni > flag without comment). > I got strange looks when I first read this, as I literally LOL'd on a bus (-: > For my own position: I'm concerned that none of these seem to have any hope > of operating properly on non-ASCII URLs. On the other hand, as Autolinker > has two kitchen sinks and a stove in it and still can't get i18n right, I > shudder to think how large a library would have to be in order to get i18n > correct. While the "UTF8 in the middle of a path" problem is probably easy > to fix, there are some much trickier problems in here. > As someone that's tried to do the regexp approach multiple times, I can tell you it gets very messy very very quickly. Autolinker has always felt like it's too heavy to me, so I've never tried to use it. > In particular, I think it would be a bit too provincial to choose an > approach that couldn't at least handle URLs like these: > > - http://www.投资者关系.中国/en/showcases.php?code=000063.SZ > - http://الوزيرالاول.الجزائر/ > > I suspect we can probably go out the door with a simple approach that only > works with strings starting with "http://" or "https://", and then try to > get smarter after we've had more time to investigate approaches. Basically, > we would look for either of those two strings to find a start of a URL, scan > until we find a character that violates URL ABNF [1] (considering all UTF-8 > characters not to be a violation regardless of where they appear), and us > that as the string we're linkifying. > This would work for the first release targeting web geeks, but most people will consider linkifying broken. They will expect to get links out of www.example.net, nobbiton.example, and often even nobody@example.com. What I've seen is that you don't have to be perfect, or really approach perfect; it just needs to get close. I will admit that I have continued to use regex, at least for a first-pass. 1) Use a regex to find candidates that require closer scrutiny. 2) Then try to rule it out based on a handful of things from RFC3896 that seem important (that list has evolved over time, but I don't have mine handy )-: ). It's horrible, it's not even close perfect, but it seems to have satisfied enough people enough of the time. It will also be code that gets tweaked with almost every release, and that can't be helped. Also, I don't try to do it as the user types. Chasing the edge cases of editing (e.g., select + insert-replace + backspace + backspace) are spectacularly not fun. Hope this helps.

Flags: needinfo?(linuxwolf)

Comment 8

•

10 years ago

(In reply to Valentin Gosu [:valentin] from comment #6) > This sounds like a nice idea, but I wouldn't go as far as trying to > `linkify` URLs after every keypress. Parsing URLs can be fairly slow, so I > suggest doing it only when a message is submitted. If at all. I'm apparently doing a surprisingly bad job of expressing my thoughts using words today. I wasn't proposing doing an evaluation on every keystroke; in fact, I wasn't proposing using the existing URL parser to do this job at all. The example I gave above was intended to demonstrate that the URL parser -- which is, as far as I know, only designed to work with URLs rather than URLs embedded in other text -- is a poor fit for the task. _____ I'll start again from the beginning. To linkify a URL, there are basically three prerequisite tasks: 1) Detect that a URL is present 2) If a URL is present, find the beginning of the URL 3) If a URL is present, find the end of the URL In the general case (dealing with URLs that have no schemes), all three steps are hard. If you only deal with URLs that have two known schemes, and require those schemes to actually be present, then those first *two* steps are trivial. However, if we go that path, we still need to solve the somewhat tricky problem of finding the *end* of the URL. That's the question to answer here: given an arbitrary string, how do you find the end of a URL? I have a proposal above, but you need to internalize the problem -- finding the end of a URL in an arbitrary string -- before it makes any sense. Maybe there's something about our URL parser that makes it well suited for this task. My experience using it and the developer.m.o page don't give any obvious ways for finding the end of a URL in an arbitrary block of text, but it's completely possible that I've overlooked something. _______ Now, that's all relevant if we try to come up with something ourselves. That would be driven by a desire to ship something that works with UTF8 in general (I'm comfortable that what I propose above would handle UTF8 rather completely), and to ship something that's not quite as large as Autolinker. Clearly, we could also bite the bullet and just ship Autolinker. For the bloat, we'd get a far more aggressive link detection algorithm, but we'd also incorporate known weak (and possibly non-existent) UTF8 support. I may be putting a bit too much import on the "global" part of our principle that "The Internet is a global public resource that must remain open and accessible," but the thought of leaving out IDN websites and websites with non-Latin characters in their paths at least seems reason for rumination. Of course, we could also bring Autolinker in, fix the i18n problems it has, and then submit pull requests upstream -- but I don't know where we'd get the resources for that. _____ And that takes us to... (In reply to Matthew Miller [:linuxwolf] from comment #7) > This would work for the first release targeting web geeks, but most people > will consider linkifying broken. They will expect to get links out of > www.example.net, nobbiton.example, and often even nobody@example.com. Which, the more I think about it, the more I'm inclined to agree, but it requires us to keep up with the explosion of TLDs. I wonder how hard it would be to update Autolinker to automatically grab http://data.iana.org/TLD/tlds-alpha-by-domain.txt as part of its operation.

Comment 9

•

10 years ago

(In reply to Adam Roach [:abr] from comment #8) > I wonder how hard it > would be to update Autolinker to automatically grab > http://data.iana.org/TLD/tlds-alpha-by-domain.txt as part of its operation. Actually, scratch that -- if Hello ever gets successful, that would be an unfair load to impose on IANA. We'd need to mirror it if we intend to use it.

Comment 10

•

10 years ago

Not sure if it helps, but Diego Perini's regex over at https://mathiasbynens.be/demo/url-regex seems like a good enough fit here to allow for fast, relatively safe and UTF-8 friendly URL detection. If we want to support more than _just_ the http/ https protocol, or even the ability to omit the protocol, we can adjust the regex to make that work.

Comment 11

•

10 years ago

(In reply to Mike de Boer [:mikedeboer] from comment #10) > Not sure if it helps, but Diego Perini's regex over at > https://mathiasbynens.be/demo/url-regex seems like a good enough fit here to > allow for fast, relatively safe and UTF-8 friendly URL detection. That's for URL validation, not URL detection. The use of regexes for detection has a problem: most of them that I've found in my searches contain productions like "(([0-9a-z_-]+\.)+", which can turn into O(n^2) for certain character sequences. See https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS and http://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex Actual strings that trigger the DOS are discussed in this thread: https://gist.github.com/gruber/8891611#comment-1310172

Updated

•

10 years ago

Whiteboard: [chat]

Updated

•

10 years ago

Points: --- → 5

Updated

•

10 years ago

Iteration: --- → 42.1 - Jul 13

Assignee

Comment 12

•

10 years ago

After further discussion with Adam, the current plan is to suck up the Autolinker issues, particularly if we can get at least basic i18n (aka UTF8 in URL) support going. Everything else seems like it's got nastier problems.

Assignee

Comment 13

•

10 years ago

Adam points out that it'll be a good idea to wrap this API in a generic wrapper so that we can swap it out for another library later, should such a thing come along.

Assignee

Comment 14

•

10 years ago

This is an impressively deep rathole. To help clarify my thinking, here are the things I think we absolutely must have these two things: a) linkifies URLs with schemes http: and https: (and no other schemes) b) secure against HTML injection (typed HTML on one side can't be injected or XSS other side) c) don't add a large amount of bloat to the link-clicker (particularly on mobile) Doing (or re-using) a regexp-based approach inside React should get the above two things pretty cheaply, because React itself does a good job of refusing to stuff unsanitized HTML into the DOM unless you make it really explicit that you're doing it intentionally. We really want: d) linkifies things that looks like URLs without schemes (eg linkifies "I usually use yahoo.com for my searches") This is harder, because it requires an approach that knows about TLDs. None of the React libraries I've found do this. We could use Autolinkify.js or linkifyjs v2.0. Both are big (~15k minified/5k gzipped minified), and neither one include an HTML sanitizer, which would add even more bloat and stuff to keep up-to-date, since React wouldn't be helping us out here. e) have a reasonable amount of automated testing A number of the libraries have a not totally trivial amount of unit tests. Autolinkify looks like the only one with good coverage. f) supports URLs with UTF8 (some subset of international domains) I suspect (but haven't tested, and might well be wrong!) that we could fix up whatever version we get to do this. g) supports URLs with IDN (most/all international domains) The one thing that even _tries_ to do this at all is <https://github.com/webmodules/urlregexp>. It generates a 24k minified RegExp, _and_ requires it's own regexp engine (XRegeX) in addition. There are no automated tests at all. This might not to be too hard to fix if/when we get a version of React that uses babel and turns on ES6 features, since ES6 has Unicode Regexp support. h) Denial-of-Service resistant (not vulnerable to O(n^k) loops in the browser that often crop up in complex regexps). Unlike pretty much everything else, linkifyjs2.0 appears to be almost the only thing that has very little RegExp use. All the rest is pretty much entirely regexp-y.

Assignee

Comment 15

•

10 years ago

So what I propose to do for this bug is just try and handle http:// and https:// links, using a React component for security. I'm currently looking at react-autolink and react-autolink-text, though those both seem to want CommonJS loaders. Hmmm.

Assignee

Comment 16

•

10 years ago

Which is to say, go for a, b, and c now; everything else later.

Assignee

Comment 17

•

10 years ago

For more on Unicode in JS, see <https://mathiasbynens.be/notes/es6-unicode-regex>. Good times!

Comment 18

•

10 years ago

Again, I think the solution _should_ be as simple as a modified version of https://gist.github.com/dperini/729294: re_weburl = /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S*)?/gi ... which doesn't match from start to end of a string and makes the protocol part optional. This results in: "lalala a www.google.com/ loeloeloe adad".match(re_weburl) --> [ "www.google.com/" ] "lalala a www.google.com/ or http://उदाहरण.परीक्षा loeloeloe adad".match(re_weburl) --> [ "www.google.com/", "http://उदाहरण.परीक्षा " ]

Comment 19

•

10 years ago

(In reply to Mike de Boer [:mikedeboer] from comment #18) > Again, I think the solution _should_ be as simple as a modified version of > https://gist.github.com/dperini/729294: > > re_weburl = > /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1, > 3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0- > 1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0- > 4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1- > \uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a- > z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/ > ?#]\S*)?/gi > > ... which doesn't match from start to end of a string and makes the protocol > part optional. > > This results in: > > "lalala a www.google.com/ loeloeloe adad".match(re_weburl) > --> [ "www.google.com/" ] > "lalala a www.google.com/ or http://उदाहरण.परीक्षा loeloeloe > adad".match(re_weburl) > --> [ "www.google.com/", "http://उदाहरण.परीक्षा " ] Off the top of my head, we have a couple of issues: "The problem is that every time I call widget.frob(), it hangs.".match(re_weburl) --> [ 'widget.frob(' ] # Oh, that's going to get annoying fast. "ƪ(‾.‾“)┐".match(re_weburl) # Note: this is a real emoticon found in the wild. --> [ '‾.‾“)' ] # Probably violates user expectations There's also a bug here, but that's fixable: "I found it at amazon.com last time I went looking".match(re_weburl) --> [ 'amazon.com ' ] # Note trailing space The real problem is that this is going to match way, way too many things. There are really only two viable ways to do this: either (a) look for and require a scheme, or (b) have a complete list of domains and incorporate them into the matching. Since we've decided that (a) isn't going to fly, we have to fall back to (b).

Assignee

Comment 20

•

10 years ago

I totally agree that this approach is liberal and match plenty of strings that "shouldn't" be matched. That said, whether my (and others in this bug, I think, have the belief that users are somewhat tolerant here). abr, you've said "way, way too many things". The question in my mind is whether that "too many" is higher or lower than what users on the whole will perceive as reasonable. It's hard to imagine how to resolve this without some user testing. My gut is that being liberal and shipping sooner is better than being conservative and shipping later. FWIW, my work-in-progress is at <https://github.com/dmose/gecko-dev/pull/9/files> if you want to see what it feels like so far.

Assignee

Comment 21

•

10 years ago

s/whether my (and others in this bug, I think,/I, (and others in this bug, I think)/

Comment 22

•

10 years ago

To be clear, what I'm asking for isn't all that complicated. Take whatever regex you want to use and drop in something that matches "all valid TLDs" in the part of the expression that is intended to match the TLD portion of a URL. Granted, with the ICANN madness, this is bigger than it should be, but not unreasonably so. If I grab the current list of TLDs, munge the punycode into UTF8, then UTF16, then escape it, and then pass this whole mess through something that generates a minimal regex, you get: (((?:c(?:[cdgkmnvwxz]|o(?:n(?:s(?:truction|ulting)|(?:tractor|do)s)|m(?:m(?:unity|bank)|p(?:uter|any))?|u(?:(?:pon|rse)s|ntry)|(?:l(?:leg|ogn)|ffe)e|o(?:[lp]|king)|rsica|ach|des)?|a(?:[bl]|r(?:e(?:ers?)?|avan|tier|d?s)|n(?:cerresearch|on)|p(?:etown|ital)|s(?:[ah]|ino)|t(?:ering)?|m(?:era|p)|fe)?|h(?:r(?:istmas|ome)|a(?:nnel|t)|urch|eap|loe)?|l(?:o(?:thing|ud)|i(?:nic|ck)|eaning|aims|ub)?|r(?:edit(?:card)?|(?:uise)?s|icket|own)?|i(?:t(?:ic|y)|sco)?|y(?:(?:mr|o)u)?|e(?:nter|rn|o)|u(?:isinella)?|f[ad]?|b[an])|s(?:[bdgjlmrvxz]|c(?:[ab]|h(?:o(?:larships|ol)|midt|warz|ule)|ience|o[rt])?|a(?:ndvik(?:coromant)?|arland|msung|kura|le|rl|xo|p)?|o(?:l(?:utions|ar)|c(?:cer|ial)|ftware|n?y|hu)?|u(?:pp(?:l(?:ies|y)|ort)|r(?:gery|f)|zuki|cks)?|t(?:a(?:rhub|toil)|ud(?:io|y)|yle)?|h(?:o(?:es|w)|iksha|riram)?|p(?:readbetting|iegel|ace)|e(?:rvices|ner|xy?|at|w)?|k(?:y(?:pe)?|i)?|y(?:stems|dney)?|i(?:ngles|te)?|w(?:atch|iss)|n(?:cf)?)|b(?:[dfgjstvwy]|a(?:r(?:c(?:lay(?:card|s)|elona)|gains)?|n[dk]|uhaus|yern)?|r(?:o(?:th|k)er|idgestone|adesco|ussels)?|u(?:ild(?:ers)?|dapest|siness|zz)|l(?:ack(?:friday)?|oomberg|ue)|i(?:[doz]|(?:bl|k)e|ngo?)?|e(?:ntley|rlin|er|st)?|o(?:utique|ats|nd|o)?|n(?:pparibas|l)?|b(?:va|c)?|h(?:arti)?|mw?|zh?|cn)|a(?:[ow]|c(?:c(?:ountants?|enture)|t(?:ive|or)|ademy)?|i(?:r(?:force|tel)|g)?|b(?:b(?:ott)?|ogado)|u(?:ction|tos?|dio)?|l(?:lfinanz|sace)?|s(?:sociates|ia)?|p(?:artments|p)|r(?:chi|my|pa)?|(?:msterda)?m|q(?:uarelle)?|t(?:torney)?|d(?:ult|s)?|n(?:droid)?|e(?:ro|g)?|g(?:ency)?|z(?:ure)?|fl?|xa?)|m(?:[cdghklnpqrsvwxyz]|o(?:n(?:tblanc|ash|ey)|v(?:i(?:star|e))?|r(?:tgage|mon)|torcycles|scow|bi|da|e)?|a(?:r(?:ket(?:ing|s)?|riott)|n(?:agement|go)|i(?:son|f)|drid)?|e(?:m(?:orial|e)|lbourne|dia|nu?|et)?|i(?:(?:am|n)i|crosoft|l)|t(?:pc|n)?|u(?:seum)?|ma?|ba)|p(?:[efgkmnstwy]|r(?:o(?:d(?:uctions)?|pert(?:ies|y)|f)?|axi|ess)?|h(?:oto(?:graphy|s)?|armacy|ilips|ysio)?|a(?:r(?:t(?:(?:ner)?s|y)|is)|nerai|ge)?|i(?:c(?:t(?:ures|et)|s)|aget|zza|nk)|l(?:u(?:mbing|s)|a(?:ce|y))?|o(?:ker|hl|rn|st)|ub)|g(?:[fhnpqstwy]|o(?:[pv]|l(?:d(?:point)?|f)|o(?:g(?:le)?)?)|r(?:a(?:phic|ti)s|een|ipe)?|a(?:l(?:lery)?|rden|me)?|u(?:i(?:tars|de)|ge|ru)?|l(?:ob(?:al|o)|ass|e)?|e(?:nt(?:ing)?)?|i(?:fts?|ves)?|m(?:[ox]|ail)?|b(?:iz)?|g(?:ee)?|dn?)|t(?:[cdfgjklmntvwz]|e(?:ch(?:nology)?|l(?:efonica)?|masek|nnis|am)|o(?:(?:ol|ur|y)s|[dr]ay|shiba|kyo|wn|p)?|r(?:a(?:d(?:ing|e)|ining|vel)|ust)?|i(?:(?:cket|p)s|r(?:es|ol)|enda)|a(?:t(?:too|ar)|ipei|xi?)|h(?:eater|d)?|ui)|f(?:[jm]|i(?:nanc(?:ial|e)|sh(?:ing)?|t(?:ness)?|rmdale|lm)?|o(?:r(?:sale|ex|um)|o(?:tball)?|undation)?|l(?:o(?:rist|wers)|smidth|ights|y)|a(?:i(?:th|l)|shion|ns?|rm)|u(?:rniture|tbol|nd)|r(?:ogans|l)?|(?:eedbac)?k|yi)|d(?:[jmz]|e(?:nt(?:ist|al)|si(?:gn)?|livery|mocrat|gree|als|v)?|i(?:(?:scoun|e)t|rect(?:ory)?|amonds|gital)|o(?:(?:main|c)s|wnload|osan|ha|g)?|a(?:[dy]|t(?:ing|sun|e)|bur|nce)|(?:cl)?k|urban|rive|vag|np)|e(?:[ceg]|n(?:gineer(?:ing)?|terprises|ergy)|x(?:p(?:osed|ress|ert)|change)|u(?:rovision|s)?|ve(?:rbank|nts)|(?:quipmen)?t|du(?:cation)?|m(?:erck|ail)|s(?:tate|q)?|a(?:rth|t)|r(?:ni)?|pson)|l(?:[bckrvy]|i(?:m(?:ited|o)|ghting|[fv]e|aison|dl|nk)?|a(?:t(?:robe)?|w(?:yer)?|caixa|salle|nd)?|o(?:tt[eo]|ans?|ndon|ve|l)|u(?:x(?:ury|e)|pin)?|e(?:clerc|ase|gal)|t(?:da)?|d?s|gbt)|r(?:e(?:p(?:ublican|air|ort)|n(?:t(?:als)?)?|s(?:tauran)?t|alt(?:or|y)|d(?:stone)?|i(?:sen?|t)|views?|cipes|hab)?|i(?:[op]|co?h)|o(?:cks|deo)?|u(?:hr|n)?|s(?:vp)?|acing|yukyu|w)|h(?:[kmnrtu]|o(?:l(?:dings|iday)|t(?:eles|mail)|me(?:depot|s)|st(?:ing)?|[ru]se|ckey|nda|w)|e(?:r(?:mes|e)|althcare|lp)|a(?:mburg|ngout|us)|i(?:tachi|phop|v)|sbc)|i(?:[deloqs]|n(?:[gk]|(?:vestment|dustrie)s|t(?:ernational)?|s(?:titut|ur)e|f(?:initi|o))?|m(?:mo(?:bilien)?)?|(?:ine)?t|c(?:bc|u)|r(?:ish)?|[bf]m|wc)|v(?:[cgu]|i(?:s(?:ta(?:print)?|ion)|(?:aje|lla)s|deo)?|e(?:(?:nture|ga)s|rsicherung|t)?|o(?:t(?:[eo]|ing)|yage|dka)|(?:laandere)?n|a(?:cations)?)|w(?:[fs]|e(?:b(?:site|cam)|d(?:ding)?|ir)|i(?:n(?:dows)?|lliamhill|en|ki)|a(?:l(?:ter|es)|tch|ng)|or(?:ks?|ld)|hoswho|t[cf]|me)|n(?:[flopuz]|e(?:t(?:(?:ban|wor)k)?|ustar|ws?|xus|c)?|a(?:goya|dex|me|vy)?|i(?:ssan|nja|co)?|r[aw]?|go?|y?c|hk|tt)|o(?:r(?:a(?:cl|ng)e|g(?:anic)?)|(?:(?:tsu|sa)k|kinaw)a|n(?:[eg]|l(?:ine)?)|m(?:ega)?|ffice|oo|vh)|k(?:[eghmpwz]|i(?:tchen|wi|m)?|o(?:matsu|eln)|(?:aufe)?n|r(?:e?d)?|y(?:oto)?|ddi)|j(?:o(?:b(?:urg|s))?|e(?:welry|tzt)?|p(?:rs)?|l[cl]|uegos|ava|cb|m)|y(?:[et]|o(?:(?:koham|g)a|dobashi|utube)|a(?:chts|ndex))|u(?:[agksyz]|n(?:iversity|o)|ol)|x(?:(?:(?:er|b)o|x)x|in|yz)|z(?:[amw]|uerich|one|ip)|q(?:uebec|pon|a)))|\u4F5B\u5C71|\u6148\u5584|\u96C6\u56E2|\u5728\u7EBF|\uD55C\uAD6D|\u09AD\u09BE\u09B0\u09A4|\u516B\u5366|\u0645\u0648\u0642\u0639|\u516C\u76CA|\u516C\u53F8|\u79FB\u52A8|\u6211\u7231\u4F60|\u043C\u043E\u0441\u043A\u0432\u0430|\u049B\u0430\u0437|\u043E\u043D\u043B\u0430\u0439\u043D|\u0441\u0430\u0439\u0442|\u0441\u0440\u0431|\u0431\u0435\u043B|\u65F6\u5C1A|\u6DE1\u9A6C\u9521|\u043E\u0440\u0433|\uC0BC\uC131|\u0B9A\u0BBF\u0B99\u0BCD\u0B95\u0BAA\u0BCD\u0BAA\u0BC2\u0BB0\u0BCD|\u5546\u6807|\u5546\u5E97|\u5546\u57CE|\u0434\u0435\u0442\u0438|\u043C\u043A\u0434|\u5DE5\u884C|\u4E2D\u6587\u7F51|\u4E2D\u4FE1|\u4E2D\u56FD|\u4E2D\u570B|\u5A31\u4E50|\u8C37\u6B4C|\u0C2D\u0C3E\u0C30\u0C24\u0C4D|\u0DBD\u0D82\u0D9A\u0DCF|\u0AAD\u0ABE\u0AB0\u0AA4|\u092D\u093E\u0930\u0924|\u7F51\u5E97|\u0938\u0902\u0917\u0920\u0928|\u9910\u5385|\u7F51\u7EDC|\u0443\u043A\u0440|\u9999\u6E2F|\u98DE\u5229\u6D66|\u53F0\u6E7E|\u53F0\u7063|\u624B\u673A|\u043C\u043E\u043D|\u0627\u0644\u062C\u0632\u0627\u0626\u0631|\u0639\u0645\u0627\u0646|\u0627\u06CC\u0631\u0627\u0646|\u0627\u0645\u0627\u0631\u0627\u062A|\u0628\u0627\u0632\u0627\u0631|\u0627\u0644\u0627\u0631\u062F\u0646|\u0628\u06BE\u0627\u0631\u062A|\u0627\u0644\u0645\u063A\u0631\u0628|\u0627\u0644\u0633\u0639\u0648\u062F\u064A\u0629|\u0633\u0648\u062F\u0627\u0646|\u0645\u0644\u064A\u0633\u064A\u0627|\u653F\u5E9C|\u0634\u0628\u0643\u0629|\u10D2\u10D4|\u673A\u6784|\u7EC4\u7EC7\u673A\u6784|\u5065\u5EB7|\u0E44\u0E17\u0E22|\u0633\u0648\u0631\u064A\u0629|\u0440\u0443\u0441|\u0440\u0444|\u062A\u0648\u0646\u0633|\u307F\u3093\u306A|\u30B0\u30FC\u30B0\u30EB|\u4E16\u754C|\u0A2D\u0A3E\u0A30\u0A24|\u7F51\u5740|\u6E38\u620F|verm\u00F6gensberater|verm\u00F6gensberatung|\u4F01\u4E1A|\u4FE1\u606F|\u0645\u0635\u0631|\u0642\u0637\u0631|\u5E7F\u4E1C|\u0B87\u0BB2\u0B99\u0BCD\u0B95\u0BC8|\u0B87\u0BA8\u0BCD\u0BA4\u0BBF\u0BAF\u0BBE|\u0570\u0561\u0575|\u65B0\u52A0\u5761|\u0641\u0644\u0633\u0637\u064A\u0646|\u653F\u52A1) For example, you can drop this into Mike's proposed regex like this, although I'll note that it suffers the same bug I point out above that it usually pulls in one more trailing character than it should: var re_weburl = /(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(((?:c(?:[cdgkmnvwxz]|o(?:n(?:s(?:truction|ulting)|(?:tractor|do)s)|m(?:m(?:unity|bank)|p(?:uter|any))?|u(?:(?:pon|rse)s|ntry)|(?:l(?:leg|ogn)|ffe)e|o(?:[lp]|king)|rsica|ach|des)?|a(?:[bl]|r(?:e(?:ers?)?|avan|tier|d?s)|n(?:cerresearch|on)|p(?:etown|ital)|s(?:[ah]|ino)|t(?:ering)?|m(?:era|p)|fe)?|h(?:r(?:istmas|ome)|a(?:nnel|t)|urch|eap|loe)?|l(?:o(?:thing|ud)|i(?:nic|ck)|eaning|aims|ub)?|r(?:edit(?:card)?|(?:uise)?s|icket|own)?|i(?:t(?:ic|y)|sco)?|y(?:(?:mr|o)u)?|e(?:nter|rn|o)|u(?:isinella)?|f[ad]?|b[an])|s(?:[bdgjlmrvxz]|c(?:[ab]|h(?:o(?:larships|ol)|midt|warz|ule)|ience|o[rt])?|a(?:ndvik(?:coromant)?|arland|msung|kura|le|rl|xo|p)?|o(?:l(?:utions|ar)|c(?:cer|ial)|ftware|n?y|hu)?|u(?:pp(?:l(?:ies|y)|ort)|r(?:gery|f)|zuki|cks)?|t(?:a(?:rhub|toil)|ud(?:io|y)|yle)?|h(?:o(?:es|w)|iksha|riram)?|p(?:readbetting|iegel|ace)|e(?:rvices|ner|xy?|at|w)?|k(?:y(?:pe)?|i)?|y(?:stems|dney)?|i(?:ngles|te)?|w(?:atch|iss)|n(?:cf)?)|b(?:[dfgjstvwy]|a(?:r(?:c(?:lay(?:card|s)|elona)|gains)?|n[dk]|uhaus|yern)?|r(?:o(?:th|k)er|idgestone|adesco|ussels)?|u(?:ild(?:ers)?|dapest|siness|zz)|l(?:ack(?:friday)?|oomberg|ue)|i(?:[doz]|(?:bl|k)e|ngo?)?|e(?:ntley|rlin|er|st)?|o(?:utique|ats|nd|o)?|n(?:pparibas|l)?|b(?:va|c)?|h(?:arti)?|mw?|zh?|cn)|a(?:[ow]|c(?:c(?:ountants?|enture)|t(?:ive|or)|ademy)?|i(?:r(?:force|tel)|g)?|b(?:b(?:ott)?|ogado)|u(?:ction|tos?|dio)?|l(?:lfinanz|sace)?|s(?:sociates|ia)?|p(?:artments|p)|r(?:chi|my|pa)?|(?:msterda)?m|q(?:uarelle)?|t(?:torney)?|d(?:ult|s)?|n(?:droid)?|e(?:ro|g)?|g(?:ency)?|z(?:ure)?|fl?|xa?)|m(?:[cdghklnpqrsvwxyz]|o(?:n(?:tblanc|ash|ey)|v(?:i(?:star|e))?|r(?:tgage|mon)|torcycles|scow|bi|da|e)?|a(?:r(?:ket(?:ing|s)?|riott)|n(?:agement|go)|i(?:son|f)|drid)?|e(?:m(?:orial|e)|lbourne|dia|nu?|et)?|i(?:(?:am|n)i|crosoft|l)|t(?:pc|n)?|u(?:seum)?|ma?|ba)|p(?:[efgkmnstwy]|r(?:o(?:d(?:uctions)?|pert(?:ies|y)|f)?|axi|ess)?|h(?:oto(?:graphy|s)?|armacy|ilips|ysio)?|a(?:r(?:t(?:(?:ner)?s|y)|is)|nerai|ge)?|i(?:c(?:t(?:ures|et)|s)|aget|zza|nk)|l(?:u(?:mbing|s)|a(?:ce|y))?|o(?:ker|hl|rn|st)|ub)|g(?:[fhnpqstwy]|o(?:[pv]|l(?:d(?:point)?|f)|o(?:g(?:le)?)?)|r(?:a(?:phic|ti)s|een|ipe)?|a(?:l(?:lery)?|rden|me)?|u(?:i(?:tars|de)|ge|ru)?|l(?:ob(?:al|o)|ass|e)?|e(?:nt(?:ing)?)?|i(?:fts?|ves)?|m(?:[ox]|ail)?|b(?:iz)?|g(?:ee)?|dn?)|t(?:[cdfgjklmntvwz]|e(?:ch(?:nology)?|l(?:efonica)?|masek|nnis|am)|o(?:(?:ol|ur|y)s|[dr]ay|shiba|kyo|wn|p)?|r(?:a(?:d(?:ing|e)|ining|vel)|ust)?|i(?:(?:cket|p)s|r(?:es|ol)|enda)|a(?:t(?:too|ar)|ipei|xi?)|h(?:eater|d)?|ui)|f(?:[jm]|i(?:nanc(?:ial|e)|sh(?:ing)?|t(?:ness)?|rmdale|lm)?|o(?:r(?:sale|ex|um)|o(?:tball)?|undation)?|l(?:o(?:rist|wers)|smidth|ights|y)|a(?:i(?:th|l)|shion|ns?|rm)|u(?:rniture|tbol|nd)|r(?:ogans|l)?|(?:eedbac)?k|yi)|d(?:[jmz]|e(?:nt(?:ist|al)|si(?:gn)?|livery|mocrat|gree|als|v)?|i(?:(?:scoun|e)t|rect(?:ory)?|amonds|gital)|o(?:(?:main|c)s|wnload|osan|ha|g)?|a(?:[dy]|t(?:ing|sun|e)|bur|nce)|(?:cl)?k|urban|rive|vag|np)|e(?:[ceg]|n(?:gineer(?:ing)?|terprises|ergy)|x(?:p(?:osed|ress|ert)|change)|u(?:rovision|s)?|ve(?:rbank|nts)|(?:quipmen)?t|du(?:cation)?|m(?:erck|ail)|s(?:tate|q)?|a(?:rth|t)|r(?:ni)?|pson)|l(?:[bckrvy]|i(?:m(?:ited|o)|ghting|[fv]e|aison|dl|nk)?|a(?:t(?:robe)?|w(?:yer)?|caixa|salle|nd)?|o(?:tt[eo]|ans?|ndon|ve|l)|u(?:x(?:ury|e)|pin)?|e(?:clerc|ase|gal)|t(?:da)?|d?s|gbt)|r(?:e(?:p(?:ublican|air|ort)|n(?:t(?:als)?)?|s(?:tauran)?t|alt(?:or|y)|d(?:stone)?|i(?:sen?|t)|views?|cipes|hab)?|i(?:[op]|co?h)|o(?:cks|deo)?|u(?:hr|n)?|s(?:vp)?|acing|yukyu|w)|h(?:[kmnrtu]|o(?:l(?:dings|iday)|t(?:eles|mail)|me(?:depot|s)|st(?:ing)?|[ru]se|ckey|nda|w)|e(?:r(?:mes|e)|althcare|lp)|a(?:mburg|ngout|us)|i(?:tachi|phop|v)|sbc)|i(?:[deloqs]|n(?:[gk]|(?:vestment|dustrie)s|t(?:ernational)?|s(?:titut|ur)e|f(?:initi|o))?|m(?:mo(?:bilien)?)?|(?:ine)?t|c(?:bc|u)|r(?:ish)?|[bf]m|wc)|v(?:[cgu]|i(?:s(?:ta(?:print)?|ion)|(?:aje|lla)s|deo)?|e(?:(?:nture|ga)s|rsicherung|t)?|o(?:t(?:[eo]|ing)|yage|dka)|(?:laandere)?n|a(?:cations)?)|w(?:[fs]|e(?:b(?:site|cam)|d(?:ding)?|ir)|i(?:n(?:dows)?|lliamhill|en|ki)|a(?:l(?:ter|es)|tch|ng)|or(?:ks?|ld)|hoswho|t[cf]|me)|n(?:[flopuz]|e(?:t(?:(?:ban|wor)k)?|ustar|ws?|xus|c)?|a(?:goya|dex|me|vy)?|i(?:ssan|nja|co)?|r[aw]?|go?|y?c|hk|tt)|o(?:r(?:a(?:cl|ng)e|g(?:anic)?)|(?:(?:tsu|sa)k|kinaw)a|n(?:[eg]|l(?:ine)?)|m(?:ega)?|ffice|oo|vh)|k(?:[eghmpwz]|i(?:tchen|wi|m)?|o(?:matsu|eln)|(?:aufe)?n|r(?:e?d)?|y(?:oto)?|ddi)|j(?:o(?:b(?:urg|s))?|e(?:welry|tzt)?|p(?:rs)?|l[cl]|uegos|ava|cb|m)|y(?:[et]|o(?:(?:koham|g)a|dobashi|utube)|a(?:chts|ndex))|u(?:[agksyz]|n(?:iversity|o)|ol)|x(?:(?:(?:er|b)o|x)x|in|yz)|z(?:[amw]|uerich|one|ip)|q(?:uebec|pon|a)))|\u4F5B\u5C71|\u6148\u5584|\u96C6\u56E2|\u5728\u7EBF|\uD55C\uAD6D|\u09AD\u09BE\u09B0\u09A4|\u516B\u5366|\u0645\u0648\u0642\u0639|\u516C\u76CA|\u516C\u53F8|\u79FB\u52A8|\u6211\u7231\u4F60|\u043C\u043E\u0441\u043A\u0432\u0430|\u049B\u0430\u0437|\u043E\u043D\u043B\u0430\u0439\u043D|\u0441\u0430\u0439\u0442|\u0441\u0440\u0431|\u0431\u0435\u043B|\u65F6\u5C1A|\u6DE1\u9A6C\u9521|\u043E\u0440\u0433|\uC0BC\uC131|\u0B9A\u0BBF\u0B99\u0BCD\u0B95\u0BAA\u0BCD\u0BAA\u0BC2\u0BB0\u0BCD|\u5546\u6807|\u5546\u5E97|\u5546\u57CE|\u0434\u0435\u0442\u0438|\u043C\u043A\u0434|\u5DE5\u884C|\u4E2D\u6587\u7F51|\u4E2D\u4FE1|\u4E2D\u56FD|\u4E2D\u570B|\u5A31\u4E50|\u8C37\u6B4C|\u0C2D\u0C3E\u0C30\u0C24\u0C4D|\u0DBD\u0D82\u0D9A\u0DCF|\u0AAD\u0ABE\u0AB0\u0AA4|\u092D\u093E\u0930\u0924|\u7F51\u5E97|\u0938\u0902\u0917\u0920\u0928|\u9910\u5385|\u7F51\u7EDC|\u0443\u043A\u0440|\u9999\u6E2F|\u98DE\u5229\u6D66|\u53F0\u6E7E|\u53F0\u7063|\u624B\u673A|\u043C\u043E\u043D|\u0627\u0644\u062C\u0632\u0627\u0626\u0631|\u0639\u0645\u0627\u0646|\u0627\u06CC\u0631\u0627\u0646|\u0627\u0645\u0627\u0631\u0627\u062A|\u0628\u0627\u0632\u0627\u0631|\u0627\u0644\u0627\u0631\u062F\u0646|\u0628\u06BE\u0627\u0631\u062A|\u0627\u0644\u0645\u063A\u0631\u0628|\u0627\u0644\u0633\u0639\u0648\u062F\u064A\u0629|\u0633\u0648\u062F\u0627\u0646|\u0645\u0644\u064A\u0633\u064A\u0627|\u653F\u5E9C|\u0634\u0628\u0643\u0629|\u10D2\u10D4|\u673A\u6784|\u7EC4\u7EC7\u673A\u6784|\u5065\u5EB7|\u0E44\u0E17\u0E22|\u0633\u0648\u0631\u064A\u0629|\u0440\u0443\u0441|\u0440\u0444|\u062A\u0648\u0646\u0633|\u307F\u3093\u306A|\u30B0\u30FC\u30B0\u30EB|\u4E16\u754C|\u0A2D\u0A3E\u0A30\u0A24|\u7F51\u5740|\u6E38\u620F|verm\u00F6gensberater|verm\u00F6gensberatung|\u4F01\u4E1A|\u4FE1\u606F|\u0645\u0635\u0631|\u0642\u0637\u0631|\u5E7F\u4E1C|\u0B87\u0BB2\u0B99\u0BCD\u0B95\u0BC8|\u0B87\u0BA8\u0BCD\u0BA4\u0BBF\u0BAF\u0BBE|\u0570\u0561\u0575|\u65B0\u52A0\u5761|\u0641\u0644\u0633\u0637\u064A\u0646|\u653F\u52A1)).?)(?::\d{2,5})?(?:[/?#]\S*)?/gi; That said, I think the regex Mike found can be simplified a bit. The searching for numeric IPs, for example, seems like overkill -- and it accounts for the majority of the regex.

Comment 23

•

10 years ago

Attached file Script to create the regex (obsolete) — Details

By request, here's the perl script I used to create the regex in my previous comment.

Comment 24

•

10 years ago

Attached file Script to create the regex — Details

This time with the script instead of its output. :)

Attachment #8633735 - Attachment is obsolete: true

Updated

•

10 years ago

Iteration: 42.1 - Jul 13 → 42.2 - Jul 27

Assignee

Updated

•

10 years ago

Depends on: 1186240

Assignee

Updated

•

10 years ago

Blocks: 1186245

Assignee

Updated

•

10 years ago

Blocks: 1186254

Assignee

Comment 26

•

10 years ago

Attached patch Make links clickable (obsolete) — Details — Splinter Review

Attachment #8636862 - Flags: review?(standard8)

Assignee

Comment 27

•

10 years ago

There are a still a few more tests I want to write (particularly around special character handling and attempted HTML injection), but I'd like to get the ball rolling on review so that this hits the tree sooner rather than later. I'll attach those tests as a separate patch in this bug, and I'll land the two of them together.

Reporter

Comment 28

•

10 years ago

Comment on attachment 8636862 [details] [diff] [review] Make links clickable Review of attachment 8636862 [details] [diff] [review]: ----------------------------------------------------------------- I've not yet tested this, no reviewed the test for the linkified view, but I think there's a reasonable amount of cleanup to do in the comments. ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx @@ +1,4 @@ > +// The monster regexp we're using is derived from Diego Perini's code, > +// currently available at https://gist.github.com/dperini/729294 > +// > +// So, we'll put this file under that license (we may later wish to extract I think we should just split this out now to avoid any possible license confusion. We've just removed three files from the standalone in the feedback rework patch, so having two files would mean we're at a net of one less which is still an improvement. Granted we want to get as much as we can, but I don't think its worth sacrificing possible license clarity to obtain that. LinkifiedTextView could always go into views.jsx anyway (with the parser going into utils.js) if we really wanted to drop down to two less. @@ +89,5 @@ > + * Find the first URL in a string > + * > + * @param {String} s - the string to be searched for a URL > + * @returns {{matchedLink: string, // the first link found > + * index: number}} // zero-based starting index of matchLink This is quite strange from a documentation point of view. Traditionally we've done something like: * @returns {Object} The result which contains * - {String} matchedLink The first link found * - {Number} index Zero-based starting index of matchLink To me that reads better than the example you have - however, I've also just found: http://usejsdoc.org/tags-param.html#parameters-with-properties which might be even nicer. @@ +100,5 @@ > + } > + > + return { > + matchedLink: result[0], > + index: result.index // index into s where matchedLink begins The document here seems to be a duplicate of the jsdoc at the start of the function. If its really necessary, I think it'd be better on the previous line. @@ +113,5 @@ > + * > + * Uses LinkifiedTextParser for the actual parsing. > + */ > + var LinkifiedTextView = React.createClass( > + { nit: our normal convention is to put the { on the same line as the createClass. @@ +115,5 @@ > + */ > + var LinkifiedTextView = React.createClass( > + { > + propTypes: { > + // the text to be linkified Nit: please make this a proper sentence. @@ +117,5 @@ > + { > + propTypes: { > + // the text to be linkified > + rawText: React.PropTypes.string.isRequired, > + // should the links send a referrer? defaults to false. nit: Capital S @@ +119,5 @@ > + // the text to be linkified > + rawText: React.PropTypes.string.isRequired, > + // should the links send a referrer? defaults to false. > + sendReferrer: React.PropTypes.bool, > + // should we suppress target="_blank" on the link? defaults to false, nit: captial S. @@ +129,5 @@ > + React.addons.PureRenderMixin > + ], > + > + getInitialState: function() { > + return { parser: new LinkifiedTextParser() }; Why does this need to be a state? It doesn't change at all. Wouldn't a member variable be better? @@ +142,5 @@ > + linkAttributes.target = "_blank"; > + } > + > + if (!this.props.sendReferrer) { > + linkAttributes.rel = "noreferrer"; note to self: check this on desktop. @@ +157,5 @@ > + * > + * @returns {Array} of strings and React <a> elements in order. > + */ > + parseStringToElements: function (s) { > + nit: prefer no blank line at start of function (this file is also inconsistent with itself). @@ +196,5 @@ > + } > + > + // no URL was matched; terminate the recursion and return the final text > + return [s]; > + nit: no blank line necessary after return. @@ +200,5 @@ > + > + }, > + > + render: function () { > + nit: prefer no blank line at start of function @@ +204,5 @@ > + > + var elements = this.parseStringToElements(this.props.rawText); > + > + // React needs all elements of an array to be keyed for efficient > + // rendering nit: missing dot. ::: browser/components/loop/ui/ui-showcase.jsx @@ +284,5 @@ > }); > > + // XXX get rid of hardcoded string constants being passed to > + // _appendTextChatMessage > + textChatStore.receivedTextChatMessage = function (actionData) { This seems to be a bad duplicate of the receivedTectChatMessage function, I don't understand why it is necessary. @@ +294,5 @@ > + this._appendTextChatMessage("recv", actionData); > + }; > + > + textChatStore.sendTextChatMessage = function (actionData) { > + this._appendTextChatMessage("sent", actionData); This seems to be working around the fact that sendTextChatMessage is sending the message - which gets sent to itself via the mockSDK - which you're modifying above. Hence, I'm confused as to what this two overrides are really doing, or if they are really the right thing.

Attachment #8636862 - Flags: review?(standard8)

Assignee

Comment 29

•

10 years ago

(In reply to Mark Banner (:standard8) from comment #28) > ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx > @@ +1,4 @@ > > +// The monster regexp we're using is derived from Diego Perini's code, > > +// currently available at https://gist.github.com/dperini/729294 > > +// > > +// So, we'll put this file under that license (we may later wish to extract > > I think we should just split this out now to avoid any possible license > confusion. OK, done. > @@ +89,5 @@ > > + * Find the first URL in a string > > + * > > + * @param {String} s - the string to be searched for a URL > > + * @returns {{matchedLink: string, // the first link found > > + * index: number}} // zero-based starting index of matchLink > > This is quite strange from a documentation point of view. Traditionally > we've done something like: > > * @returns {Object} The result which contains > * - {String} matchedLink The first link found > * - {Number} index Zero-based starting index of matchLink > > To me that reads better than the example you have I picked that syntax out from other places in usejsdoc.org, and picked it because it should be machine-parsable as well as human-readable, once you get used to it (our traditional form seems unlikely to be well-understood by the various IDE and other jsdoc parsers). > - however, I've also just > found: > > http://usejsdoc.org/tags-param.html#parameters-with-properties > > which might be even nicer. It's not obvious to me how we would use that with a retval, since those are unnamed. Right now, I've switched to this: /** * Find the first URL in a string * * @param {String} s - the string to be searched for a URL * * @typedef LinkableResult * @type Object * @property {string} matchedLink - the first link found * @property {number} index - zero-based starting index of matchedLink in s * * @returns {LinkableResult|null} */ Thoughts? > ::: browser/components/loop/ui/ui-showcase.jsx > @@ +284,5 @@ > > }); > > > > + // XXX get rid of hardcoded string constants being passed to > > + // _appendTextChatMessage > > + textChatStore.receivedTextChatMessage = function (actionData) { > > This seems to be a bad duplicate of the receivedTectChatMessage function, I > don't understand why it is necessary. > > @@ +294,5 @@ > > + this._appendTextChatMessage("recv", actionData); > > + }; > > + > > + textChatStore.sendTextChatMessage = function (actionData) { > > + this._appendTextChatMessage("sent", actionData); > > This seems to be working around the fact that sendTextChatMessage is sending > the message - which gets sent to itself via the mockSDK - which you're > modifying above. > > Hence, I'm confused as to what this two overrides are really doing, or if > they are really the right thing. Right now, text chat date stamps are busted in the UI showcase (a bunch of things show up as "Invalid date"). IIRC, part of this is related to the fact that it wasn't easy to override the "receive" stuff in the SDK driver, because that's triggered somewhere inconvenient from a data channel. However, you're quite right that this is a mess. I'll try and figure out something better after lunch. I kinda suspect the right thing to do is just make it easier to override the receive action in the SDK driver. All other stuff has been fixed/addressed. I'll reupload the patch after I sort the showcase stuff.

Assignee

Comment 30

•

10 years ago

Attached patch make links in chat clickable (obsolete) — Details — Splinter Review

Attachment #8637619 - Flags: review?(standard8)

Assignee

Updated

•

10 years ago

Attachment #8636862 - Attachment is obsolete: true

Assignee

Updated

•

10 years ago

Attachment #8637619 - Flags: review?(gerv)

Assignee

Comment 31

•

10 years ago

OK, all comments from the previous review have been addressed. I've also added integration tests to check that pasted HTML is not injected into the real DOM, but is instead rendered as HTML source code. Gerv, if you could rs the license.html change, that'd be great. I didn't need to add the Autolinker.js stuff, as that's only tests, and isn't shipping as part of Firefox.

Gervase Markham [:gerv]

Comment 32

•

10 years ago

Comment on attachment 8637619 [details] [diff] [review] make links in chat clickable Review of attachment 8637619 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/content/license.html @@ +4213,5 @@ > + > +Author: Diego Perini > +Updated: 2010/12/05 > +License: MIT > + Cut out these top six lines and just start with the Copyright line below. Then, r=gerv. Gerv

Attachment #8637619 - Flags: review?(gerv) → review+

Reporter

Comment 33

•

10 years ago

Comment on attachment 8637619 [details] [diff] [review] make links in chat clickable Review of attachment 8637619 [details] [diff] [review]: ----------------------------------------------------------------- I don't have time to do justice on reviewing this at the moment, so passing to Mike.

Attachment #8637619 - Flags: review?(standard8) → review?(mdeboer)

Comment 34

•

10 years ago

Comment on attachment 8637619 [details] [diff] [review] make links in chat clickable Review of attachment 8637619 [details] [diff] [review]: ----------------------------------------------------------------- The approach looks good to me, I think we can simplify it a bit more. ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx @@ +73,5 @@ > + }, > + > + _generateLinkAttributes: function(href) { > + var linkAttributes = { > + href: href Please be aware that on desktop, we use `mozLoop.openURL()` to open links, to prevent certain security vulnerabilities. @@ +95,5 @@ > + * @param {String} s - the raw string to be parsed > + * > + * @returns {Array} of strings and React <a> elements in order. > + */ > + parseStringToElements: function (s) { Since we're dealing with tail-recursion here, it's not necessary to make this recursive or introduce a separate parser object for this. ```js var elements = []; var result; while (result = loop.shared.urlRegExps.fullUrlMatch.exec(s)) { if (result.index) { elements.push(s.substr(0, result.index)); s = s.substr(result.index); } elements.push( <a { ...this._generateLinkAttributes(result[0]) }> {result[0]} </a> ); s = s.substr(result[0].length); } if (s) { elements.push(s); } return elements; ``` @@ +152,5 @@ > + > + return ( > + > + { keyedElements } > + nit: I think this'll fit on a single line ;-) ::: browser/components/loop/content/shared/js/urlRegExps.js @@ +1,2 @@ > +// This is derived from Diego Perini's code, > +// currently available at https://gist.github.com/dperini/729294 Please move the regex to utils.js. A single regex is not worth having its own file for. Plus, but not the main reason to ask for this, until we concatenate our JS files for the standalone, introducing more files to load is not free. @@ +26,5 @@ > +// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > +// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, > +// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > +// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > +// OTHER DEALINGS IN THE SOFTWARE. If this is really just a 1:1 copy of the MIT license text, then I'm really unsure whether the added value of having it here is valuable. If possible, I'd like to remove it. @@ +40,5 @@ > + // > + // * https://regex101.com/#javascript > + // * https://www.debuggex.com/ > + // * http://regviz.org/ > + // * http://scraping.pro/10-best-online-regex-testers/ Is this something you added yourself? I think this stuff is better suited to be placed on a wiki, instead of in-code. URLs can vanish at any point, so I'd rather not have them here... @@ +43,5 @@ > + // * http://regviz.org/ > + // * http://scraping.pro/10-best-online-regex-testers/ > + > + var fullUrlMatch = new RegExp( > + // protocol identifier nit: 'Protocol identifier.' Please apply this nit throughout this file. ::: browser/components/loop/test/shared/linkifiedTextView_test.js @@ +1,1 @@ > +/* You forgot to add our own CC blob :) @@ +27,5 @@ > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + */ > +describe("loop.shared.views.LinkifiedTextView", function () { General comment for these tests: I'd prefer to use fictional URLs, instead of 'yahoo' and/ or 'sencha'.

Attachment #8637619 - Flags: review?(mdeboer) → feedback+

Updated

•

10 years ago

Iteration: 42.2 - Jul 27 → 42.3 - Aug 10

Assignee

Comment 35

•

10 years ago

(In reply to Mike de Boer [:mikedeboer] from comment #34) > ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx > @@ +73,5 @@ > > + }, > > + > > + _generateLinkAttributes: function(href) { > > + var linkAttributes = { > > + href: href > > Please be aware that on desktop, we use `mozLoop.openURL()` to open links, > to prevent certain security vulnerabilities. Thanks for the heads up; I wasn't aware of that. I've added code to handle this. > @@ +95,5 @@ > > + * @param {String} s - the raw string to be parsed > > + * > > + * @returns {Array} of strings and React <a> elements in order. > > + */ > > + parseStringToElements: function (s) { > > Since we're dealing with tail-recursion here, it's not necessary to make > this recursive or introduce a separate parser object for this. My previous version was indeed making premature assumptions here, particularly in light of your idea of doing post-matching fixup rather than breaking up the regexp itself, which I like. Fixed. > @@ +152,5 @@ > > + > > + return ( > > + > > + { keyedElements } > > + > > nit: I think this'll fit on a single line ;-) Heh; fixed. > ::: browser/components/loop/content/shared/js/urlRegExps.js > @@ +1,2 @@ > > +// This is derived from Diego Perini's code, > > +// currently available at https://gist.github.com/dperini/729294 > > Please move the regex to utils.js. A single regex is not worth having its > own file for. > Plus, but not the main reason to ask for this, until we concatenate our JS > files for the standalone, introducing more files to load is not free. After discussion with Mike, we decided to leave this alone, given the previous review comments from Mark and Gerv. > @@ +40,5 @@ > > + // > > + // * https://regex101.com/#javascript > > + // * https://www.debuggex.com/ > > + // * http://regviz.org/ > > + // * http://scraping.pro/10-best-online-regex-testers/ > > Is this something you added yourself? I think this stuff is better suited to > be placed on a wiki, instead of in-code. URLs can vanish at any point, so > I'd rather not have them here... OK, put it up in the Loop section of wikimo at https://wiki.mozilla.org/Loop/Development/RegExpDebugging and linked to that from the code for discoverability. > @@ +43,5 @@ > > + // * http://regviz.org/ > > + // * http://scraping.pro/10-best-online-regex-testers/ > > + > > + var fullUrlMatch = new RegExp( > > + // protocol identifier > > nit: 'Protocol identifier.' Please apply this nit throughout this file. Fixed. > ::: browser/components/loop/test/shared/linkifiedTextView_test.js > @@ +1,1 @@ > > +/* > > You forgot to add our own CC blob :) It's not obvious to me how anyone would be able to figure out which changes in the file were MIT and which were CC, so it feels like the pragmatic thing to do is just put it all under MIT. Am I missing something? > @@ +27,5 @@ > > +describe("loop.shared.views.LinkifiedTextView", function () { > > General comment for these tests: I'd prefer to use fictional URLs, instead > of 'yahoo' and/ or 'sencha'. Fixed.

Assignee

Comment 36

•

10 years ago

Attached patch make links in chat clickable (obsolete) — Details — Splinter Review

Attachment #8637619 - Attachment is obsolete: true

Attachment #8640935 - Flags: review?(mdeboer)

Comment 37

•

10 years ago

Comment on attachment 8640935 [details] [diff] [review] make links in chat clickable Review of attachment 8640935 [details] [diff] [review]: ----------------------------------------------------------------- Cool! r=me with comments addressed. As a side-note: I don't quite see _why_ the MIT license text needs to spelled out in its entirety in two places within this patch. It's not like there are multiple. ::: browser/components/loop/content/shared/js/linkifiedTextView.jsx @@ +5,5 @@ > +var loop = loop || {}; > +loop.shared = loop.shared || {}; > +loop.shared.views = loop.shared.views || {}; > +loop.shared.views.linkifiedText = (function(mozL10n) { > + nit: superfluous newline. @@ +15,5 @@ > + * links inside a container. > + */ > + var LinkifiedTextView = React.createClass({ > + propTypes: { > + // Call this instead of allowing the default <a> click semantics, if given nit: missing dot. @@ +19,5 @@ > + // Call this instead of allowing the default <a> click semantics, if given > + linkClickHandler: React.PropTypes.func, > + // The text to be linkified. > + rawText: React.PropTypes.string.isRequired, > + // Should the links send a referrer? defaults to false. nit: 'Defaults'. @@ +21,5 @@ > + // The text to be linkified. > + rawText: React.PropTypes.string.isRequired, > + // Should the links send a referrer? defaults to false. > + sendReferrer: React.PropTypes.bool, > + // Should we suppress target="_blank" on the link? defaults to false, nit: 'Defaults'. @@ +22,5 @@ > + rawText: React.PropTypes.string.isRequired, > + // Should the links send a referrer? defaults to false. > + sendReferrer: React.PropTypes.bool, > + // Should we suppress target="_blank" on the link? defaults to false, > + // mostly for testing use. nit: 'Mostly'. @@ +60,5 @@ > + /** > + * Parse the given string into an array of strings and React <a> elements > + * in the order in which they should be rendered (i.e. FIFO). > + * > + * @param {String} s - the raw string to be parsed nit: I'm used to: '@param {String} s The raw string to be parsed.' @@ +64,5 @@ > + * @param {String} s - the raw string to be parsed > + * > + * @returns {Array} of strings and React <a> elements in order. > + */ > + parseStringToElements: function (s) { nit: please unify the function declarations in this file. I recommend `function(s) {`, as that's most common in loop/. @@ +68,5 @@ > + parseStringToElements: function (s) { > + var elements = []; > + var result; > + > + result = loop.shared.urlRegExps.fullUrlMatch.exec(s); You might as well make this `var result = ...` and remove the `var` declaration here. @@ +71,5 @@ > + > + result = loop.shared.urlRegExps.fullUrlMatch.exec(s); > + while (result) { > + // if there's text preceding the first link, push it onto the array > + // and update the string pointer nit: missing dot. @@ +77,5 @@ > + elements.push(s.substr(0, result.index)); > + s = s.substr(result.index); > + } > + > + // Push the first link itself, and advance the string pointer again nit: missing dot. @@ +85,5 @@ > + </a> > + ); > + s = s.substr(result[0].length); > + > + // check for another link, and perhaps continue... nit: 'Check'. @@ +101,5 @@ > + var elements = this.parseStringToElements(this.props.rawText); > + > + // React needs all elements of an array to be keyed for efficient > + // rendering. > + var keyedElements = elements.map( Why not keep a counter in `parseStringToElements` and add the `key` attribute to the constructed <a> directly? Saves you another loop. @@ +111,5 @@ > + }); > + > + return ( { keyedElements } ); > + } > + }); nit: indentation. @@ +113,5 @@ > + return ( { keyedElements } ); > + } > + }); > + > + return { Please just return `LinkifiedTextView`, since that's the only component this file will ever export. ::: browser/components/loop/content/shared/js/textChatView.jsx @@ +57,5 @@ > }); > > + var optionalProps = {}; > + if (navigator.mozLoop) { > + optionalProps.linkClickHandler = navigator.mozLoop.openURL; I'd recommend adding this as an optional property that is passed along down from <TextChatView/>. However, since we've got the unified, shared mediaView, we need to discuss how we should approach this better - because passing down props through a dozen components is not sustainable. In the E10S work/ bug I made a sharedUtils.isDesktop() function, which could be an improvement and the RPC mechanism allows for more shallow abstractions. </long_story> ;-) @@ +58,5 @@ > > + var optionalProps = {}; > + if (navigator.mozLoop) { > + optionalProps.linkClickHandler = navigator.mozLoop.openURL; > + } This still sets the `target` and `referer` attributes on the links, which are not necessary anymore on desktop. @@ +64,3 @@ > return ( > <div className={classes}> > + <sharedViews.linkifiedText.View {...optionalProps} ...so this will become `<sharedViews.LinkifiedText .../>` ::: browser/components/loop/test/shared/textChatView_test.js @@ +251,5 @@ > expect(node.querySelector(".text-chat-entry-timestamp")).to.not.eql(null); > }); > + > + // note that this is really an integration test to be sure that we don't > + // inadvertently regress using linkifiedText.View. Please update this comment when you update linkifiedTextView.jsx...

Attachment #8640935 - Flags: review?(mdeboer) → review+

Assignee

Comment 38

•

10 years ago

Mentioned nits have been fixed. (In reply to Mike de Boer [:mikedeboer] from comment #37) > As a side-note: I don't quite see _why_ the MIT license text needs to > spelled out in its entirety in two places within this patch. It's not like > there are multiple. You might well be right, but we do seem to treat different variants of the same license as multiple in license.html, though perhaps that's not entirely analogous. That said, I feel like the pragmatic choice is not to spend more energy on this, so I'm going to leave it as-is. > @@ +101,5 @@ > > + var elements = this.parseStringToElements(this.props.rawText); > > + > > + // React needs all elements of an array to be keyed for efficient > > + // rendering. > > + var keyedElements = elements.map( > > Why not keep a counter in `parseStringToElements` and add the `key` > attribute to the constructed <a> directly? Saves you another loop. Good catch; done. This was a leftover from an earlier iteration. > @@ +113,5 @@ > > + return ( { keyedElements } ); > > + } > > + }); > > + > > + return { > > Please just return `LinkifiedTextView`, since that's the only component this > file will ever export. Done, along with other related comments. > ::: browser/components/loop/content/shared/js/textChatView.jsx > @@ +57,5 @@ > > }); > > > > + var optionalProps = {}; > > + if (navigator.mozLoop) { > > + optionalProps.linkClickHandler = navigator.mozLoop.openURL; > > I'd recommend adding this as an optional property that is passed along down > from <TextChatView/>. > However, since we've got the unified, shared mediaView, we need to discuss > how we should approach this better - because passing down props through a > dozen components is not sustainable. > In the E10S work/ bug I made a sharedUtils.isDesktop() function, which could > be an improvement and the RPC mechanism allows for more shallow abstractions. > > </long_story> ;-) Agreed that our current setup is not ideal, and that passing a zillion props down the tree is not ideal either. Upcoming cleanup of React.context is likely to help us out here. > @@ +58,5 @@ > > > > + var optionalProps = {}; > > + if (navigator.mozLoop) { > > + optionalProps.linkClickHandler = navigator.mozLoop.openURL; > > + } > > This still sets the `target` and `referer` attributes on the links, which > are not necessary anymore on desktop. Added a test and code to avoid this.

Assignee

Comment 39

•

10 years ago

Attached patch make links in Hello chat clickable, r=mikedeboer, r=gerv for license.html changes (obsolete) — Details — Splinter Review

Updated with review comments addressed; carrying forward r=mikedeboer & r=gerv for license changes

Attachment #8640935 - Attachment is obsolete: true

Attachment #8642614 - Flags: review+

https://hg.mozilla.org/integration/fx-team/rev/ece4d8f79cfc

Assignee

Comment 40

•

10 years ago

Attached patch make links in Hello chat clickable, r=mikedeboer, r=gerv for license.html changes — Details — Splinter Review

Updated version with additions to code-coverage configuration

Attachment #8642614 - Attachment is obsolete: true

Attachment #8642646 - Flags: review+

Pulsebot

Comment 41

•

10 years ago

Carsten Book [:Tomcat]

Comment 42

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ece4d8f79cfc

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Assignee

Comment 43

•

9 years ago

Attached patch Make Hello chat links clickable (Aurora 41 port) — Details — Splinter Review

Approval Request Comment [Feature/regressing bug #]: Bug 1176280 (this bug) [User impact if declined]: Text Chat for Hello goes out with Firefox 41. Auto-linkification of at least http(s) URLs is a table-stakes feature for any sort of web-based chat, and one that plenty of Nightly/Aurora users have complained about missing. User impact if declined is that the premier of the text chat feature for Hello will feel incomplete. [Describe test coverage new/current, TreeHerder]: Test coverage for both JS files added here hits 100% of the statements, according to our code-coverage tool. Testing includes coverage for ASCII links being properly linkified, as well as several tests that sample HTML strings are not vulnerable to CSRF attacks on the UI, and it only linkifies stuff starting with http:// and https://, to avoid attacks by other protocols. These tests are all run on treeherder with each checkin. [Risks and why]: This patch has been baking on nightly for most of a week. No bugs have been found and filed during that time. It is a heuristic-based approach, so most of the risk is that the heuristics are sometimes wrong, and things that should be linkified aren't, or things are linkified incorrectly, leading to users having pasted links somtimes not work. This is very standard in the industry (eg looking at Apple and Facebook auto-linkification shows a various edge case errors). The known cases of that are listed in bug 1186254, and are believed to be within tolerable limits. [String/UUID change made/needed]: None.

Attachment #8645218 - Flags: approval-mozilla-aurora?

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 44

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cdd2fde417f is a try server of this commit on Aurora. I went through all the oranges; they're all either known intermittents, or in no way plausibly related to this commit. In other words, things look clear for landing this after getting approval.

Updated

•

9 years ago

status-firefox41: --- → affected

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 45

•

9 years ago

Comment on attachment 8645218 [details] [diff] [review] Make Hello chat links clickable (Aurora 41 port) [Triage Comment] Approved for uplift to Beta41. Dev has confirmed that the fix was locally verified and automated test coverage is at 100% pass.

Attachment #8645218 - Flags: approval-mozilla-aurora? → approval-mozilla-beta+

[SV Manager] Florin Mezei, QA (:FlorinMezei)

Comment 46

•

9 years ago

Looks like this has pretty good automated coverage. CC'ing Bogdan so he's aware of this implementation.

Flags: qe-verify-

Flags: in-testsuite?

https://hg.mozilla.org/releases/mozilla-beta/rev/54ff2c2fb59c

Reporter

Comment 47

•

9 years ago

status-firefox41: affected → fixed