Last Comment Bug 591953 - "Open link" context menu should be activated for urls with dashes.
: "Open link" context menu should be activated for urls with dashes.
Status: VERIFIED FIXED
:
Product: Firefox
Classification: Client Software
Component: Menus (show other bugs)
: Trunk
: All All
: -- normal with 9 votes (vote)
: Firefox 7
Assigned To: Dão Gottwald [:dao]
:
:
Mentors:
data:text/html,foo-bar.com<br><br>foo...
: 601063 662350 (view as bug list)
Depends on: 694856
Blocks: 515512
  Show dependency treegraph
 
Reported: 2010-08-30 09:22 PDT by Sylvain Pasche
Modified: 2011-10-16 08:22 PDT (History)
8 users (show)
dao+bmo: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
accept hyphen and reject trailing non-alpha character (2.86 KB, patch)
2011-06-06 13:50 PDT, Dão Gottwald [:dao]
no flags Details | Diff | Splinter Review
accept hyphens and umlauts/accents/etc., reject trailing non-alpha character (3.30 KB, patch)
2011-06-06 15:37 PDT, Dão Gottwald [:dao]
gavin.sharp: review+
Details | Diff | Splinter Review

Description Sylvain Pasche 2010-08-30 09:22:54 PDT
It's valid to have dashes in urls, and selecting such urls (such as foo-bar.com) should show the "Open link" context menu, like it does for urls without dashes (google.com).
Comment 1 Alexander L. Slovesnik 2010-11-22 04:41:30 PST
*** Bug 601063 has been marked as a duplicate of this bug. ***
Comment 2 Dão Gottwald [:dao] 2011-06-06 13:49:22 PDT
*** Bug 662350 has been marked as a duplicate of this bug. ***
Comment 3 Dão Gottwald [:dao] 2011-06-06 13:50:25 PDT
Created attachment 537631 [details] [diff] [review]
accept hyphen and reject trailing non-alpha character
Comment 4 Alexander L. Slovesnik 2011-06-06 14:04:25 PDT
Does this patch fixes this bug for IDN urls?
Comment 5 Kai Liu 2011-06-06 14:39:06 PDT
(In reply to comment #3)
> Created attachment 537631 [details] [diff] [review] [review]
> accept hyphen and reject trailing non-alpha character

If you want to be really pedantic (er, I mean, accurate), it ought to be:
/^(?:[a-z\d](?:[a-z\d-]*[a-z\d])*)(?:\.[a-z\d](?:[a-z\d-]*[a-z\d])*)+$/i

Underscores are invalid (so so \w), and the hyphen could not be the first or last character of each dot-limited section.
Comment 6 Kai Liu 2011-06-06 14:39:37 PDT
> Underscores are invalid (so so \w), and the hyphen could not be the first or
> last character of each dot-limited section.

s/so so/so no/
Comment 7 Dão Gottwald [:dao] 2011-06-06 14:54:28 PDT
a-z\d is too restrictive. \w may match special alphabetical characters beyond a-z.
Comment 8 Kai Liu 2011-06-06 15:02:50 PDT
(In reply to comment #7)
> a-z\d is too restrictive. \w may match special alphabetical characters
> beyond a-z.

Such as?  In both Perl, \w is equivalent to [a-z0-9_], and that appears to be the case for Gecko's JS engine as well:
  alert(/\w/.test("eee")); -> true
  alert(/\w/.test("ééé")); -> false

(I assume you meant accented characters by "special alphabetical characters"?  That would be an IDN, which would be unwieldy to cover using re.)
Comment 9 Kai Liu 2011-06-06 15:03:49 PDT
Ack, s/both//.  I really should proofread before hitting that button!
Comment 10 Dão Gottwald [:dao] 2011-06-06 15:11:47 PDT
Such as ü:
data:text/html;charset=UTF-8,www.günteramendt.de

é works as well.
Comment 11 Dão Gottwald [:dao] 2011-06-06 15:18:41 PDT
Err, I think günteramendt.de in www.günteramendt.de matches because of \D\S*. I'm not sure under which conditions SpiderMonkey expands \w, or if it stopped doing so. Anyway, I don't think we need to exclude '_', prevent '-' from leading in a segment, etc.
Comment 12 Dão Gottwald [:dao] 2011-06-06 15:37:53 PDT
Created attachment 537664 [details] [diff] [review]
accept hyphens and umlauts/accents/etc., reject trailing non-alpha character

This handles umlauts, accents, etc.. Since this makes the character class more complex anyway, I also replaced \w with a-z\d.
Comment 13 Kai Liu 2011-06-06 16:28:15 PDT
I still don't think that re is the best way to handle IDNs: there's more that we need to handle beyond accented Latin characters.

E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently started using.

I think the best way to do it is something along the lines of:

try {
  linkText = Components.classes["@mozilla.org/network/idn-service;1"].
             getService(Components.interfaces.nsIIDNService).
             convertUTF8toACE(linkText);
} catch(e) {
  linkText = "";
}

// Standard regexp test here

convertUTF8toACE will throw an exception if there are invalid characters or if it can't un-IDN the string and then we can run re on the results as usual to catch the things that convertUTF8toACE doesn't catch (e.g., the presence of whitespace).

This would properly fix bug 612146 and make Alexander L. Slovesnik (comment 4) happy.
Comment 14 Nochum Sossonko [:Natch] 2011-06-06 20:35:58 PDT
Why not implement bug 540787 solution and use the method? I mean, if we're doing this already...
Comment 15 Kai Liu 2011-06-06 20:51:48 PDT
(In reply to comment #14)
> Why not implement bug 540787 solution and use the method? I mean, if we're
> doing this already...

The most that could do is replace the re portion of this thing.  When I skimmed through the implementation of findURLInPlaintext, it did not appear to be IDN-aware.  And I am not convinced that it would do a much better job than a re (the one in comment 5 conforms precisely to the relevant RFCs; you can't really improve on that).  It may be slightly more efficient than a re, but in situation like this, that is just background noise (since this is matching over a short string and only when the context menu is activated).
Comment 16 Dão Gottwald [:dao] 2011-06-06 23:44:32 PDT
(In reply to comment #13)
> I still don't think that re is the best way to handle IDNs: there's more
> that we need to handle beyond accented Latin characters.
> 
> E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently
> started using.
[...]
> This would properly fix bug 612146 and make Alexander L. Slovesnik (comment
> 4) happy.

I didn't set out to fix bug 612146. Bug 612146 can deal with that.
Comment 17 Kai Liu 2011-06-07 00:06:40 PDT
(In reply to comment #16)
> I didn't set out to fix bug 612146. Bug 612146 can deal with that.

Okay, that makes sense organizationally; the reason I suggested that was because you were making accommodations for a subset of IDN domains, and I thought it would be easier to get it all done in one swoop instead of making a re change here and then immediately changing it again for 612146.
Comment 18 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-06-20 13:31:06 PDT
Comment on attachment 537664 [details] [diff] [review]
accept hyphens and umlauts/accents/etc., reject trailing non-alpha character

\xc0-\xff seems kind of arbitrary (why stop at 0xff?). Can we omit that for the moment and leave it to bug 612146?
Comment 19 Dão Gottwald [:dao] 2011-06-21 03:01:37 PDT
http://hg.mozilla.org/mozilla-central/rev/e92f3523a97c
Comment 20 Simona B [:simonab ] 2011-06-28 04:42:41 PDT
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110627 Firefox/7.0a1

Verified issue on Ubuntu 11, WinXP, Mac OS X 10.6, Win 7. Selecting URL <foo-bar.com> show "Open link" in the context menu.

Note You need to log in before you can comment on or make changes to this bug.