Closed Bug 591953 Opened 9 years ago Closed 9 years ago

"Open link" context menu should be activated for urls with dashes.

Categories

(Firefox :: Menus, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
Firefox 7

People

(Reporter: sylvain.pasche, Assigned: dao)

References

(Depends on 1 open bug, )

Details

Attachments

(1 file, 1 obsolete file)

It's valid to have dashes in urls, and selecting such urls (such as foo-bar.com) should show the "Open link" context menu, like it does for urls without dashes (google.com).
Duplicate of this bug: 601063
Duplicate of this bug: 662350
Assignee: nobody → dao
Status: NEW → ASSIGNED
Attachment #537631 - Flags: review?(gavin.sharp)
Does this patch fixes this bug for IDN urls?
(In reply to comment #3)
> Created attachment 537631 [details] [diff] [review] [review]
> accept hyphen and reject trailing non-alpha character

If you want to be really pedantic (er, I mean, accurate), it ought to be:
/^(?:[a-z\d](?:[a-z\d-]*[a-z\d])*)(?:\.[a-z\d](?:[a-z\d-]*[a-z\d])*)+$/i

Underscores are invalid (so so \w), and the hyphen could not be the first or last character of each dot-limited section.
> Underscores are invalid (so so \w), and the hyphen could not be the first or
> last character of each dot-limited section.

s/so so/so no/
a-z\d is too restrictive. \w may match special alphabetical characters beyond a-z.
(In reply to comment #7)
> a-z\d is too restrictive. \w may match special alphabetical characters
> beyond a-z.

Such as?  In both Perl, \w is equivalent to [a-z0-9_], and that appears to be the case for Gecko's JS engine as well:
  alert(/\w/.test("eee")); -> true
  alert(/\w/.test("ééé")); -> false

(I assume you meant accented characters by "special alphabetical characters"?  That would be an IDN, which would be unwieldy to cover using re.)
Ack, s/both//.  I really should proofread before hitting that button!
Such as ü:
data:text/html;charset=UTF-8,www.günteramendt.de

é works as well.
Err, I think günteramendt.de in www.günteramendt.de matches because of \D\S*. I'm not sure under which conditions SpiderMonkey expands \w, or if it stopped doing so. Anyway, I don't think we need to exclude '_', prevent '-' from leading in a segment, etc.
This handles umlauts, accents, etc.. Since this makes the character class more complex anyway, I also replaced \w with a-z\d.
Attachment #537631 - Attachment is obsolete: true
Attachment #537664 - Flags: review?(gavin.sharp)
Attachment #537631 - Flags: review?(gavin.sharp)
I still don't think that re is the best way to handle IDNs: there's more that we need to handle beyond accented Latin characters.

E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently started using.

I think the best way to do it is something along the lines of:

try {
  linkText = Components.classes["@mozilla.org/network/idn-service;1"].
             getService(Components.interfaces.nsIIDNService).
             convertUTF8toACE(linkText);
} catch(e) {
  linkText = "";
}

// Standard regexp test here

convertUTF8toACE will throw an exception if there are invalid characters or if it can't un-IDN the string and then we can run re on the results as usual to catch the things that convertUTF8toACE doesn't catch (e.g., the presence of whitespace).

This would properly fix bug 612146 and make Alexander L. Slovesnik (comment 4) happy.
Why not implement bug 540787 solution and use the method? I mean, if we're doing this already...
(In reply to comment #14)
> Why not implement bug 540787 solution and use the method? I mean, if we're
> doing this already...

The most that could do is replace the re portion of this thing.  When I skimmed through the implementation of findURLInPlaintext, it did not appear to be IDN-aware.  And I am not convinced that it would do a much better job than a re (the one in comment 5 conforms precisely to the relevant RFCs; you can't really improve on that).  It may be slightly more efficient than a re, but in situation like this, that is just background noise (since this is matching over a short string and only when the context menu is activated).
(In reply to comment #13)
> I still don't think that re is the best way to handle IDNs: there's more
> that we need to handle beyond accented Latin characters.
> 
> E.g., президент.рф (bug 612146) or the .中国 TLD that the Chinese recently
> started using.
[...]
> This would properly fix bug 612146 and make Alexander L. Slovesnik (comment
> 4) happy.

I didn't set out to fix bug 612146. Bug 612146 can deal with that.
(In reply to comment #16)
> I didn't set out to fix bug 612146. Bug 612146 can deal with that.

Okay, that makes sense organizationally; the reason I suggested that was because you were making accommodations for a subset of IDN domains, and I thought it would be easier to get it all done in one swoop instead of making a re change here and then immediately changing it again for 612146.
Comment on attachment 537664 [details] [diff] [review]
accept hyphens and umlauts/accents/etc., reject trailing non-alpha character

\xc0-\xff seems kind of arbitrary (why stop at 0xff?). Can we omit that for the moment and leave it to bug 612146?
Attachment #537664 - Flags: review?(gavin.sharp) → review+
http://hg.mozilla.org/mozilla-central/rev/e92f3523a97c
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 7
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110627 Firefox/7.0a1

Verified issue on Ubuntu 11, WinXP, Mac OS X 10.6, Win 7. Selecting URL <foo-bar.com> show "Open link" in the context menu.
Status: RESOLVED → VERIFIED
Depends on: 694856
You need to log in before you can comment on or make changes to this bug.