Closed
Bug 317408
Opened 19 years ago
Closed 18 years ago
URL bar should be able to display 'safe' non-ASCII characters unencoded?
Categories
(Firefox :: General, enhancement)
Firefox
General
Tracking
()
RESOLVED
DUPLICATE
of bug 105909
People
(Reporter: usenet, Assigned: usenet)
References
Details
In the current Mozilla implementation, it is possible to enter "URLs" containing non-ASCII Unicode characters into the URL bar, either by typing or cutting-and-pasting. These characters are then converted into valid URLs that represent these characters as percent-encoded UTF-8 bytes.
The reverse transformation should also be possible; that is, when a URL containing valid percent-encoded UTF-8 characters is to be displayed in the user interface, it should be possible to present this to the user using the original unencoded Unicode characters, to the extent that it is safe to do so. Only percent-encodings of valid UTF-8 sequences should be translated, and not invalid encodings of other byte sequences. Any sequences of percent-escapes that are not safe to display as native non-ASCII Unicode characters should remain displayed in their percent-encoded form.
However, defining what constitute "safe" characters for display in this context may be a hard problem; see the work on the IDN code elsewhere for discussion of this. Nevertheless, if these problems can be resolved, this might be a useful feature.
Potential benefits: display of non-ASCII text in URLs in native format for non-English readers
Potential risks: spoofing of text in URLs, or URL protocol characters, by using visual spoof character sequences, similar to those of concern in IDNs.
Assignee | ||
Comment 1•19 years ago
|
||
A possible approach for determining safe characters for display as Unicode:
1. Any Unicode codepoint that is not transformed into itself by NAMEPREP would be deemed to be "unsafe". This would mean that composing-character-sequences would not be allowed, but precomposed characters would be OK.
2. The path/query part of the URL should be divided into "labels" separated by any of the unencoded URL special characters "/+?&", and then only displayed as Unicode on a "label"-by-"label" basis if every character in the label is "safe" and the entire label also meets the script-mixing restrictions of section 3 of http://www.icann.org/general/idn-guidelines-14nov05.htm
3. and in addition, a label will not be displayed in Unicode form unless every character in it can be displayed in by the browser's text-rendering engine
Examples of URL strings that would be eligible to be displayed in human-readable Unicode form using this approach would be
http://pl.wikipedia.org/wiki/Strona_g%C5%82%C3%B3wna
http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8
http://ka.wikipedia.org/wiki/%E1%83%9B%E1%83%97%E1%83%90%E1%83%95%E1%83%90%E1%83%A0%E1%83%98_%E1%83%92%E1%83%95%E1%83%94%E1%83%A0%E1%83%93%E1%83%98
Assignee | ||
Comment 2•19 years ago
|
||
...and of course,
* any whitespace or control characters should be viewed as "unsafe" since, alas, Nameprep will let them through, as-is
* as should any one-byte-ASCII characters that have been percent-escaped, since this is typically done delibreately to prevent them from being interpreted as syntax metacharacters in URLs
Assignee | ||
Comment 3•19 years ago
|
||
...as should any character in the IDN Unicode-display blacklist
Assignee | ||
Comment 4•18 years ago
|
||
The name-filtering enhancements mentioned in bug 355416 will greatly help defining what is a "safe" URL.
Depends on: 355416
Assignee | ||
Updated•18 years ago
|
Status: NEW → ASSIGNED
Comment 5•18 years ago
|
||
Neil, You marked this bug as "assigned", but it's assigned to "nobody".
Bugs without a real assignee should not be in "assigned" state.
Perhaps you meant to assign it to your self?
Assignee | ||
Updated•18 years ago
|
Assignee: nobody → usenet
Status: ASSIGNED → NEW
Assignee | ||
Comment 6•18 years ago
|
||
Assiging to myself: thanks, Nelson, for pointing that out: that was what I had originally intended, but clearly I didn't get it right.
The bug has already been resolved! Look at this: https://addons.mozilla.org/en-US/firefox/addon/4014
Comment 9•18 years ago
|
||
Fixed on trunk in bug 105909, using code based on the extension utf16@ linked to.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•