Closed Bug 317408 Opened 15 years ago Closed 14 years ago

URL bar should be able to display 'safe' non-ASCII characters unencoded?


(Firefox :: General, enhancement)

Not set





(Reporter: usenet, Assigned: usenet)



In the current Mozilla implementation, it is possible to enter "URLs" containing non-ASCII Unicode characters into the URL bar, either by typing or cutting-and-pasting. These characters are then converted into valid URLs that represent these characters as percent-encoded UTF-8 bytes.

The reverse transformation should also be possible; that is, when a URL containing valid percent-encoded UTF-8 characters is to be displayed in the user interface, it should be possible to present this to the user using the original unencoded Unicode characters, to the extent that it is safe to do so. Only percent-encodings of valid UTF-8 sequences should be translated, and not invalid encodings of other byte sequences. Any sequences of percent-escapes that are not safe to display as native non-ASCII Unicode characters should remain displayed in their percent-encoded form.

However, defining what constitute "safe" characters for display in this context may be a hard problem; see the work on the IDN code elsewhere for discussion of this. Nevertheless, if these problems can be resolved, this might be a useful feature.

Potential benefits: display of non-ASCII text in URLs in native format for non-English readers
Potential risks: spoofing of text in URLs, or URL protocol characters, by using visual spoof character sequences, similar to those of concern in IDNs.
A possible approach for determining safe characters for display as Unicode:

1. Any Unicode codepoint that is not transformed into itself by NAMEPREP would be deemed to be "unsafe". This would mean that composing-character-sequences would not be allowed, but precomposed characters would be OK.

2. The path/query part of the URL should be divided into "labels" separated by any of the unencoded URL special characters "/+?&", and then only displayed as Unicode on a "label"-by-"label" basis if every character in the label is "safe" and the entire label also meets the script-mixing restrictions of section 3 of

3. and in addition, a label will not be displayed in Unicode form unless every character in it can be displayed in by the browser's text-rendering engine 

Examples of URL strings that would be eligible to be displayed in human-readable Unicode form using this approach would be

...and of course, 
* any whitespace or control characters should be viewed as "unsafe" since, alas, Nameprep will let them through, as-is
* as should any one-byte-ASCII characters that have been percent-escaped, since this is typically done delibreately to prevent them from being interpreted as syntax metacharacters in URLs should any character in the IDN Unicode-display blacklist
The name-filtering enhancements mentioned in bug 355416 will greatly help defining what is a "safe" URL.
Depends on: 355416
Neil,  You marked this bug as "assigned", but it's assigned to "nobody".
Bugs without a real assignee should not be in "assigned" state.
Perhaps you meant to assign it to your self?
Assignee: nobody → usenet
Assiging to myself: thanks, Nelson, for pointing that out: that was what I had originally intended, but clearly I didn't get it right.
Trying again to assign this bug to myself!
The bug has already been resolved! Look at this:
Fixed on trunk in bug 105909, using code based on the extension utf16@ linked to.
Closed: 14 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 105909
You need to log in before you can comment on or make changes to this bug.