317408 - URL bar should be able to display 'safe' non-ASCII characters unencoded?

Assignee

Description

•

19 years ago

In the current Mozilla implementation, it is possible to enter "URLs" containing non-ASCII Unicode characters into the URL bar, either by typing or cutting-and-pasting. These characters are then converted into valid URLs that represent these characters as percent-encoded UTF-8 bytes. The reverse transformation should also be possible; that is, when a URL containing valid percent-encoded UTF-8 characters is to be displayed in the user interface, it should be possible to present this to the user using the original unencoded Unicode characters, to the extent that it is safe to do so. Only percent-encodings of valid UTF-8 sequences should be translated, and not invalid encodings of other byte sequences. Any sequences of percent-escapes that are not safe to display as native non-ASCII Unicode characters should remain displayed in their percent-encoded form. However, defining what constitute "safe" characters for display in this context may be a hard problem; see the work on the IDN code elsewhere for discussion of this. Nevertheless, if these problems can be resolved, this might be a useful feature. Potential benefits: display of non-ASCII text in URLs in native format for non-English readers Potential risks: spoofing of text in URLs, or URL protocol characters, by using visual spoof character sequences, similar to those of concern in IDNs.

Neil Harris

Assignee

Comment 1

•

19 years ago

A possible approach for determining safe characters for display as Unicode: 1. Any Unicode codepoint that is not transformed into itself by NAMEPREP would be deemed to be "unsafe". This would mean that composing-character-sequences would not be allowed, but precomposed characters would be OK. 2. The path/query part of the URL should be divided into "labels" separated by any of the unencoded URL special characters "/+?&", and then only displayed as Unicode on a "label"-by-"label" basis if every character in the label is "safe" and the entire label also meets the script-mixing restrictions of section 3 of http://www.icann.org/general/idn-guidelines-14nov05.htm 3. and in addition, a label will not be displayed in Unicode form unless every character in it can be displayed in by the browser's text-rendering engine Examples of URL strings that would be eligible to be displayed in human-readable Unicode form using this approach would be http://pl.wikipedia.org/wiki/Strona_g%C5%82%C3%B3wna http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 http://ka.wikipedia.org/wiki/%E1%83%9B%E1%83%97%E1%83%90%E1%83%95%E1%83%90%E1%83%A0%E1%83%98_%E1%83%92%E1%83%95%E1%83%94%E1%83%A0%E1%83%93%E1%83%98

Neil Harris

Assignee

Comment 2

•

19 years ago

...and of course, * any whitespace or control characters should be viewed as "unsafe" since, alas, Nameprep will let them through, as-is * as should any one-byte-ASCII characters that have been percent-escaped, since this is typically done delibreately to prevent them from being interpreted as syntax metacharacters in URLs

Neil Harris

Assignee

Comment 3

•

19 years ago

...as should any character in the IDN Unicode-display blacklist

Neil Harris

Assignee

Comment 4

•

18 years ago

The name-filtering enhancements mentioned in bug 355416 will greatly help defining what is a "safe" URL.

Depends on: 355416

Neil Harris

Assignee

Updated

•

18 years ago

Status: NEW → ASSIGNED

Nelson Bolyard (seldom reads bugmail)

Comment 5

•

18 years ago

Neil, You marked this bug as "assigned", but it's assigned to "nobody". Bugs without a real assignee should not be in "assigned" state. Perhaps you meant to assign it to your self?

Neil Harris

Assignee

Updated

•

18 years ago

Assignee: nobody → usenet

Status: ASSIGNED → NEW

Neil Harris

Assignee

Comment 6

•

18 years ago

Assiging to myself: thanks, Nelson, for pointing that out: that was what I had originally intended, but clearly I didn't get it right.

Neil Harris

Assignee

Comment 7

•

18 years ago

Trying again to assign this bug to myself!

Status: NEW → ASSIGNED

utf16

Comment 8

•

18 years ago

The bug has already been resolved! Look at this: https://addons.mozilla.org/en-US/firefox/addon/4014

Jesse Ruderman

Comment 9

•

18 years ago

Fixed on trunk in bug 105909, using code based on the extension utf16@ linked to.

Status: ASSIGNED → RESOLVED

Closed: 18 years ago

Resolution: --- → DUPLICATE

Bugzilla

URL bar should be able to display 'safe' non-ASCII characters unencoded?

Categories

(Firefox :: General, enhancement)

Tracking

()

People

(Reporter: usenet, Assigned: usenet)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9