Closed Bug 853231 Opened 11 years ago Closed 11 years ago

IDN: Non-whitelisted TLD displays punycode for some single scripts

Categories

(Core :: Networking, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: mwobensmith, Unassigned)

References

Details

Attachments

(1 file)

Attached file Test case
1. Open attachment.
2. Mouse over links to view URL in status bar.

Result:
Domains displayed in punycode when not whitelisted.

Expected:
Domains should be displayed in Unicode whether or not they're whitelisted.
Affected scripts: Georgian, Hangul, Tibetan, Mongolian
Blocks: 722299
The whitelisted domains allow those characters because you could put anything in there (other than our small blacklist pref) and they would show IDN.

You should double-check the characters you use in your test. Tibetan characters are allowed, but the swirl thing you used \u0F04 (TIBETAN MARK INITIAL YIG MGO MDUN MA) is "restricted" according to http://www.unicode.org/Public/security/latest/xidmodifications.txt

In our tree looks like that data is encoded at https://mxr.mozilla.org/mozilla-central/source/intl/icu/source/data/unidata/UnicodeData.txt#3160
Same with the Mongolian Birga (\u1800) character, it's restricted "not-xid". The algorithm only supports "restricted; limited-use" for mongolian: 1810-1819 (numbers) and 1820-1877,1880-18AA (letters).
http://Ⴏ.other with U+10AF GEORGIAN CAPITAL LETTER ZHAR: almost all of the GEORGIAN CAPITAL LETTER block is "restricted; historic". FTR there are three sets of Georgian letters in Unicode. Those with GEORGIAN CAPITAL LETTER or GEORGIAN SMALL LETTER in the name are part of a historic script used in the Georgian Orthodox church. Those with GEORGIAN LETTER are used in modern Georgian, e.g. in the Georgian TLD .გე

http://ᄀ.other with U+1100 HANGUL CHOSEONG KIYEOK: the Hangul Jamo block from U+1100 to U+11FF is also defined as "restricted; historic", though for different reasons: these characters are used in modern Korean, but they were deprecated for use in IRIs to prevent confusion between Jamos and precomposed Korean syllables.

http://༄.other with U+0F04 TIBETAN MARK INITIAL YIG MGO MDUN MA and http://᠀.other with U+1800 MONGOLIAN BIRGA: what dveditz said in comment 2 and comment 3, except that the algorithm doesn't support "limited-use" scripts, so Mongolian script will always be displayed as punycode.
Resolving INVALID based on comment 4. But this bug is indicative of really exhaustive testing, so thank you :-)

Gerv
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: