Closed
Bug 853231
Opened 12 years ago
Closed 12 years ago
IDN: Non-whitelisted TLD displays punycode for some single scripts
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: mwobensmith, Unassigned)
References
Details
Attachments
(1 file)
884 bytes,
text/html
|
Details |
1. Open attachment.
2. Mouse over links to view URL in status bar.
Result:
Domains displayed in punycode when not whitelisted.
Expected:
Domains should be displayed in Unicode whether or not they're whitelisted.
Reporter | ||
Comment 1•12 years ago
|
||
Affected scripts: Georgian, Hangul, Tibetan, Mongolian
Comment 2•12 years ago
|
||
The whitelisted domains allow those characters because you could put anything in there (other than our small blacklist pref) and they would show IDN.
You should double-check the characters you use in your test. Tibetan characters are allowed, but the swirl thing you used \u0F04 (TIBETAN MARK INITIAL YIG MGO MDUN MA) is "restricted" according to http://www.unicode.org/Public/security/latest/xidmodifications.txt
In our tree looks like that data is encoded at https://mxr.mozilla.org/mozilla-central/source/intl/icu/source/data/unidata/UnicodeData.txt#3160
Comment 3•12 years ago
|
||
Same with the Mongolian Birga (\u1800) character, it's restricted "not-xid". The algorithm only supports "restricted; limited-use" for mongolian: 1810-1819 (numbers) and 1820-1877,1880-18AA (letters).
Comment 4•12 years ago
|
||
http://Ⴏ.other with U+10AF GEORGIAN CAPITAL LETTER ZHAR: almost all of the GEORGIAN CAPITAL LETTER block is "restricted; historic". FTR there are three sets of Georgian letters in Unicode. Those with GEORGIAN CAPITAL LETTER or GEORGIAN SMALL LETTER in the name are part of a historic script used in the Georgian Orthodox church. Those with GEORGIAN LETTER are used in modern Georgian, e.g. in the Georgian TLD .გე
http://ᄀ.other with U+1100 HANGUL CHOSEONG KIYEOK: the Hangul Jamo block from U+1100 to U+11FF is also defined as "restricted; historic", though for different reasons: these characters are used in modern Korean, but they were deprecated for use in IRIs to prevent confusion between Jamos and precomposed Korean syllables.
http://༄.other with U+0F04 TIBETAN MARK INITIAL YIG MGO MDUN MA and http://᠀.other with U+1800 MONGOLIAN BIRGA: what dveditz said in comment 2 and comment 3, except that the algorithm doesn't support "limited-use" scripts, so Mongolian script will always be displayed as punycode.
Comment 5•12 years ago
|
||
Resolving INVALID based on comment 4. But this bug is indicative of really exhaustive testing, so thank you :-)
Gerv
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•