Closed
Bug 315728
Opened 20 years ago
Closed 10 years ago
NAMEPREP / text rendering holes in IDN processing
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: usenet, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: meta, sec-other, Whiteboard: [sg:nse meta])
Attachments
(1 file)
IDN processing is defined to only use characters in the Unicode 3.2 repertoire. However, at least one character, U+1160 'HANGUL JUNGSEONG FILLER', first defined in Unicode 4.0, can be shown to pass straight through the IDN name preparation logic, as in this example:
www.pyth<U+1160>on.info
is translated to:
www.xn--python-eyy.info
instead of being blocked as an invalid IDN. Although this character has dodgy space-like rendering in the current Mozilla rendering code, and is therefore a display-spoofing risk, it is neither ignored nor prohibited in the current NAMEPREP rules designed to prevent these problems, since the ignore/exclude lists in NAMEPREP do not address characters not in Unicode 3.2.
Fortunately, this particular character is currently caught be the hand-rolled ad-hoc IDN dodgy-display-character blacklist, and is displayed as Punycode. However, it would be better if the IDN code was strictly RFC-compliant with regard to characters not in Unicode 3.2, and that this name was thus treated as invalid during the IDN processing, removing the need to catch it as a special case at display time.
Although this particular case is not a serious security problem _in itself_, it does suggest that the current IDN processing code is still not watertight, and therefore that there may be other unknown IDN spoofing or other security problems lurking which are still not addressed by the current code.
| Reporter | ||
Comment 1•20 years ago
|
||
This is a small pseudo-HTML file containing test cases: you will need to select the UTF-8 charset to view this correctly, since it has no proper DTD header -- the example for the case mentioned in bug report can be found in the first link on the line starting "U+1160".
| Reporter | ||
Updated•20 years ago
|
Version: unspecified → 1.5 Branch
| Reporter | ||
Comment 2•20 years ago
|
||
Aaargh. On re-inspecting the different versions of the Unicode standards, it looks like U+1160 _is_ in Unicode 3.2.
However, as of Unicode 4.0, it was added to the list of default-ignorables (see http://www.macchiato.com/slides/Unicode4.0.ppt). This is different to the behavior in Unicode 3.2, which means that U+1160 most probably should have been added to the list of ignorable characters in NAMEPREP, had this property been known at the time, since Unicode defines default ignorable characters as characters that
"should be ignored by default in rendering unless explicitly supported. They have no visible glyph or advance width in and of themselves, although they may affect the display, positioning, or adornment of adjacent or surrounding characters."
So, this is likely a problem with either or both of:
a) the NAMEPREP specification itself,
b) the current rendering code, which renders it as a wide space even in a non-Hangul context, whereas the Unicode 4.0 spec appears to explicitly disallow this.
and not the IDN code itself. Perhaps this should be re-filed as a bug in the text-rendering code?
Updated•20 years ago
|
Whiteboard: [sg:investigate]
| Reporter | ||
Updated•20 years ago
|
Summary: IDN processing still allows characters not in the Unicode3.2 repertoire → NAMEPREP / text rendering holes in IDN processing
Comment 3•20 years ago
|
||
U+1160 was even in Unicode 2.0 (probably in 1.0 as well). Whether it's rendered like a space or with a 'zero-width glyph', it can be used for 'spoofing'. How do we resolve this? I guess this character is already in our list of black-listed characters, isn't it? If so, we don't have to change our NAMEPREP code, but we may want to do that, too.
| Reporter | ||
Comment 4•20 years ago
|
||
It is in the current list of blacklisted characters: I put it there. However, the fact that this was caught is a lucky catch, since the list was compiled by a mixture of grepping the Unicode character names, various small programs to scan the confusables lists, ad-hoc inspection of a hard copy of the Unicode 3.0 book, and visual inspection of test cases, of which only the last caught these particular characters.
Ideally, I would like to reduce the Unicode character blacklist to address only visual spoofs, and to deal with other problems, such as protocol-character smuggling, in the DNS name validation code (which, incidentally, does not completely close off this problem, since NAMEPREP can still be used to smuggle full stops into the middle of domain name labels, possibly subverting name-filtering checks at upper layers: I'll file a new bug on that particular issue).
We should have some systematic principle for finding nasties like this.
I've been in touch with the IDN working group about this, as well as the NAMEPREP people regarding officialy blocking these characters from names; in the absence of an official policy, perhaps we should have a unilateral policy for making all characters in the default ignorable class in the _current_ version of Unicode illegal in domain names (after NAMEPREP canonicalization, of course), since this appears to be in the spirit of the NAMEPREP authors' apparent intentions, if not the letter of the current Unicode 3.2-based version.
There appears to be some work on a NAMEPREP v2, but I don't expect any results for some time to come, so we are on our own on this.
In the longer termn, there's work in progress in the IDN working group to add further constraints to IDNs: we should implement these as soon as they're announced (see bug 309435).
| Reporter | ||
Comment 5•20 years ago
|
||
Bug 316444 now filed for the real problem behind the full-stop-smuggling problem mentioned above.
| Reporter | ||
Comment 6•20 years ago
|
||
Bug 289588 has now been created for the Hangul filler rendering issue.
Depends on: 289588
Assignee: nobody → smontagu
Component: General → Internationalization
Product: Firefox → Core
QA Contact: general → amyy
Version: 1.5 Branch → 1.8 Branch
| Reporter | ||
Comment 7•16 years ago
|
||
A more general bug, bug 546013, has now been created to cover a wider variety of text-rendering issues involving unsupported characters, including this.
Although this needs to be fixed at the text rendering layer, it may also be worth adding some of the unsupported characters listed in the reference linked there to the IDN/URI character blacklist, to address issues such as the same IRI rendering differently in two different software packages.
Updated•14 years ago
|
Assignee: smontagu → nobody
Group: core-security
Component: Internationalization → Networking
Keywords: meta
QA Contact: amyy → networking
Whiteboard: [sg:investigate] → [sg:nse meta]
Version: 1.8 Branch → unspecified
Comment 8•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•