Closed
Bug 304316
Opened 19 years ago
Closed 19 years ago
Expand IDN character blacklist
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
DUPLICATE
of bug 309311
mozilla1.8beta4
People
(Reporter: gerv, Assigned: gerv)
Details
(Whiteboard: [sg:dupe 309311])
We may need to expand our IDN character blacklist. Currently, it includes two homographs of the "/" character. It also needs to include homographs of other URL punctuation characters, such as ".", "?", "#", "&" and ":". We have a character blacklist as well as TLD restrictions because registries have no control of the levels above the directly-registered level. The blacklist is to prevent labels at that level pretending to be at a different level; hence the focus on URL punctuation characters. Gerv
Assignee | ||
Comment 1•19 years ago
|
||
I will contact Opera to see what characters they are blacklisting. Gerv
Severity: normal → major
Flags: blocking1.8b4?
Target Milestone: --- → mozilla1.8beta4
Updated•19 years ago
|
Whiteboard: [sg:investigate]
Presumably it only needs to blacklist those characters that are NOT normalized by the IDN normalization rules, which include NFKC Unicode normalization with some additional rules for homographs of "." (U+3002, U+FF0E, U+FF61). When characters are normalized according to those rules, we always use the normalized form in the status bar, in the URL bar (unless the user is typing and they haven't pressed Enter yet), etc.
Updated•19 years ago
|
Flags: blocking1.8b4? → blocking1.8b4+
Assignee | ||
Comment 3•19 years ago
|
||
dbaron: indeed. If our decoding code transforms a given dodgy domain name into a different one altogether, and then displays that and tries to visit it, it's as if the person linked there originally, and there's no problem. We just need to make sure there's no inconsistency or seeming inconsistency between what's displayed and the place visited. Gerv
Assignee | ||
Comment 4•19 years ago
|
||
I got the following message from Yngve Pettersen of Opera: The list used in 7.5x is: 0021-0023; 0025-002C; 002F; Forward slash 003B-0040; 005C; 005E; 007B-007D; 007F; "%" and "/" are the most important of these, since they can impact both visual and machine interpretation of a URI, especially if the servername is converted, and then later parsed again. The others were added because they are not (or should not be) valid in a DNS name. (Hmmm, on second thought, not sure U+002B "+" should be on the list) In 8.0 beta I added two sets of characters to the strict list The first list is 2000-206f; General punctuation 2215; fractional slash The fractional slash is a homograph of "/" and should therefore be covered by the same rules as "/" in the above list. I suspect there are more of these (IIRC I have seen references that indicate that). Blocking this character impacts U+33C6 which is folded into a sequence that includes U+2215 by stringprep (the only one in the fold list). The punctuation section was added based on a suggestion in the unicode discussion list that was forwarded to me, and IMO should be excluded, although U+2010 and some others should probably be folded to "-" (U+002D). The second set was added also added based on the same list as general punctuation, but I am not 100% sure these should be excluded; it may be overkill. I'll leave final decision to the experts. 1680-16FF; 2400-243f; 2500-257f; FB00-FB4F; FE50-FE6F; FF00-FFEF; 10000-100FF; 10300-1032F; 10400-1047F; AFAICT (I am not a Unicode expert), at least the alphanumeric half/fullwidths in FF00-FFEF are normalized to normal ASCII. The referenced list (which includes a number of ranges already covered by stringprep) was -------------- * Box Drawing * Block Elements * Geometric Shapes * Miscellaneous Symbols * Dingbats * Byzantine Musical Symbols * Musical Symbols * Mathematical Alphanumeric Symbols * Letterlike Symbols * Number Forms * Arrows * Mathematical Operators * Miscellaneous Technical * Combining Marks for Symbols * Control Pictures * Optical Character Recognition * Enclosed Alphanumerics * Miscellaneous Mathematical Symbols-A * Supplemental Arrows-A * Supplemental Arrows-B * Miscellaneous Mathematical Symbols-B * Supplemental Mathematical Operators * Miscellaneous Symbols and Arrows * High Surrogates * Low Surrogates * Private Use Area * Alphabetic Presentation Forms * Small Form Variants * Halfwidth and Fullwidth Forms * Variation Selectors * Tags * Specials * Variation Selectors Supplement * Supplementary Private Use Area-A * Supplementary Private Use Area-B * Linear B Syllabary * Linear B Ideograms * Shavian * Deseret * Ugaritic * Old Italic * Ogham * Runic * General Punctuation --------------
Assignee | ||
Comment 5•19 years ago
|
||
That list probably isn't right for us to use as-is; we need to answer the following questions: - Do we already have code that will barf on the non-LDH ASCII characters in the first list? - Do we want to include "+"? (I believe Opera may have later removed that one due to encountering problems) - Do we want to include large swathes of punctuation, or only characters which directly spoof URL punctuation? - If we are going to have a large set of blocked characters, do we need a better storage method than a character list? My suggested approach is to comb their list looking for homographs of the characters listed in comment 0, and just add those. But perhaps other people have a different view. Gerv
Updated•19 years ago
|
Flags: blocking1.8b5+
Updated•19 years ago
|
Flags: blocking1.8b5+
Comment 6•19 years ago
|
||
Does this bug need to be security sensitive given public bug 301694? Clearly people know we need *a* blacklist, and which particular characters are or are not on it would seem to benefit us from the many eyes theory making sure we don't miss any. In fact, is this not a plain duplicate of bug 301694?
Assignee | ||
Comment 7•19 years ago
|
||
This list needs reconciling with the one in bug 309311, which is the canonical list. Once that's done, we can dupe it. Gerv
Comment 8•19 years ago
|
||
Gerv, can you carry the contents of this bug over to the other and dupe this?
Comment 9•19 years ago
|
||
We've only got a couple of days to get all these blackliste chars in for beta. If they're not in by monday, they might not make it.
Comment 10•19 years ago
|
||
After talking with Darin, we're going to push this out to after the beta2 (first RC).
Flags: blocking1.8b5+ → blocking1.8b5-
Comment 11•19 years ago
|
||
So what's the story here? Are we done?
Assignee | ||
Comment 12•19 years ago
|
||
I don't think this bug has further intrinsic value. Gerv *** This bug has been marked as a duplicate of 309311 ***
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
Updated•18 years ago
|
Group: security
Whiteboard: [sg:investigate] → [sg:dupe 309311]
You need to log in
before you can comment on or make changes to this bug.
Description
•