Closed Bug 309311 Opened 14 years ago Closed 14 years ago

Add yet more characters to the IDN blacklist

Categories

(Core :: Networking, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.8rc1

People

(Reporter: usenet, Assigned: darin.moz)

References

(Blocks 1 open bug)

Details

(Keywords: fixed1.8, Whiteboard: [sg:investigate])

Attachments

(4 files, 6 obsolete files)

Yet more characters need to be added to the IDN domain name display blacklist.

Note: this is flagged as a security bug because some of these are potentially
usable for spoofing. 

Note: adding these to the list will not fix the spoofing-via-NAMEPREP problem.

A provisional list is:

Characters with special meaning in URLs or E-mail addresses (should not happen,
but let's list them anyway -- some can be generated by NAMEPREP from other
characters, even if not present in the input string):

/    SOLIDUS
:    COLON
@   COMMERCIAL AT
#   NUMBER SIGN
?   QUESTION MARK
;   SEMICOLON
%   PERCENT SIGN
& AMPERSAND
=   EQUALS SIGN
<   LESS-THAN SIGN
> GREATER-THAN SIGN
(   LEFT PARENTHESIS
)   RIGHT PARENTHESIS 

Spacing characters:

U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+115F HANGUL CHOSEONG FILLER
U+1160 HANGUL JUNGSEOUNG FILLER
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+200B ZERO WIDTH SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
U+3164 HANGUL FILLER
U+FEFF ZERO WIDTH NO-BREAK SPACE
U+FFA0 HALFWIDTH HANGUL FILLER 

Line separators -- these could totally screw with URL display:

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

Spoofs of FULL STOP and/or other label separators:

U+2024 ONE DOT LEADER
U+2027 HYPHENATION POINT 

Spoofs of SOLIDUS:

U+0337 COMBINING SHORT SOLIDUS OVERLAY
U+0338 COMBINING LONG SOLIDUS OVERLAY
U+2044 FRACTION SLASH
U+2215 DIVISION SLASH
U+23AE INTEGRAL EXTENSION
U+29F6 SOLIDUS WITH OVERBAR
U+29F8 BIG SOLIDUS
U+2AFB TRIPLE SOLIDUS BINARY RELATION
U+2AFD DOUBLE SOLIDUS OPERATOR
U+FF0F FULLWIDTH SOLIDUS 

Hangul fillers -- there is a suggestion that these are not needed in real-world
text encoding, and they certainly break at least one text renderer:

U+115F HANGUL CHOSEONG FILLER
U+1160 HANGUL JUNGSEOUNG FILLER
U+3164 HANGUL FILLER
U+FFA0 HALFWIDTH HANGUL FILLER 

We should also check which of the following are valid IDN label separators, make
sure they are implemented, and add any that are not to the blacklist:

U+06D4   ARABIC FULL STOP
U+0702   SYRIAC SUBLINEAR FULL STOP
U+3002   IDEOGRAPHIC FULL STOP
U+FF0E   FULLWIDTH FULL STOP
U+FF61   HALFWIDTH IDEOGRAPHIC FULL STOP 

Also, my reading of RFC 3743 seems to suggest that the ideographic description
characters may not be used in any CJK domain:

U+2FF0	IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT
U+2FF1	IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW
U+2FF2	IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT
U+2FF3	IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW
U+2FF4	IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND
U+2FF5	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE
U+2FF6	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW
U+2FF7	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT
U+2FF8	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT
U+2FF9	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT
U+2FFA	IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT
U+2FFB	IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID

For completeness' sake:
all the ISO-8859-1 control codes, high and low, and the DELETE character
See comment in Bug 307438 -- it will be difficult to fix this without a fix to
307438, as many of these characters are not displayable, and thus difficult to
edit into source files, and even more difficult to review without the aid of a
hex dump of the source.
Attached file Blocklist string maker in Python (obsolete) —
This tiny program may help to make the job of compiling the blocklist slightly
easier.
According to RFC 3490:

   1) Whenever dots are used as label separators, the following
      characters MUST be recognized as dots: U+002E (full stop), U+3002
      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
      (halfwidth ideographic full stop).

However, the state of the Arabic and Syriac full stop characters is unclear:

see http://www.nic.ps/idns/syria.pdf for some proposals, which seem to imply
that these characters were not supported as of the time of writing, and makes
the recommendation that the _hyphen_ or even _space_ be a valid label separator
in an Arabic context, which might well present some severe interoperability
problems.

More research needed...



Flags: blocking1.9a1?
Flags: blocking1.8b5?
Whiteboard: [sg:investigate]
Flags: blocking1.9a1?
Flags: blocking1.8b5?
Flags: blocking1.8b5+
This is the output of the above blocklist program, with character names given.

The blocklist string itself, in \x- and \u-escaped format, is:

"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\x20\x23\x25\x26\x28\x29\x2F\x3A\x3B\x3C\x3D\x3E\x3F\x40\x7F\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F\xA0\u0337\u0338\u06D4\u0702\u115F\u1160\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2024\u2027\u2028\u2029\u202F\u2044\u205F\u2215\u23AE\u29F6\u29F8\u2AFB\u2AFD\u2FF0\u2FF1\u2FF2\u2FF3\u2FF4\u2FF5\u2FF6\u2FF7\u2FF8\u2FF9\u2FFA\u2FFB\u3000\u3002\u3164\uFEFF\uFF0E\uFF0F\uFF61\uFFA0"
Looks like progress is being made over in bug 307438. Can we have \u rather than
\x escapes, please? As dveditz points out of there, they are easier to read and
more self-documenting.

Gerv
Another small Python program for generating blocklist candidates. This checks
all characters outside the ASCII range to see if they generate "bad" characters
after being fed through NAMEPREP. "Bad" characters are:
* those that spoof characters used for special purposes in strings in protocols
like SMTP and URLs
* control characters
Oops, 2*20 in the program above should read 2**20.

Given that, the output of the program above is:

U+00A0 NO-BREAK SPACE
U+00A8 DIAERESIS
U+00AF MACRON
U+00B4 ACUTE ACCENT
U+00B8 CEDILLA
U+02D8 BREVE
U+02D9 DOT ABOVE
U+02DA RING ABOVE
U+02DB OGONEK
U+02DC SMALL TILDE
U+02DD DOUBLE ACUTE ACCENT
U+037A GREEK YPOGEGRAMMENI
U+0384 GREEK TONOS
U+0385 GREEK DIALYTIKA TONOS
U+1FBD GREEK KORONIS
U+1FBF GREEK PSILI
U+1FC0 GREEK PERISPOMENI
U+1FC1 GREEK DIALYTIKA AND PERISPOMENI
U+1FCD GREEK PSILI AND VARIA
U+1FCE GREEK PSILI AND OXIA
U+1FCF GREEK PSILI AND PERISPOMENI
U+1FDD GREEK DASIA AND VARIA
U+1FDE GREEK DASIA AND OXIA
U+1FDF GREEK DASIA AND PERISPOMENI
U+1FED GREEK DIALYTIKA AND VARIA
U+1FEE GREEK DIALYTIKA AND OXIA
U+1FFD GREEK OXIA
U+1FFE GREEK DASIA
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+2017 DOUBLE LOW LINE
U+202F NARROW NO-BREAK SPACE
U+203E OVERLINE
U+205F MEDIUM MATHEMATICAL SPACE
U+207D SUPERSCRIPT LEFT PARENTHESIS
U+207E SUPERSCRIPT RIGHT PARENTHESIS
U+208D SUBSCRIPT LEFT PARENTHESIS
U+208E SUBSCRIPT RIGHT PARENTHESIS
U+2100 ACCOUNT OF
U+2101 ADDRESSED TO THE SUBJECT
U+2105 CARE OF
U+2106 CADA UNA
U+2474 PARENTHESIZED DIGIT ONE
U+2475 PARENTHESIZED DIGIT TWO
U+2476 PARENTHESIZED DIGIT THREE
U+2477 PARENTHESIZED DIGIT FOUR
U+2478 PARENTHESIZED DIGIT FIVE
U+2479 PARENTHESIZED DIGIT SIX
U+247A PARENTHESIZED DIGIT SEVEN
U+247B PARENTHESIZED DIGIT EIGHT
U+247C PARENTHESIZED DIGIT NINE
U+247D PARENTHESIZED NUMBER TEN
U+247E PARENTHESIZED NUMBER ELEVEN
U+247F PARENTHESIZED NUMBER TWELVE
U+2480 PARENTHESIZED NUMBER THIRTEEN
U+2481 PARENTHESIZED NUMBER FOURTEEN
U+2482 PARENTHESIZED NUMBER FIFTEEN
U+2483 PARENTHESIZED NUMBER SIXTEEN
U+2484 PARENTHESIZED NUMBER SEVENTEEN
U+2485 PARENTHESIZED NUMBER EIGHTEEN
U+2486 PARENTHESIZED NUMBER NINETEEN
U+2487 PARENTHESIZED NUMBER TWENTY
U+249C PARENTHESIZED LATIN SMALL LETTER A
U+249D PARENTHESIZED LATIN SMALL LETTER B
U+249E PARENTHESIZED LATIN SMALL LETTER C
U+249F PARENTHESIZED LATIN SMALL LETTER D
U+24A0 PARENTHESIZED LATIN SMALL LETTER E
U+24A1 PARENTHESIZED LATIN SMALL LETTER F
U+24A2 PARENTHESIZED LATIN SMALL LETTER G
U+24A3 PARENTHESIZED LATIN SMALL LETTER H
U+24A4 PARENTHESIZED LATIN SMALL LETTER I
U+24A5 PARENTHESIZED LATIN SMALL LETTER J
U+24A6 PARENTHESIZED LATIN SMALL LETTER K
U+24A7 PARENTHESIZED LATIN SMALL LETTER L
U+24A8 PARENTHESIZED LATIN SMALL LETTER M
U+24A9 PARENTHESIZED LATIN SMALL LETTER N
U+24AA PARENTHESIZED LATIN SMALL LETTER O
U+24AB PARENTHESIZED LATIN SMALL LETTER P
U+24AC PARENTHESIZED LATIN SMALL LETTER Q
U+24AD PARENTHESIZED LATIN SMALL LETTER R
U+24AE PARENTHESIZED LATIN SMALL LETTER S
U+24AF PARENTHESIZED LATIN SMALL LETTER T
U+24B0 PARENTHESIZED LATIN SMALL LETTER U
U+24B1 PARENTHESIZED LATIN SMALL LETTER V
U+24B2 PARENTHESIZED LATIN SMALL LETTER W
U+24B3 PARENTHESIZED LATIN SMALL LETTER X
U+24B4 PARENTHESIZED LATIN SMALL LETTER Y
U+24B5 PARENTHESIZED LATIN SMALL LETTER Z
U+2A74 DOUBLE COLON EQUAL
U+3000 IDEOGRAPHIC SPACE
U+309B KATAKANA-HIRAGANA VOICED SOUND MARK
U+309C KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
U+3200 PARENTHESIZED HANGUL KIYEOK
U+3201 PARENTHESIZED HANGUL NIEUN
U+3202 PARENTHESIZED HANGUL TIKEUT
U+3203 PARENTHESIZED HANGUL RIEUL
U+3204 PARENTHESIZED HANGUL MIEUM
U+3205 PARENTHESIZED HANGUL PIEUP
U+3206 PARENTHESIZED HANGUL SIOS
U+3207 PARENTHESIZED HANGUL IEUNG
U+3208 PARENTHESIZED HANGUL CIEUC
U+3209 PARENTHESIZED HANGUL CHIEUCH
U+320A PARENTHESIZED HANGUL KHIEUKH
U+320B PARENTHESIZED HANGUL THIEUTH
U+320C PARENTHESIZED HANGUL PHIEUPH
U+320D PARENTHESIZED HANGUL HIEUH
U+320E PARENTHESIZED HANGUL KIYEOK A
U+320F PARENTHESIZED HANGUL NIEUN A
U+3210 PARENTHESIZED HANGUL TIKEUT A
U+3211 PARENTHESIZED HANGUL RIEUL A
U+3212 PARENTHESIZED HANGUL MIEUM A
U+3213 PARENTHESIZED HANGUL PIEUP A
U+3214 PARENTHESIZED HANGUL SIOS A
U+3215 PARENTHESIZED HANGUL IEUNG A
U+3216 PARENTHESIZED HANGUL CIEUC A
U+3217 PARENTHESIZED HANGUL CHIEUCH A
U+3218 PARENTHESIZED HANGUL KHIEUKH A
U+3219 PARENTHESIZED HANGUL THIEUTH A
U+321A PARENTHESIZED HANGUL PHIEUPH A
U+321B PARENTHESIZED HANGUL HIEUH A
U+321C PARENTHESIZED HANGUL CIEUC U
U+3220 PARENTHESIZED IDEOGRAPH ONE
U+3221 PARENTHESIZED IDEOGRAPH TWO
U+3222 PARENTHESIZED IDEOGRAPH THREE
U+3223 PARENTHESIZED IDEOGRAPH FOUR
U+3224 PARENTHESIZED IDEOGRAPH FIVE
U+3225 PARENTHESIZED IDEOGRAPH SIX
U+3226 PARENTHESIZED IDEOGRAPH SEVEN
U+3227 PARENTHESIZED IDEOGRAPH EIGHT
U+3228 PARENTHESIZED IDEOGRAPH NINE
U+3229 PARENTHESIZED IDEOGRAPH TEN
U+322A PARENTHESIZED IDEOGRAPH MOON
U+322B PARENTHESIZED IDEOGRAPH FIRE
U+322C PARENTHESIZED IDEOGRAPH WATER
U+322D PARENTHESIZED IDEOGRAPH WOOD
U+322E PARENTHESIZED IDEOGRAPH METAL
U+322F PARENTHESIZED IDEOGRAPH EARTH
U+3230 PARENTHESIZED IDEOGRAPH SUN
U+3231 PARENTHESIZED IDEOGRAPH STOCK
U+3232 PARENTHESIZED IDEOGRAPH HAVE
U+3233 PARENTHESIZED IDEOGRAPH SOCIETY
U+3234 PARENTHESIZED IDEOGRAPH NAME
U+3235 PARENTHESIZED IDEOGRAPH SPECIAL
U+3236 PARENTHESIZED IDEOGRAPH FINANCIAL
U+3237 PARENTHESIZED IDEOGRAPH CONGRATULATION
U+3238 PARENTHESIZED IDEOGRAPH LABOR
U+3239 PARENTHESIZED IDEOGRAPH REPRESENT
U+323A PARENTHESIZED IDEOGRAPH CALL
U+323B PARENTHESIZED IDEOGRAPH STUDY
U+323C PARENTHESIZED IDEOGRAPH SUPERVISE
U+323D PARENTHESIZED IDEOGRAPH ENTERPRISE
U+323E PARENTHESIZED IDEOGRAPH RESOURCE
U+323F PARENTHESIZED IDEOGRAPH ALLIANCE
U+3240 PARENTHESIZED IDEOGRAPH FESTIVAL
U+3241 PARENTHESIZED IDEOGRAPH REST
U+3242 PARENTHESIZED IDEOGRAPH SELF
U+3243 PARENTHESIZED IDEOGRAPH REACH
U+FC5E ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM
U+FC5F ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM
U+FC60 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM
U+FC61 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM
U+FC62 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM
U+FC63 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM
U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
U+FDFB ARABIC LIGATURE JALLAJALALOUHOU
U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
U+FE49 DASHED OVERLINE
U+FE4A CENTRELINE OVERLINE
U+FE4B WAVY OVERLINE
U+FE4C DOUBLE WAVY OVERLINE
U+FE55 SMALL COLON
U+FE59 SMALL LEFT PARENTHESIS
U+FE5A SMALL RIGHT PARENTHESIS
U+FE64 SMALL LESS-THAN SIGN
U+FE65 SMALL GREATER-THAN SIGN
U+FE68 SMALL REVERSE SOLIDUS
U+FE6B SMALL COMMERCIAL AT
U+FE70 ARABIC FATHATAN ISOLATED FORM
U+FE72 ARABIC DAMMATAN ISOLATED FORM
U+FE74 ARABIC KASRATAN ISOLATED FORM
U+FE76 ARABIC FATHA ISOLATED FORM
U+FE78 ARABIC DAMMA ISOLATED FORM
U+FE7A ARABIC KASRA ISOLATED FORM
U+FE7C ARABIC SHADDA ISOLATED FORM
U+FE7E ARABIC SUKUN ISOLATED FORM
U+FF08 FULLWIDTH LEFT PARENTHESIS
U+FF09 FULLWIDTH RIGHT PARENTHESIS
U+FF0F FULLWIDTH SOLIDUS
U+FF1A FULLWIDTH COLON
U+FF1C FULLWIDTH LESS-THAN SIGN
U+FF1E FULLWIDTH GREATER-THAN SIGN
U+FF20 FULLWIDTH COMMERCIAL AT
U+FF3B FULLWIDTH LEFT SQUARE BRACKET
U+FF3C FULLWIDTH REVERSE SOLIDUS
U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
U+FFE3 FULLWIDTH MACRON
Revised version of the program above, specifies what characters are being
generated by NAMEPREP.
Attachment #197757 - Attachment is obsolete: true
Output of the revised version of the program, which is more specific about what
is setting off the spoof detector: results need checking for false positives.

U+00A0 NO-BREAK SPACE spoofs SPACE
U+00A8 DIAERESIS spoofs SPACE, COMBINING DIAERESIS
U+00AF MACRON spoofs SPACE, COMBINING MACRON
U+00B4 ACUTE ACCENT spoofs SPACE, COMBINING ACUTE ACCENT
U+00B8 CEDILLA spoofs SPACE, COMBINING CEDILLA
U+02D8 BREVE spoofs SPACE, COMBINING BREVE
U+02D9 DOT ABOVE spoofs SPACE, COMBINING DOT ABOVE
U+02DA RING ABOVE spoofs SPACE, COMBINING RING ABOVE
U+02DB OGONEK spoofs SPACE, COMBINING OGONEK
U+02DC SMALL TILDE spoofs SPACE, COMBINING TILDE
U+02DD DOUBLE ACUTE ACCENT spoofs SPACE, COMBINING DOUBLE ACUTE ACCENT
U+037A GREEK YPOGEGRAMMENI spoofs SPACE, GREEK SMALL LETTER IOTA
U+0384 GREEK TONOS spoofs SPACE, COMBINING ACUTE ACCENT
U+0385 GREEK DIALYTIKA TONOS spoofs SPACE, COMBINING DIAERESIS, COMBINING ACUTE
ACCENT
U+1FBD GREEK KORONIS spoofs SPACE, COMBINING COMMA ABOVE
U+1FBF GREEK PSILI spoofs SPACE, COMBINING COMMA ABOVE
U+1FC0 GREEK PERISPOMENI spoofs SPACE, COMBINING GREEK PERISPOMENI
U+1FC1 GREEK DIALYTIKA AND PERISPOMENI spoofs SPACE, COMBINING DIAERESIS,
COMBINING GREEK PERISPOMENI
U+1FCD GREEK PSILI AND VARIA spoofs SPACE, COMBINING COMMA ABOVE, COMBINING
GRAVE ACCENT
U+1FCE GREEK PSILI AND OXIA spoofs SPACE, COMBINING COMMA ABOVE, COMBINING ACUTE
ACCENT
U+1FCF GREEK PSILI AND PERISPOMENI spoofs SPACE, COMBINING COMMA ABOVE,
COMBINING GREEK PERISPOMENI
U+1FDD GREEK DASIA AND VARIA spoofs SPACE, COMBINING REVERSED COMMA ABOVE,
COMBINING GRAVE ACCENT
U+1FDE GREEK DASIA AND OXIA spoofs SPACE, COMBINING REVERSED COMMA ABOVE,
COMBINING ACUTE ACCENT
U+1FDF GREEK DASIA AND PERISPOMENI spoofs SPACE, COMBINING REVERSED COMMA ABOVE,
COMBINING GREEK PERISPOMENI
U+1FED GREEK DIALYTIKA AND VARIA spoofs SPACE, COMBINING DIAERESIS, COMBINING
GRAVE ACCENT
U+1FEE GREEK DIALYTIKA AND OXIA spoofs SPACE, COMBINING DIAERESIS, COMBINING
ACUTE ACCENT
U+1FFD GREEK OXIA spoofs SPACE, COMBINING ACUTE ACCENT
U+1FFE GREEK DASIA spoofs SPACE, COMBINING REVERSED COMMA ABOVE
U+2000 EN QUAD spoofs SPACE
U+2001 EM QUAD spoofs SPACE
U+2002 EN SPACE spoofs SPACE
U+2003 EM SPACE spoofs SPACE
U+2004 THREE-PER-EM SPACE spoofs SPACE
U+2005 FOUR-PER-EM SPACE spoofs SPACE
U+2006 SIX-PER-EM SPACE spoofs SPACE
U+2007 FIGURE SPACE spoofs SPACE
U+2008 PUNCTUATION SPACE spoofs SPACE
U+2009 THIN SPACE spoofs SPACE
U+200A HAIR SPACE spoofs SPACE
U+2017 DOUBLE LOW LINE spoofs SPACE, COMBINING DOUBLE LOW LINE
U+202F NARROW NO-BREAK SPACE spoofs SPACE
U+203E OVERLINE spoofs SPACE, COMBINING OVERLINE
U+205F MEDIUM MATHEMATICAL SPACE spoofs SPACE
U+207D SUPERSCRIPT LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+207E SUPERSCRIPT RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+208D SUBSCRIPT LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+208E SUBSCRIPT RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+2100 ACCOUNT OF spoofs LATIN SMALL LETTER A, SOLIDUS, LATIN SMALL LETTER C
U+2101 ADDRESSED TO THE SUBJECT spoofs LATIN SMALL LETTER A, SOLIDUS, LATIN
SMALL LETTER S
U+2105 CARE OF spoofs LATIN SMALL LETTER C, SOLIDUS, LATIN SMALL LETTER O
U+2106 CADA UNA spoofs LATIN SMALL LETTER C, SOLIDUS, LATIN SMALL LETTER U
U+2474 PARENTHESIZED DIGIT ONE spoofs LEFT PARENTHESIS, DIGIT ONE, RIGHT PARENTHESIS
U+2475 PARENTHESIZED DIGIT TWO spoofs LEFT PARENTHESIS, DIGIT TWO, RIGHT PARENTHESIS
U+2476 PARENTHESIZED DIGIT THREE spoofs LEFT PARENTHESIS, DIGIT THREE, RIGHT
PARENTHESIS
U+2477 PARENTHESIZED DIGIT FOUR spoofs LEFT PARENTHESIS, DIGIT FOUR, RIGHT
PARENTHESIS
U+2478 PARENTHESIZED DIGIT FIVE spoofs LEFT PARENTHESIS, DIGIT FIVE, RIGHT
PARENTHESIS
U+2479 PARENTHESIZED DIGIT SIX spoofs LEFT PARENTHESIS, DIGIT SIX, RIGHT PARENTHESIS
U+247A PARENTHESIZED DIGIT SEVEN spoofs LEFT PARENTHESIS, DIGIT SEVEN, RIGHT
PARENTHESIS
U+247B PARENTHESIZED DIGIT EIGHT spoofs LEFT PARENTHESIS, DIGIT EIGHT, RIGHT
PARENTHESIS
U+247C PARENTHESIZED DIGIT NINE spoofs LEFT PARENTHESIS, DIGIT NINE, RIGHT
PARENTHESIS
U+247D PARENTHESIZED NUMBER TEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT ZERO,
RIGHT PARENTHESIS
U+247E PARENTHESIZED NUMBER ELEVEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
ONE, RIGHT PARENTHESIS
U+247F PARENTHESIZED NUMBER TWELVE spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
TWO, RIGHT PARENTHESIS
U+2480 PARENTHESIZED NUMBER THIRTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
THREE, RIGHT PARENTHESIS
U+2481 PARENTHESIZED NUMBER FOURTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
FOUR, RIGHT PARENTHESIS
U+2482 PARENTHESIZED NUMBER FIFTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
FIVE, RIGHT PARENTHESIS
U+2483 PARENTHESIZED NUMBER SIXTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
SIX, RIGHT PARENTHESIS
U+2484 PARENTHESIZED NUMBER SEVENTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
SEVEN, RIGHT PARENTHESIS
U+2485 PARENTHESIZED NUMBER EIGHTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
EIGHT, RIGHT PARENTHESIS
U+2486 PARENTHESIZED NUMBER NINETEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
NINE, RIGHT PARENTHESIS
U+2487 PARENTHESIZED NUMBER TWENTY spoofs LEFT PARENTHESIS, DIGIT TWO, DIGIT
ZERO, RIGHT PARENTHESIS
U+249C PARENTHESIZED LATIN SMALL LETTER A spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER A, RIGHT PARENTHESIS
U+249D PARENTHESIZED LATIN SMALL LETTER B spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER B, RIGHT PARENTHESIS
U+249E PARENTHESIZED LATIN SMALL LETTER C spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER C, RIGHT PARENTHESIS
U+249F PARENTHESIZED LATIN SMALL LETTER D spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER D, RIGHT PARENTHESIS
U+24A0 PARENTHESIZED LATIN SMALL LETTER E spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER E, RIGHT PARENTHESIS
U+24A1 PARENTHESIZED LATIN SMALL LETTER F spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER F, RIGHT PARENTHESIS
U+24A2 PARENTHESIZED LATIN SMALL LETTER G spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER G, RIGHT PARENTHESIS
U+24A3 PARENTHESIZED LATIN SMALL LETTER H spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER H, RIGHT PARENTHESIS
U+24A4 PARENTHESIZED LATIN SMALL LETTER I spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER I, RIGHT PARENTHESIS
U+24A5 PARENTHESIZED LATIN SMALL LETTER J spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER J, RIGHT PARENTHESIS
U+24A6 PARENTHESIZED LATIN SMALL LETTER K spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER K, RIGHT PARENTHESIS
U+24A7 PARENTHESIZED LATIN SMALL LETTER L spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER L, RIGHT PARENTHESIS
U+24A8 PARENTHESIZED LATIN SMALL LETTER M spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER M, RIGHT PARENTHESIS
U+24A9 PARENTHESIZED LATIN SMALL LETTER N spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER N, RIGHT PARENTHESIS
U+24AA PARENTHESIZED LATIN SMALL LETTER O spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER O, RIGHT PARENTHESIS
U+24AB PARENTHESIZED LATIN SMALL LETTER P spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER P, RIGHT PARENTHESIS
U+24AC PARENTHESIZED LATIN SMALL LETTER Q spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Q, RIGHT PARENTHESIS
U+24AD PARENTHESIZED LATIN SMALL LETTER R spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER R, RIGHT PARENTHESIS
U+24AE PARENTHESIZED LATIN SMALL LETTER S spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER S, RIGHT PARENTHESIS
U+24AF PARENTHESIZED LATIN SMALL LETTER T spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER T, RIGHT PARENTHESIS
U+24B0 PARENTHESIZED LATIN SMALL LETTER U spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER U, RIGHT PARENTHESIS
U+24B1 PARENTHESIZED LATIN SMALL LETTER V spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER V, RIGHT PARENTHESIS
U+24B2 PARENTHESIZED LATIN SMALL LETTER W spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER W, RIGHT PARENTHESIS
U+24B3 PARENTHESIZED LATIN SMALL LETTER X spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER X, RIGHT PARENTHESIS
U+24B4 PARENTHESIZED LATIN SMALL LETTER Y spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Y, RIGHT PARENTHESIS
U+24B5 PARENTHESIZED LATIN SMALL LETTER Z spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Z, RIGHT PARENTHESIS
U+2A74 DOUBLE COLON EQUAL spoofs COLON, COLON, EQUALS SIGN
U+3000 IDEOGRAPHIC SPACE spoofs SPACE
U+309B KATAKANA-HIRAGANA VOICED SOUND MARK spoofs SPACE, COMBINING
KATAKANA-HIRAGANA VOICED SOUND MARK
U+309C KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK spoofs SPACE, COMBINING
KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
U+3200 PARENTHESIZED HANGUL KIYEOK spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
KIYEOK, RIGHT PARENTHESIS
U+3201 PARENTHESIZED HANGUL NIEUN spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
NIEUN, RIGHT PARENTHESIS
U+3202 PARENTHESIZED HANGUL TIKEUT spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
TIKEUT, RIGHT PARENTHESIS
U+3203 PARENTHESIZED HANGUL RIEUL spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
RIEUL, RIGHT PARENTHESIS
U+3204 PARENTHESIZED HANGUL MIEUM spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
MIEUM, RIGHT PARENTHESIS
U+3205 PARENTHESIZED HANGUL PIEUP spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
PIEUP, RIGHT PARENTHESIS
U+3206 PARENTHESIZED HANGUL SIOS spoofs LEFT PARENTHESIS, HANGUL CHOSEONG SIOS,
RIGHT PARENTHESIS
U+3207 PARENTHESIZED HANGUL IEUNG spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
IEUNG, RIGHT PARENTHESIS
U+3208 PARENTHESIZED HANGUL CIEUC spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
CIEUC, RIGHT PARENTHESIS
U+3209 PARENTHESIZED HANGUL CHIEUCH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
CHIEUCH, RIGHT PARENTHESIS
U+320A PARENTHESIZED HANGUL KHIEUKH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
KHIEUKH, RIGHT PARENTHESIS
U+320B PARENTHESIZED HANGUL THIEUTH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
THIEUTH, RIGHT PARENTHESIS
U+320C PARENTHESIZED HANGUL PHIEUPH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
PHIEUPH, RIGHT PARENTHESIS
U+320D PARENTHESIZED HANGUL HIEUH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
HIEUH, RIGHT PARENTHESIS
U+320E PARENTHESIZED HANGUL KIYEOK A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
GA, RIGHT PARENTHESIS
U+320F PARENTHESIZED HANGUL NIEUN A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE NA,
RIGHT PARENTHESIS
U+3210 PARENTHESIZED HANGUL TIKEUT A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
DA, RIGHT PARENTHESIS
U+3211 PARENTHESIZED HANGUL RIEUL A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE RA,
RIGHT PARENTHESIS
U+3212 PARENTHESIZED HANGUL MIEUM A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE MA,
RIGHT PARENTHESIS
U+3213 PARENTHESIZED HANGUL PIEUP A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE BA,
RIGHT PARENTHESIS
U+3214 PARENTHESIZED HANGUL SIOS A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE SA,
RIGHT PARENTHESIS
U+3215 PARENTHESIZED HANGUL IEUNG A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE A,
RIGHT PARENTHESIS
U+3216 PARENTHESIZED HANGUL CIEUC A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE JA,
RIGHT PARENTHESIS
U+3217 PARENTHESIZED HANGUL CHIEUCH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
CA, RIGHT PARENTHESIS
U+3218 PARENTHESIZED HANGUL KHIEUKH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
KA, RIGHT PARENTHESIS
U+3219 PARENTHESIZED HANGUL THIEUTH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
TA, RIGHT PARENTHESIS
U+321A PARENTHESIZED HANGUL PHIEUPH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
PA, RIGHT PARENTHESIS
U+321B PARENTHESIZED HANGUL HIEUH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE HA,
RIGHT PARENTHESIS
U+321C PARENTHESIZED HANGUL CIEUC U spoofs LEFT PARENTHESIS, HANGUL SYLLABLE JU,
RIGHT PARENTHESIS
U+3220 PARENTHESIZED IDEOGRAPH ONE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E00, RIGHT PARENTHESIS
U+3221 PARENTHESIZED IDEOGRAPH TWO spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E8C, RIGHT PARENTHESIS
U+3222 PARENTHESIZED IDEOGRAPH THREE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E09, RIGHT PARENTHESIS
U+3223 PARENTHESIZED IDEOGRAPH FOUR spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-56DB, RIGHT PARENTHESIS
U+3224 PARENTHESIZED IDEOGRAPH FIVE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E94, RIGHT PARENTHESIS
U+3225 PARENTHESIZED IDEOGRAPH SIX spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-516D, RIGHT PARENTHESIS
U+3226 PARENTHESIZED IDEOGRAPH SEVEN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E03, RIGHT PARENTHESIS
U+3227 PARENTHESIZED IDEOGRAPH EIGHT spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-516B, RIGHT PARENTHESIS
U+3228 PARENTHESIZED IDEOGRAPH NINE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E5D, RIGHT PARENTHESIS
U+3229 PARENTHESIZED IDEOGRAPH TEN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5341, RIGHT PARENTHESIS
U+322A PARENTHESIZED IDEOGRAPH MOON spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6708, RIGHT PARENTHESIS
U+322B PARENTHESIZED IDEOGRAPH FIRE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-706B, RIGHT PARENTHESIS
U+322C PARENTHESIZED IDEOGRAPH WATER spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6C34, RIGHT PARENTHESIS
U+322D PARENTHESIZED IDEOGRAPH WOOD spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6728, RIGHT PARENTHESIS
U+322E PARENTHESIZED IDEOGRAPH METAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-91D1, RIGHT PARENTHESIS
U+322F PARENTHESIZED IDEOGRAPH EARTH spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-571F, RIGHT PARENTHESIS
U+3230 PARENTHESIZED IDEOGRAPH SUN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-65E5, RIGHT PARENTHESIS
U+3231 PARENTHESIZED IDEOGRAPH STOCK spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-682A, RIGHT PARENTHESIS
U+3232 PARENTHESIZED IDEOGRAPH HAVE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6709, RIGHT PARENTHESIS
U+3233 PARENTHESIZED IDEOGRAPH SOCIETY spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-793E, RIGHT PARENTHESIS
U+3234 PARENTHESIZED IDEOGRAPH NAME spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-540D, RIGHT PARENTHESIS
U+3235 PARENTHESIZED IDEOGRAPH SPECIAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-7279, RIGHT PARENTHESIS
U+3236 PARENTHESIZED IDEOGRAPH FINANCIAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-8CA1, RIGHT PARENTHESIS
U+3237 PARENTHESIZED IDEOGRAPH CONGRATULATION spoofs LEFT PARENTHESIS, CJK
UNIFIED IDEOGRAPH-795D, RIGHT PARENTHESIS
U+3238 PARENTHESIZED IDEOGRAPH LABOR spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-52B4, RIGHT PARENTHESIS
U+3239 PARENTHESIZED IDEOGRAPH REPRESENT spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4EE3, RIGHT PARENTHESIS
U+323A PARENTHESIZED IDEOGRAPH CALL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-547C, RIGHT PARENTHESIS
U+323B PARENTHESIZED IDEOGRAPH STUDY spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5B66, RIGHT PARENTHESIS
U+323C PARENTHESIZED IDEOGRAPH SUPERVISE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-76E3, RIGHT PARENTHESIS
U+323D PARENTHESIZED IDEOGRAPH ENTERPRISE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4F01, RIGHT PARENTHESIS
U+323E PARENTHESIZED IDEOGRAPH RESOURCE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-8CC7, RIGHT PARENTHESIS
U+323F PARENTHESIZED IDEOGRAPH ALLIANCE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5354, RIGHT PARENTHESIS
U+3240 PARENTHESIZED IDEOGRAPH FESTIVAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-796D, RIGHT PARENTHESIS
U+3241 PARENTHESIZED IDEOGRAPH REST spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4F11, RIGHT PARENTHESIS
U+3242 PARENTHESIZED IDEOGRAPH SELF spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-81EA, RIGHT PARENTHESIS
U+3243 PARENTHESIZED IDEOGRAPH REACH spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-81F3, RIGHT PARENTHESIS
U+FC5E ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM spoofs SPACE, ARABIC
DAMMATAN, ARABIC SHADDA
U+FC5F ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM spoofs SPACE, ARABIC
KASRATAN, ARABIC SHADDA
U+FC60 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM spoofs SPACE, ARABIC
FATHA, ARABIC SHADDA
U+FC61 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM spoofs SPACE, ARABIC
DAMMA, ARABIC SHADDA
U+FC62 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM spoofs SPACE, ARABIC
KASRA, ARABIC SHADDA
U+FC63 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM spoofs SPACE,
ARABIC SHADDA, ARABIC LETTER SUPERSCRIPT ALEF
U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM spoofs ARABIC LETTER SAD,
ARABIC LETTER LAM, ARABIC LETTER ALEF MAKSURA, SPACE, ARABIC LETTER ALEF, ARABIC
LETTER LAM, ARABIC LETTER LAM, ARABIC LETTER HEH, SPACE, ARABIC LETTER AIN,
ARABIC LETTER LAM, ARABIC LETTER YEH, ARABIC LETTER HEH, SPACE, ARABIC LETTER
WAW, ARABIC LETTER SEEN, ARABIC LETTER LAM, ARABIC LETTER MEEM
U+FDFB ARABIC LIGATURE JALLAJALALOUHOU spoofs ARABIC LETTER JEEM, ARABIC LETTER
LAM, SPACE, ARABIC LETTER JEEM, ARABIC LETTER LAM, ARABIC LETTER ALEF, ARABIC
LETTER LAM, ARABIC LETTER HEH
U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FE49 DASHED OVERLINE spoofs SPACE, COMBINING OVERLINE
U+FE4A CENTRELINE OVERLINE spoofs SPACE, COMBINING OVERLINE
U+FE4B WAVY OVERLINE spoofs SPACE, COMBINING OVERLINE
U+FE4C DOUBLE WAVY OVERLINE spoofs SPACE, COMBINING OVERLINE
U+FE55 SMALL COLON spoofs COLON
U+FE59 SMALL LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FE5A SMALL RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FE64 SMALL LESS-THAN SIGN spoofs LESS-THAN SIGN
U+FE65 SMALL GREATER-THAN SIGN spoofs GREATER-THAN SIGN
U+FE68 SMALL REVERSE SOLIDUS spoofs REVERSE SOLIDUS
U+FE6B SMALL COMMERCIAL AT spoofs COMMERCIAL AT
U+FE70 ARABIC FATHATAN ISOLATED FORM spoofs SPACE, ARABIC FATHATAN
U+FE72 ARABIC DAMMATAN ISOLATED FORM spoofs SPACE, ARABIC DAMMATAN
U+FE74 ARABIC KASRATAN ISOLATED FORM spoofs SPACE, ARABIC KASRATAN
U+FE76 ARABIC FATHA ISOLATED FORM spoofs SPACE, ARABIC FATHA
U+FE78 ARABIC DAMMA ISOLATED FORM spoofs SPACE, ARABIC DAMMA
U+FE7A ARABIC KASRA ISOLATED FORM spoofs SPACE, ARABIC KASRA
U+FE7C ARABIC SHADDA ISOLATED FORM spoofs SPACE, ARABIC SHADDA
U+FE7E ARABIC SUKUN ISOLATED FORM spoofs SPACE, ARABIC SUKUN
U+FF08 FULLWIDTH LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FF09 FULLWIDTH RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FF0F FULLWIDTH SOLIDUS spoofs SOLIDUS
U+FF1A FULLWIDTH COLON spoofs COLON
U+FF1C FULLWIDTH LESS-THAN SIGN spoofs LESS-THAN SIGN
U+FF1E FULLWIDTH GREATER-THAN SIGN spoofs GREATER-THAN SIGN
U+FF20 FULLWIDTH COMMERCIAL AT spoofs COMMERCIAL AT
U+FF3B FULLWIDTH LEFT SQUARE BRACKET spoofs LEFT SQUARE BRACKET
U+FF3C FULLWIDTH REVERSE SOLIDUS spoofs REVERSE SOLIDUS
U+FF3D FULLWIDTH RIGHT SQUARE BRACKET spoofs RIGHT SQUARE BRACKET
U+FFE3 FULLWIDTH MACRON spoofs SPACE, COMBINING MACRON


The same list, hand-edited to remove instances of SPACE immediately followed by
a combining character, which should not result in a visual spoof:

U+00A0 NO-BREAK SPACE spoofs SPACE
U+037A GREEK YPOGEGRAMMENI spoofs SPACE, GREEK SMALL LETTER IOTA
U+2000 EN QUAD spoofs SPACE
U+2001 EM QUAD spoofs SPACE
U+2002 EN SPACE spoofs SPACE
U+2003 EM SPACE spoofs SPACE
U+2004 THREE-PER-EM SPACE spoofs SPACE
U+2005 FOUR-PER-EM SPACE spoofs SPACE
U+2006 SIX-PER-EM SPACE spoofs SPACE
U+2007 FIGURE SPACE spoofs SPACE
U+2008 PUNCTUATION SPACE spoofs SPACE
U+2009 THIN SPACE spoofs SPACE
U+200A HAIR SPACE spoofs SPACE
U+202F NARROW NO-BREAK SPACE spoofs SPACE
U+205F MEDIUM MATHEMATICAL SPACE spoofs SPACE
U+207D SUPERSCRIPT LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+207E SUPERSCRIPT RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+208D SUBSCRIPT LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+208E SUBSCRIPT RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+2100 ACCOUNT OF spoofs LATIN SMALL LETTER A, SOLIDUS, LATIN SMALL LETTER C
U+2101 ADDRESSED TO THE SUBJECT spoofs LATIN SMALL LETTER A, SOLIDUS, LATIN
SMALL LETTER S
U+2105 CARE OF spoofs LATIN SMALL LETTER C, SOLIDUS, LATIN SMALL LETTER O
U+2106 CADA UNA spoofs LATIN SMALL LETTER C, SOLIDUS, LATIN SMALL LETTER U
U+2474 PARENTHESIZED DIGIT ONE spoofs LEFT PARENTHESIS, DIGIT ONE, RIGHT PARENTHESIS
U+2475 PARENTHESIZED DIGIT TWO spoofs LEFT PARENTHESIS, DIGIT TWO, RIGHT PARENTHESIS
U+2476 PARENTHESIZED DIGIT THREE spoofs LEFT PARENTHESIS, DIGIT THREE, RIGHT
PARENTHESIS
U+2477 PARENTHESIZED DIGIT FOUR spoofs LEFT PARENTHESIS, DIGIT FOUR, RIGHT
PARENTHESIS
U+2478 PARENTHESIZED DIGIT FIVE spoofs LEFT PARENTHESIS, DIGIT FIVE, RIGHT
PARENTHESIS
U+2479 PARENTHESIZED DIGIT SIX spoofs LEFT PARENTHESIS, DIGIT SIX, RIGHT PARENTHESIS
U+247A PARENTHESIZED DIGIT SEVEN spoofs LEFT PARENTHESIS, DIGIT SEVEN, RIGHT
PARENTHESIS
U+247B PARENTHESIZED DIGIT EIGHT spoofs LEFT PARENTHESIS, DIGIT EIGHT, RIGHT
PARENTHESIS
U+247C PARENTHESIZED DIGIT NINE spoofs LEFT PARENTHESIS, DIGIT NINE, RIGHT
PARENTHESIS
U+247D PARENTHESIZED NUMBER TEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT ZERO,
RIGHT PARENTHESIS
U+247E PARENTHESIZED NUMBER ELEVEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
ONE, RIGHT PARENTHESIS
U+247F PARENTHESIZED NUMBER TWELVE spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
TWO, RIGHT PARENTHESIS
U+2480 PARENTHESIZED NUMBER THIRTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
THREE, RIGHT PARENTHESIS
U+2481 PARENTHESIZED NUMBER FOURTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
FOUR, RIGHT PARENTHESIS
U+2482 PARENTHESIZED NUMBER FIFTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
FIVE, RIGHT PARENTHESIS
U+2483 PARENTHESIZED NUMBER SIXTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
SIX, RIGHT PARENTHESIS
U+2484 PARENTHESIZED NUMBER SEVENTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
SEVEN, RIGHT PARENTHESIS
U+2485 PARENTHESIZED NUMBER EIGHTEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
EIGHT, RIGHT PARENTHESIS
U+2486 PARENTHESIZED NUMBER NINETEEN spoofs LEFT PARENTHESIS, DIGIT ONE, DIGIT
NINE, RIGHT PARENTHESIS
U+2487 PARENTHESIZED NUMBER TWENTY spoofs LEFT PARENTHESIS, DIGIT TWO, DIGIT
ZERO, RIGHT PARENTHESIS
U+249C PARENTHESIZED LATIN SMALL LETTER A spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER A, RIGHT PARENTHESIS
U+249D PARENTHESIZED LATIN SMALL LETTER B spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER B, RIGHT PARENTHESIS
U+249E PARENTHESIZED LATIN SMALL LETTER C spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER C, RIGHT PARENTHESIS
U+249F PARENTHESIZED LATIN SMALL LETTER D spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER D, RIGHT PARENTHESIS
U+24A0 PARENTHESIZED LATIN SMALL LETTER E spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER E, RIGHT PARENTHESIS
U+24A1 PARENTHESIZED LATIN SMALL LETTER F spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER F, RIGHT PARENTHESIS
U+24A2 PARENTHESIZED LATIN SMALL LETTER G spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER G, RIGHT PARENTHESIS
U+24A3 PARENTHESIZED LATIN SMALL LETTER H spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER H, RIGHT PARENTHESIS
U+24A4 PARENTHESIZED LATIN SMALL LETTER I spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER I, RIGHT PARENTHESIS
U+24A5 PARENTHESIZED LATIN SMALL LETTER J spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER J, RIGHT PARENTHESIS
U+24A6 PARENTHESIZED LATIN SMALL LETTER K spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER K, RIGHT PARENTHESIS
U+24A7 PARENTHESIZED LATIN SMALL LETTER L spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER L, RIGHT PARENTHESIS
U+24A8 PARENTHESIZED LATIN SMALL LETTER M spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER M, RIGHT PARENTHESIS
U+24A9 PARENTHESIZED LATIN SMALL LETTER N spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER N, RIGHT PARENTHESIS
U+24AA PARENTHESIZED LATIN SMALL LETTER O spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER O, RIGHT PARENTHESIS
U+24AB PARENTHESIZED LATIN SMALL LETTER P spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER P, RIGHT PARENTHESIS
U+24AC PARENTHESIZED LATIN SMALL LETTER Q spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Q, RIGHT PARENTHESIS
U+24AD PARENTHESIZED LATIN SMALL LETTER R spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER R, RIGHT PARENTHESIS
U+24AE PARENTHESIZED LATIN SMALL LETTER S spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER S, RIGHT PARENTHESIS
U+24AF PARENTHESIZED LATIN SMALL LETTER T spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER T, RIGHT PARENTHESIS
U+24B0 PARENTHESIZED LATIN SMALL LETTER U spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER U, RIGHT PARENTHESIS
U+24B1 PARENTHESIZED LATIN SMALL LETTER V spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER V, RIGHT PARENTHESIS
U+24B2 PARENTHESIZED LATIN SMALL LETTER W spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER W, RIGHT PARENTHESIS
U+24B3 PARENTHESIZED LATIN SMALL LETTER X spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER X, RIGHT PARENTHESIS
U+24B4 PARENTHESIZED LATIN SMALL LETTER Y spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Y, RIGHT PARENTHESIS
U+24B5 PARENTHESIZED LATIN SMALL LETTER Z spoofs LEFT PARENTHESIS, LATIN SMALL
LETTER Z, RIGHT PARENTHESIS
U+2A74 DOUBLE COLON EQUAL spoofs COLON, COLON, EQUALS SIGN
U+3000 IDEOGRAPHIC SPACE spoofs SPACE
U+3200 PARENTHESIZED HANGUL KIYEOK spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
KIYEOK, RIGHT PARENTHESIS
U+3201 PARENTHESIZED HANGUL NIEUN spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
NIEUN, RIGHT PARENTHESIS
U+3202 PARENTHESIZED HANGUL TIKEUT spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
TIKEUT, RIGHT PARENTHESIS
U+3203 PARENTHESIZED HANGUL RIEUL spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
RIEUL, RIGHT PARENTHESIS
U+3204 PARENTHESIZED HANGUL MIEUM spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
MIEUM, RIGHT PARENTHESIS
U+3205 PARENTHESIZED HANGUL PIEUP spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
PIEUP, RIGHT PARENTHESIS
U+3206 PARENTHESIZED HANGUL SIOS spoofs LEFT PARENTHESIS, HANGUL CHOSEONG SIOS,
RIGHT PARENTHESIS
U+3207 PARENTHESIZED HANGUL IEUNG spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
IEUNG, RIGHT PARENTHESIS
U+3208 PARENTHESIZED HANGUL CIEUC spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
CIEUC, RIGHT PARENTHESIS
U+3209 PARENTHESIZED HANGUL CHIEUCH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
CHIEUCH, RIGHT PARENTHESIS
U+320A PARENTHESIZED HANGUL KHIEUKH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
KHIEUKH, RIGHT PARENTHESIS
U+320B PARENTHESIZED HANGUL THIEUTH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
THIEUTH, RIGHT PARENTHESIS
U+320C PARENTHESIZED HANGUL PHIEUPH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
PHIEUPH, RIGHT PARENTHESIS
U+320D PARENTHESIZED HANGUL HIEUH spoofs LEFT PARENTHESIS, HANGUL CHOSEONG
HIEUH, RIGHT PARENTHESIS
U+320E PARENTHESIZED HANGUL KIYEOK A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
GA, RIGHT PARENTHESIS
U+320F PARENTHESIZED HANGUL NIEUN A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE NA,
RIGHT PARENTHESIS
U+3210 PARENTHESIZED HANGUL TIKEUT A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
DA, RIGHT PARENTHESIS
U+3211 PARENTHESIZED HANGUL RIEUL A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE RA,
RIGHT PARENTHESIS
U+3212 PARENTHESIZED HANGUL MIEUM A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE MA,
RIGHT PARENTHESIS
U+3213 PARENTHESIZED HANGUL PIEUP A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE BA,
RIGHT PARENTHESIS
U+3214 PARENTHESIZED HANGUL SIOS A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE SA,
RIGHT PARENTHESIS
U+3215 PARENTHESIZED HANGUL IEUNG A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE A,
RIGHT PARENTHESIS
U+3216 PARENTHESIZED HANGUL CIEUC A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE JA,
RIGHT PARENTHESIS
U+3217 PARENTHESIZED HANGUL CHIEUCH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
CA, RIGHT PARENTHESIS
U+3218 PARENTHESIZED HANGUL KHIEUKH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
KA, RIGHT PARENTHESIS
U+3219 PARENTHESIZED HANGUL THIEUTH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
TA, RIGHT PARENTHESIS
U+321A PARENTHESIZED HANGUL PHIEUPH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE
PA, RIGHT PARENTHESIS
U+321B PARENTHESIZED HANGUL HIEUH A spoofs LEFT PARENTHESIS, HANGUL SYLLABLE HA,
RIGHT PARENTHESIS
U+321C PARENTHESIZED HANGUL CIEUC U spoofs LEFT PARENTHESIS, HANGUL SYLLABLE JU,
RIGHT PARENTHESIS
U+3220 PARENTHESIZED IDEOGRAPH ONE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E00, RIGHT PARENTHESIS
U+3221 PARENTHESIZED IDEOGRAPH TWO spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E8C, RIGHT PARENTHESIS
U+3222 PARENTHESIZED IDEOGRAPH THREE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E09, RIGHT PARENTHESIS
U+3223 PARENTHESIZED IDEOGRAPH FOUR spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-56DB, RIGHT PARENTHESIS
U+3224 PARENTHESIZED IDEOGRAPH FIVE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E94, RIGHT PARENTHESIS
U+3225 PARENTHESIZED IDEOGRAPH SIX spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-516D, RIGHT PARENTHESIS
U+3226 PARENTHESIZED IDEOGRAPH SEVEN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E03, RIGHT PARENTHESIS
U+3227 PARENTHESIZED IDEOGRAPH EIGHT spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-516B, RIGHT PARENTHESIS
U+3228 PARENTHESIZED IDEOGRAPH NINE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4E5D, RIGHT PARENTHESIS
U+3229 PARENTHESIZED IDEOGRAPH TEN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5341, RIGHT PARENTHESIS
U+322A PARENTHESIZED IDEOGRAPH MOON spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6708, RIGHT PARENTHESIS
U+322B PARENTHESIZED IDEOGRAPH FIRE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-706B, RIGHT PARENTHESIS
U+322C PARENTHESIZED IDEOGRAPH WATER spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6C34, RIGHT PARENTHESIS
U+322D PARENTHESIZED IDEOGRAPH WOOD spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6728, RIGHT PARENTHESIS
U+322E PARENTHESIZED IDEOGRAPH METAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-91D1, RIGHT PARENTHESIS
U+322F PARENTHESIZED IDEOGRAPH EARTH spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-571F, RIGHT PARENTHESIS
U+3230 PARENTHESIZED IDEOGRAPH SUN spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-65E5, RIGHT PARENTHESIS
U+3231 PARENTHESIZED IDEOGRAPH STOCK spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-682A, RIGHT PARENTHESIS
U+3232 PARENTHESIZED IDEOGRAPH HAVE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-6709, RIGHT PARENTHESIS
U+3233 PARENTHESIZED IDEOGRAPH SOCIETY spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-793E, RIGHT PARENTHESIS
U+3234 PARENTHESIZED IDEOGRAPH NAME spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-540D, RIGHT PARENTHESIS
U+3235 PARENTHESIZED IDEOGRAPH SPECIAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-7279, RIGHT PARENTHESIS
U+3236 PARENTHESIZED IDEOGRAPH FINANCIAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-8CA1, RIGHT PARENTHESIS
U+3237 PARENTHESIZED IDEOGRAPH CONGRATULATION spoofs LEFT PARENTHESIS, CJK
UNIFIED IDEOGRAPH-795D, RIGHT PARENTHESIS
U+3238 PARENTHESIZED IDEOGRAPH LABOR spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-52B4, RIGHT PARENTHESIS
U+3239 PARENTHESIZED IDEOGRAPH REPRESENT spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4EE3, RIGHT PARENTHESIS
U+323A PARENTHESIZED IDEOGRAPH CALL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-547C, RIGHT PARENTHESIS
U+323B PARENTHESIZED IDEOGRAPH STUDY spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5B66, RIGHT PARENTHESIS
U+323C PARENTHESIZED IDEOGRAPH SUPERVISE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-76E3, RIGHT PARENTHESIS
U+323D PARENTHESIZED IDEOGRAPH ENTERPRISE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4F01, RIGHT PARENTHESIS
U+323E PARENTHESIZED IDEOGRAPH RESOURCE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-8CC7, RIGHT PARENTHESIS
U+323F PARENTHESIZED IDEOGRAPH ALLIANCE spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-5354, RIGHT PARENTHESIS
U+3240 PARENTHESIZED IDEOGRAPH FESTIVAL spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-796D, RIGHT PARENTHESIS
U+3241 PARENTHESIZED IDEOGRAPH REST spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-4F11, RIGHT PARENTHESIS
U+3242 PARENTHESIZED IDEOGRAPH SELF spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-81EA, RIGHT PARENTHESIS
U+3243 PARENTHESIZED IDEOGRAPH REACH spoofs LEFT PARENTHESIS, CJK UNIFIED
IDEOGRAPH-81F3, RIGHT PARENTHESIS
U+FC5E ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM spoofs SPACE, ARABIC
DAMMATAN, ARABIC SHADDA
U+FC5F ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM spoofs SPACE, ARABIC
KASRATAN, ARABIC SHADDA
U+FC60 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM spoofs SPACE, ARABIC
FATHA, ARABIC SHADDA
U+FC61 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM spoofs SPACE, ARABIC
DAMMA, ARABIC SHADDA
U+FC62 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM spoofs SPACE, ARABIC
KASRA, ARABIC SHADDA
U+FC63 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM spoofs SPACE,
ARABIC SHADDA, ARABIC LETTER SUPERSCRIPT ALEF
U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM spoofs ARABIC LETTER SAD,
ARABIC LETTER LAM, ARABIC LETTER ALEF MAKSURA, SPACE, ARABIC LETTER ALEF, ARABIC
LETTER LAM, ARABIC LETTER LAM, ARABIC LETTER HEH, SPACE, ARABIC LETTER AIN,
ARABIC LETTER LAM, ARABIC LETTER YEH, ARABIC LETTER HEH, SPACE, ARABIC LETTER
WAW, ARABIC LETTER SEEN, ARABIC LETTER LAM, ARABIC LETTER MEEM
U+FDFB ARABIC LIGATURE JALLAJALALOUHOU spoofs ARABIC LETTER JEEM, ARABIC LETTER
LAM, SPACE, ARABIC LETTER JEEM, ARABIC LETTER LAM, ARABIC LETTER ALEF, ARABIC
LETTER LAM, ARABIC LETTER HEH
U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FE55 SMALL COLON spoofs COLON
U+FE59 SMALL LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FE5A SMALL RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FE64 SMALL LESS-THAN SIGN spoofs LESS-THAN SIGN
U+FE65 SMALL GREATER-THAN SIGN spoofs GREATER-THAN SIGN
U+FE68 SMALL REVERSE SOLIDUS spoofs REVERSE SOLIDUS
U+FE6B SMALL COMMERCIAL AT spoofs COMMERCIAL AT
U+FE70 ARABIC FATHATAN ISOLATED FORM spoofs SPACE, ARABIC FATHATAN
U+FE72 ARABIC DAMMATAN ISOLATED FORM spoofs SPACE, ARABIC DAMMATAN
U+FE74 ARABIC KASRATAN ISOLATED FORM spoofs SPACE, ARABIC KASRATAN
U+FE76 ARABIC FATHA ISOLATED FORM spoofs SPACE, ARABIC FATHA
U+FE78 ARABIC DAMMA ISOLATED FORM spoofs SPACE, ARABIC DAMMA
U+FE7A ARABIC KASRA ISOLATED FORM spoofs SPACE, ARABIC KASRA
U+FE7C ARABIC SHADDA ISOLATED FORM spoofs SPACE, ARABIC SHADDA
U+FE7E ARABIC SUKUN ISOLATED FORM spoofs SPACE, ARABIC SUKUN
U+FF08 FULLWIDTH LEFT PARENTHESIS spoofs LEFT PARENTHESIS
U+FF09 FULLWIDTH RIGHT PARENTHESIS spoofs RIGHT PARENTHESIS
U+FF0F FULLWIDTH SOLIDUS spoofs SOLIDUS
U+FF1A FULLWIDTH COLON spoofs COLON
U+FF1C FULLWIDTH LESS-THAN SIGN spoofs LESS-THAN SIGN
U+FF1E FULLWIDTH GREATER-THAN SIGN spoofs GREATER-THAN SIGN
U+FF20 FULLWIDTH COMMERCIAL AT spoofs COMMERCIAL AT
U+FF3B FULLWIDTH LEFT SQUARE BRACKET spoofs LEFT SQUARE BRACKET
U+FF3C FULLWIDTH REVERSE SOLIDUS spoofs REVERSE SOLIDUS
U+FF3D FULLWIDTH RIGHT SQUARE BRACKET spoofs RIGHT SQUARE BRACKET
U+FFE3 FULLWIDTH MACRON spoofs SPACE, COMBINING MACRON
Plus, now \u format escapes only, as per request.
Attachment #196789 - Attachment is obsolete: true
Attachment #197552 - Attachment is obsolete: true
can we blacklist characters < U+0080 in another place, like net_IsValidHostName?
that way, they are caught even when IDN is disabled.
Assignee: nobody → darin
Component: Security → Networking
Product: Firefox → Core
QA Contact: firefox → benc
Version: unspecified → Trunk
OS: Linux → All
Hardware: PC → All
*** Bug 309133 has been marked as a duplicate of this bug. ***
*** Bug 301694 has been marked as a duplicate of this bug. ***
Neil (usenet@tonal):

a) Can you please set a name in your Bugzilla preferences? :-)

b) Can we please have one issue per bug, and one bug per issue? 

I've just spent an hour decoding and rationalising eight bugs on different but
overlapping pieces of the IDN character spoofing blacklist problem; I've
designated this one as the master for the blacklist. 

There seems to be a problem that we are checking for bad characters, and then
doing NAMEPREP, whereas we should be doing it the other way round. Is that
right? If so, the fix is to reverse the tests, not to try and blacklist every
character which namepreps to something we think is bad.

Gerv
I thought the whole point of NAMEPREP was to map similar looking characters to
the same thing, so people can use the glyphs that look right to them but we
don't have to worry about spoofing.

Checking before NAMEPREP seems wrong. In fact, some of the characters we're
blacklisting in other bugs seem like they should be added to NAMEPREP instead
and mapped to the character they resemble. Does that require changing the standard?
We seem to be slowly but surely separating the protocol-spoof aspect from the
visual-spoof aspect of what seemed at first to be a single problem.

Yes, changing NAMEPREP would require a change to the standard, which would be
hard to achieve without incompatibility, see http://nameprep.org/ for some work
on trying to clarify NAMEPREP.

I agree that checking for "protocol characters" in NAMEPREP output would
_greatly_ reduce the size of the Unicode blacklist, which, as you can see, has
become enormous. It's also the Obvious Right Thing to do.

There's one awkward case, though: some characters normalize to character
sequences contain ASCII space characters _followed by combining characters_.
These _are_ visually safe, since they will appear as a non-space glyph when
composed: however, they may still form a protocol risk to software when passed
to other layers of software or external packages.

I can't at the moment think of an easy way to resolve this, other than blocking
all of these cases, and later revisiting the decision if it blocks any
real-world language constructs. Most of them shouldn't, since they only appear
to be used to render in isolation symbols which would otherwide be used in
combination with other characters -- for example, a grave accent or cedilla on
their own. However, I can't claim to know the usage for all possible languages.

An even stronger check that the "protocol-characters" rule would be to treat as
invalid _all_ characters in NAMEPREP output with codes less than 160, apart from
the LDH characters (see the ICANN proposal 2.0 for motivation: they are
"punctuation characters without grammatical meaning") -- with the one exception
of the apostrophe, for use in constructions like "O'Brien", which does not seem
unreasonable to me.  

This has the advantage of working by explicit inclusion, rather than by building
a list of exclusion cases, and would also naturally catch both sets of control
characters and DEL as well This approach seems to me to be a better principle
for building secure systems.
We've only got a couple of days to get all these blackliste chars in for beta.
Can we get a patch for this ASAP?
OK, I'm working towards a final list. 

NOTE WELL: This list will assume that bad-character checking for "protocol"
characters is carried first _after_ NAMEPREP has been performed on the label,
and not before: this list will ignore all cases of characters that NAMEPREP to
protocol characters, and will consist only of visual spoofs of protocol
characters that won't be caught by the above, spacing, filling, and line control
characters, some characters that break renderers, and other particularly
egregious characters that definitely should not be present in output strings.


This is the latest program to generate a blacklist: it only consists of a
commented list of characters, and a few lines of Python to compile it into a
Javascript string using \u escapes.
Attachment #197759 - Attachment is obsolete: true
Attachment #197765 - Attachment is obsolete: true
This is the output of the program above, with all the codepoints named, and a
sorted Javascipt \u-escaped string that encodes the list.
This is the latest blacklist string itself, cut from the program output above.
It encodes 88 Unicode characters, in a total of 88*6 = 528 ASCII characters. See
the immediately previous attachments for rationale, and the orginial output.

It aims to be reasonably conservative, but errs on the side of caution with
spacing characters and spoofed ASCII punctuation not caught by NAMEPREP, which I
consider particularly dangerous. It should be OK for 1.5b2. However, please note
that I have not yet tested this properly, since I do not yet have a version of
the code thar supports \u escapes, so please test before generating a patch.

Again, please note: this assumes that "bad character" detection for
protocol-spoof characters in labels (essentially, most ASCII punctuation, and
spaces) has already been fixed to act _after_ rather than before, NAMEPREP.

The string is:
"\u0020\u00A0\u00BC\u00BD\u01C3\u0337\u0338\u05C3\u05F4\u06D4\u0702\u115F\u1160\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2024\u2027\u2028\u2029\u202F\u2039\u203A\u2044\u205F\u2154\u2155\u2156\u2159\u215A\u215B\u215F\u2215\u23AE\u29F6\u29F8\u2AFB\u2AFD\u2FF0\u2FF1\u2FF2\u2FF3\u2FF4\u2FF5\u2FF6\u2FF7\u2FF8\u2FF9\u2FFA\u2FFB\u3000\u3002\u3014\u3015\u3033\u3164\u321D\u321E\u33AE\u33AF\u33C6\u33DF\uFE14\uFE15\uFE3F\uFE5D\uFE5E\uFEFF\uFF0E\uFF0F\uFF61\uFFA0\uFFF9\uFFFA\uFFFB\uFFFC\uFFFD\uFFFE\uFFFF"



Depends on: 310734
Is it better to put the list in a pref. file than to hard-code it (we have a
rather compact way to hard-code a list of unicode characters)? The advantage of
the former would be that it's easy to ship a hot-fix, but that advantage would
vanish with darin's new update method, wouldn't it? 
We're wrapping up 1.5 beta 2 and something needs to happen here today. What do
we need to do to get this bug resolved today?
Neil: bug 307438 is in, so we have \u support. Can you please roll in anything
we need from bug 304316 (if you have time), make a patch containing the current
list and post it here? It's only a beta, so we don't have to get it _quite_
perfect yet, but we need something. The deadling is 11.59pm PST tonight!

Gerv
After talking with Darin, we're going to push this out to after the beta2 (first
RC).
Flags: blocking1.8b5+ → blocking1.8b5-
(In reply to comment #23)
> Is it better to put the list in a pref. file than to hard-code it (we have a
> rather compact way to hard-code a list of unicode characters)? The advantage of
> the former would be that it's easy to ship a hot-fix, but that advantage would
> vanish with darin's new update method, wouldn't it? 

jshin: that's a good point, but users "might" appreciate being able to twiddle
the contents of a pref file to modify the blacklist.
I don't think we need to worry about user modifiability; even if 1 in a million
people changes the list, no-one is ever going to set up a DNS label using these
chars, because everyone else can't use it.

Putting it in prefs.js probably reduces the size of any update; but with the new
tech that's not all that critical either.

Gerv
Flags: blocking1.8rc1?
Target Milestone: --- → mozilla1.8rc1
What are we waiting on here? Time is quickly running out. We need to see a patch
here in the next day or two if this is going to make 1.5.
It sounds like we've taken all we're gonna take for this for 1.5. If I'm
mistaken, please renominate with an explanation of what's left to do. Thanks.
Flags: blocking1.8rc1?
This is a set of testcases for this patch. It also has the side-effect of
testing some of the lower-level DNS bad character blocking logic.
Attached patch Proposed patch for this bug. (obsolete) — Splinter Review
This implements the blacklist. Applied to 1.5b2 "Deer Park Beta 2" source, and
checked against the testcases in the attachment above. It blocks most of them:
the remaining cases appear to be issues for the lower-level DNS bad character
blocker.

Passes the prechecking tests given at
http://www.mozilla.org/quality/precheckin-tests.html

Please review. Sorry about the delay in producing a patch.
Spotted and fixed some errors in the comments with the previous patch. Changes
should be cosmetic only, but I will re-test anyway.
Attachment #199874 - Attachment is obsolete: true
Re-tested. Passes same testcases as before, and precheckin tests.
Attachment #199875 - Flags: superreview?(dveditz)
Attachment #199875 - Flags: review?(gerv)
Comment on attachment 199875 [details] [diff] [review]
Patch for this bug, version 2.

sr=dveditz
Attachment #199875 - Flags: superreview?(dveditz) → superreview+
Attachment #199875 - Flags: approval1.8rc1?
Comment on attachment 199875 [details] [diff] [review]
Patch for this bug, version 2.

r=gerv on the approach, although I don't have the facilities to test it right
now.

Gerv
Attachment #199875 - Flags: review?(gerv) → review+
Flags: blocking1.8rc1?
Attachment #199875 - Flags: approval1.8rc1? → approval1.8rc1+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
fixed1.8
Keywords: fixed1.8
blocking flag cleanup.
Flags: blocking1.8rc1? → blocking1.8rc1+
Blocks: 316730
*** Bug 304316 has been marked as a duplicate of this bug. ***
Group: security
No longer blocks: 1317346
You need to log in before you can comment on or make changes to this bug.