The IDN code contains a blacklist of characters that are confusingly similar to others that automatically kick us into the punycode display. This is stored as the pref "network.IDN.blacklist_chars". Apparently we're missing the box-drawing characters, and a recent talk at blackhat DC appears to have taken advantage of that to spoof an SSL connection to a bank. https://www.blackhat.com/presentations/bh-dc-09/Marlinspike/BlackHat-DC-09-Marlinspike-Defeating-SSL.pdf Since it's just a screen shot I can't be sure, but I can reproduce the results with those characters.
Specifically I meant 2571 BOX DRAWING LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT There are other potential omissions from the list, characters that look similar to other characters we've already included. Should maybe consider 066A ARABIC PERCENT SIGN (and maybe 0609 and 060A depending on font) 2052 COMMERCIAL MINUS SIGN both are percent signs with tiny dots, which on my system/font made a pretty good slash in the URL field. 2041 CARET INSERTION POINT (looks like a slash with a smudge on the bottom) If we've included 05C3 HEBREW PUNCTUATION SOF PASUQ because it looks like a colon, shouldn't we also have 02D0 MODIFIER LETTER TRIANGULAR COLON 0589 ARMENIAN FULL STOP 2236 RATIO A789 MODIFIER LETTER COLON We've got 0702 SYRIAC SUBLINEAR FULL STOP. We might want to look into 0701 SYRIAC SUPRALINEAR FULL STOP (I doubt this one) 0703 SYRIAC SUPRALINEAR COLON 0704 SYRIAC SUBLINEAR COLON We're inconsistent about fractions. We block on xBC and xBD but not xBE, and in the range 2153-215F which are all fractions we reject seven and allow six. One that doesn't work in my font but shows up in the Unicode charts as a slash-like char is 1735 PHILIPPINE SINGLE PUNCTUATION In any case, we need to add 2571.
Yep, chuck them all in. If we find a letter that has this problem, we may have to stop and ask more questions, but if it's punctuation, I have no problem adding it. Gerv
Many of those punctuation characters probably would have been coalesced under NAMEPREP had they existed in earlier Unicode versions. Looks like the IETF is in the process of standardizing "IDNA2008" to replace the earlier IDNA2003 (rfc 3490,3491,3492) we're following. I filed bug 479520 on looking into that for a future version, for now we should just blacklist these characters. I didn't find any characters scarier than what Moxie Marlinspike revealed at BlackHat, unhiding this bug to reduce duplicate filings.
Created attachment 363436 [details] [diff] [review] patch - v1
Comment on attachment 363436 [details] [diff] [review] patch - v1 r=dveditz This much looks good.
Comment on attachment 363436 [details] [diff] [review] patch - v1 approved for 126.96.36.199 and 188.8.131.52, a=dveditz for release-drivers
Fix checked into the 1.8 branch
Fix checked into the 1.9.0 branch
Verified that fix is checked in for 1.9.0.