Closed
Bug 301694
Opened 18 years ago
Closed 18 years ago
Create IDN blacklist that include 'DIVISION SLASH'(U+2215) and 'FRACTION SLASH'(U+2044)
Categories
(Core :: Networking, defect, P1)
Core
Networking
Tracking
()
RESOLVED
DUPLICATE
of bug 309311
mozilla1.8beta4
People
(Reporter: masayuki, Assigned: masayuki)
References
Details
Attachments
(2 files, 6 obsolete files)
3.18 KB,
text/html;charset=UTF-8
|
Details | |
1.34 KB,
patch
|
Details | Diff | Splinter Review |
This is remain work of bug 283016.
Assignee | ||
Updated•18 years ago
|
Status: NEW → ASSIGNED
Flags: blocking1.8b4?
Priority: -- → P1
Target Milestone: --- → mozilla1.8beta4
Assignee | ||
Comment 1•18 years ago
|
||
Attachment #190114 -
Flags: superreview?(darin)
Attachment #190114 -
Flags: review+
Attachment #190114 -
Flags: approval1.8b4?
Comment 2•18 years ago
|
||
Suggested comment: // If a domain includes any of the following characters, it may be a spoof // attempt and so we always display the domain name as punycode. This would // override the settings "network.IDN_show_punycode" and // "netword.IDN.whitelist.*". // The list currently contains the characters: // U+2215 DIVISION SLASH // U+2044 FRACTION SLASH We should list the chars as not all editors can display them. Gerv
Comment 3•18 years ago
|
||
Comment on attachment 190114 [details] [diff] [review] Patch rv1.0 Please do not request approval until reviews are completed.
Attachment #190114 -
Flags: approval1.8b4?
Assignee | ||
Comment 4•18 years ago
|
||
Assignee | ||
Updated•18 years ago
|
Attachment #190114 -
Attachment is obsolete: true
Attachment #190137 -
Flags: superreview?(darin)
Attachment #190137 -
Flags: review+
Assignee | ||
Updated•18 years ago
|
Attachment #190114 -
Flags: superreview?(darin)
Attachment #190114 -
Flags: review-
Attachment #190114 -
Flags: review+
Comment 5•18 years ago
|
||
'DIVISION SLASH' (U+2215) and 'FRACTION SLASH' (U+2044) are examples of unicode Pattern_Syntax characters. In addition to these two, you should probably consider all of the other Pattern_Syntax characters as illegal. There are a total of 2006 of them (unfortunately). I've uploaded a complete list, extracted from http://www.unicode.org/reports/tr36/draft/idn-chars.txt, and expressed as a C array.
Comment 6•18 years ago
|
||
Tony: we are marking these two particularly as illegal because they can be used to spoof URL separators. The IAB (Internet Architecture Board), the parent of the IETF, has put together a working group to consider restricting the characters permissible in IDN. We don't want to prejudice that work, so we are keeping our blacklist limited to the more seriously problematic characters. Gerv
Updated•18 years ago
|
Attachment #190137 -
Flags: superreview?(darin) → superreview+
Assignee | ||
Updated•18 years ago
|
Attachment #190137 -
Flags: approval1.8b4?
Updated•18 years ago
|
Attachment #190137 -
Flags: approval1.8b4? → approval1.8b4+
Assignee | ||
Comment 7•18 years ago
|
||
checked-in
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Updated•18 years ago
|
Flags: blocking1.8b4?
Comment 8•18 years ago
|
||
Regarding the full list of 2006 characters: playing around with various transformations and compressions of the list gives an apparent limiting complexity of about 300 to 500 bytes for compressing the list (roughly 1.2 bits per character). I can get it down to about 500 - 750 bytes with some bitmap and range-table compression (15-way switch statement for 8 MSBs, with bitmap lookups for the LSBs, or none if entire MSB range is empty) but just doing a simple C switch statement generates code that's about 1280 bytes on an x86 with gcc and optimization, and this form of the code is much more readable and maintainable and does not need a hand-hacked "pre-compiler" to generate the bitmaps etc. At the moment, it does not look like the super-efficient implementation is needed, unless there's a desperate need to save 500 bytes of code. Challenge: can anyone do better than 750 bytes for the code+data for this? If you need the super-efficient implementation for the full set of characters, drop me an E-mail.
Comment 9•18 years ago
|
||
can nsIUGenCategory be used?
Assignee | ||
Comment 10•18 years ago
|
||
See bug 286535.
Comment 11•18 years ago
|
||
*** Bug 286535 has been marked as a duplicate of this bug. ***
Comment 12•18 years ago
|
||
The following are also possible Unicode homographs for (one or more) slashes, and should be added to this blocklist, in addition to the existing U+2215 and U+2044: Combining long solidus overlay U+0338 ̸ <br> Fullwidth solidus U+FF0F / <br> Big solidus U+29F8 ⧸<br> Solidus with overbar U+29F6 ⧶<br> Triple solidus binary relation U+2AFB ⫻<br> Double solidus operator U+2AFD ⫽<br> Combining short solidus overlay U+0337 ̷<br> Integral extension U+23AE ⎮<br> Note: some of these are from math fonts, and some may be visible depending on combining-character rendering. The "integral extension" character, in particular, may be rendered either as a vertical bar or a slash-like sloping line, depending on font. None of these (in my opinion) are reasonable characters for use in identifiers.
Comment 13•18 years ago
|
||
This Python program generates the expanded blacklist as a standard C hex-escaped UTF-8 string from a human-readable listing of the offending characters.
Comment 14•18 years ago
|
||
This adds the new proposed characters, and uses Javascript \u escape format, which makes for a more self-documenting blacklist string. See http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Guide:Unicode for documentation of this string escape format.
Comment 15•18 years ago
|
||
Comment on attachment 193309 [details] [diff] [review] Created a patch to add new blacklist characters, with more self-documenting string format Great - I like the new format. r=gerv. Gerv
Attachment #193309 -
Flags: superreview?(darin)
Attachment #193309 -
Flags: review+
Comment 16•18 years ago
|
||
Reopening; but if there are any further patches, let's do them in new bugs. Gerv
Updated•18 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•18 years ago
|
||
Another thought: can we turn that pref into a regexp? That way, we could include character ranges if that ever became necessary. Gerv
Comment 18•18 years ago
|
||
gerv: not easily, no... the networking engine is rather far removed from anything that does regexp processing
Comment 19•18 years ago
|
||
Where in the code is this check, that is expanded here, happening?
Comment 20•18 years ago
|
||
Comment on attachment 193309 [details] [diff] [review] Created a patch to add new blacklist characters, with more self-documenting string format sr=darin
Attachment #193309 -
Flags: superreview?(darin) → superreview+
Updated•18 years ago
|
Attachment #193309 -
Flags: approval1.8b4?
Comment 21•18 years ago
|
||
andreas: see the patch attached to bug 283016. Gerv
Comment 22•18 years ago
|
||
that's in IDN code, and the only thing it does is show the host as punycode instead of the unicode form, afaict. therefore, I think bug 304904 should be fixed independently of this blacklist.
Comment 23•18 years ago
|
||
There is a check with isOnlySafeChars, but I suspect it will only get triggered if the IDNService is used, which does not happen with ascii-only hostsnames I think. Both checks are necessary, maybe it is possible to chare the ascii portion of the blacklist.
Updated•18 years ago
|
Attachment #193309 -
Flags: approval1.8b4? → approval1.8b4+
Comment 24•18 years ago
|
||
Fixed on trunk and branch. Any further updates in new bugs, please. Checking in ./modules/libpref/src/init/all.js; /cvsroot/mozilla/modules/libpref/src/init/all.js,v <-- all.js new revision: 3.591; previous revision: 3.590 done --- Checking in ./modules/libpref/src/init/all.js; /cvsroot/mozilla/modules/libpref/src/init/all.js,v <-- all.js new revision: 3.585.2.5; previous revision: 3.585.2.4 done Gerv
Status: REOPENED → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → FIXED
Comment 25•18 years ago
|
||
guys, why does this work? I'm told this causes an NS_WARNING in prefread.cpp, due to lack of support for \u. http://lxr.mozilla.org/seamonkey/source/modules/libpref/src/prefread.cpp#388 This file is not parsed by a JS interpreter, but by a specialized parser. Did anyone test this patch? reopening because of that
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•18 years ago
|
Flags: blocking1.8b5?
Assignee | ||
Comment 26•18 years ago
|
||
Please re-create or back out the latest patch. The patch makes infinite loop on the test case. See http://bugbug.cocolog-nifty.xn--cominfo-ef0d.nekodama.com/icons/nekodama64b.gif usenet: You *MUST* test before attaching a patch.
Comment 27•18 years ago
|
||
Gah... i'm also seeing the NS_WARNING in my trunk build. I figured the code had been tested when I reviewed this (and forgot that I never coded support for \u-escape sequences in prefread.cpp) :(
Comment 28•18 years ago
|
||
I backed this patch out on the trunk.
Comment 29•18 years ago
|
||
I filed bug 307438 on adding support for \u escape sequences to prefread.cpp
Updated•18 years ago
|
Attachment #193309 -
Flags: superreview+ → superreview-
Comment 30•18 years ago
|
||
Darin, is this critical to get backed out for Firefox 1.5 beta 1?
Comment 31•18 years ago
|
||
Asa: Hmm.. yes, I think so. I would have backed it out had I realized that it was commited on the 1.8 branch. Apparently, no one added the fixed1.8 keyword! :(
Comment 32•18 years ago
|
||
OK, I backed out the patch from the 1.8 branch as well.
Comment 33•18 years ago
|
||
backout from 1.8 branch was with a=asa ("per the results of our driver discussion" according to asa).
Comment 34•18 years ago
|
||
Aaargh. Mea culpa. My apologies: I should have tested, but mistakenly assumed this was trivial. I won't make that mistake again. Some punycoded test cases: u'www.pyth\u0337on.org' -> www.xn--python-e8d.org u'www.pyth\u0338on.org' -> www.xn--python-l8d.org u'www.pyth\u2044on.org' -> www.xn--python-sq0c.org u'www.pyth\u2215on.org' -> www.xn--python-se3c.org u'www.pyth\u23aeon.org' -> www.xn--python-lq5c.org u'www.pyth\u29f6on.org' -> www.xn--python-6w4d.org u'www.pyth\u29f8on.org' -> www.xn--python-lx4d.org u'www.pyth\u2afbon.org' -> www.xn--python-ef6d.org u'www.pyth\u2afdon.org' -> www.xn--python-sf6d.org u'www.pyth\uff0fon.org' -> www.pyth/on.org UTF-8 versions in an attachment coming soon. I'll let you know when I've tested them.
Comment 35•18 years ago
|
||
Here are some test cases. Although some are caught by the nameprep logic, others are not, and although some are more convincing than others (depending on your choice of font), some of them are quite believable spoofs, even one really devious one where the slash is made up of a Canadian syllabics character and a solidus overlay.
Comment 36•18 years ago
|
||
Also, note that URLs with nbsps in the domain name are displayed as clickable links: they probably shouldn't be, since nameprep will convert them to ASCII spaces, are not valid either in a conventional domain label, or in a Punycoded IDN label (since Punycoding passes them straight through). This should probably also apply to any other URL where the nameprep'd version of the FQDN has an ASCII space in it.
Updated•18 years ago
|
Attachment #195247 -
Attachment mime type: text/html → text/html;charset=UTF-8
Comment 37•18 years ago
|
||
My sincere apologies for not testing this - as the guy who checked it in, the buck stops here. And I forgot the fixed1.8 keyword as well :-( usenet@tonal.clara.co.uk: any chance of configuring your real name in Bugzilla, so we don't have to call you "usenet"? :-) Gerv
Comment 38•18 years ago
|
||
usenet@tonal.clara.co.uk: are you able to work on bug 307438 and then this one again? Gerv
Comment 39•18 years ago
|
||
Alternatively, we could do the new list using the current encoding method. Given that we are locked down for 1.5, this may be a better plan. Anyone got a Unicode-compatible editor? :-) Gerv
Comment 40•18 years ago
|
||
This is an (as yet untested) patch for this bug, just encoding the characters as UTF-8. This was generated by a Python program, as some of these characters cause some text editors (even emacs) to do funny things. Please don't commit this until I've tested it.
Attachment #190137 -
Attachment is obsolete: true
Updated•18 years ago
|
Flags: blocking1.8b5? → blocking1.8b5+
Comment 41•18 years ago
|
||
Note: this patch is not a proper CVS patch, since I can't currently make my build environment work. However, when manually applied to all.js in an installation of 1.5b1, it: * passes the pre-checkin test * blocks the display of all but one of the test cases, which I believe is a separate issue from that addressed in this bug, also related to IDN normalization, and will file as a new bug Can someone with better Mozilla-fu than me confirm this, please, and generate a proper patch prior to reviewing it?
Attachment #190146 -
Attachment is obsolete: true
Attachment #193305 -
Attachment is obsolete: true
Attachment #193309 -
Attachment is obsolete: true
Attachment #196171 -
Attachment is obsolete: true
Comment 42•18 years ago
|
||
*** Bug 308800 has been marked as a duplicate of this bug. ***
Comment 43•18 years ago
|
||
Gerv, can you drive this one? Looks like we need a proper patch and some reviews.
Comment 44•18 years ago
|
||
See also * Bug 309128 * Bug 309133 and bug 301694 as well, or there will some difficulty in adding these characters in a maintainable way. Perhaps a half-way house would be representing trhe codepoints in the blocklist in hex, and having a special-case decoder that only decodes bare hex number separated by spaces just for this purpose?
Comment 45•18 years ago
|
||
Is it that much more complex to just fix it properly by decoding \u and \x sequences? Gerv
Comment 46•18 years ago
|
||
(In reply to comment #45) > Is it that much more complex to just fix it properly by decoding \u and \x > sequences? That's bug 307438. Not going to decode \x, pref files are supposed to be utf8 and \x is just going to be misused.
Depends on: 307438
Comment 47•18 years ago
|
||
Bug 309311 has the canonical list of chars we are going to ban, including all of those from this bug. We are going to use that set of chars as soon as bug 307438, which allows us to insert them into prefs.js in a sensible way, is fixed. Gerv *** This bug has been marked as a duplicate of 309311 ***
Status: REOPENED → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → DUPLICATE
Updated•18 years ago
|
Flags: blocking1.8b5+
You need to log in
before you can comment on or make changes to this bug.
Description
•