User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/20060909 Firefox/188.8.131.52
In case of
Firefox allows users to click the link and displays essentially the following strings in the status and address bar (if it was able to make the request):
On Windows and MacOSX the requests both go to *.example.com, and in case of the latter the request will be malformed as it includes
While Apache rejects such a request, it's not difficult to work around that. I was unable to reproduce the problem on Linux. There are other characters that look like "/" and ":" though only some of them are displayed literally. Opera9 and IE7 consider both resource identifiers malformed and do not attempt to traverse them.
Reading http://www.mozilla.org/projects/security/tld-idn-policy-list.html I should probably add that I used a .de domain for testing, not example.com.
Created attachment 240404 [details]
Screenshot FF 1.5 on Debian
Using only U+2571 in the URL seems to work fine now on Linux.
Sounds like we need to add these two to our blacklist
I'm a bit confused about the fullwidth colon though (xff1a). RFC 3490 section 3.1.1 says we must treat both fullwidth and halfwidth ideographic full stops as label separators, yet both \uFF0E and \uFF61 are in our blacklist. nsIDNService::normalizeFullStops() converts them before the blacklist is applied so I'm not sure why they're needed in the blacklist.
The IDNA spec doesn't mention accepting fullwidth colon as a port delimiter, but it would be somewhat consistent to do so.
\u2571 is a no-brainer to add. \u2573 sorta looks like an 'X', there are various "plus"-looking things.
Does the colon get converted in nameprep? I would have expected colon to be banned by net_isValidHostName since I think we only call that after we've parsed and removed the port part, but we allow colon.
It looks like the whole box-drawing section isn't supposed to be allowed as output (http://www.unicode.org/reports/tr39/#IDN_Security_Profiles). As long as we do allow them, though, \u2571 would be good to have in the blacklist as an interim band-aide.
(In reply to comment #3)
> The IDNA spec doesn't mention accepting fullwidth colon as a port delimiter,
> but it would be somewhat consistent to do so.
Port delimiters are not part of the domain name, they could only be part of the resource identifier; since the URL is parsed for the domain name first, any colon or colon-lookalike character cannot delimit the domain name from the port. I initially included this character precisely to test that Mozilla does not handle it this way.
CCing Neil Harris, who works on our IDN implementation.
That's interesting: as the poster says, this doesn't appear to work on Linux; something peculiar is happening here, since this should work in exactly the same way on all operating systems.
I've got some code lying in an earlier IDN bug which never got merged, which might be useful for stopping this.
I think the best way to handle this in the short term is to add an extra check to isOnlySafeChars() that blacklists all characters that do not belong either to a script system, or to a very limited set of non-script characters. This will also have the effect of enforcing part of the ICANN rules for labels.
I've got some code lying around that might just do the trick.
(In reply to comment #7)
> That's interesting: as the poster says, this doesn't appear to work on Linux;
> something peculiar is happening here, since this should work in exactly the
> same way on all operating systems.
Since ':' cannot occur in a domain name, it is likely that the DNS client code on Linux simply rejects any hostname containing it; on Windows this is not the case (compare, for example, `ping a:b.example.org` on both systems, where example.org needs to have a wildcard record). The first example should work on all systems.
I think it's time to for me push forward the code I wrote to address for bug 316727, which should fix this, as well as many other issues.
I've got a patch already made, but it's untested: I'm currently generating sets of test cases for it, to try on 2.0rc1+patch. More soon.
*** This bug has been marked as a duplicate of 316727 ***
Reopening: the fix for bug 316727 has more issues to be tested, and I have a experimental patch almost ready for this simpler bug now: this will also test the waters for the full fix of bug 316727.
Created attachment 241003 [details]
Test cases for the above
These links test the two test cases given by the submitter.
OK, test case 1 is now caught by my experimental patch, which also blocks a huge number of other characters by adopting a whitelisting-by-Unicode-blocks approach, in addition to the existing very specific blacklist.
However, the behavious in test case 2 is more involved: I think there's a possibility of an interaction between the Unicode normalization of the fullwidth colon and the IPv6 code...
I'll try to take a look at the on-the-wire behaviour across multiple operating systems tomorrow.
Created attachment 241008 [details] [diff] [review]
EXPERIMENTAL patch: work-in-progress for script-block whitelisting
This is the experimental code so far, just for reference. Note: this is completely untested, and subject to rapid change.
Created attachment 241123 [details] [diff] [review]
More polished patch to nsIDNService.cpp; not yet smoketested, but works
This patch defangs both of the examples given in this bug on my Linux build, without specifically needing to reference any particular character.
The first becomes
and the second becomes
ASCII domain names and normal mixed-script IDNs still appear to work OK: my set of broken IDNs with bad character are consistently caught by this, too, and it doesn't crash with any of the tests.
NB this patch has not been fully smoketested yet.
The second testcase is an example of "ASCII-smuggling" through Unicode normalization in the IDN Nameprep processing (see bug 316444). However the comment at the end of 316444 seems to be contradicted by the second test case here: see the issues regarding URL roundtripping at end of this comment.
I'm working on some code in bug 355181 that should shut off the possibility of using the ':' character in IDNs, by discriminating between the allowed character sets for RFC 1035 DNS names and dotted quads, and that for RFC 2732 IPv6 literals.
However, this example also raises an interesting round-tripping issue, which will probably become a new bug: the ASCII-smuggling behaviour of the second example allows an address with a colon in it to appear in the location bar, but does not get looked up, so is OK. However, reparsing the very same text that is displayed in the location bar will truncate the new hostname at the colon, and thus end up looking up a quite different domain name.
This is a problem of
1 relying on Punycoding for obfuscation, when it was not originally intended for that purpose
2 URL display and URL parsing sometimes not being round-trippable
Created attachment 241351 [details]
Extended set of testcases for the above...
Now with addition of duplicate examples, with the addition of the fake ".idntest" TLD, which I flag as IDN-compatible in my local installation
Time is a little tight and this hasn't been tested on the trunk yet so I'm a little worried about it... moving nomination request to next release for now, can always request approval on trunk-landed patches if that happens in time.
Is this patch ready for reviews now?
Neil: are you still working on this?
This bug probably needs to be update to "critical" or "blocker"; given the recent very public reports of experimental exploitation of this spoofing technique, we are almost certain to see it in the wild very soon.
(see page 2 of the report for the use of homographs in the attack)
I'm impressed by the use of a wildcard something.cn certificate: that's clever.
Unfortunately, I haven't got the time or resources to test my patch properly at the moment, but I believe the patch is reasonably OK, if someone else wants to QA it.
I've also got an experimental patch filed for bug 316727 that enforces even more paranoid checking, preventing not only the use of unassigned characters, but also the mixing of scripts except in certain explicity allowed combinations, as per ICANN guidance on IDNs -- in the long run, that patch is probably superior to this one, but in the short run, it's would need more QA, and would be riskier to apply.
We definitely need a lot more discussion before banning script mixing. Let's just make sure the current character blacklist is solid.
See also bug 479336 for a quick-n-dirty blacklist update, and bug 479520 about looking into the new proposed IDNA2008 standards.