Closed Bug 1364283 (CVE-2017-7764) Opened 4 years ago Closed 4 years ago
Security: disallow "Canadian Syllabics" unicode block from IDN domains
161.86 KB, text/plain
2.61 KB, patch
|Details | Diff | Splinter Review|
2.45 KB, patch
|Details | Diff | Splinter Review|
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Safari/537.36 Steps to reproduce: VULNERABILITY DETAILS Firefox should prevent the “Canadian Syllabics” unicode block from rendering in domain names with characters from other unicode blocks. This was observed in data found in the Certificate Transparency log while seeking to quantify the IDN impersonation/phishing problem (raw data attached). REPRODUCTION CASE There are a series of characters in the “CANADIAN SYLLABICS” unicode block which can be used to impersonate other domains. I believe mixing this block with other unicode blocks should be disallowed and the punycode value should be displayed. The characters within this set that I believe could be abused: http://www.fileformat.info/info/unicode/block/unified_canadian_aboriginal_syllabics/list.htm (I do not know the registration status of any of the domains below) http://xn--youtue-084a.com/ -- youtuᖯe.com -- example domain http://xn--youtbe-z72a.com/ -- youtᑌbe.com -- example domain http://xn--uny-8wq.com/ -- ᑭuny.com -- example domain http://xn--oor-hxq.com -- ᑯoor.com -- example domain http://xn--ego-73q.com/ -- ᒪego.com -- example domain http://xn--fc-lym.com/ -- fcᒿ.com -- example domain is not fc2.com (alexa top 1m #97) -- this is likely the hardest to see (based on the fonts I’m using) http://xn--ulu-7sr.com/ -- ᕼulu.com -- example domain http://invalid.xn--acebook-yp9a.com/ -- ᖴacebook.com -- example domain Tested with Firefox Nightly 55.0a1 (2017-05-11) (32-bit) This issue has also been reported to chromium Actual results: unicode domains are displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters. Expected results: Punycode values should have been displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters. ---- background ---- (please excuse the length of this report) To form the attached lists, I cross referenced the Google CT Pilot log and the Alexa top 1 million domains (only .com domains). There are a fair number of false positives (non-abusive domain impersonations or python unidecode failures), but I choose not to manually remove them. ---- Other unicode characters observed ---- ĸ, 22, 0x138, "LATIN SMALL LETTER KRA" 96074858, 1509667199, xn--faceboo-jhb.com, facebooĸ.com , ĸ, facebook.com, 3, 1 86142753, 1507679999, xn--autodes-jhb.com, autodesĸ.com , ĸ, autodesk.com, 697, 1 ł, 5, 0x142, "LATIN SMALL LETTER L WITH STROKE" 94011919, 1524055021, xn--ppe-8ka60c.com, àppłe.com , àł, apple.com, 69, 1 94724468, 1500291180, xn--sack-01a.com, słack.com , ł, slack.com, 205, 1 ı, 100, 0x131, "LATIN SMALL LETTER DOTLESS I" 18331655, 1488327078, xn--reddt-q4a.com, reddıt.com , ı, reddit.com, 7, 1 95900673, 1500493680, xn--t-fka.com, tı.com , ı, ti.com, 3235, 1 84518766, 1497998760, xn--gml-kua34j.com, gmȧıl.com , ȧı, gmail.com, 22463, 1 95900424, 1500493860, xn--fat-jua.com, fıat.com , ı, fiat.com, 54102, 1 94504694, 1509148799, xn--curacao-egamng-hgc.com, curacao-egamıng.com , ı, curacao-egaming.com, 524456, 1 94724500, 1500493920, xn--suzu-kza.com, ısuzu.com , ı, isuzu.com, 866480, 1 ì, 25, 0xec, "LATIN SMALL LETTER I WITH GRAVE" 95900680, 1500670920, xn--twttr-7raz.com, twìttèr.com , ìè, twitter.com, 11, 1 85019386, 1507161599, xn--polonex-3ya.com, polonìex.com , ì, poloniex.com, 1595, 1 83724035, 1497798600, xn--gma-pma40b.com, gmaìĺ.com , ìĺ, gmail.com, 22463, 1 ---- Special case observed --- 2 interesting domains observed bypasses Chromium checks by using only cyrillic characters: 07022746, 1443571199, xn--80aac5cct.com, таобао.com , таобао, taobao.com, 10, 1 10303999, 1461542399, xn--e1anr4f.com, тіме.com , тіме, time.com, 817, 1
Group: firefox-core-security → network-core-security
Component: Untriaged → Networking
Product: Firefox → Core
According to http://unicode.org/repository/draft/trunk/Public/security/10.0.0/IdentifierStatus.txt the Canadian Syllabics block (U+1400 -> U+167F) is not allowed in identifiers, and therefore not in IDNA2008. So why are we accepting them? :jfkthame is the expert here... Gerv
sec-high might be overstating it, but the first youtube.com one is convincing with the font on my mac.
(In reply to Gervase Markham [:gerv] from comment #1) > According to > http://unicode.org/repository/draft/trunk/Public/security/10.0.0/ > IdentifierStatus.txt the Canadian Syllabics block (U+1400 -> U+167F) is not > allowed in identifiers, and therefore not in IDNA2008. So why are we > accepting them? By default, we have network.IDN.restriction_profile set to "moderate". At the Moderately Restrictive level, Canadian Syllabics characters are accepted as part of the identifier profile thanks to their status as an "Aspirational" script, and the Moderately Restrictive profile allows a mixture of "Latin with other Recommended or Aspirational scripts except Cyrillic and Greek". So it looks like our behavior is correct according to the specifications we're implementing; but I agree that some of these examples look problematic. Two possible fixes: (a) Change the default setting for network.IDN.restriction_profile from "moderate" to "high". This will force punycode rendering for mixed Latin+UCAS labels, since this combination is not one of the handful of mixtures that the Highly Restrictive profile allows. (b) Exclude UCAS from the scripts that are allowed to be mixed with Latin at the Moderate level, like Cyrillic and Greek. To do this, we'd need to adjust the lookup tables used by nsIDNService::illegalScriptCombo in netwerk/dns/nsIDNService.cpp. However, given that what we're implementing here is a Unicode standard, we should also suggest a change to UTS#39 rather than simply deviate from the standard. I'd suggest that we do (a) immediately (recognizing that it will potentially impact other mixed-script labels that are currently accepted and displayed as Unicode IDNs, but the number of real-world examples likely to be affected should be small). To consider (b), I'll raise the status of Latin+UCAS at the Moderately Restrictive level for discussion in the Unicode Technical Committee. If the UTC agrees to make a change to Moderate, we could then consider reverting our default level.  http://www.unicode.org/reports/tr39/#Restriction_Level_Detection  http://www.unicode.org/reports/tr31/#Aspirational_Use_Scripts
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Hmm. What's the timeline for a Unicode Technical Committee discussion? Given it's taken so long for someone to notice this, I would be surprised to see an epidemic of UCAS-based spoofing starting tomorrow. If we want to switch to Highly Restrictive, we should try and get some data on how much impact there would be first. AIUI other browsers also picked that level, because www.<latin-bit>-<native-bit>.tld or similar is an acknowledged use case, and because of the commonality of Latin, many scripts/languages in practice borrow words or letters. But this is remembering discussions from a long time ago, so we should get data. Gerv
(In reply to Gervase Markham [:gerv] from comment #5) > Hmm. What's the timeline for a Unicode Technical Committee discussion? Well... I could raise the issue immediately on the UTC's internal mailing list, and see if anyone chimes in with comments (provided we're happy with bringing this up in a somewhat wider forum); depending on the response, that might give an early indication of what's likely to happen (or not). The next actual UTC meeting is at the beginning of August, so we could ask for it to be on the agenda then.
I think we should start with the mailing list. Perhaps the line to take is: "We notice there's this 'except for Cyrillic and Greek' rule. Presumably those were picked because of the extensive overlap in character shapes with Latin. How might other scripts be considered for that list? What about Canadian Syllabics, for example?" That's a bit better than "Hey, everyone, we're worried that Canadian Syllabics can be used for phishing..." Gerv
OK, I've asked a question on the (private) UTC list; will follow up here if anything interesting comes of it.
It turns out the latest proposed update to UTS#31, aimed at Unicode 10.0, drops the "aspirational scripts" category and merges them into "limited use": http://www.unicode.org/reports/tr31/tr31-26.html This will make the UCAS characters no longer eligible to be mixed with Latin in labels, per the "Moderately Restrictive" profile in UTS#39. I'd suggest, therefore, that we go ahead and make such a change in nsIDNService, in anticipation of the upcoming Unicode changes.
That seems like an excellent fix :-) CC Ryan so he can let Chrome know, as they might well want to make a similar change. Gerv
Thanks for the heads up, Gerv. I've created https://bugs.chromium.org/p/chromium/issues/detail?id=725461 for our tracking of this issue.
Actually, original reporter reported https://bugs.chromium.org/p/chromium/issues/detail?id=719199 :)
This is a minimal patch to just change how the Aspirational script characters are handled. Once Unicode 10 is released and we update all our data tables, etc., the IDTYPE_ASPIRATIONAL value will go away altogether, but in the meantime this seems the simplest way to fix the issue, and minimizes any backporting risk.
Attachment #8870470 - Flags: review?(mcmanus)
Attachment #8869368 - Attachment is obsolete: true
Attachment #8870470 - Flags: review?(mcmanus) → review?(valentin.gosu)
Comment on attachment 8870470 [details] [diff] [review] Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update Review of attachment 8870470 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the patch. Looks good!
Attachment #8870470 - Flags: review?(valentin.gosu) → review+
Comment on attachment 8870470 [details] [diff] [review] Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update [Security approval request comment] How easily could an exploit be constructed based on the patch? The issue is potential domain spoofing; the patch arguably draws attention to the "Aspirational" scripts, which might lead someone to notice the possible "lookalike" characters there. Having noticed that, exploiting is as easy as registering a domain. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Not really, it's positioned as an update to match forthcoming Unicode changes. Which older supported branches are affected by this flaw? All If not all supported branches, which bug introduced the flaw? n/a Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? ESR52 will need a slightly modified patch, but it is trivial to create. How likely is this patch to cause regressions; how much testing does it need? Minimal risk, and covered by unit tests.
Attachment #8870470 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa4c7abccb77836bd80ed8e5bfe9b44ed3e0c9c7 Bug 1364283 - Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update. r=valentin
Here's the ESR52 version of the patch.
Attachment #8870599 - Attachment description: Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update → [esr52] Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update
Comment on attachment 8870470 [details] [diff] [review] Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update Fix a sec-high. Beta54+ & ESR52+. Should be in 54 beta 11.
https://hg.mozilla.org/releases/mozilla-esr52/rev/a6caa7628e365ac53d12fef9146ff09094b33e41 Bug 1364283 - Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update. r=valentin a=gchang
Adding firstname.lastname@example.org to cc list (he requested via email@example.com & has access to the corresponding Chromium bug)
As mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=719199 comment #31, the example domains cannot be registered due to Verisign's script mixing policy: "It turned out that Verisign's script mixing policy does not allow Latin and Canadian syllabics. As a result, none of examples in this bug report (involving mixing Latin and Canadian syllabics) can be registered in any TLDs subject to Verisign's policy. https://www.verisign.com/en_US/channel-resources/domain-registry-products/idn/idn-policy/registration-rules/index.xhtml All code points within an IDN must come from the same Unicode script. This is done to prevent confusable code points from appearing in the same IDN. https://www.verisign.com/assets/idn/idn-canadian-aboriginal.html does not list any of [a-z]. " I never attempted to register any of the domains above as I was unaware of this policy. Attempting to register any of these domains results with an error "Parameter value policy error (IDN commingles multiple scripts)" (using the first example above) https://iwantmyname.com/?domain=youtu%E1%96%AFe It appears that this is still register-able with certain ccTLD's, but that does significantly limit the scope of this issue. Thanks, Sam
re comment 3: moderately restrictive vs highly restrictive > All code points within an IDN must come from the same Unicode script. This is done to prevent confusable code points from appearing in the same IDN. A lot of TLDs have a similar policy. Armenia, Israel, Thai, India, Saudi Arabia do not allow mixing of their native scripts with Latin. They only allow a subset of characters in their native scripts and ASCII digits and hyphen. Exceptions are CJK TLDs. ICANN's per-language LRGs also acknowledge that. So, using 'moderately restrictive' profile instead of 'highly restrictive' profile has little practical impact. https://bugs.chromium.org/p/chromium/issues/detail?id=726950 has more information I collected. I can file a new bug to talk about 'moderately vs highly' restrictive profiles.
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main54+][adv-esr52.2+]
I've reproduced this issue using STR from comment 0 on an affected build. This is verified fixed on latest Nightly 55.0a1 (2017-06-07) and 54.0 (20170605204906) on the following OSes: - Windows 10 x64 - Mac OS X 10.11.6 - Ubuntu 16.04 x64 LTS
Hi, I intend to discuss this issue at a DEFCON (wall of sheep) talk next Friday 7/28. As information about this bug is public through CVE-2017-5076 and the Firefox CVE ( https://www.mozilla.org/en-US/security/advisories/mfsa2017-16/#1 ) I do not see any problem with this, but I did want to give you a headsup as this issue is still labeled as restricted.
Thanks for the heads up. I've unhidden the bug.
You need to log in before you can comment on or make changes to this bug.