Bug 1364283 (CVE-2017-7764)

Security: disallow "Canadian Syllabics" unicode block from IDN domains

VERIFIED FIXED in Firefox -esr52



2 years ago
Last month


(Reporter: samrerb, Assigned: jfkthame)


({csectype-spoof, sec-moderate})

55 Branch
Dependency tree / graph
Bug Flags:
sec-bounty +
in-testsuite +

Firefox Tracking Flags

(firefox-esr45 wontfix, firefox-esr5254+ verified, firefox53 wontfix, firefox54+ verified, firefox55+ verified)


(Whiteboard: [post-critsmash-triage][adv-main54+][adv-esr52.2+])


(3 attachments, 1 obsolete attachment)



2 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.96 Safari/537.36

Steps to reproduce:

Firefox should prevent the “Canadian Syllabics” unicode block from rendering in domain names with characters from other unicode blocks. This was observed in data found in the Certificate Transparency log while seeking to quantify the IDN impersonation/phishing problem (raw data attached).

There are a series of characters in the  “CANADIAN SYLLABICS” unicode block which can be used to impersonate other domains. I believe mixing this block with other unicode blocks should be disallowed and the punycode value should be displayed. The characters within this set that I believe could be abused:
(I do not know the registration status of any of the domains below)
http://xn--youtue-084a.com/ -- youtuᖯe.com -- example domain 
http://xn--youtbe-z72a.com/ -- youtᑌbe.com -- example domain
http://xn--uny-8wq.com/ -- ᑭuny.com -- example domain
http://xn--oor-hxq.com -- ᑯoor.com -- example domain
http://xn--ego-73q.com/ -- ᒪego.com -- example domain
http://xn--fc-lym.com/ -- fcᒿ.com -- example domain is not fc2.com (alexa top 1m #97) -- this is likely the hardest to see (based on the fonts I’m using)
http://xn--ulu-7sr.com/ -- ᕼulu.com -- example domain
http://invalid.xn--acebook-yp9a.com/ -- ᖴacebook.com -- example domain

Tested with Firefox Nightly 55.0a1 (2017-05-11) (32-bit)
This issue has also been reported to chromium

Actual results:

unicode domains are displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters.

Expected results:

Punycode values should have been displayed when mixing the “CANADIAN SYLLABICS” unicode block with other unicode characters.

  ---- background ----
(please excuse the length of this report)
To form the attached lists, I cross referenced the Google CT Pilot log and the Alexa top 1 million domains (only .com domains). 
There are a fair number of false positives (non-abusive domain impersonations or python unidecode failures), but I choose not to manually remove them.

  ---- Other unicode characters observed ----

ĸ, 22, 0x138, "LATIN SMALL LETTER KRA"
96074858, 1509667199, xn--faceboo-jhb.com, facebooĸ.com , ĸ, facebook.com, 3, 1
86142753, 1507679999, xn--autodes-jhb.com, autodesĸ.com , ĸ, autodesk.com, 697, 1

94011919, 1524055021, xn--ppe-8ka60c.com, àppłe.com , àł, apple.com, 69, 1
94724468, 1500291180, xn--sack-01a.com, słack.com , ł, slack.com, 205, 1

18331655, 1488327078, xn--reddt-q4a.com, reddıt.com , ı, reddit.com, 7, 1
95900673, 1500493680, xn--t-fka.com, tı.com , ı, ti.com, 3235, 1
84518766, 1497998760, xn--gml-kua34j.com, gmȧıl.com , ȧı, gmail.com, 22463, 1
95900424, 1500493860, xn--fat-jua.com, fıat.com , ı, fiat.com, 54102, 1
94504694, 1509148799, xn--curacao-egamng-hgc.com, curacao-egamıng.com , ı, curacao-egaming.com, 524456, 1
94724500, 1500493920, xn--suzu-kza.com, ısuzu.com , ı, isuzu.com, 866480, 1

95900680, 1500670920, xn--twttr-7raz.com, twìttèr.com , ìè, twitter.com, 11, 1
85019386, 1507161599, xn--polonex-3ya.com, polonìex.com , ì, poloniex.com, 1595, 1
83724035, 1497798600, xn--gma-pma40b.com, gmaìĺ.com , ìĺ, gmail.com, 22463, 1

 ---- Special case observed ---

2 interesting domains observed bypasses Chromium checks by using only cyrillic characters:
07022746, 1443571199, xn--80aac5cct.com, таобао.com , таобао, taobao.com, 10, 1
10303999, 1461542399, xn--e1anr4f.com, тіме.com , тіме, time.com, 817, 1


2 years ago
Group: firefox-core-security → network-core-security
Component: Untriaged → Networking
Product: Firefox → Core
According to http://unicode.org/repository/draft/trunk/Public/security/10.0.0/IdentifierStatus.txt the Canadian Syllabics block (U+1400 -> U+167F) is not allowed in identifiers, and therefore not in IDNA2008. So why are we accepting them?

:jfkthame is the expert here...

Flags: needinfo?(jfkthame)
sec-high might be overstating it, but the first youtube.com one is convincing with the font on my mac.

Comment 3

2 years ago
(In reply to Gervase Markham [:gerv] from comment #1)
> According to
> http://unicode.org/repository/draft/trunk/Public/security/10.0.0/
> IdentifierStatus.txt the Canadian Syllabics block (U+1400 -> U+167F) is not
> allowed in identifiers, and therefore not in IDNA2008. So why are we
> accepting them?

By default, we have network.IDN.restriction_profile set to "moderate". At the Moderately Restrictive level[1], Canadian Syllabics characters are accepted as part of the identifier profile thanks to their status as an "Aspirational" script[2], and the Moderately Restrictive profile allows a mixture of "Latin with other Recommended or Aspirational scripts except Cyrillic and Greek"[1].

So it looks like our behavior is correct according to the specifications we're implementing; but I agree that some of these examples look problematic.

Two possible fixes:

(a) Change the default setting for network.IDN.restriction_profile from "moderate" to "high". This will force punycode rendering for mixed Latin+UCAS labels, since this combination is not one of the handful of mixtures that the Highly Restrictive profile allows.

(b) Exclude UCAS from the scripts that are allowed to be mixed with Latin at the Moderate level, like Cyrillic and Greek. To do this, we'd need to adjust the lookup tables used by nsIDNService::illegalScriptCombo in netwerk/dns/nsIDNService.cpp. However, given that what we're implementing here is a Unicode standard, we should also suggest a change to UTS#39 rather than simply deviate from the standard.

I'd suggest that we do (a) immediately (recognizing that it will potentially impact other mixed-script labels that are currently accepted and displayed as Unicode IDNs, but the number of real-world examples likely to be affected should be small). To consider (b), I'll raise the status of Latin+UCAS at the Moderately Restrictive level for discussion in the Unicode Technical Committee. If the UTC agrees to make a change to Moderate, we could then consider reverting our default level.

[1] http://www.unicode.org/reports/tr39/#Restriction_Level_Detection
[2] http://www.unicode.org/reports/tr31/#Aspirational_Use_Scripts
Flags: needinfo?(jfkthame)


2 years ago
Assignee: nobody → jfkthame
Ever confirmed: true
Hmm. What's the timeline for a Unicode Technical Committee discussion? Given it's taken so long for someone to notice this, I would be surprised to see an epidemic of UCAS-based spoofing starting tomorrow. 

If we want to switch to Highly Restrictive, we should try and get some data on how much impact there would be first. AIUI other browsers also picked that level, because www.<latin-bit>-<native-bit>.tld or similar is an acknowledged use case, and because of the commonality of Latin, many scripts/languages in practice borrow words or letters. But this is remembering discussions from a long time ago, so we should get data.


Comment 6

2 years ago
(In reply to Gervase Markham [:gerv] from comment #5)
> Hmm. What's the timeline for a Unicode Technical Committee discussion?

Well... I could raise the issue immediately on the UTC's internal mailing list, and see if anyone chimes in with comments (provided we're happy with bringing this up in a somewhat wider forum); depending on the response, that might give an early indication of what's likely to happen (or not).

The next actual UTC meeting is at the beginning of August, so we could ask for it to be on the agenda then.
I think we should start with the mailing list. Perhaps the line to take is: "We notice there's this 'except for Cyrillic and Greek' rule. Presumably those were picked because of the extensive overlap in character shapes with Latin. How might other scripts be considered for that list? What about Canadian Syllabics, for example?"

That's a bit better than "Hey, everyone, we're worried that Canadian Syllabics can be used for phishing..."


Comment 8

2 years ago
OK, I've asked a question on the (private) UTC list; will follow up here if anything interesting comes of it.

Comment 9

2 years ago
It turns out the latest proposed update to UTS#31, aimed at Unicode 10.0, drops the "aspirational scripts" category and merges them into "limited use":

This will make the UCAS characters no longer eligible to be mixed with Latin in labels, per the "Moderately Restrictive" profile in UTS#39.

I'd suggest, therefore, that we go ahead and make such a change in nsIDNService, in anticipation of the upcoming Unicode changes.
That seems like an excellent fix :-) CC Ryan so he can let Chrome know, as they might well want to make a similar change.


Comment 11

2 years ago
Thanks for the heads up, Gerv. I've created https://bugs.chromium.org/p/chromium/issues/detail?id=725461 for our tracking of this issue.

Comment 12

2 years ago
Actually, original reporter reported https://bugs.chromium.org/p/chromium/issues/detail?id=719199 :)

Comment 13

2 years ago
This is a minimal patch to just change how the Aspirational script characters are handled. Once Unicode 10 is released and we update all our data tables, etc., the IDTYPE_ASPIRATIONAL value will go away altogether, but in the meantime this seems the simplest way to fix the issue, and minimizes any backporting risk.
Attachment #8870470 - Flags: review?(mcmanus)


2 years ago
Attachment #8869368 - Attachment is obsolete: true
Attachment #8870470 - Flags: review?(mcmanus) → review?(valentin.gosu)
Comment on attachment 8870470 [details] [diff] [review]
Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update

Review of attachment 8870470 [details] [diff] [review]:

Thanks for the patch. Looks good!
Attachment #8870470 - Flags: review?(valentin.gosu) → review+

Comment 15

2 years ago
Comment on attachment 8870470 [details] [diff] [review]
Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
The issue is potential domain spoofing; the patch arguably draws attention to the "Aspirational" scripts, which might lead someone to notice the possible "lookalike" characters there. Having noticed that, exploiting is as easy as registering a domain.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Not really, it's positioned as an update to match forthcoming Unicode changes.

Which older supported branches are affected by this flaw?

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
ESR52 will need a slightly modified patch, but it is trivial to create.

How likely is this patch to cause regressions; how much testing does it need?
Minimal risk, and covered by unit tests.
Attachment #8870470 - Flags: sec-approval?
Attachment #8870470 - Flags: approval-mozilla-esr52?
Attachment #8870470 - Flags: approval-mozilla-beta?
Attachment #8870470 - Flags: sec-approval? → sec-approval+

Comment 17

2 years ago
Bug 1364283 - Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update. r=valentin


2 years ago
Attachment #8870599 - Attachment description: Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update → [esr52] Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update
Flags: sec-bounty?
Closed: 2 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8870470 [details] [diff] [review]
Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update

Fix a sec-high. Beta54+ & ESR52+. Should be in 54 beta 11.
Attachment #8870470 - Flags: approval-mozilla-esr52?
Attachment #8870470 - Flags: approval-mozilla-esr52+
Attachment #8870470 - Flags: approval-mozilla-beta?
Attachment #8870470 - Flags: approval-mozilla-beta+
Blocks: 1367120
Blocks: 1367442

Comment 21

2 years ago
Bug 1364283 - Treat Aspirational scripts the same as Restricted, in anticipation of UAX#31 update. r=valentin a=gchang
Group: network-core-security → core-security-release

Comment 23

2 years ago
Adding jshin1987@gmail.com to cc list (he requested via jungshik@google.com & has access to the corresponding Chromium bug)

Comment 24

2 years ago
As mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=719199 comment #31, the example domains cannot be registered due to Verisign's script mixing policy:

"It turned out that Verisign's script mixing policy does not allow Latin and Canadian syllabics. As a result, none of examples in this bug report (involving mixing Latin and Canadian syllabics) can be registered in any TLDs subject to Verisign's policy.  


All code points within an IDN must come from the same Unicode script. This is done to prevent confusable code points from appearing in the same IDN.

https://www.verisign.com/assets/idn/idn-canadian-aboriginal.html does not list any of [a-z]. "

I never attempted to register any of the domains above as I was unaware of this policy. Attempting to register any of these domains results with an error "Parameter value policy error (IDN commingles multiple scripts)"

(using the first example above)
It appears that this is still register-able with certain ccTLD's, but that does significantly limit the scope of this issue.

Comment 25

2 years ago
re comment 3: moderately restrictive vs highly restrictive 

> All code points within an IDN must come from the same Unicode script. This is done to prevent confusable code points from appearing in the same IDN.

A lot of TLDs have a similar policy. Armenia, Israel, Thai, India, Saudi Arabia do not allow mixing of their native scripts with Latin. They only allow a subset of characters in their native scripts and ASCII digits and hyphen. Exceptions are CJK TLDs. ICANN's per-language LRGs also acknowledge that. 

So, using 'moderately restrictive' profile instead of 'highly restrictive' profile  has little practical impact.
https://bugs.chromium.org/p/chromium/issues/detail?id=726950 has more information I collected. 

I can file a new bug to talk about 'moderately vs highly' restrictive profiles.
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
Flags: sec-bounty? → sec-bounty+
Keywords: sec-highsec-moderate
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main54+][adv-esr52.2+]
Alias: CVE-2017-7764
I've reproduced this issue using STR from comment 0 on an affected build. 

This is verified fixed on latest Nightly 55.0a1 (2017-06-07) and 54.0 (20170605204906) on the following OSes:
- Windows 10 x64
- Mac OS X 10.11.6
- Ubuntu 16.04 x64 LTS
Flags: qe-verify+
This is also verified on esr 52.2.0 (20170607123825).

Comment 28

2 years ago
Hi, I intend to discuss this issue at a DEFCON (wall of sheep) talk next Friday 7/28. As information about this bug is public through CVE-2017-5076 and the Firefox CVE ( https://www.mozilla.org/en-US/security/advisories/mfsa2017-16/#﷒1﷓ ) I do not see any problem with this, but I did want to give you a headsup as this issue is still labeled as restricted.
Thanks for the heads up. I've unhidden the bug.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.