Closed Bug 1370497 (CVE-2017-7833) Opened 7 years ago Closed 7 years ago

URL spoofing using combining marks

Categories

(Firefox :: Address Bar, defect, P1)

53 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 58
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: rayyanh12, Assigned: jfkthame)

References

Details

(Keywords: csectype-spoof, sec-moderate, Whiteboard: [adv-main57+])

Attachments

(2 files)

Attached image PoC.png
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
Build ID: 20170518000419
Firefox for Android

Steps to reproduce:

http://xn--google-yri.com/ 


Actual results:

By adding this *ِ* (notice the weird thing under asterisk) we can actually spoof the URL (espicially the inexperienced users)

More info:
U+0650, ARABIC KASRA


Expected results:

The URL should be shown in PunnyCode or these letters should be banned.
jfkthame: bug 1370504 is also related. 

So presumably this is because we allow any single script plus Latin. This seems like the tip of the iceberg; various scripts are going to have all sorts of combining modifiers which one can match up with a Latin letter which happens to have pixels in the right place to hide it. 

Can we enforce that Arabic modifiers only go on Arabic letters, and so on?

Gerv
Flags: needinfo?(jfkthame)
Component: Untriaged → Location Bar
(In reply to Gervase Markham [:gerv] from comment #1)
> jfkthame: bug 1370504 is also related. 
> 
> So presumably this is because we allow any single script plus Latin. This
> seems like the tip of the iceberg; various scripts are going to have all
> sorts of combining modifiers which one can match up with a Latin letter
> which happens to have pixels in the right place to hide it. 
> 
> Can we enforce that Arabic modifiers only go on Arabic letters, and so on?

Possibly, though it's not quite that simple (the Arabic vowel marks are also used in Syriac, for example).

I'm not sure we even have to look at other scripts to find diacritic-based "spoofs"; there are small, easily-overlooked diacritics in the Combining Marks block at U+03xx that I suspect many users would miss. Consider something like https://www.xn--google-e4d.com/ ... this has a combining dot under the second "g", but it's hardly noticeable.

And then there are delights such as https://www.xn--mozlla-r9a478a.com/. Do domain registrars have policies that disallow things like this? AFAICS the IDN rules would permit it...
Domain registrars should, and often do, have policies which check if a registration is homographic with another registration. But this is different - this is adding a tiny extra combining mark and hoping it doesn't get noticed. So they homographs are not exact in in that sense, and probably won't be detected by the standard algorithm. But visually, the results are still very similar.

So I think we have an issue that this is a form of spoofing which is not entirely addressed by the current algorithms. Now, my understanding is (correct me if I'm wrong) that these small dots and combining marks and so on are all allocated to one or more scripts in the Unicode Character Database. Is that right? If so, why doesn't our script mixing code detect this? Are some of these dots allocated to Latin? Or is it because we allow "Latin plus one other", and so "Latin plus a random combining mark" will always be allowed?

Gerv
Generally, combining marks are considered to "inherit" the script property of the base character to which they're applied, and can in principle be used with any script. There are some -- e.g. from Arabic and Indic scripts -- that have one or more specific scripts with which they're normally associated (listed in ScriptExtensions.txt), but the general Combining Marks block (at U+03xx) is for the most part entirely script-agnostic. Something like "combining dot below" certainly has legitimate usage in Latin script, so there's no expectation that script-mixing prohibitions would be relevant.

The original report here relates to an Arabic vowel mark, which is also used in Syriac, but arguably should not be combined with Latin letters (it's not "wrong" to do so, in strictly Unicode terms, but it's sufficiently implausible that IDN rules could reasonably block it). But the more general issue of small, easily-overlooked diacritics can't be fixed that way, as many of them are truly generic -- or even primarily (though not exclusively) used with Latin script.
Flags: needinfo?(jfkthame)
Hmm.

It seems like there are a few ways we could go here.

Firstly, we could try and improve font rendering technology in some way such that such marks were more obvious when used - e.g. if they overlapped an existing bit of a letter, they didn't just disappear. This would be a Firefox (or graphite2?) fix (and a complex one).

Secondly, we could just disallow most combining marks, perhaps with a whitelist of those which were script-specific. People making domain names would need to use precomposed characters, or just not use the mark (just as e.g. English names can't have an apostrophe in LDH domain names). This has the risk of upsetting one or more language communities, particularly as ISTR that Unicode has gone backwards and forwards on whether scripts require precomposed versions of common letter+mark combos, and so some scripts may have them and others may not. 

Thirdly, we could try and develop a set of rules about where combining marks were allowed. This would probably involve the Unicode Consortium, either by getting them to better classify the allowed scripts for the generic combining marks (perhaps in an advisory rather than a normative way), or even by enumerating the allowed letters for each mark. These rules could either be implemented in browsers (as they don't require knowledge of all existing domain names, as homograph algorithms do) or by registries. The risk here is ending up with a massive data table that you need to ship to solve a very niche problem.

Do we need to consult the usual suspects at the Unicode Consortium to ask them for advice at this point?

Gerv
(In reply to Gervase Markham [:gerv] from comment #6)
> Firstly, we could try and improve font rendering technology in some way such
> that such marks were more obvious when used - e.g. if they overlapped an
> existing bit of a letter, they didn't just disappear. This would be a
> Firefox (or graphite2?) fix (and a complex one).

In the case of KASRA I don't see it at all in the URL bar on Mac: it seems to be below the visible area. If I click on http://xn--google-yri.com/ I see it in the body text of the Server Not Found page (almost looks like a spec of dust on my screen), but not where it matters in the address bar. The two examples in comment 2 are visible. Easy to miss for sure, but at least the differences are there for the eagle-eyed.

> Do we need to consult the usual suspects at the Unicode Consortium to ask
> them for advice at this point?

Probably for the best.

How does the KASRA character look on Windows? Since that's where most people are that's probably the one the determines if we call this sec-moderate or sec-high.
Keywords: csectype-spoof
Summary: URL Spoofing → URL spoofing using combining marks
Blocks: 1392190
See https://bugs.chromium.org/p/chromium/issues/detail?id=729979 for some mitigation the Chromium folk have implemented, although AFAICT it's not a complete solution to the general issue.
Yeah, seems to me that those mitigations are pretty specific. I'm not particularly interested in playing whack-a-mole such that what Firefox allows changes every release for the next 10 releases.

Gerv
Here's a patch that provides some mitigation, based on the Unicode ScriptExtensions property that identifies some combining marks as being used with particular script(s) even though they have Script=INHERITED as their primary property. This blocks things like the Arabic-vowel-on-Latin-letter example here.
Attachment #8910325 - Flags: review?(mak77)
Attachment #8910325 - Flags: feedback?(gerv)
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

This seems a pretty sensible and general mitigation. We may also have to switch to strict as well, but this seems orthogonal to that.

Gerv
Attachment #8910325 - Flags: feedback?(gerv) → feedback+
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

Review of attachment 8910325 [details] [diff] [review]:
-----------------------------------------------------------------

This is in netwerk and I'm not a peer for that module. I'm forwarding to jduell.
The code looks god afaict.
Attachment #8910325 - Flags: review?(mak77) → review?(jduell.mcbugs)
Just FTR, since bug 1402048 has just landed, I'll remove the #ifdef IDNA2008 from this patch as this is now unconditionally the way we build.
Attachment #8910325 - Flags: review?(jduell.mcbugs) → review?(valentin.gosu)
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

Review of attachment 8910325 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8910325 - Flags: review?(valentin.gosu) → review+
Priority: -- → P1
https://hg.mozilla.org/mozilla-central/rev/6bd2d96c0c3d
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
Group: firefox-core-security → core-security-release
Flags: sec-bounty?
Please request Beta approval on this when you get a chance. And maybe ESR52 as well?
Flags: needinfo?(jfkthame)
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

Approval Request Comment
[Feature/Bug causing the regression]: not a regression
[User impact if declined]: potential for domain spoofing using diacritics that are invisible or near-invisible in the URL bar
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not really
[Why is the change risky/not risky?]: adds a new check in nsIDNService::isLabelSafe to detect unexpected diacritic usage; in case of bugs I guess it could cause legitimate domain labels to display as punycode, but otherwise shouldn't affect functionality
[String changes made/needed]: none
Flags: needinfo?(jfkthame)
Attachment #8910325 - Flags: approval-mozilla-beta?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> Please request Beta approval on this when you get a chance. And maybe ESR52
> as well?

This doesn't seem to me to really meet the ESR criteria, and I see we've marked other examples such as bug 1399540 as wontfix there. (Feel free to disagree if you think there's an ESR case to make.)
Hi Dan, given that this is a sec-mod and too many uplifts in beta57 already, I am leaning towards wontfix'ing this for 57. I'd like to get your opinion before making a final decision here. Thanks!
Flags: needinfo?(dveditz)
We keep getting dinged for this so it'd be nice to fix sooner, and the patch isn't scary. But in practice real phishing sites are making enough money from non-spoofing domain names. I guess we could live without it in 57 :-(
Flags: needinfo?(dveditz)
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

Thanks Dan. Sorry! :( 57 is getting an unusually large # of uplift requests. I'd like to keep our risks to quality and shipping on time to a minimal. I'd prefer to let this one ride the 58 train.
Attachment #8910325 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
An argument in favor is that we're already fixing bug 1399939 in Firefox 57 and it'd be nice to change the IDN behavior all at once. But that's not the strongest argument
Flags: needinfo?(rkothari)
Comment on attachment 8910325 [details] [diff] [review]
Check ScriptExtensions property of combining marks when available

Ok. Since this is still early beta, I'll take it. After a week, this uplift would be rejected.
Flags: needinfo?(rkothari)
Attachment #8910325 - Flags: approval-mozilla-beta- → approval-mozilla-beta+
(In reply to Jonathan Kew (:jfkthame) from comment #24)
> [Is this code covered by automated tests?]: yes
> [Has the fix been verified in Nightly?]: yes
> [Needs manual test from QE? If yes, steps to reproduce]: no

Setting qe-verify- based on Jonathan's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Flags: sec-bounty? → sec-bounty+
Whiteboard: [adv-main57+]
Alias: CVE-2017-7833
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: