Closed Bug 1397704 Opened 7 years ago Closed 7 years ago

(new) IDN homograph attack on latest Firefox

Categories

(Firefox :: Address Bar, defect)

55 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1370497

People

(Reporter: root, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
Build ID: 20170824053622

Steps to reproduce:

Latest FF doesn't show any security warning when homograph symbols "ḳ" (1E33), "ṇ"(1E47), "ḅ" (1E05) are used in URL.
Punycode also is not shown. 
This can be used for phishing attacks.

Domain examples:
facebooḳ.com (notice "K with dot" at the end), caṇon.com (notice "n with dot" in the middle)


Actual results:

No punycode is shown when I open those domains in latest FireFox (both mobile and desktop versions)


Expected results:

Puny code should be shown instead of the normal domain name.
Dupe of bug 1370497?
Component: Untriaged → Address Bar
I would say so.

FWIW, these aren't really homographs -- they are visibly different characters, although the difference is minor (to someone whose normal alphabet does not involve a dot-below diacritic).
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
It's not quite the same as bug 1370497 is it? I could imagine doing something there such as not allowing combining marks from one script to apply to a different script, or even special-casing Latin (and some other scripts?) such that combining marks result in automatic punycode since sufficient precomposed characters should exist. That wouldn't fix these characters.

This particular block of Extended Latin looks like it's mostly used in languages distinct from western european ones (Vietnamese, Ikwerre and other African languages, etc). We could override the Unicode consortium's categorization and call it a different script. But then 1) we allow mixing with Latin (and have spoofs because of it) which wouldn't change anything, or 2) we go "highly restrictive" and block these, but then users of these languages can't have domain names because they need unadorned latin characters too.

Neither approach helps.

The "good" news is that in practice phishers mostly use all-plain-latin domain names because the people they're targeting are fooled even by that.
(In reply to Daniel Veditz [:dveditz] from comment #3)
> It's not quite the same as bug 1370497 is it? I could imagine doing
> something there such as not allowing combining marks from one script to
> apply to a different script, 

Yes, there are a few cases where that might help, though many combining marks don't "belong" to just a single script. (In Unicode terms, they generally have Script=Inherited, so that they "adopt" the script identity of the base to which they're applied. In cases like the Arabic vowel marks, I guess we could overrule that fairly safely -- although we'd still need to allow them to be applied to Syriac, for example, so purely categorizing them as "Arabic script" would be too narrow.)

For the general Combining Marks block, though, they are intended to be usable with any script -- including Latin, obviously.

> or even special-casing Latin (and some other
> scripts?) such that combining marks result in automatic punycode since
> sufficient precomposed characters should exist. That wouldn't fix these
> characters.

Yes, I guess there are arguably some distinctions, though the fact that IDN processing involves normalization means that many Latin-with-combining-marks domains are (by definition) the same as their precomposed versions. While we could say that if (after normalization) a Latin-script label has residual combining marks, that's a cue to use punycode, I don't really see much value in that, though; there are plenty of precomposed characters (as here) that wouldn't be subject to such a check.

ISTM that bundling at the registrar level is probably the only reasonable answer to "facebooḳ.com" vs "facebook.com", unless browsers decide to give up on treating non-European languages as (somewhere close to) first-class citizens. Even if we decided that dot-below is rare enough in real-world orthographies that we could blacklist it, there'd be things like "façebook.com" or "microsofț.com" that use letters from major European languages, yet will be overlooked by many monolingual English users.
Group: firefox-core-security
You need to log in before you can comment on or make changes to this bug.