Open Bug 1405845 Opened 8 years ago Updated 3 years ago

URL Spoofing with U+0307 after i

Categories

(Firefox :: Address Bar, enhancement, P3)

x86_64
All
enhancement

Tracking

()

People

(Reporter: chromium.khalil, Unassigned)

References

Details

https://www.i̇nstagram.com U+0307 (dot above) after i, j, l, or U+0131 (dotless i) would be very hard to see if possible at all.So, it has to be blocked separately.
This isn't fixed by the highly restrictive profile, nor will it be fixed by considering the script of combining marks since this block seems to be pretty generic. Can we make a special case and use punycode for _any_ combining marks with Latin script? There are so many Latin pre-composed characters there should be more than enough for domain names in living languages. Interesting that this domain appears to be registered for real, by some kind of scammer. The https version doesn't exist but http://www.i̇nstagram.com is live. My first visit I got redirected to a fake Adobe flash download page, and on subsequent visits got redirected to amazon.com (in both cases after a bunch of intermediate requests). I assume I got some sort of affiliate code for amazon, but now I've tried it too much and get stuck on an intermediate "Click Validator" page which says it can't tell if I'm fraud or have just clicked on too many ads :-)
Flags: needinfo?(jfkthame)
This is essentially the same example as mentioned in bug 1370497 comment 2 (the case suggested there was xn--mozlla-r9a478a, which is "mozı̇lla"). "i̇nstagram.com" (with combining dot above "i") should, in theory, be visibly distinct: it should end up with a second dot stacked above the original one. Likewise for "j". However, whether this actually works or the second dot just overprints the first will depend on the font, so we can't really rely on it. (In brief testing, it works for me with -apple-system on macOS -- the added dot is clearly visible -- but with Segoe UI from Windows, the extra dot disappears and so the spoof succeeds.) And in any case, even with perfect font support, the case of dotless-i + combining dot remains problematic. It's not just U+0307, either. Consider U+0308 (combining dieresis), which could be applied to U+0131 dotless-i to produce a perfect spoof of ï (U+00EF), because the canonical decomposition of U+00EF is to <i, dieresis> and not <dotless-i, dieresis>. The same goes for other accents that can be applied to dotless-i to spoof accented forms of (dotted) i. Yet we can't just ban dotless-i either, as it is essential for languages such as Turkish. If we're not prepared to just punt this kind of thing to the registrars, who (IMO) really should have policies that enforce bundling of names like this that are valid per IDN rules, but clearly "lookalikes" of another name, then perhaps disallowing combining marks in Latin script is the most reasonable mitigation. I imagine there'll be some (obscure but) legitimate use-case somewhere that will be impacted, but that'll be the price everyone has to pay for living in a messy world. The question then arises whether we should consider doing the same for Cyrillic and Greek, which also have a few letters where a combining dot above might plausibly be "hidden", depending on the quality of font support. Examples like "ϊ̇" (Greek) and "ї̇" (Cyrillic) appear safe for me with -apple-system, as the added dot (correctly) appears stacked above the inherent dieresis; but with Segoe UI, the added dot overprints the right-hand dot of the dieresis and becomes virtually unnoticeable. And like with Latin, it could be argued that all the accented letters likely to be needed for "genuine" domain names are encoded in their own right, so any combining marks that remain after NFC normalization are most likely undesirable. Gerv, WDYT? Is disallowing combining marks for the LGC scripts a reasonable compromise between functionality and safety here?
Flags: needinfo?(jfkthame) → needinfo?(gerv)
(In reply to Jonathan Kew (:jfkthame) from comment #2) > Gerv, WDYT? Is disallowing combining marks for the LGC scripts a reasonable > compromise between functionality and safety here? I think we should do what I said in another bug, which is a) measure impact, and b) consult with other browsers and language experts, and try and form a coordinated approach. Perhaps we need to consolidate all the remaining bugs (after the highly restrictive stuff and the combining marks script properties stuff) into a list, and then take a holistic approach to brainstorm restrictions which might help with entire classes, and then start a discussion. At this point, I don't really see this as being something which needs discussing in private any more. And we do need to remember Gijs's very good point, which is that whatever domains are used for phishing, it often doesn't matter anyway, and Safe Browsing tends to pick them up pretty quickly. So fixing "everything" should be an explicit non-goal. Gerv
Flags: needinfo?(gerv)
Group: firefox-core-security
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3
(In reply to Khalil Zhani from comment #4) > See also https://bugs.chromium.org/p/chromium/issues/detail?id=750239 That bug is not public :-( Gerv
(In reply to Gervase Markham [:gerv] from comment #5) > (In reply to Khalil Zhani from comment #4) > > See also https://bugs.chromium.org/p/chromium/issues/detail?id=750239 > > That bug is not public :-( > > Gerv Hmm.. here is the CL link https://chromium-review.googlesource.com/c/chromium/src/+/709919 I don't know why this bug is public? isn't a security bug?
Severity: normal → S3
See Also: → 1473911
You need to log in before you can comment on or make changes to this bug.