User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0 Build ID: 20161209094039 Steps to reproduce: See https://www.xn--80ak6aa92e.com/ - if you compare with https://www.apple.com/, URL looks identical on Windows and Linux. On OS X, the font is slightly different, and it is potentially possible to distinguish between the two. The domain xn--80ak6aa92e.com is currently proxying requests to apple.com to demonstrate how difficult it is to distinguish the malicious domain. I will take it offline in the near future. This issue also exists in Chrome and has been reported to them as well. On Safari, the URL does not appear as "apple.com", likely because Safari interprets some of the characters as belonging to a different language.
Valentin, any idea why the IDN URL stuff we do doesn't detect this as a homograph attack? Reporter, can you link to the chrome issue (which most of us likely can't see because it'll be confidential) so we can coordinate disclosure etc.? (CC'ing :dveditz and :abillings for this, and for moving this somewhere with the right sec group)
(In reply to :Gijs from comment #1) > Valentin, any idea why the IDN URL stuff we do doesn't detect this as a > homograph attack? To be honest I don't know the IDN code well enough to be sure. It seems in the past our approach has been to black list the characters. Patch incoming.
Created attachment 8829419 [details] [diff] [review] Update blacklist pref MozReview-Commit-ID: re2Gs83qLT
I'm not happy about just blacklisting the "ӏ". Couldn't it be used in a legitimate Cyrillic domain? https://www.аррӏе.com/ is an example of a whole-script homograph, which our IDN display code is not designed to protect against -- for example https://www.асе.com/ spoofs https://www.ace.com/
(In reply to Simon Montagu :smontagu from comment #4) > https://www.аррӏе.com/ is an example of a whole-script homograph, which our > IDN display code is not designed to protect against -- for example > https://www.асе.com/ spoofs https://www.ace.com/ Indeed. Our IDN threat model specifically excludes whole-script homographs, because they can't be detected programmatically and our "TLD whitelist" approach didn't scale in the face of a large number of new TLDs. If you are buying a domain in a registry which does not have proper anti-spoofing protections (like .com), it is sadly the responsibility of domain owners to check for whole-script homographs and register them. We can't go blacklisting standard Cyrillic letters. If you think there is a problem here, complain to the .com registry who let you register https://www.xn--80ak6aa92e.com/ . Gerv
https://bugs.chromium.org/p/chromium/issues/detail?id=683314 As mentioned by :Gijs, bug is currently confidential
Comment on attachment 8829419 [details] [diff] [review] Update blacklist pref Review of attachment 8829419 [details] [diff] [review]: ----------------------------------------------------------------- r-, since Gerv and I agree that this approach is wrong.
FYI, Chromium has this CL: https://codereview.chromium.org/2683793010 . It affects about 2,800 more domains out of ~ 1 million IDN domains in .com TLD.
(In reply to Jungshik Shin from comment #8) > FYI, Chromium has this CL: https://codereview.chromium.org/2683793010 . It > affects about 2,800 more domains out of ~ 1 million IDN domains in .com TLD. Gerv/Valentin, is this something we can/should align with Chromium on?
FWIW, I think this is something that the security team should decide.
we can wontfix this later, once a final decision about comment 9 has been taken.
(In reply to :Gijs from comment #9) > Gerv/Valentin, is this something we can/should align with Chromium on? I would say no, and here's why. Some (many?) responsible registries implement bundling/blocking of homographic domains. In those registries, the owner of www.<something>.tld will always be the same as the owner of www.<cyrillic-lookalike>.tld. People may well have bought these domains in good faith and be using them for their businesses. Why should they be penalised because other registries are not doing the sensible thing? It's a known and accepted issue that our current system does not suppress whole-script homographs in TLDs where the registry refuses to implement proper anti-spoofing controls. That was considered an acceptable tradeoff in order to hit the following goals: * Not have first and second-class scripts in IDN, but have everything that is supported at all be first class * Make it so that if an IDN works in one Firefox, it works in them all (certainty for site operators) If we start putting restrictions on scripts which happen to look like Latin, such as Cyrillic, we are making that script a second-class citizen because not as much can be represented using it. If the Internet had started as a Russian invention, and Latin was late to the party, I think we'd be pretty annoyed at being treated that way. So I don't think we should treat Cyrillic that way. Gerv
On the other side, it is also matter of balancing breaking some rare domains, that are already broken in the browser with the largest marketshare, vs providing added phishing protection to most users. Fwiw, could we also evaluate a UI fix where the domain keeps working but for these specific chars we don't actually decode them? btw, ni? dveditz to evaluate the global discussion from a security point of view.
I think maybe we should do something about it. There's almost no way of differentiating between the ascii and IDN domains in comment 0, and unless the registrars do something about it immediately, users will be at risk.
It's not clear to me why we trust the registrars here, since they've clearly not shown competency thus far and indeed other browsers are not trusting them for those reasons. Trusting registrars also does not help us with any kind of subdomain situation.
"Trusting the registrars" would better be put as "making it clear the registries are responsible for their own actions". We went through a period where we tried to solve this problem completely, by having a TLD whitelist of sensible TLDs. Then, the number of TLDs exploded and that didn't scale any more. We were faced with a choice of breaking some of the conditions I outlined above, which would be bad for users of non-Latin languages and domain owners in general, or resolving to solve the problem as far as we could and, for the remaining edge cases, make it clear if it ever came up that it's the registry which is causing the problem by their own practices, and telling anyone upset where the real blame lies. We went for the second option. I freely admit that our current mechanism doesn't solve whole-script spoofing. This is not a surprise - it was a known and accepted fact when we implemented it. See https://wiki.mozilla.org/IDN_Display_Algorithm#Downsides . Gerv
(In reply to Marco Bonardo [::mak] from comment #13) > Fwiw, could we also evaluate a UI fix where the domain keeps working but for > these specific chars we don't actually decode them? What do you mean "keeps working"? No one is suggesting refusing to connect to a web site, we're talking about whether we display the IDN form as our current algorithms say (the example has no mixed scripts) or whether we display the uglified punycode form. Chrome's fix is to collect all the Cyrillic letters in a label and then see if they are all in the set of 22 confusables. If they are and the TLD is ascii then they show punycode. If they find a Cyrillic letter outside that set then they let the normal IDN algorithm make the decision about allowed script mixing. аррӏе.com would be punycode аррӏе.ru would be punycode аррӏе.рф would be IDN Advantages: Protects users from registries not doing their job; protects against sub-domain label spoofing where the registry has no say in any case. Disadvantages: will uglify at least 2800 .com domains. Do we know how many are legit vs spoofing demonstrations like аррӏе.com? More concerning are the unknown number in other ascii TLDs like .ru, .ua, etc. Given 22 letters to play with I would imagine a large number of legit Russian words fit in that set. It looks like some of those registries may only allow ascii domains on the ascii TLD and restrict the use of cyrillic to their cyrillic TLD (don't hold me to it--was skimming). On the the other hand the .eu registry definitely accepts cyrillic (Bulgaria is a member) so that could be a problem. (In reply to Gervase Markham [:gerv] from comment #12) > It's a known and accepted issue that our current system does not suppress > whole-script homographs in TLDs Should we unhide this bug, then? > That was considered an acceptable tradeoff in order to hit the following goals: > > * Not have first and second-class scripts in IDN, but have everything that > is supported at all be first class > * Make it so that if an IDN works in one Firefox, it works in them all > (certainty for site operators) Adopting Chrome's fix nails the second but fails on the first. We'd have to take the extra step of disallowing ascii confusables if the TLD is cyrillic, but that's only legalistically fair and less so if you take the history of .com dominance into account.
(In reply to Daniel Veditz [:dveditz] from comment #17) > Chrome's fix is to collect all the Cyrillic letters in a label and then see > if they are all in the set of 22 confusables. If they are and the TLD is > ascii then they show punycode. I wonder if we could always show the idn form as we currently do, but also add the punycode somewhere in the UI, only for these cases, where all the chars are in that 22 set. For example before the url in the urlbar, like [(www.xn--80ak6aa92e.com/) https://www.apple.com/...] Or we could dash-underline the domain and on mouseover it may show the punycode form in a tooltip. Or we could have a small icon appearing before the url to toggle between the 2 modes (idn or punycode), just for these 2800 cases. There are various visual ways we could handle this, I guess, just to let the user notice the domain could be mis-interpreted.
We can certainly open this bug if you like. Complicating the UI for a very edge case doesn't seem like the right fix, both in UI complexity and in time/effort trade-off terms. I have not yet seen any instance where anyone has lost anything of value due to IDN whole-script homograph spoofing. I suspect I will never see such a case. There are a pretty limited number of such domains (for Cyrillic, at most 1 and normally 0 per important domain you want to spoof, e.g. https://www.xn--80ak6aa92e.com/ is the only one for apple.com). Phishers don't need this sort of trick - they make enough money without it. I really think that WONTFIX is the right resolution for this bug. Gerv
I definitely agree that special purpose UI goes too far and probably would not be understood by the majority of users. Perhaps blaming registrars and making use of Safe Browsing to protect users is the way to go. It definitely seems problematic to harm legitimate use in the process.
(In reply to Anne (:annevk) from comment #20) > Perhaps [...] and making use of Safe Browsing to protect users is the way to go. If Chrome ships the fix Jungshik mentioned in comment 8 they won't need to put these kind of spoofs in SafeBrowsing. Would they add it for us anyway?
Presumably they'd still do that for actual scamming sites. We don't rely on the user to tell those apart from other sites today.
As the Wordfence article says; WHY for FIX this you do not set the parameter network.IDN_show_punycode to true?
I see this bug has been long discussed. Why the solution can be or start from set the parameter network.IDN_show_punycode set to true? Now as default is set to false and this seems allow the Phising issue to be existant. You can read more about this in the WordFence blog where they sayd that change the parameter I give to you solve the issue... so why not change this parameter and set default true for all in the next Firefox version? https://www.wordfence.com/blog/2017/04/chrome-firefox-unicode-phishing/
(In reply to Marco from comment #30) > I see this bug has been long discussed. > Why the solution can be or start from set the parameter > network.IDN_show_punycode set to true? Er, no, because that would make all non-Latin domain names show as gibberish. That's not really a good thing for people, countries and languages which don't use Latin letters. We want every script and language to be treated equally on the Internet. Gerv
:gerv would adopting comment #20 satisfy the issues regarding certain languages treated as second class citizens? It looks like we should update  to state that the work has been implemented. I can't see .com in my whitelist prefs does that mean the whitelist is out of date? The commit that corresponds with :dveditz's comment appears to be  there doesn't seem to be many other changes of their url_formatter.cc  Perhaps we could do a phased rollout of this algorithm for a TLD list if we wanted to audit the impact for domains? Has there been any further work on making a consensus across browsers on how these should be handled as mentioned in ?  https://wiki.mozilla.org/IDN_Display_Algorithm  http://www.mozilla.org/projects/security/tld-idn-policy-list.html  https://chromium.googlesource.com/chromium/src/+/08cb718ba7c3961c1006176c9faba0a5841ec792%5E%21/#F0  https://chromium.googlesource.com/chromium/src/+log/60.0.3072.2/components/url_formatter/url_formatter.cc
(In reply to Jonathan Kingston [:jkt] from comment #32) > :gerv would adopting comment #20 satisfy the issues regarding certain > languages treated as second class citizens? My current proposal is that Mozilla issue a statement to the effect that whole-script spoofing is a registry responsibility. I am just confirming we have consensus before drafting something. > It looks like we should update  to state that the work has been > implemented. Good idea - done. > I can't see .com in my whitelist prefs does that mean the > whitelist is out of date?  has a big "this is no longer relevant" sign at the top. > Has there been any further work on making a consensus across browsers on how > these should be handled as mentioned in ? Chrome basically adopted the same policy as us a few years after we did, although as noted above, they've recently implemented a moderate tightening in very specific circumstances. Gerv
>  has a big "this is no longer relevant" sign at the top. I did see this however I was a little confused by both:  states the following though which suggests some form of whitelist is in effect: > If a TLD is in the whitelist, we unconditionally display Unicode.  states: > The whitelist mechanism still remains in the product for backwards compatibility So for clarity is the whitelist is still used looking at the code, where .com has been removed (and perhaps others)? Thanks for updating the wiki! > Chrome basically adopted the same policy as us a few years after we did, although as noted above, they've recently implemented a moderate tightening in very specific circumstances. Perhaps we should write this into a something that resembles a standard (granted standards on UI are not favoured but this is somewhat an edge case) so that anything that implements displaying a URL could choose to follow. For example a domain registrar might choose to implement a more stringent restriction than the registry to reduce the number of complaints caused by punycode domain names.
UTS #39 is the standard, basically. (And it allows for variants which might be more appropriate for other actors such as registries.) I believe the whitelist is still used, although it may be OK to remove it. Gerv
I AM sad and little bit deluded by reading someone is considering to close this security issue as wontfix. Why Microsoft Edge browser are not affected by this security issue and Firefox and Chrome is affected? Now is public... using some domain or character is possibile to show the same domain ... I cannot belive a browser can show two different website as www.apple.com ... is as told in the world there are some address to deliver email completly the same: this will always creating issue and I belive that the registry will not solve this issue so this behavior can be used in the future for done Phising. I do not belive this should be closed as wontfix.
(In reply to Marco from comment #36) > I AM sad and little bit deluded by reading someone is considering to close > this security issue as wontfix. > Why Microsoft Edge browser are not affected by this security issue and > Firefox and Chrome is affected? Perhaps because Microsoft has decided to make the other choice. They privilege certain languages over others, depending on the context. As I understand it, they allow or disallow certain scripts based on the UI language of the copy of Windows you are running. The result of this, of course, is that if you buy an IDN domain, you can't know how many of your customers using Microsoft browsers it will work for, and how many will see garbage in the URL bar. We don't believe that creating this uncertainty for all non-Latin domain names is a good thing for the internationalization of the web. It certainly makes non-Latin languages second-class. Gerv
Safari is not affected, Vivaldi seems is not affected... there are different browser not affected... anyway I think this should be resolved from someone and the browser should also not show two completly different domain as the same domain. Chrome may will fix this security issue who is not accessible by anyone maybe because is really a security issue... hope this issue will be not allowed to be in Firefox.
(In reply to Gervase Markham [:gerv] from comment #37) > As I understand it, they allow or disallow certain scripts based on the UI > language of the copy of Windows you are running. I very much agree with this solution (or at least, something along the same lines as this solution). As someone who doesn't speak any languages which use a Cyrillic character set, a feature which renders Cyrillic characters in domain names provides no positive value for me. A domain name consisting entirely of Cyrillic characters is just as meaningless to me as xn--80ak6aa92e.com is. So for me, this feature not only opens me up to phishing attacks that I would normally be able to easily detect, but it's also useless. I understand that for those users who do speak a language which uses Cyrillic characters, rendering those characters properly in domain names is important. But for everyone else, it's worse than useless. So why not enable rendering Cyrillic characters based on the language settings the user has set in the host OS or browser? That way those who can benefit from seeing Cyrillic characters properly rendered will receive those benefits, while those for whom rendering those characters would do nothing but open them up to another attack vector can avoid that problem at no cost to them.
(In reply to Ajedi32 from comment #39) > As someone who doesn't speak any languages which use a Cyrillic character > set, a feature which renders Cyrillic characters in domain names provides no > positive value for me. However, there are many people using the US English version of Windows who do. The matching between OS language and user's spoken language(s) or understood scripts is deeply imperfect. People understand multiple languages and scripts. People use other people's computers. Speakers of non-English languages prefer to computer in English. I'm not sure there's a good way to fix this. We can hardly have a quiz at browser startup - "which of these characters do you recognise?" Gerv
Well, you could just do what Edge does and ask the user whether to allow a particular character set the first time they see a domain using those characters. https://i.imgur.com/LYifRig.png You also don't necessarily _have_ to rely on the OS settings, browser settings would also work.
> UTS #39 is the standard, basically. (And it allows for variants which might be more appropriate for other actors such as registries.) It appears the Firefox algorithm is fully captured by isLabelSafe still  (the whitelist is in this file also). Chrome's algo is roughly  with the notable exemptions above. I couldn't spot the following steps in Firefox: a. If there is an error in ToUnicode conversion (e.g. contains disallowed characters, starts with a combining mark, or violates BiDi rules), punycode is displayed. b. If the component contains either U+0338 or U+2027, punycode is displayed. c. If there are any invisible characters (e.g. a sequence of the same combining mark or a sequence of Kana combining marks), punycode is shown. d. Test the label for mixed script confusable per UTS 39. If mixed script confusable is detected, show punycode. e. If the label matches a dangerous pattern, punycode is shown a. Looks like this is covered by stringPrep  b. Interestingly Chrome added hyphen  to this exclusion list and their code references to the even older mozillazine page  It could be steps b-e are handled by other code, or I'm not understanding it. Either way I can't see tests for all of these. I also couldn't see that Google had implemented Bug 737931 either. Steps b,c,e don't appear to be in UTS #39 and certainly Googles new narrowing mentioned (hence my point about standards). However I need to rest and read the standard further as UTS #39 appears to stack onto the work of other Unicode and RFCs. We also appear to have rust-url checked in too, however I don't think that appears to matter.  http://searchfox.org/mozilla-central/source/netwerk/dns/nsIDNService.cpp#782-871  http://www.chromium.org/developers/design-documents/idn-in-google-chrome  https://chromium.googlesource.com/chromium/src/+/master/components/url_formatter/url_formatter.cc#513  http://kb.mozillazine.org/Network.IDN.blacklist_chars  http://searchfox.org/mozilla-central/source/netwerk/dns/nsIDNService.cpp#532
It could be just a "Warning" that appear if a website is using a TLD with complete NON latin characters like https://www.аррӏе.com/ It's not a warning that will show up very often so it will not disturb the users, also it would not remove access from any website but at least users would be warned to double check if the site is safe or not. Regards.
> making use of Safe Browsing to protect users is the way to go. I agree with this. I've been trying to persuaded SafeBrowsing to do more on IDN spoofing protection, but I haven't been successful. SafeBrowsing has a lot more information at their hand than poor web browsers (their only data is just 'names'). My Chromium fix is an ad-hoc attempt until I can convince SafeBrowsing to take up this issue on their end (mostly). There is a downside of entirely depending on SafeBrowsing, though. It cannot be quick enough to protect a user against a targetted attack. A domain is registered and used to attack a specific user within a, say, a few hours of the domain creation. As for registrars, well ..... I hope they'll be more vigilant than they have been....
> I also couldn't see that Google had implemented Bug 737931 either. Yes, Chromium did by virtue of ICU's uspoof API.
re comment 41: It protects users who do not speak any language written in Cyrillic (or who say No to the prompt), but Russian/Bulgarian/Ukrainian/etc speakers are still exposed to a potential attack. Without a user-prompt, that's what Chromium used to do (Accept-Language based approach).
(In reply to Jungshik Shin from comment #47) > re comment 41: > > It protects users who do not speak any language written in Cyrillic (or who > say No to the prompt), but Russian/Bulgarian/Ukrainian/etc speakers are > still exposed to a potential attack. > > Without a user-prompt, that's what Chromium used to do (Accept-Language > based approach). Yeah, it's not a full solution, but it does reduce the attack surface significantly. Users would only be able to be fooled by characters from languages they actually speak. Unless someone can think of a way to 100% solve this issue, it's probably best to take the approach of trying to mitigate the problem in as many ways as possible. Not exposing users who don't require IDNs to this attack vector is one of several layered mitigations you could apply.
(In reply to Ajedi32 from comment #41) > Well, you could just do what Edge does and ask the user whether to allow a > particular character set the first time they see a domain using those > characters. https://i.imgur.com/LYifRig.png This is just another way of making Cyrillic a second-class citizen on the web. Also, what if someone who understands Cyrillic uses the computer, and turns on the setting, then someone who doesn't uses it later? Or someone who doesn't understand it says "no", and then someone who does comes along? And whenever the setting to display Cyrillic is set, apple.com and apple.com will still look identical, so it provides no actual protection for people who understand Cyrillic. (Just because someone understands Cyrillic doesn't make the two domains magically look different to them.) It's not a good solution. > You also don't necessarily > _have_ to rely on the OS settings, browser settings would also work. Browser settings are also a poor proxy for "what the user actually understands". This is why the Accept-Language header never worked well. (https://wiki.whatwg.org/wiki/Why_not_conneg) Gerv
Anne has been suggesting for a while that we take a page from Safari, and only display the domain name in the URL bar. If we do that, we could easily _also_ display the punycode for IDNA domains. Maybe this would be a nice change for the 57 release?
We could, but should we? I don't think Punycode is useful to end users. It's meant as an interchange format. It's only useful to expose in the rare spoofing case being discussed here, which really should be solved through Safe Browsing and registrars upping their game. Other than that it would end up making the address bar more cluttered for non-Latin-alphabet users, which is also not reasonable.
Gerv do you know if we are handling the issues mentioned here also: https://www.iab.org/documents/correspondence-reports-documents/2015-2/iab-statement-on-identifiers-and-unicode-7-0-0/ Thanks Jungshik for the clarifications. I further agree with Anne and Gerv's stance that using punycode should be the last choice as to highlight domains that are clearly designed to confuse. Also conveying this issue to users as to why they should pick the correct language and potentially all the languages they understand would be unnecessary noise and unlikely solve the issue for all languages.
:jkt: the conclusion to that document seems to be: "the IAB recommends (as a temporary measure) not using hamza above a base character, in either precomposed or decomposed form, in any new identifier when user language information is not available." The question is whether that advice is aimed at registries or at us. It seems at first glance to be aimed at registries. The trouble is, if we add the hamza-using characters (and decomposed hamza) to our banned character list, it will be a long time before those characters can be reliably used in domain names even if the problem is solved and we remove it again. Gerv
(In reply to Gervase Markham [:gerv] from comment #49) > (In reply to Ajedi32 from comment #41) > > Well, you could just do what Edge does and ask the user whether to allow a > > particular character set the first time they see a domain using those > > characters. https://i.imgur.com/LYifRig.png > > This is just another way of making Cyrillic a second-class citizen on the > web. Well, no. You could just as easily treat Latin characters the same way. If the user's OS is set to a language which doesn't use Latin characters, don't display them until the user explicitly allows it. (Yes, I realize that for Latin characters that'd be a bit ridiculous for a variety of technical reasons, but if you're so concerned about making sure every character set is treated equally by the browser then this would be one way of doing that.) > Also, what if someone who understands Cyrillic uses the computer, and turns > on the setting, then someone who doesn't uses it later? Fair point, this clearly wouldn't help in this scenario. How common do you suppose this situation actually is though? I know _I_ don't have people who speak a language which uses Cyrillic characters using my computer. Others might, but as per my previous comment (comment #48) this idea isn't intended as a 100% solution, just a way of reducing the attack surface. > Or someone who doesn't understand it says "no", and then someone who does comes along? With the implementation in the screenshot I linked (https://i.imgur.com/LYifRig.png) this wouldn't be an issue. The "this domain uses characters from another language" indicator icon would continue to be displayed next to the punycode, allowing the user to easily view the fully-rendered version of the domain name and/or change language settings at any time. > And whenever the setting to display Cyrillic is set, apple.com and apple.com > will still look identical, so it provides no actual protection for people > who understand Cyrillic. (Just because someone understands Cyrillic doesn't > make the two domains magically look different to them.) Again, see comment #48.
Whatever you do, please do not disable the display of IDNs entirely and make punycode the default, as suggested by the (horribly U.S.-centric, borderline offensive) Wordfence post that started all this discussion again. You could allow users who want so to do so, but not as a default. At most, you could warn about whole script confusables in some way, though I agree that this would affect perfectly legitimate domain names as well, or even, if you get a multi-script URL, colour letters from different scripts in different colours (that would be enough to show some difference between the Latin and the Cyrillic URL without being too invasive or discouraging for Unicode users). For the rest, registries should just do their work, implement Unicode TR-39 and not register strings that are confusable with already existing domain names. And wontfix is a perfectly acceptable solution too.
This bug to my understanding is staying open for the following reasons: 1. To establish our stance related to policy 2. To establish the IDN differences between Firefox and Chrome As stated the current solution for people who mostly visit Latin based domains is to set "network.IDN_show_punycode" to true in about:config. If there is a need to discuss interface changes I suspect that should be another bug / mailing list. I suspect that isn't something we will ever do though. Gerv, I think there is still merit in establishing 1 and 2, is this the right place for that?
(In reply to Vittorio Bertola from comment #55) > And wontfix is a perfectly acceptable solution too. Strongly disagree. The mere possibility of two different domain names displayed in the browser's URL bar being visually indistinguishable from each other breaks one of the fundamentals of browser security; the ability of users to reliably determine which site they're communicating with. Without this ability, many other security features (such as HTTPS) become effectively useless against certain attacks. While I agree that disabling support for IDNs is not a good solution (and it seems like that won't happen, as there are plenty of much better alternatives suggested here), even that option would still be preferable to just ignoring this problem entirely.
Maybe Vittorio want to have a security bug so an attack can be done? I agree with Ajedi32 Comment 58 Jonathan I do not understand where a security issue about Firefox should be discussed if not here so I think this is the right place, if is not please tell me where report and discuss about this security issue. Is not acceptable that a browser show two different domain as the same domain. I cannot understand why this is happening and as there are browser not vulnerable to this Firefox should take example from where this security issue does not exist!. This is my vision and I hope Firefox will decide to take security at first place. Hacker or who are interested on have this security issue in the browser will be not happy of the fix... this is space for hackers... and should be fixed as the importance of the security also in the web is growing. I feel offensive the message of Vittorio Comment 55
There is no perfect solution to this problem. Human languages are messy, inconsistent, and wonderful. Different scripts have letters which clash with each other. You can either say "Script A is more important than Script B - in case of a clash, Script A always wins", or you can live with the problem after reducing it as much as possible. We don't think it's fair to privilege one script over another. After all, if e.g. the Russians had invented the Internet and support for Latin letters came along later, would you like to be told that the Latin domain name you wanted wouldn't work in browsers because it happened to consist entirely of letters which clashed with Cyrillic? The reason some browsers are not vulnerable to this problem is that they don't support this technology properly. If a browser doesn't support <video>, they won't be affected by a security problem with the <video> element! If we decided to turn off IDNs entirely we'd also not have a problem, but that would have massive negative side effects for all web users who don't use Latin letters. The route to a solution here is for those with domain name databases, i.e. registries, to stop letting their customers attack one another. Thing is, the reason you are in here shouting at us rather than shouting at them is that we run things using an open community process, so we are easier to shout at. But that doesn't make us the right people to fix the problem. If you don't want to be attacked this way, buy a domain in a TLD which doesn't allow it. If your TLD does allow it, lobby your registry. In the mean time, Firefox users have Safe Browsing to protect them from actual phishing attempts, whether they use IDN lookalikes or not. Gerv
I start to think that a fix is possibile like a warning as showed on comment 54 but seems someone do not want it this fixed. Gerv you closed this bug as wontfix one time and you seems continue to be of the idea of wontfix, I AM not of the same idea and I AM happy to see here I AM not the only one who are looking and hoping for a fix. A solution can be found if human wants, will be not found if will be decided in this way. I hope to see this Security issue fixed or reduced with a warn, browser warn or different solution. The most bad solution is maybe close this as wontfix. Register will not solve this issue the security issue already happens so the importance of the security is in the browser and Chrome bug are not yet closed. I think Chrome (hope) will apply also a solution but I AM Firefox user not Chrome so importance for me is here... or maybe Edge browser will be more safe than Firefox.
Apologies if my comment has unnecessarily raised the tone of the discussion, it was not my intention. However, suggesting that some scripts are less important than others, or can be hampered to facilitate English-speaking users, *is* offensive and will be perceived that way by people and governments in several parts of the world. Moreover, if the existence of a given domain name is a security risk, then it should be addressed where that domain name is created, rather than asking each and every client to deal with the issue (when that domain name will be used in email addresses or in server names, will you also require all email and SSH clients to warn about it?). I would rather encourage anyone who is not happy about the fact that ICANN policies still allow lazy/greedy registries to sell domain names that are confusable with existing ones to participate in the ongoing public consultation on the revision of the IDN registration guidelines: https://www.icann.org/public-comments/idn-guidelines-2017-03-03-en Actually, a comment from the Mozilla Foundation and/or some Firefox developers could be very useful.
I AM unable to understand how the domain https://xn--e1awd7f.com/ can be showed as the same as https://www.apple.com IF I paste into the browser the chatacter https://xn--e1awd7f.com/ this should be not converted to apple.com who is completly different address... maybe it's me who I AM ignorant on the subject but... cannot understand... but I understand that if this is allowed that all bank and other important website can be copied and with a different address look the same address as unicredit or BARCLAYS, mozzilla, etc.
If you put https://xn--e1awd7f.com/ on Chrome ... Chrome will show also as valid the certificate... and you want learn from Chrome how to fix or is the issue is fixed? In Chrome also the certificate is acepted... have you tried to see how this address is showed on Chrome? It's impossibile to know you are visiting a fake website once the page is loaded in the browser... this is why this behavior should find and end, I believe.
(In reply to Gervase Markham [:gerv] from comment #60) > The route to a solution here is for those with domain name databases, i.e. > registries, to stop letting their customers attack one another. I disagree. While I do agree that domain registrars can and should be involved in addressing this, that cannot and should not be the main solution to this problem for several reasons. Firstly, regardless of what mistakes might be made by third parties, it's my general expectation that the user agent (the web browser) should always do its best to behave in a way that furthers the best interests of the user. There have been numerous examples in the past of browsers (including Firefox) intervening to protect users despite third parties being the ones responsible for creating the problem in the first place. For example, the recent distrust of WoSign was one such case where Firefox intervened to protect users (by distrusting new WoSign certs) despite a third party (WoSign) being the one responsible for creating the problem. Another example would be [this issue](https://bug623317.bugzilla.mozilla.org/show_bug.cgi?id=1295064) where people are currently discussing whether or not Firefox should disable autoplaying videos on mobile, despite website developers being the ones responsbile for creating the problem by wasting bandwidth by loading and playing videos unncessiarily. Secondly, I'm not sure I even agree that addressing this issue is primarily the responsibility of domain registrars. While domain registrars can and should take steps to help defeat phishing whenever possible, browsers are the ones responsible for informing users of the domain of the site they are visiting. If the browser chooses to display two _different_ domains to the user in such as way as to make them visually indistinguishable from each other, I'm not sure that's really the domain registrar's fault. Thirdly, even if domain registrars were to accept responsibility for this problem, I don't really trust them to take sufficient steps to prevent it. Just look at certificate authorities as an example. Despite the numerous regulations, audits, and certifications they have to submit to, occasionally some CAs will _still_ mess up and issue certificates in violation of the baseline requirements, endangering the security of users. In such cases, browsers typically intervene to protect their users by distrusting the root certificates of the offending CAs. If domain registrars make similar missteps in this case by selling misleading IDN domains to their customers, what recourse will browser vendors have? Are you going to maintain a whitelist of TLDs allowed to create IDN domains, and remove TLDs from that list if they consistently mess up? In previous comments it was suggested that you'd already tried that and determined it to not be a good idea. > If you don't want to be attacked this way, buy a domain in a TLD which doesn't allow it. That's great if you're a website operator, but it doesn't help at all if you're merely a user. Not to mention that moving to another TLD is hardly a trivial task, and some website operators may not even care enough about user security to make such a move. (Which again, comes back to my point about user agents intervening to protect users whenever possible despite the mistakes of third parties.) > the reason you are in here shouting at us rather than shouting at them is > that we run things using an open community process, so we are easier to > shout at. But that doesn't make us the right people to fix the problem. I'm unsure as to whether that statement was directed at me, but in any case I apologize if you perceive my comments here as "shouting". I certainly don't mean to come across that way, and I do appreciate the ability to be directly involved in these discussions.
(In reply to Marco from comment #63) > I AM unable to understand how the domain https://xn--e1awd7f.com/ can be > showed as the same as https://www.apple.com > > IF I paste into the browser the chatacter https://xn--e1awd7f.com/ this > should be not converted to apple.com who is completly different address... > > maybe it's me who I AM ignorant on the subject but... cannot understand... > but I understand that if this is allowed that all bank and other important > website can be copied and with a different address look the same address as > unicredit or BARCLAYS, mozzilla, etc. Marco, I'm starting to get the feeling that perhaps you are confused about what the intended purpose of this feature actually is. You may wish to read up on Internationalized Domain Names, and why they were implemented in the first place. https://en.wikipedia.org/wiki/Internationalized_domain_name Essentially, without this feature many characters from non-English languages cannot be used in domain names at all. (E.g. A person whose native language is Mandarin would not be able to create a domain name which uses Chinese characters.) In your example, the characters you see on screen, а, р, р, ӏ, е are not the same as the letters which make up the real apple.com's domain. They are instead characters from a different script which happen to resemble the Latin characters which make up Apple's name. That's why the solution to this issue is a bit more complicated than "let's just make xn--e1awd7f display as xn--e1awd7f". That's why, while I completely agree with you that IDNs should not be implemented in a way which allows two different domains to be displayed exactly the same way in the URL bar, others are reacting very negatively to your suggestion to turn off the feature entirely.
> That's why, while I completely agree with you that IDNs should not be implemented in a way which allows two different domains to be displayed exactly the same way in the URL bar, others are reacting very negatively to your suggestion to turn off the feature entirely. Exactly. > Actually, a comment from the Mozilla Foundation and/or some Firefox developers could be very useful. dveditz, Gijs, annevk and myself are all current developers for Firefox to name a few. Gerv represents policy and was a developer for Firefox and as I understand it was the decider of our IDN policy in the first place. > I disagree. While I do agree that domain registrars can and should be involved in addressing this, that cannot and should not be the main solution to this problem for several reasons. As mentioned elsewhere, suggestions to how to fix this without becoming a global registrar would be welcome. So far I don't thin there are any perfect solutions. As Firefox represents one of the most translated browsers, I think it would do out users and contributors a disservice to treat any language as better. To me this significantly limits solutions to this down to: 1. Lobbying registrars/ICAAN/similar to do the right thing 2. Working on tightening restrictions when it's clear it doesn't impact real domains 3. Using tools like safebrowsing to create a blocklist of entries that clearly shouldn't be accessed The reason 2. is hard is because there are valid instances of brand names or dialects that could confuse users. These probably should be valid in the language of origin however in another they wouldn't be. The reason registrars are the right solution here is exactly that they have the systems and processes in places to prevent multiple registrations of similar domains for example .uk domains are restricted for sale at present so that .co.uk owners can purchase their own domain. They also have the ability to rapidly check for all confusable matches at time of registration, there isn't a chance a user agent could do that before presenting a page to a user. Registrars could also reserve all confusable variants of a domain to the domain owner automatically. > Are you going to maintain a whitelist of TLDs allowed to create IDN domains, and remove TLDs from that list if they consistently mess up? In previous comments it was suggested that you'd already tried that and determined it to not be a good idea. It's unmaintainable, I worked for a domain registration company who was struggling to manage the pace of even deploying new extensions to search. Last count I did there was 1500 extensions permitted by ICAAN with each having differing restrictions of policy for IDN blocking.
What's wrong with the previous suggestion already made, which is that the raw punycode should be shown if: 1. The domain name uses punycode and 2. All characters are either ascii or ascii look-alike punycode and 3. The domain extension has no punycode in it I think in those cases it should be pretty obvious something fishy might be going on, no? Even if someone finds a legitimate use case for something caught by those rules, everything that applies to those three things is at high risk of being a phishing attempt, isn't it?
Do we have the capability to detect on the browser side if a non-Latin-script domain name is a complete homograph of a Latin-script one? If so, would it be feasible to just not allow IDN for those? I know, my thinking here may be naive and we may not have the tools or it could get hairy (do we know the edge cases of this?) in some situations, but I believe it may be worth a thought.
Suppose the domain is a whole-script homograph made of characters outside the user's preferred language. That seems like a good moment to show a warning message in a popup. Also, using a different font for Cyrillic and Latin characters, the way OS X does it (as mentioned in comment 1), seems like an appropriate measure.
(In reply to jonas from comment #69) > What's wrong with the previous suggestion already made, which is that the > raw punycode should be shown if: > > 1. The domain name uses punycode and > 2. All characters are either ascii or ascii look-alike punycode and > 3. The domain extension has no punycode in it > > I think in those cases it should be pretty obvious something fishy might be > going on, no? Even if someone finds a legitimate use case for something > caught by those rules, everything that applies to those three things is at > high risk of being a phishing attempt, isn't it? What seems to be upsetting people the most is the use of a cert so https makes them think it's trustworthy. If we limit the impact of "1." in the above list to sites using certs with xn--*.com as the domain (and other valid punycode matching 2 and 3 in the above list) In fact I think in the https case (which is an even higher risk of being a phishing attempt) where the punycode domain cert translates to ascii or ascii look-alike Firefox could just silently go to the *actual matching ascii https site* as a kind of registrar/CA blacklisting method that won't impact the ones implementing proper anti-spoofing protections.
(In reply to Mardeg from comment #72) > What seems to be upsetting people the most is the use of a cert so https > makes them think it's trustworthy. If we limit the impact of "1." in the > above list to sites using certs with xn--*.com as the domain (and other > valid punycode matching 2 and 3 in the above list) At some point in the future--probably further out than I'd like, but coming nonetheless--every site will have a certificate and there will no longer be any use of insecure plaintext transmission.
(In reply to jonas from comment #69) > What's wrong with the previous suggestion already made, which is that the > raw punycode should be shown if: > > 1. The domain name uses punycode and > 2. All characters are either ascii or ascii look-alike punycode and > 3. The domain extension has no punycode in it > > I think in those cases it should be pretty obvious something fishy might be > going on, no? Even if someone finds a legitimate use case for something > caught by those rules, everything that applies to those three things is at > high risk of being a phishing attempt, isn't it? No. As noted in the first para of comment #60, it's perfectly legitimate to have a fully-Cyrillic .com address, and just because other parts of the internet use ascii doesn't mean we should 'break' all Cyrillic .com addresses that happen to only contain ascii-confusable Cyrillic characters. (In reply to Arthur Edelstein [:arthuredelstein] from comment #71) > Suppose the domain is a whole-script homograph made of characters outside > the user's preferred language. That seems like a good moment to show a > warning message in a popup. A number of downsides to this approach were already discussed in comment #47 and comment #49 et al. > Also, using a different font for Cyrillic and Latin characters, the way OS X > does it (as mentioned in comment 1), seems like an appropriate measure. This won't help at all, IMO. We already consider partial spoofing (low-severity) security bugs (e.g. when people find ways of scrolling the main domain out of view in the URL bar). It's not realistic to think that users can tell apart domain names when simply using a different font. (In reply to Robert Kaiser from comment #70) > Do we have the capability to detect on the browser side if a > non-Latin-script domain name is a complete homograph of a Latin-script one? I don't know. comment #46 seems to suggest that ICU has code for (parts of?) this. > If so, would it be feasible to just not allow IDN for those? That feels shaky. It would protect against some subclasses of this attack, but I suspect it wouldn't be sufficient (that is, a single character outside the "potential-spoofy-Latin" set would disable this protection, and it's questionable whether users would realistically notice the difference in that case). And it still has some of the same downsides of the suggestion in comment #69 (see earlier in my reply). There are words in Russian (and, I expect, other languages that use Cyrillic or Greek or other vaguely-like-Latin scripts) that consist entirely of homographs. Do we just stop users of those words/languages from using them effectively in domain names? That doesn't seem right, either. "Мат" and "Март" are both Russian words, but only the first is a homograph for an English word (though the second might be a homograph for non-English words that also use only Latin characters, I don't know). But the only point where there is an actual problem for users with this confusion is if "Мат.com" pretends to be "mat.com" - in other words, if the site is a spoof and/or malicious. This is why people are leaning towards using safebrowsing here - the ultimate test for "is this domain a malicious attempt to be something else" is related to the content, not the domain name. If we further accept that some uses of Cyrillic (or Greek, or ...) homographs are legitimate, it seems to me that there's really not much else the browser can do beyond what safebrowsing already does.
(In reply to :Gijs from comment #74) > (In reply to jonas from comment #69) > > What's wrong with the previous suggestion already made, which is that the > > raw punycode should be shown if: > > > > 1. The domain name uses punycode and > > 2. All characters are either ascii or ascii look-alike punycode and > > 3. The domain extension has no punycode in it > > > > I think in those cases it should be pretty obvious something fishy might be > > going on, no? Even if someone finds a legitimate use case for something > > caught by those rules, everything that applies to those three things is at > > high risk of being a phishing attempt, isn't it? > > No. As noted in the first para of comment #60, it's perfectly legitimate to > have a fully-Cyrillic .com address, and just because other parts of the > internet use ascii doesn't mean we should 'break' all Cyrillic .com > addresses that happen to only contain ascii-confusable Cyrillic characters. A .com domain with fully cyrillic characters where _each character is an ascii look-alike_? I am sorry, but I strongly disagree. If you register something that looks exactly like an ascii .com domain, you should have gotten the ascii .com domain. Even if you want to argue that some people who write cyrillic all day weren't aware of that and now already got the domain, the point is that there is no good way to allow such domains to NOT be browsed with a warning. After all, they are indistinguishable from a regular ascii domain, and that is after all why this bug report exists and why there could be phishing going on. While other languages should be respected, at some point you have to be realistic at what can be allowed without putting the user at risk.
Microsoft Edge seems to have a nice solution for that btw. If all characters are ascii look-alike and there is no cyrillic locale used on the user system, I strongly think there should be at least SOME sort of warning. Otherwise, phishing with such domains is simply unpreventable..
We now have an FAQ which makes our position clear: https://wiki.mozilla.org/IDN_Display_Algorithm_FAQ You may not agree with it, but it's our considered position, so please do not comment further here unless you have new information to add which you genuinely believe has not been considered. Gerv
Sorry, you caught me while writing, I just want to point this out to you: (In reply to Jonathan Kingston [:jkt] from comment #68) > > Actually, a comment from the Mozilla Foundation and/or some Firefox developers could be very useful. > > dveditz, Gijs, annevk and myself are all current developers for Firefox to > name a few. Gerv represents policy and was a developer for Firefox and as I > understand it was the decider of our IDN policy in the first place. Yes I know, what I meant is that it would be useful if you submitted a comment to ICANN's IDN guidelines consultation stating your problems with them allowing registries to sell whole-script confusables (the new guidelines draft now says that registries "may" apply Unicode TR-39 and block these registrations, but it's still at their discretion - it should say "must").
Sad to see that the final decision seems to be the wontfix. It's incredible think that a website can be showed in the browser with the same name and also a valid certificate: no warn... so the business seems to be in the first place and the security in the second place... I understand this is a complex case but I do not agree when I read in the Internet is possibile also in the future, use this kind of hole and browser vulnerability... to find and think how create new fake domain who are showed also with a valid certificate (for the browser). You said for now no big phishing or scam has been made... well seems we need wait a big scam or hack is made in the web for fix issues... however I AM looking at this security issue as a navigator who know the good rule for stay safe is to check the browser address and today... articles in the web are starting to talk and demostrate you cannot trust anymore the browser bar because there are the possibility to have two completly different website with the look of the same address. No warn, valid certificate, same web address... is unbelievable to me. I love the Firefox browser but today I AM very deluded... Security in this case seems to be not important or not at the first place. I think web address should be not repetible. Is for me as say that home address should have the possibility to be the same to another for usability so you can have different address who looks like the same. When I discover this security issue I was surprise to see the browser show a completly unrelated address same to another... with te difference you are redirected to another server. Ok I can stop to follow this bug and consider this will be not fixed in Firefox. Now will monitor Chrome and other browser... Firefox has a vulnerability now for me and maybe not also for me... I belive is right wait a fix and read to wait a fix in articles... but this fix seems never come. It's incredibile that also the certificate still showed valid in Chrome and in Firefox... so also https website can be not secure, cloned with a different, very rare address case who can grow in the future... as the news of this vulnerability leaved as is now... will going in the wrong hands. I AM very deluded today to read about the wontfix or just blacklisting... this is not a fix but is your decision... I will continue to serach an look to security on the web and on secure browser... sad to see Firefox has decide to not touch nothing about this. End of my partencipation here.
well I personally think that the best Idea is to check whether the whole domain has colliding scripts (primarily cryillic, latin, and greek, extend if needed) and if yes, ax the thing and show it as punycode.
> You may not agree with it, but it's our considered position, so please do not comment further here unless you have new information to add which you genuinely believe has not been considered. Here's something I haven't seen mentioned in this bug before at least - a way this could still be used as an attack even in the ideal world when registrars do their job properly. I think my considered opinion is that it's a very hard problem and it is likely not possible to fix it perfectly with no side-effects or holes, so to be honest I'm really not sure that the current official position should change - but I feel this is something worth considering at least. So, you've got https://apple.com and it has HSTS to enforce HTTPS access. So if someone hosting Dodgy Public Wi-fi Network tries to redirect you to their MITM'd http://apple.com the browser will refuse, since it will only visit over HTTPS. But, what if (say) the captive portal splash screen then redirected you to http://xn--80ak6aa92e.com/ ? The user is unlikely to have visited this page, and even though the host of Dodgy Public Wi-fi Network doesn't own the domain (because this is the ideal world in which the registrar does their job properly), the fact that the user is unlikely to have an HSTS entry for this domain means that the host of Dodgy Public Wi-fi Network can MITM HTTP requests to this fake domain, show their own spoof Apple site and capture people's iTunes accounts (or whatever). Now, I don't know if "a user browsing on a public Wi-Fi network and not ensuring they enter details over HTTPS (even though it looks like they're on the right domain)" would be something considered to be an issue or if that user would be considered to already have made a fatal error that can't be protected against (despite the fact that in the ordinary case they'd be protected by HSTS). But in the former case, perhaps HSTS entries could be expanded to apply to lookalike IDNs to prevent an attack like this?
Please sea as if Mozilla Team is not willing fix it.I will fix this by follows advice from This Polish website https://niebezpiecznik.pl/post/uwaga-na-niewykrywalny-phishing-poprzez-domeny-ze-znakami-unicode-podobnymi-do-liter-z-alfabetu-lacinskiego/?more Nobody is able recognizance this phishing website.They said only Browser Safari is able see this mistake. https://www.аррӏе.com/ about:config network.IDN_show_punycode true
A FIX will be relased on next version of Chrome, is in testing and will be relased in about a month. Seems for Firefox the vulnerability will stay present. Maybe in the future also Opera will fix the security issue. As you can read here: http://thehackernews.com/2017/04/unicode-Punycode-phishing-attack.html ************************************************************** The Phising attack is almost IMPOSSIBILE to detect. I think is important add this to this Security BUG report. ************************************************************** I AM happy to read Chrome will FIX this Security Issue because is a Security Issue a hole into the browsers... also if Mozilla create a FAQ where is explained why the hole is not fixed. Many aricles are hoping in a FIX, maybe they are not updated yet about decision of Firefox Team to not fix this... yes the issue can be solved manually by the users but major users will be EXPOSED. By not fixing this we are mading the web a less secure place where Hackers can now use the fact there is a way in the browser for show a domain same to another one. Yes punny code are important ... can be allowed for register a domain... but maybe should be not converted, showed as converted in the address bar of the browser... that's all. Hope maybe with the Chrome FIX, discussions of the user... maybe a solution will come also here... or maybe is the time to consider to find a secure way to surf the web with a most possibile secure browser. This Phishing attack is ALMOST IMPOSSIBLE to detect, this is why is so strong and can be used in the future if a fix will be not released. Now I hope and I think to have nothing more to say as here is requesting to not post more replies or post... I will respect this. I want only say that I cannot belive the browser I trusted since today is deciding to... leave this security issue. Is a strong issue who alow Hackers and attackers to use a vulnerable part of the browser. Solution will never come from register as told here and the vulnerability is already today demostred and created without issue on the web so will happen again in the future... email Phishing and not only email Phising will start to grow and with this issue almost me can be inganned... You can set manually Firefox to show punny code but the next time you will be on a clean machine and you forget to activate this... you may be inganned... also now is the time that "world" is talking about this issue, with the time when people have forghet this than hacker can attack using this vulnerability. Ok done, as request I will try to not post anymore post if I AM not interpelled. Thank you... Deluded, very deluded... maybe I will consider what to do.. if stay with Firefox or prefer more safer browser... if this sisue persist on Firefox and is solved on others browser.
(In reply to Gervase Markham [:gerv] from comment #78) > We now have an FAQ which makes our position clear: > https://wiki.mozilla.org/IDN_Display_Algorithm_FAQ > > You may not agree with it, but it's our considered position, so please do > not comment further here unless you have new information to add which you > genuinely believe has not been considered. > > Gerv There are a few things which don't appear to be addressed by that page. In particular, it seems to dismiss a lot of partial solutions by arguing "this won't solve the problem for everybody in all cases, so we're just going to ignore it", whereas I think the correct attitude should be "this _will_ solve the problem for a large number of users (even if not everyone), therefore we should implement it". For example: > # OK. Why doesn't Firefox decide based on the script associated with the browser's UI language? > > Because many people use browsers with a UI language different to the ones they speak, Yes, but there are also many users who use a UI language which _exactly_ matches the language they speak. This would solve the problem for those users. > or that is only one of the ones they speak. So let users set multiple languages like Edge does. > And that's before you've accounted for shared computers and internet cafes, with multiple people of differing capabilities using the same computer. Again, just because this doesn't solve the problem for _everyone_ doesn't mean it's not a useful solution. Solving the problem for _some_ users is better than not solving the problem at all. > Also, this would make using IDN domain names a dodgy proposition for any organization, because they can never know which of their customers will see them correctly and which won't. Essentially all customers who speak the language the domain is written in (i.e. >99% of the site's target demographic) would see the domain rendered correctly. (And those who don't could permanently solve that issue on their computer with just a few clicks.) Those who don't speak the language the domain is written in likely wouldn't be able to enter the domain name correctly on their keyboard layout in the first place (which is a far bigger usability issue than them seeing punycode is). > Lastly, this fix wouldn't actually solve the problem for everyone. http://apple.com and http://аррІе.com/ look the same even to people who read Cyrillic. Again, that's not an argument against implementing this.
I think enough has been said, the positions of everyone are clear, as well as the positives and negatives of the different approaches. You can refer to comment 78 and I think there will be further communications shortly. If you want to continue the discussion, please move it to a more appropriate place where to discuss decisions, like one of the Mozilla mailing lists. I'm now restricting comments to the editbugs group.
FYI: Similar issue was concerned even only with Japanese characters at supporting IDN. E.g., 口 (Kanji character, meaning mouth) vs. ロ (Japanese Kanatana-Ro). If there are ハ口ープロジェクト.com and ハロープロジェクト.com, it's difficult to distinguish them with most Japanese font. So, checking only UI language doesn't solve some issues. Therefore, we supported IDN only with trustable TLDs. (And also hyphen like characters are much more complicated.)
Couldn't we approach this from the other end and simply change the font so these characters don't look exactly alike? If we can make the differences less subtle, that could help. For that matter, we could do something like change the color of the URL if it contains characters not associated with the user's preferred language.
(My point being we could do something that simply affects the display to make these things easier to notice, instead of trying to do some kind of blocking)
Sheppy: see https://wiki.mozilla.org/Gerv's_IDN_Display_Algorithm_FAQ which addresses suggestions of changing the UI.
(In reply to Mardeg from comment #72) > What seems to be upsetting people the most is the use of a cert so https > makes them think it's trustworthy. If that's the case, then people misunderstand what a "normal" certificate means. It just means that transmission to that domain is encrypted and so you verifiably send send/receive the information from/to the shown domain (homograph or not). It is not a verification of identity of, i.e. of who runs the domain. That's what EV certs are. (In reply to :Gijs from comment #74) > (In reply to Robert Kaiser from comment #70) > > Do we have the capability to detect on the browser side if a > > non-Latin-script domain name is a complete homograph of a Latin-script one? > > I don't know. comment #46 seems to suggest that ICU has code for (parts of?) > this. > > > If so, would it be feasible to just not allow IDN for those? > > That feels shaky. It would protect against some subclasses of this attack, > but I suspect it wouldn't be sufficient (that is, a single character outside > the "potential-spoofy-Latin" set would disable this protection, and it's > questionable whether users would realistically notice the difference in that > case). And it still has some of the same downsides of the suggestion in > comment #69 (see earlier in my reply). There are words in Russian (and, I > expect, other languages that use Cyrillic or Greek or other > vaguely-like-Latin scripts) that consist entirely of homographs. Do we just > stop users of those words/languages from using them effectively in domain > names? That doesn't seem right, either. We could go in-depth in theory and allow IDN if DNS for the other-script homograph returns the same host(s), or if EV certs are used (which imply actual verification of identity, not just of domain possession). But it gets hairy. Still, the status quo feels pretty bad as well. > "Мат" and "Март" are both Russian words, but only the first is a homograph > for an English word For one, both used as domain names aren't really homographs as they don't really match-all-lower-case Latin. For the other, words are nothing I'd take into account at all, as a lot of domain names are or contain abbreviations. That said, maybe the real solution for the long-term is having ICANN enforce all registries using TR-39 as comment #79 proposes. Still, for the moment, esp. with other browsers "doing something" (and I don't agree their solutions are really good either), it makes us look like we are not caring about security, which isn't just completely untrue but also bad marketing. That's what disturbs me most, personally.