Last Comment Bug 618051 - Enable IDN for .si
: Enable IDN for .si
Status: RESOLVED FIXED
[qa-]
:
Product: Core
Classification: Components
Component: Networking: Domain Lists (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla13
Assigned To: Gervase Markham [:gerv]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-09 12:41 PST by Gervase Markham [:gerv]
Modified: 2012-03-06 12:12 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
fixed


Attachments
Patch v.1 (831 bytes, patch)
2012-02-24 08:38 PST, Gervase Markham [:gerv]
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Gervase Markham [:gerv] 2010-12-09 12:41:05 PST
From benjamin.zwittnig@registry.si:

Hello,

On http://www.mozilla.org/projects/security/tld-idn-policy-list.html we have noticed that we need to inform you about the IDN aware TLD. On 20th of October 2010 we enabled IDN registration for .si.

Registry web page is http://www.registry.si/. Policy for IDN registrations is described on http://www.registry.si/idn.html and set of allowed characters is listed on http://www.registry.si/idn/characters.html

I would kindly request if you could enable support for IDNs for .si.

Best regards,

Benjamin Zwittnig,
Arnes (
Comment 1 Gervase Markham [:gerv] 2010-12-09 12:56:04 PST
I am not able to approve this at this time. The .si character list:
http://www.registry.si/idn/characters.html
contains a number of confusable characters, including the ligatures œ and æ, ĺ which is similar to i in some fonts, and lots of very tiny variants of accented characters. I think they need an anti-spoofing policy, certainly for the ligatures.

.si registry team: do you have any anti-spoofing policy, which would prevent e.g. www.encyclopædia.si and www.encyclopaedia.si being registered to two different entities?

Gerv
Comment 2 Jo Hermans 2010-12-09 16:29:34 PST
Agreed. Note that this is not an (dis)approval of what currently exists or what might exists in the future. It's only about what the .si registry team would do with confusing entries like www.encyclopædia.si and www.encyclopaedia.si .
Comment 3 Benjamin Zwittnig 2010-12-23 05:36:28 PST
We do not have any special proactive mechanism to fight anti-spoofing.

Everyone can check if any 'similar' domain to his domain was registered. This can be done via Domain Availability Service (without any rate-limiting).
The service is similar to whois and runs on das.registry.si on port 4343. It is available also on our web page http://www.registry.si/

What is a 'similar' domain is up to a domain name owner to decide.

For quicker dispute resolution in case of infringements we have established ADR (Alternative Dispute Resolution). More information about the Dispute Resolution can be found on http://www.registry.si/domain-name-disputes-adr.html
Comment 4 Benjamin Zwittnig 2010-12-24 03:42:23 PST
> We do not have any special proactive mechanism to fight anti-spoofing.

Typo. ...to fight spoofing.
Comment 5 Gervase Markham [:gerv] 2010-12-28 06:53:07 PST
Benjamin: I'm afraid our policy states that the registry must implement anti-spoofing measures. Saying "it's the responsibility of the registrant" is not good enough. That transfers the burden of understanding the complex topic of homographs from those who know about it (us) to those who don't know about it (domain owners). It also puts on them the burden of paying a registration fee for each homographic domain - which could be a large number of domains, if they have a number of confusable characters in their domain name.

And having dispute resolution is no good. Most phishing sites have a lifetime of a few days - a dispute resolution process wouldn't even have got started.

So I'm afraid that, until an anti-homograph policy is put in place, as outlined in http://www.mozilla.org/projects/security/tld-idn-policy-list.html , we will not be able to enable IDN for .si in Firefox.

Gerv
Comment 6 Benjamin Zwittnig 2010-12-29 05:30:46 PST
In addition to ADR we also have some rules which can (and will) be used to fight malicious use of domain names:

http://www.registry.si/fileadmin/dokumenti/register/ENG/general-terms.pdf:

8.1.3. that their Application is submitted in good faith and for lawful purposes, and does not encroach upon the rights of third parties;
8.1.4. that the Domain name is not in breach of the law, public order or morals;
8.1.5. that they will comply with these General terms and conditions for the duration of the Period;
8.2. Where Domain name holders fail to comply with their obligations referred to in clause 8.1 of these General terms and conditions, Arnes shall have the right not to register a Domain name, or to delete a Domain name at its own initiative.

The rules are not explicitly written to fight spoofing/phishing but any type of fraudulent domain name registration. Actions (block or delete) would be taken immediately after reported misuse.

What are 'similar' domains which should be bundled is still up to the registrant to decide.

Bundling is actually not specific to IDNs since O (capital letter 'o') and 0 (number 0) look very much the same.
Comment 7 Boštjan 2011-03-09 01:56:16 PST
Gervase: 

looking at the policy links at:
http://www.mozilla.org/projects/security/tld-idn-policy-list.html

I was unable to find a single anti-spoofing related policy for any country, what exactly is the problem here?

Boštjan
Comment 8 Jo Hermans 2011-03-09 03:15:55 PST
(In reply to comment #7)
> Gervase: 
> 
> looking at the policy links at:
> http://www.mozilla.org/projects/security/tld-idn-policy-list.html
> 
> I was unable to find a single anti-spoofing related policy for any country,
> what exactly is the problem here?
> 
> Boštjan

What do you mean ? Let's take the Polish one :

http://www.dns.pl/IDN/idn-registration-policy.txt

First of all, it says "A combination of characters from different sets is not allowed". So you can't mix Latin characters with Cyrillic ones, but you can register pure Latin or Cyrillic names.

Second, it says "Registration of look-alike domain names is not allowed". So you can't register a Cyrillic domain name that just happens to look like a Latin one or vice versa.
Comment 9 Benjamin Zwittnig 2011-03-09 05:51:42 PST
We support ONLY latin characters (http://www.registry.si/idn/characters.html). This means that it is not possible to mix characters from different scripts in .si domain names.
Comment 10 Jothan Frakes 2011-05-30 05:53:52 PDT
Hi Benjamin, 

I want to make this work, and I am looking at your word lists and policy and I think I found the issue that is holding us all back from approving this.

Jo put in the link to NASK's IDN policy, which I think lets us get to the heart of the concern.

In looking at the registration policy from .pl which was approved, they have in place (rule 3) a means to ensure that registration of look-alike domains would not be possible.

Is a similar solution / rule in place in .SI?   that is the blocking concern.
If you can provide me a link to show that homonym registrations are not possible then we can move this forward.

example: 
If it is possible for Registrant A to have a domain name caeron.si 
and Registrant B to have cæron.si which is visually similar.  

Only one of those registrations should be allowed.  If both are allowed then the TLD does not make it onto the whitelist.  

The primary characters that are at issue are:
U+00E6 æ LATIN SMALL LETTER AE
U+0153 ΠLATIN SMALL LIGATURE OE

This is done to avoid the vector of homonym attacks and phishing which are part of why the browser will expose punicode.  With the browser user seeing punicode they can visually see that the domain names are different.

-Jothan
Comment 11 Jothan Frakes 2011-05-30 05:56:18 PDT
It appears my oe character in the above notes entered as the capitalized version as opposed to the lower case, but hopefully the point is still clear.   -j
Comment 12 Benjamin Zwittnig 2011-06-13 08:20:13 PDT
Hi Jothan,

They would register cæron.pl and caeron.pl to two different entities as we would. The same is true for some other registries already approved by mozilla as IDN safe (.de, .at, .ch, .li, .lu, .nu...). All of these registries allow domain names at least with incriminated U+00E6 and they do not check if a corresponding domain in latin only exists.

Please check domains tæst1234.pl (xn--tst1234-mxa.pl) and taest1234.pl at http://www.dns.pl/cgi-bin/en_whois.pl
One is registered to a organization named Arnes and the other belongs to a private person.

Benjamin
Comment 13 Jothan Frakes 2011-06-13 17:17:10 PDT
Benjamin, 

Good points.

Will yourself or another from the registry be at the Singapore ICANN meeting next week?  I'd like to work with you to resolve this face to face if we can.

-Jothan
Comment 14 Benjamin Zwittnig 2011-06-14 05:53:15 PDT
Hi Jothan,

unfortunately noone from .si Registry will attend ICANN meeting in Singapore.

In our previous correspondence we tried to explain that .si registration policy has got the same security mechanism as many other ccTLD registries that are "IDN whitelisted" (.de, .at, .ch, .li, .lu, .nu, .pl). We only support 1 script (Latin), so preventing mixing scripts in a non-issue for .si. We also proved that with registration of 2 example domain names under .pl.

We certainly believe and expect that all ccTLD registries should be treated equally and we hope that .si will be added to mozilla's IDN white list.

Best regards,

Benjamin
Comment 15 Boštjan 2011-06-14 06:48:50 PDT
Singapore since it starts with "si" or is there any other/deeper meaning to this request/question, Jothan? :)
Comment 16 Andrzej Bartosiewicz 2011-07-25 02:16:16 PDT
Dear Jothan, Benjamin and All Colleagues,

Today I'm managing my own company Yonita.com and since 2010 I'm no longer Head of .PL, but because I was the person who decided on the implementation of IDNs under .PL in 2003 let me explain the approach towards look-alike domains. I have also recommended similar policies to other TLDs (already implemented by them) so I feel responsible for those recommendations and I truly believe that we have taken the right decisions regarding Latin script. 

Of course, it is possible to register U+00E6 æ and combination of "ae" in two domain names for different registrants in many European ccTLDs. It's always the question what is "identical" and what is not. 

In Cyrillic there is similar letter to Latin "y" (U+0423 and U+0443) and that's why it should NOT be allowed to register identical domains in Latin and Cyrillic or to mix scripts. What we can't do is to disallow any similarly looking names like taest1234 and tæst1234 as an example. For some people it's similar for other it's not. There must be human decision which characters look identical. We allow registering "0" (zero) and "O" (capital "o") in domain names but they look quite alike. From my perspective æ and ae are less confusing than 0 and O. Is that true or not? Should be change the policy regarding zero and letter "O"? Of course not. 

You can take a look at that issue addressed in my short presentation:
http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pdf (slide #9).

I hope it helps.
Comment 17 Andrzej Bartosiewicz 2011-07-25 04:31:08 PDT
:gerv

I've checked registration rules for .SI and .PL, and .DE.

I do not understand, why .SI is not whitelisted, but .DE or .PL (and others) are whitelisted. 

Probably Mozilla do not understand implemented rules by ccTLDs, but there are no ANTI-SPPOFING mechanisms for Latin script in neither TLD that allows whole Latin script. Why Slovenia is treated differently than Germany I can't get. 

Please review the rules and apply the same policy for ALL TLDs with similar (identical) rules for Latin :)

Andrzej, Yonita Inc.
Comment 18 Jothan Frakes 2011-09-29 12:29:43 PDT
@Gerv, Peter, et all in community:

I have been spending quite a lot of time on IDN work in the context of being the coordinator of the Latin VIP team for new TLDs with ICANN, and I've taken away from that experience that handling of Latin characters beyond the LDH set (A-Z, 0-9, Hyphen "-", case insensitive) is not always perfect, due to ligature sets (ae/dz/oe etc) or some diacritical marks being visually indistinguishable depending upon fonts.

The premise of the whitelist, was to add some elegance and some safety and security to the end-user so that the Mozilla experience is a positive one. 

It is always best to default to the local registry to allow for them to make the best decisions on characters and code points to be used in their registries.

In the case of .SI, I'd like to make some strong points in favor of approving their request.

The first is their resilience, professionalism, and patience in getting this approved.  As the registry, it must be maddening and frustrating to have to field the customer requests or complaints about their IDN domain not working in Mozilla.  Their request to have us whitelist them is something that followed their introduction of the IDN domain names, so these names are out in the wild as we speak.  

Second point in their favor is that the registry has illustrated to me on more than one occasion to be extremely responsible, diligent and proactive in addressing matters where there were bad-actor registrations in a very expedient manner.  My understanding is that the whitelist exists as a technical workaround where this is not present.

Third point is that they have followed all of the guidelines we have published, and have listed codepoints, IDN policies, etc. that other TLDs have listed.

Fourh point is that they have in fact been treated differently with respect to the standards that they have been held to when contrasted against .de or .pl and others.  We recognize that the standards changed in and around after those registries' submissions and approvals (as .si evidenced by registering ligature variants of using the 'ash' / ae character under .pl). 

Fifth point is that there is not tremendous volume of registration under .SI at this point, and far lower in <IDN>.si domain registrations.  Statistically, the likelihood of the strife that the whitelist was created to constrain is so minimal, that one has to weigh it as a factor as to if the harms that the cure creates are valuable when measured against the constraints it creates.

Finally, my sixth point is the comprehensive aspect of the previous five.  I'd like to state my open support for allowing the addition of .si into the whitelist based not on any of these individual points but rather the mesh of all of them being present as basis to approve.

I hope this is convincing enough to inspire the community approval of this string into the IDN whitelist.

-Jothan

-Jothan
Comment 19 Gervase Markham [:gerv] 2012-02-24 08:35:44 PST
As far as I can tell, all of the characters listed in http://www.registry.si/idn/characters.html are in the Latin script. That means that according to https://wiki.mozilla.org/IDN_Display_Algorithm , all permissible .si IDNs would pass our automated checks. That means that, according to the transitional arrangements on that page, .si is eligible for whitelisting.

Gerv
Comment 20 Gervase Markham [:gerv] 2012-02-24 08:38:23 PST
Created attachment 600408 [details] [diff] [review]
Patch v.1
Comment 21 Gervase Markham [:gerv] 2012-02-24 08:39:10 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/dea5e2945b2f

Gerv
Comment 22 Marco Bonardo [::mak] 2012-02-25 02:18:19 PST
https://hg.mozilla.org/mozilla-central/rev/dea5e2945b2f
Comment 23 Gervase Markham [:gerv] 2012-02-27 05:02:23 PST
Comment on attachment 600408 [details] [diff] [review]
Patch v.1

[Approval Request Comment]
User impact if declined: IDNs not working (displaying as gibberish) in relevant domain.
Testing completed (on m-c, etc.): Policy rather than code change; but patch has baked over the weekend.
Risk to taking this patch (and alternatives if risky): Very low.
String changes made by this patch: None.

Gerv
Comment 24 Alex Keybl [:akeybl] 2012-02-27 15:26:39 PST
Comment on attachment 600408 [details] [diff] [review]
Patch v.1

[Triage Comment]
Assuming this has gone through all the proper processes, approving for Aurora 12 and Beta 11. We do not expect to see any regressions caused by this patch which, as Gerv notes, is really just policy change.
Comment 26 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-05 16:44:31 PST
Is there anything QA can do to verify this fix?
Comment 27 Gervase Markham [:gerv] 2012-03-06 02:18:45 PST
It would be a waste of your time to try :-) This is a policy change, not a code change. Same for all the other changes with "PSL" in the subject.

Gerv

Note You need to log in before you can comment on or make changes to this bug.