I've compared the Public Suffix List to IE 8's ietldlist.xml. While the two lists do not have the same function, it has thrown up some areas we should investigate to see if the PSL is deficient. Specifically we should check and see whether the following need to be added: +name.ae +pro.ae +info.ai +inima.al +soros.al +tirana.al +uniti.al +upt.al +iris.arpa +biz.at +info.at +biz.bb +info.bb +store.bb +asso.bj +barreau.bj +gouv.bj +info.bw +biz.by +city.ca +namlet.ca +town.ca +village.ca +ville.ca +travel.ch +jobs.cn +ngo.cn +travel.cn +jobs.cz +pro.cz +gouv.dj +ass.dz +aero.fo +biz.fo +coop.fo +info.fo +museum.fo +name.fo +gouv.ga +hmg.gb +biz.gh +cuhk.hk +hku.hk +ust.hk +ernet.in +coop.it +info.it +pro.it +tel.ki +info.ms +pro.ms +web.ms +museum.mw +web.na +gouv.nc +web.nl +lodz.pl +siedlce.pl +torun.pl +alumni.pr +exalumnos.pr +ghabarovsk.ru +idn.sg +gouv.sn +embaixada.st +so.at.tc +com.au.tc +pro.tc +shop.tc +co.uk.tc +me.uk.tc +org.uk.tc +co.at.tf +aero.tj +coop.tj +dyn.tj +info.tj +museum.tj +nom.tm +mrt.tn +benedettoxvi.va +denedictumxvi.va +mailservice.va +osservatoreromano.va +ossrom.va +pcf.va +photo.va +sistinechapel.va +vaticanstate.va +vatican.va +vatiradio.va Gerv
I am actively researching these.
(In reply to comment #0) Here's what I've checked so far: > +name.ae > +pro.ae Wikipedia: listed, marked "deprecated in 2008" Registrar: Relevant info is in http://www.aeda.ae/aeda/ar/policies/AEDA-POL-007_Domain_Name_Eligibility_Policy-ar.pdf ; I can't read Arabic but I note they list the other second-level TLDs but not these two. Other: Still listed at some third-party domain registration sites Recommendation: Unsure > +info.ai Wikipedia: Not listed Registrar: Unreachable Other: Not found in Google searches Recommendation: Ignore > +inima.al > +soros.al > +tirana.al > +uniti.al > +upt.al Wikipedia: These are not eTLDs but FQDNs, grandfathered in as exceptions to an original "no direct second-level registrations" policy. Registrar: Not listed. Unrelated: http://www.ert.gov.al/ert_alb/faq_det.html?Id=31 lists "mil.al" at the top and in the document body. Recommendation: Add "mil.al" (and update Wikipedia) > +iris.arpa Wikipedia: Listed Registrar: Listed Recommendation: Add > +biz.at > +info.at Wikipedia: Not listed Registrar: Not listed Other: http://www.info.at/ lists these (third-party registrar). Both resolve as FQDNs. Recommendation: Ignore > +biz.bb > +info.bb > +store.bb Wikipedia: Listed Registrar: Listed Recommendation: Add > +asso.bj > +barreau.bj > +gouv.bj Wikipedia: Listed in passing; other eTLDs implied Registrar: Listed in passing; other eTLDs implied Recommendation: Add, possibly alongside contacting registrar for complete list > +info.bw Wikipedia: Not listed Registrar: None given Other: http://www.info.bw/ resolves to something that does not look related to domain registration Recommendation: Ignore > +biz.by Wikipedia: Not listed Registrar: http://hoster.by/ gives "of.by"; http://extmedia.com/domains.html gives "ooo.by", "odo.by", "zao.by", "oao.by", "llc.by", "chup.by", "ppl.by". However, typing these directly results in successful navigation as FQDNs, redirected back to the registrar in question. Other: Resolves directly as FQDN. Recommendation: Ignore > +city.ca > +namlet.ca > +town.ca > +village.ca > +ville.ca Wikipedia: Not listed Registrar: http://www.cira.ca/assets/Documents/Legal/Registrars/registrationrules.pdf lists these as "reserved" in 3.3(b), except that "namlet.ca" is a typo and should be "hamlet.ca". 3.3(a) also lists "com.ca", "org.ca", "net.ca", "edu.ca", "gov.ca", "int.ca", and "mil.ca" as "reserved". Recommendation: Unsure > +travel.ch Wikipedia: Not listed Registrar: Not listed Other: Resolves directly as FQDN. Recommendation: Ignore > +jobs.cn > +ngo.cn > +travel.cn Wikipedia: Not listed Registrar: Can't find applicable policy Other: All resolve directly as FQDNs. Recommendation: Ignore > +jobs.cz > +pro.cz Wikipedia: Not listed Registrar: Not listed Other: All resolve directly as FQDNs. Recommendation: Ignore
Peter: that's great work, thanks for taking this on. One thought though: how much weight do we put on something resolving as an FQDN? I mean, if I were the registr of .zz, and I was offering domains below com.zz and net.zz, I might well make typing "com.zz" resolve to a website and have it pointed at a page advertising com.zz registrations. When you say "resolve directly as FQDNs", are you implying "and to a website unrelated to registrations or DNS"? Gerv
(In reply to comment #3) > When you say "resolve directly as FQDNs", are you implying "and to a website > unrelated to registrations or DNS"? No, I simply mean resolve. Although in most of the above cases, the resolved host is unrelated to anything registrar-wise, so it would fit that criterion too. I think it's problematic if the eTLD list contains entries that are actually FQDNs. In Chrome's case, for example, it will refuse to navigate to them unless the user is fairly explicit (e.g. by adding a scheme). It's also not necessarily clear how to deal with cookies, since the website in question can set a domain cookie that we'd want to prevent "subdomains" from reading. In existing third-party cases, e.g. the recent ones with ZaNiC, the registrar's page is itself at a subdomain, generally the "www." subdomain.
Followup on some of the above cases, since I just found that a Google search for "site:co.uk" will list everything that's a subdomain of that, making it much easier to check what's really used. * For "name.ae" and "pro.ae", there are no results in Google, and if a registrar delists something I'd rather not list it without concrete evidence of real-world use. So I change my recommendation to "Ignore". * For the various "al" subdomains, "mil.al" has many registered 3LDs. "soros.al" and "upt.al" look to have been directly registered instead of hosting 3LDs. Recommendation stands as above. * For "iris.arpa", there are no results in Google. However, the registrar is quite clear, so I leave my recommendation as it stands. * For "biz.at", there are no Google results. For "info.at" there are a number of 3LDs. Since "info.at" redirects to "www.info.at", maybe it's safe to add "info.at"? * For the various "bb" subdomains, there are no results in Google, but the registrar is quite clear, so once again I leave my recommendation unchanged. * For "gouv.bj", there are various 3LDs. For the other "bj" subdomains there are no results in Google. No text on the registrar's site clearly distinguish these cases, so I leave my recommendation unchanged. * For "info.bw", there are a couple of 3LDs in Google, but almost everything is just "www.info.bw". I think it is OK to ignore this. * For the various "by" subdomains, there are no results in Google except for "of.by", which has a large number of 3LDs. Since "of.by" redirects to "hoster.by", maybe it's safe to add "of.by"? * For the various "ca" subdomains, there are no results in Google (for any of the subdomains I mentioned) except for "village.ca", which looks to have been registered directly rather than hosting 3LDs. * For "travel.ch", the only apparent 3LDs are all clearly owned by "travel.ch" (if you visit them), so I leave my recommendation unchanged. * For "jobs.cn", all 3LDs seem to be owned by "jobs.cn". For "ngo.cn", there are a number of unrelated 3LDs, though "ngo.cn" resolves (although it doesn't load for me at the moment). "travel.cn" has no Google results. Waffling on adding "ngo.cn". * For "jobs.cz", there are a number of 3LDs, and "jobs.cz" redirects to "www.jobs.cz". Maybe it's safe to add "jobs.cz"? For "pro.cz", there are few 3LDs, and all look to be owned by the same company. In the cases above where it may be possible to add 2LDs that host 3LDs, I am worried about the possibility of the 2LD owner doing something nefarious with reading other sites' cookies. More research: (In reply to comment #0) > +gouv.dj Wikipedia: Not listed Registrar: Not listed Other: Two 3LDs listed in Google, one of which doesn't resolve. Recommendation: Ignore > +ass.dz Wikipedia: Lists "asso.dz", as does eTLD data file; I believe "ass.dz" is a typo. Registrar: No list I can find Other: No Google results Recommendation: Ignore > +aero.fo > +biz.fo > +coop.fo > +info.fo > +museum.fo > +name.fo Wikipedia: Not listed Registrar: Listed as "gTLDs" with "reserves the right to reject applications". Also lists "edu.fo", "gov.fo", "int.fo", "mil.fo", "org.fo"; in a separate subsection, also lists "nic.fo", "web.fo", "co.fo", "ftp.fo", "www.fo", "telnet.fo", "irc.fo", "internet.fo", and "mail.fo". Registrar does not mention availability of third-level registrations under these. Other: "gov.fo" has a very small number of 3LDs in Google, most of which claim to be owned by "gov.fo". "internet.fo" has a couple of 3LDs. All other 2LDs have no subdomains as results. Recommendation: Ignore > +gouv.ga Wikipedia: Not listed Registrar: Mentioned in passing in an example; explicitly lists "org.ga", "or.ga", "com.ga", "co.ga", "edu.ga", "ed.ga", "ac.ga", "net.ga", "go.ga", "asso.ga", "Aéroport.ga", "Int.ga", "Presse.ga" as "second-level domains managed by nic.ga" (but doesn't list "gouv.ga"). Other: Registrar-provided example does not resolve, but there are many other 3LDs of "gouv.ga" on Google. "org.ga" and "co.ga" seem to have a couple of 3LDs each that are owned by the same people. Recommendation: Add "gouv.ga" only (and update Wikipedia) > +hmg.gb Wikipedia: Mentioned inside noting "dra.hmg.gb" as the lone subdomain in DNS Registrar: None Other: No results for "gb" in Google Recommendation: Ignore > +biz.gh Wikipedia: Not listed Registrar: http://www.nic.gh/customer/search_c.htm has "biz.gh" in dropdown Other: "biz.gh" has no results in Google Recommendation: Unsure. Normally I'd go with what the registrar says but this is a bit less clearly stated than some of the cases above.
More random info: With at least some cases in our existing list, we mark as eTLDs hostnames that resolve as FQDNs; for example, saotome.st and principe.st. I dunno whether that means this pattern is fine, or we should remove these cases, or neither. (In reply to comment #0) > +cuhk.hk > +hku.hk > +ust.hk Wikipedia: Not listed Registrar: Not listed Other: These have sites found in Google, but seem to be universities, not true 2LDs any more than e.g. "hmc.edu" (my Alma Mater's website) is a 2LD because it has cs.hmc.edu, math.hmc.edu, etc. Recommendation: Ignore > +ernet.in Wikipedia: Listed Registrar: Not listed Other: Various 3LDs found in Google under "ernet.in". However, "ernet.in" also resolves directly, and doesn't redirect. Recommendation: Ignore > +coop.it > +info.it > +pro.it Wikipedia: Not listed Registrar: Not listed (but I also can't find "gov.it" or "edu.it" listed). Unrelated note: there are multiple different lists of reserved names, one of which is ridiculously long (600+ pages). It's kind of weird that our current list pulls entries out of one of these, and not the others. Other: "coop.it" and "info.it" both appear to host 3LDs; "pro.it" does not. Recommendation: Ignore > +tel.ki Wikipedia: Listed. Unrelated: "de.ki" is listed as a host for 3LDs; "de.ki" resolves by redirecting to "www.de.ki", which indeed registers subdomains. Registrar: I get redirected to "nic.mu" -- the NIC for a completely unrelated country (Mauritius). Weird Other: No sites found for "tel.ki" in a Google search Recommendation: Add "de.ki"? > +info.ms > +pro.ms > +web.ms Wikipedia: Not listed Registrar: Not listed Other: "info.ms" has many 3LDs listed in Google. However, I was not able to actually open any sites. Google did cache them, though, and most seemed to actually exist somewhere else (the .ms sites were redirects); those that didn't redirect were all from a single domain parking spammer. "pro.ms" and "web.ms" resolve directly. "pro.ms" has no sites found in Google; "web.ms" seems to be populated solely by a single spammer (lots of pages about iPods). Recommendation: Ignore > +museum.mw Wikipedia: Listed Registrar: Listed Other: Nothing found in Google Recommendation: Add, since the registrar is clear. > +web.na Wikipedia: Not listed Registrar: Not listed/unreachable Other: A few 3LDs in Google Recommendation: Ignore > +gouv.nc Wikipedia: Not listed Registrar: Not listed. Unrelated: "asso.nc" is listed (in significant detail). Other: "gouv.nc" resolves by redirecting. A number of 3LDs on each of "asso.nc", some on "gouv.nc". Recommendation: Add "asso.nc", and update Wikipedia. > +web.nl Wikipedia: Not listed Registrar: Not listed Other: "web.nl" resolves by redirecting to "www.web.nl"; all 3LDs on Google look to be domain parking spam from a single source Recommendation: Ignore > +lodz.pl > +siedlce.pl > +torun.pl Wikipedia: "lodz.pl" and "torun.pl" are listed. Registrar: "siedlce.pl" is listed. Other: "lodz.pl" resolves directly. All three seem to have 3LDs in Google. Recommendation: Add "siedlce.pl". > +alumni.pr > +exalumnos.pr Wikipedia: Not listed Registrar: Not listed Other: "examlumnos" has a single result in Google, "demo.exalumnos.pr", whose content is boilerplate. Recommendation: Ignore > +ghabarovsk.ru Wikipedia: Not listed Registrar: Not listed Other: I think this is a typo for "khavarosk.ru". Recommendation: Ignore > +idn.sg Wikipedia: Listed as "on trial from 4 July 2005 - 3 January 2006" Registrar: Not listed Other: No sites found in Google Recommendation: Ignore > +gouv.sn Wikipedia: Not listed Registrar: Listed alongside "univ.sn", "edu.sn", "org.sn", "art.sn", com.sn", "perso.sn" Other: All these have results -- if few -- on Google, and none resolve directly Recommendation: Add all, and update Wikipedia > +embaixada.st Wikipedia: Listed Registrar: Listed Other: No 3LDs found in Google Recommendation: Add > +so.at.tc > +com.au.tc > +pro.tc > +shop.tc > +co.uk.tc > +me.uk.tc > +org.uk.tc Wikipedia: Not listed Registrar: No documentation? Other: Other than "so.at.tc", all seem to have 3LDs on Google. Recommendation: Ignore? > +co.at.tf Wikipedia: Not listed Registrar: None available Other: Seems to host some 3LDs in Google, but few are reachable Recommendation: Ignore > +aero.tj > +coop.tj > +dyn.tj > +info.tj > +museum.tj Wikipedia: Listed Registrar: No rules given, but another site has http://www.domain.tj/Docs/rules.pdf , which does not list these. Other: None has any results in Google. Recommendation: Ignore (and remove from Wikipedia?) > +nom.tm Wikipedia: Not listed Registrar: Not listed Other: I think this is a typo for "com.tm", which is mentioned in passing in some registrar docs (but has no 3LDs in Google). Recommendation: Ignore > +mrt.tn Wikipedia: Not listed Registrar: Not listed Other: I think this is a typo for "rnrt.tn". Recommendation: Ignore > +benedettoxvi.va > +denedictumxvi.va > +mailservice.va > +osservatoreromano.va > +ossrom.va > +pcf.va > +photo.va > +sistinechapel.va > +vaticanstate.va > +vatican.va > +vatiradio.va Wikipedia: "benedettoxvi.va", "osservatoreromano.va", "sistinechapel.va" are listed as websites, rather than 2LDs. "denedictumxvi.va" is a typo for "benedictumxvi.va", which is also listed as a website. Others are generally found as parent domains of hosts listed. Registrar: None available Other: The entries here which turn up on Google appear to be hostnames. Recommendation: Ignore
For .tj, I have more info. The previously-mentioned http://www.domain.tj/Docs/rules.pdf also explicitly lists "nic.tj", "test.tj", and a number of geographical names. The first two of these give "reserved name" when tested at nic.tj, and the geographical names a mixture of "registered" and "geographic name". I have also found http://www.get.tj/info/?lang=en which can be read as listing the subdomains above, as well as "my.tj", "per.tj", and "pro.tj". When tested at nic.tj these give the response "public zone". Another reading of this document is that it's simply listing existing ICANN TLDs and saying none can be registered as 2LDs. This document also mentions a few geographic names in passing which have a wide variety of results when tested at nic.tj. I don't know what to make of this mess. I am inclined to add "nic.tj" and "test.tj" since those are much more clearly listed than any of these others. I have updated Wikipedia based on these findings.
Created attachment 413467 [details] [diff] [review] patch v1 This implements my recommendations above, basically. Didn't add ngo.cn as it is unrelated to hosting. Didn't add jobs.cz, same reason. Did add most other 3LD host sites.
Hi Peter, I've been going through these. It's a lot of work :-) Thanks for taking it on. BTW, I didn't know you were at Harvey Mudd. Did you know Jesse Ruderman, Mozilla security guy, was also there? I think we should have a policy which says that registry documentation is to be preferred over practical investigation. If the registry lists something, we add it. If they provide what appears to be a full list which doesn't contain something we are asking about, we shouldn't add it. Only if there is no documentation and no communication should we resort to guessing based on investigation and/or Wikipedia. And we should add only when the 2LD doesn't resolve and when there's clearly multiple different entities owning the 3LDs. On top of that, we should err on the site of not adding rather than adding in unclear cases. Not adding something that should be added means a possible cookie data leak if there's a bad actor involved. Adding something which shouldn't be added means someone's website breaking in 25%+ of the browsers in the world. Does this sound sane? Is it what you did? :-) Gerv
(In reply to comment #9) > BTW, I didn't know you were at Harvey Mudd. Did you know Jesse Ruderman, > Mozilla security guy, was also there? Yes. > I think we should have a policy which says that registry documentation is to be > preferred over practical investigation. I very much disagree. Most registries are unclear and/or inaccurate. We should match reality. This is simply another case of web browsers having to prefer the real world over someone's specification of it. > On top of that, we should err on the site of not adding rather than adding in > unclear cases. Not adding something that should be added means a possible > cookie data leak if there's a bad actor involved. Adding something which > shouldn't be added means someone's website breaking in 25%+ of the browsers in > the world. Not adding something that should be added means you can't even visit the site easily in Chrome. Adding something that shouldn't be added is extremely unlikely to cause any problems given that we're only doing it in cases where there is good evidence that separate actors own the various subdomains. I don't view either case as desirable. We should not err on one side or the other; we should strive not to err. That is what I have done above.
(In reply to comment #10) > I very much disagree. Most registries are unclear and/or inaccurate. We > should match reality. This is simply another case of web browsers having to > prefer the real world over someone's specification of it. But if we pick the specification, the blame for any problems lies unambiguously with the registry. If we try and hit the real world and fail in some way, and someone's site breaks, it's our fault. Gerv
Comment on attachment 413467 [details] [diff] [review] patch v1 r=gerv, with the exception of de.ki. We have a policy of not adding pseudo-registries (eg uk.net, iki.fi etc.) without their explicit permission, because we want them to know what they are asking for. Gerv
(In reply to comment #12) > (From update of attachment 413467 [details] [diff] [review]) > r=gerv, with the exception of de.ki. We have a policy of not adding > pseudo-registries (eg uk.net, iki.fi etc.) without their explicit permission, > because we want them to know what they are asking for. I contacted the folks at de.ki. They replied to me by personal email saying "we think that including our domains is the right thing". However, they are apparently doing this not just for de.ki, but for a total of 1436 "pseudo-TLD+1s", which they sent me a list of. I'm a bit unsure of whether we want to include all these on the main list. I can't really think of a reason not to other than "there's really a lot" and that it increases the chance of something being wrong if these folks lose control of these domains someday. I will forward the email to you directly so you can take a look.
Wow... That probably requires a bit of thinking about! Let's get this patch in, and deal with that in a separate bug. Gerv
Created attachment 420123 [details] [diff] [review] patch v2 This is the same patch without de.ki. Carrying over gerv's r+.
Filed bug 537975 about the large list of sites.
roc: thank you :-) For about three consecutive mornings (UK time) I looked at the tree with an eye to checking this in, but there was orange every time :-( Gerv
Comment on attachment 420123 [details] [diff] [review] patch v2 Approved for 22.214.171.124 and 126.96.36.199, a=dveditz for release-drivers
1.9.1: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/957fb1286630 1.9.0: Checking in netwerk/dns/src/effective_tld_names.dat; /cvsroot/mozilla/netwerk/dns/src/effective_tld_names.dat,v <-- effective_tld_names.dat new revision: 1.16; previous revision: 1.15 done Gerv
Comment on attachment 420123 [details] [diff] [review] patch v2 Approved for 188.8.131.52, a=dveditz for release-drivers
verified on http://hg.mozilla.org/releases/mozilla-1.9.1/rev/957fb1286630 http://hg.mozilla.org/releases/mozilla-1.9.2/rev/eb79100feb19 using Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:184.108.40.206) Gecko/20100316 Firefox/3.6.2 (.NET CLR 3.5.30729)
Raymond, you verified this for Firefox 3.5.9 using the candidate build? I don't see any mention of it above but you added that verified1.9.1 keyword.