User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5) Gecko/2008032619 Firefox/3.0b5 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b5) Gecko/2008032619 Firefox/3.0b5 My site, https://www.warsaw.k12.in.us, when you click on the favicon area to view information about ssl, it reports as k12.in.us when it should at least go one sub-domain further for warsaw.k12.in.us. Reproducible: Always Steps to Reproduce: 1. Go to https://www.warsaw.k12.in.us 2. View ssl information on the favicon. 3. It shows as a cert for k12.in.us Actual Results: Cert shows as k12.in.us when it should at least show as warsaw.k12.in.us Expected Results: Should at least show as warsaw.k12.in.us but www.warsaw.k12.in.us would be better.
Confirming. k12.(state code).us should be considered a TLD for purposes of site identification. Seen also with https://surveys.hudsonville.k12.mi.us/
There are other subdomains withing <statecode>.us that should also be considered TLDs for purposes of identification. They are listed in section 3.3 of RFC 1386.
Gerv manages the suffix list - cc'd.
Currently, the PSL lists 50 xx.us state prefixes then has a comment: // The registrar notes several more specific domains available in each state, // such as state.*.us, dst.*.us, etc., but resolution of these is somewhat // haphazard; in some states these domains resolve as addresses, while in others // only subdomains are avilable, or even nothing at all. http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1 But it's good to know there's an RFC. However, for our purposes, it basically says "you can't make a list like the one you want. We could make things better by adding the ones we can add: .STATE.<state>.US. .K12.<state>.US. .CC.<state>.US. .LIB.<state>.US. And .FED.<state>.US. CCing pkasting, because the PSL actually affects navigability in Chrome. Gerv
I originally wanted to add patterns like k12.*.us to save a few hundred rules, but that pattern isn't actually possible (they're not regular expressions).
The registrars have noted hundreds of variant domains that are reserved for U.S. addresses, but we've rarely encountered their usage in the wild. In other parts of the PSL, we've mostly followed a pattern of only noting real eTLDs, and ignoring ones that registrars claim are "reserved for future use" or "reserved" without subdomains actually available in them. Otherwise we'd have a tens of thousands more entries needed (see Italy for a pathological example... their registrar has a multi-hundred-page PDF of reserved names). I don't have a problem with adding more specificity for U.S. domains where there's actual usage. This isn't urgent for Chrome navigability as the critical bit there is whether a string has a TLD _at all_, whereas this is simply an issue of how granular the TLD is.
Some of these are clearly more used than others. k12.*.us definitely is. I'm wrong about FED.<state>.US in comment 5, but let's add the 200 rules for the other four. That'll cover a lot of what's out there. Gerv
OK, partial change of mind - let's not at state.*.us, because all those sites are, in a sense, owned by the same entity (the state government) and so they might want to share cookies. But I've added the other three. Gerv
Note for those CC'd here, this broke *.k12.hi.us -- see bug 614565
and bug 614565 currently proposes to back out this fix entirely, not just the .HI.us part