Closed Bug 687165 Opened 14 years ago Closed 13 years ago

Pseudo-TLDs causing problems for Chrome navigations

Categories

(Core Graveyard :: Networking: Domain Lists, defect)

x86_64
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zerodpx, Assigned: zerodpx)

References

Details

The eTLD list currently contains pseudo-TLDs for at least CentralNic (ar.com, br.com, etc.), as well as operaunite.com and appspot.com. In all these cases, the "TLD" itself is actually a navigable address. This causes problems for Chrome's omnibox whose heuristics preclude navigation directly to TLDs unless the user overrides by explicitly typing a scheme or similar. This rule is to prevent short strings like "co" from being treated as broken navigations instead of searches. Unfortunately in these cases it prevents navigation to valid hostnames. See http://code.google.com/p/chromium/issues/detail?id=96086 . At least Chrome needs a way to solve this. There are a couple options I can think of: (1) Remove these entries from the eTLD list. This is consistent with some other cases where I've elected to not add "TLDs" from various registries that allow navigation directly to those TLDs. The downside is that this lowers the cookie separation for such sites, e.g. "foo.appspot.com" will be allowed to set cookies on appspot.com. (I assume this downside is also present in Gecko.) (2) Add new syntax to distinguish "TLDs" that, when taken alone, should be parsed ignoring the rules that mark them as TLDs. e.g. "=x.y.z" could be used to mean that "foo.x.y.z" has an eTLD of x.y.z, but x.y.z itself should be parsed by looking at any relevant rules for "y.z" and so forth, just like "foo.y.z" would be. The downside here is code complexity, and perhaps this might cause caller confusion if code is not prepared to deal with a suffix of one hostname having a different TLD. Thoughts? Unless we can fix this relatively quickly, I'd probably patch the downstream Chrome copy of the eTLD list to implement #1, but I don't want a longterm fork.
Peter- Pardon my question, but does eTLD list = public suffix list? There was an event recently where there was a fairly "fast and loose" policy (if that) regime for a third party subdomain under a ccTLD (I am specifically naming co.cc in this case) created an attractive environment for some badware content. The result was Google clamped the TLD from the listings. Not all subdomain providers are bad actors. Many are project domains for open source like those of DYNDNS or appspot or operaunite. Others such as CentralNic are commercial providers who are very active participants and are 'good actors'. Whatever we do here, let's be careful to not harm good actor entities like CentralNic who have hundreds of thousands or perhaps millions of registrants who might be adversely impacted. -Jothan
Yes, eTLD list = PSL. I don't think the co.cc case is relevant because if I'm thinking of the same thing you are this involved a mass-delisting of sites from Google search, which has nothing to do with the PSL (or Gecko or Chrome etc.). If I'm mistaken let me know. I don't consider these cases to be "bad actors" and I'm not trying punish anyone. But we should be concrete about what the purpose of the PSL is and what presence on or removal from the list will actually do. In Chrome, the most obvious effects are that address bar navigation is impacted -- in this case negatively (hence the bug). I think the other main effect is on cookie handling, where I believe both Gecko and Chrome check the list to determine whether a site is attempting to set a cookie on an eTLD. Given the above, removing a pseudo-TLD from this list is not the end of the world. As I noted, it probably means "foo.appspot.com" can set (and read) cookies for "appspot.com" -- suboptimal but par for the course for browsers for most of history. Of course I would rather make fixes with no downsides, which is why I tried to think of something like my proposal #2.
The fix for bug 531758 has added a ton of DynDNS domains that all fall into this category of "pseudo-TLDs that are actually directly navigable". I'm currently blocking these from going into Chrome's copy of the PSL but this effectively means we're forked which is not the situation I want. Can I get any agreement about adding a new symbol as in comment 0 proposal (2) so that we can mark these "TLDs" as navigable?
Interesting... I would lean towards option 2 of comment 0 if we did it, but I have some other thoughts that might let us 'plus this up' a bit... Alternatively, what about us leaving the existing format intact and using flags space delimited after the entry? I think once we'd do what you're saying we will come up with other use cases that could be incorporated in the ps/etld list? I am assuming that derivative parsers might ignore _stuff_ beyond the right hand side of the current uncommented psl entry and we could use args without breaking stuff for consumers of the list. so we'd have a 'valid itself' flag and the entry could look like this (using your example from #2 in comment 0), using v and identify sub-domaining of entry with * =z.z v =y.z v =x.y.z v * would infer that z.z, y.z and x.y.z are valid by themselves, and that sub-domains would/could be valid also under x.y.z I am willing to help contribute towards this change but 1) would want to engage this community about its benefits and impacts, 2) would want to ensure it doesn't break anything, and 3) get input on other beneficial flagging that could be introduced from some of the non-cookie/non-search consumers / benefactors of the psl
(In reply to Jothan Frakes from comment #4) > I am assuming that derivative parsers might ignore _stuff_ beyond the right > hand side of the current uncommented psl entry and we could use args without > breaking stuff for consumers of the list. That would surprise me. It would only be the case if parsers implemented their tokenization as "read until you hit whitespace, but then ignore everything until after a newline". That wouldn't be the algorithm I'd write -- I'd either read in each token delimited by whitespace, or else read in each line delimited by newlines. Either method would result in changes like the ones you propose breaking parsers. (I am aware in all this that adding a '=' symbol in the first place is also a breaking change, so it's not like making other breaking changes at the time is really a huge problem.) > =z.z v > =y.z v > =x.y.z v * > > would infer that z.z, y.z and x.y.z are valid by themselves, and that > sub-domains would/could be valid also under x.y.z I don't understand what you're proposing. * Are the '=' and the 'v' symbolizing the same concept? If not, how are they distinct? * What, precisely, is the difference between the '*' case and the non-'*' case? In your example above, what are the correct TLDs for the following: a.a.z.z a.a.x.y.z a.x.y.z x.y.z a.y.z y.z For reference, if you omitted the 'v' and '*' symbols, my proposal would answer as follows: a.a.z.z -> .z.z a.a.x.y.z -> .x.y.z a.x.y.z -> .x.y.z x.y.z -> .y.z (both the "=x.y.z" rule and the "=y.z" rule would give this) a.y.z -> .y.z y.z -> .z Without the '=' sign, the existing algorithm today would answer as follows: a.a.z.z -> .z.z a.a.x.y.z -> .x.y.z a.x.y.z -> .x.y.z x.y.z -> .y.z (from the "y.z" rule) a.y.z -> .y.z y.z -> No TLD
I don't fully understand why Chrome prohibits navigation to an eTLD. Is presence in the eTLD list just being used as a signal that the user wants a search, relying on the assumption that if it's a bare eTLD it can't be a navigation? Personally, I would lean toward including navigable eTLDs in the list, on the grounds that the potential for harm from a malicious site setting a supercookie outweighs the need to type a scheme to navigate to them. But I'm not adamant about that, and I recognize that needing the scheme is completely undiscoverable. In any case actually fixing the problem is better still. Only reading each line up to the first whitespace is explicitly part of the PSL spec (http://publicsuffix.org/list/). Of course, parsers may have ignored that. But given that, we can make a nominally nonbreaking change by putting the "=" at the end of the rule instead.
(In reply to Pam Greene from comment #6) > I don't fully understand why Chrome prohibits navigation to an eTLD. Is > presence in the eTLD list just being used as a signal that the user wants a > search, relying on the assumption that if it's a bare eTLD it can't be a > navigation? Chrome uses the validity of the user's input as a URL as an input to its heuristics that determine whether to search or navigate. Before the inclusion of any "pseudo-TLDs" (my term), all TLDs named as rules could legitimately be assumed to be non-navigable. For example, given the rules "bar" and "foo.bar", we could assume that "foo.bar" was not a hostname with TLD of ".bar" (under the "bar" rule) because it was itself a TLD -- and worldwide, bare TLDs didn't resolve. While this assumption isn't explicitly listed in Chrome's registry_controlled_domain.h file, Chrome's code does indeed make this assumption. This implies that one fix would be to simply remove this assumption and allow "foo.bar" in the above case to be reported as having TLD ".bar". The disadvantage of this is that it doesn't match reality very well -- it would result in e.g. treating "co.uk" as a valid host with TLD ".uk", which seems wrong. Since only the "pseudo-TLDs" seem to violate the above assumption, it seems reasonable to me to document the assumption explicitly and then also qualify these cases as exceptions. > Only reading each line up to the first whitespace is explicitly part of the > PSL spec (http://publicsuffix.org/list/). Thanks for that link, I didn't know this. I notice that the documentation there is less detailed than the Chrome documentation. Do you know who owns this? I think it would be good to update the documentation there to be as thorough as possible -- e.g. by documenting details like "A wildcard rule implies a rule for its base, so *.foo implies an additional rule foo" (without which, exception rules don't actually make a lot of sense). > given that, we can make a nominally nonbreaking change by > putting the "=" at the end of the rule instead. That still breaks the spec in that really we want parsers to read this token lest they get the wrong result. Also, the other symbols we use ('*' and '!') both occur at the beginning of a rule. I think for consistency and ease of parsing it would be better to go ahead and add any symbols we need at the beginning of the rule rather than the end.
(In reply to Peter Kasting from comment #7) > Before the inclusion of any "pseudo-TLDs" (my term), all TLDs named as rules > could legitimately be assumed to be non-navigable. OK, so it's assuming that a TLD can't be a valid URL. Which used to be correct, and is still useful the great majority of the time, so we shouldn't ditch it. Are there enough interested parties here to collectively agree on how to annotate the file, or does this discussion need to move to some mailing list? > > (http://publicsuffix.org/list/). > > I notice that the documentation there is less detailed than the Chrome > documentation. Do you know who owns this? I think it would be good to > update the documentation there to be as thorough as possible -- e.g. by > documenting details like "A wildcard rule implies a rule for its base, so > *.foo implies an additional rule foo" (without which, exception rules don't > actually make a lot of sense). I wrote the original on Mozilla's wiki ages ago. I've written to find out who owns it on publicsuffix.org. I agree it would be great to make the documentation as thorough as we can without contradicting any existing implementations. > > given that, we can make a nominally nonbreaking change by > > putting the "=" at the end of the rule instead. > > That still breaks the spec in that really we want parsers to read this token > lest they get the wrong result. Also, the other symbols we use ('*' and > '!') both occur at the beginning of a rule. I think for consistency and > ease of parsing it would be better to go ahead and add any symbols we need > at the beginning of the rule rather than the end. How do we define "wrong"? It's accurate that the result returned is an eTLD. It's just also navigable. For purposes of cookie restrictions, it's probably better to consider it a TLD. For UI sorting, probably likewise, although it depends on the specifics. Really, though, I'm just very reluctant to make a non-backwards-compatible change to a file that might be used by any number of people without our knowledge.
(I'm slightly surprised not to have been CCed on this bug before now...) (In reply to Peter Kasting from comment #0) > The eTLD list currently contains pseudo-TLDs for at least CentralNic > (ar.com, br.com, etc.), as well as operaunite.com and appspot.com. > > In all these cases, the "TLD" itself is actually a navigable address. Before getting further into the discussion, can I challenge an assumption? Why are we assuming that the correct behaviour is to navigate to such addresses? When owners of such domains ask to be put on the PSL, they need to be aware of the consequences - of which non-navigability is or could be one. On a related note, does Chrome yet have a policy on what you plan to do when/if brand owners start buying TLDs like ".brand" and add an A record directly for them? That means that single words could be valid public hostnames. We are currently working through this issue ourselves. This is related to your assumption about "bare TLDs don't resolve" that you mention in comment 7. > (1) Remove these entries from the eTLD list. This is consistent with some > other cases where I've elected to not add "TLDs" from various registries > that allow navigation directly to those TLDs. The downside is that this > lowers the cookie separation for such sites, e.g. "foo.appspot.com" will be > allowed to set cookies on appspot.com. (I assume this downside is also > present in Gecko.) Indeed. Removing appspot.com from the list, at least for cookie purposes, would be a significant change to their security model. At the very least, we would need to give them good warning. Anyway, I still think the original reason for adding them is sound. > (2) Add new syntax to distinguish "TLDs" that, when taken alone, should be > parsed ignoring the rules that mark them as TLDs. e.g. "=x.y.z" could be > used to mean that "foo.x.y.z" has an eTLD of x.y.z, but x.y.z itself should > be parsed by looking at any relevant rules for "y.z" and so forth, just like > "foo.y.z" would be. The downside here is code complexity, and perhaps this > might cause caller confusion if code is not prepared to deal with a suffix > of one hostname having a different TLD. The problem with this is that I suspect that some implementations may assume that getPublicSuffix(thingThatIsAPublicSuffix) always returns itself. May I make a suggestion (3)? If the thing on the PSL has an embedded dot (ie is "co.uk" rather than just "co" or "uk"), get Chrome to do a DNS lookup to see if there's a website there before doing a search. W.r.t the spec, I have checkin rights to the publicsuffix.org site, so can add improvements to the pages. If you would like to make a patch, you can get the code here: http://viewvc.svn.mozilla.org/vc/projects/publicsuffix.org/trunk/ svn co svn://svn.mozilla.org/projects/publicsuffix.org/trunk publicsuffix.org Gerv
(In reply to Pam Greene from comment #8) > For purposes of cookie restrictions, it's probably > better to consider it a TLD. There are two cases, though. One is when "foo.appspot.com" tries to set cookies and one is when "appspot.com" itself tries to set cookies. The former should not be able to set cookies for appspot.com, but the latter should. So navigability is not the only issue here. The issue is that, at the core, a rule like "=appspot.com" needs to say that ".appspot.com" is a TLD but "appspot.com" should ignore this rule and continue searching (to be caught by the "com" rule), for all users of the TLD service, not just for navigations. > For UI sorting, probably likewise, although it > depends on the specifics. I'm not sure what "UI sorting" refers to. (In reply to Gervase Markham [:gerv] from comment #9) > (I'm slightly surprised not to have been CCed on this bug before now...) Sorry, I CCed bugzilla@gerv.net because that's what the Bugzilla autocompleter told me to put. I don't know what maintains that list. > (In reply to Peter Kasting from comment #0) > Before getting further into the discussion, can I challenge an assumption? > Why are we assuming that the correct behaviour is to navigate to such > addresses? When owners of such domains ask to be put on the PSL, they need > to be aware of the consequences - of which non-navigability is or could be > one. In at least the appspot.com case, it's Google folks who contacted me to tell me there was a problem. I hope my comments above to Pam demonstrate why we can't just say "adding to this list may make these non-navigable" and walk away. If that isn't sufficient, I would also note that (a) Firefox doesn't determine navigability based on this like Chrome does, so users may have different expectations, and (b) since these really are navigable it seems kind of wrong to me to end up in a scenario where they "aren't". > On a related note, does Chrome yet have a policy on what you plan to do > when/if brand owners start buying TLDs like ".brand" and add an A record > directly for them? That means that single words could be valid public > hostnames. We are currently working through this issue ourselves. This is > related to your assumption about "bare TLDs don't resolve" that you mention > in comment 7. If I could, I would simply ban this practice. I don't think it's appropriate for any true TLDs to actually be navigable in and of themselves. What will actually happen is that Chrome will do a search, but will in the background also attempt to resolve the hostname, and if it can make an HTTP HEAD connection, will show the user a "did you mean http://foo?" infobar, where clicking the link will mean future attempts will navigate rather than search. This is because we have to handle this case for intranet hostnames today. I don't see a way to write the code in Chrome to ban the A record case but still work correctly on intranets, or I would. > > (2) Add new syntax to distinguish "TLDs" that, when taken alone, should be > > parsed ignoring the rules that mark them as TLDs. e.g. "=x.y.z" could be > > used to mean that "foo.x.y.z" has an eTLD of x.y.z, but x.y.z itself should > > be parsed by looking at any relevant rules for "y.z" and so forth, just like > > "foo.y.z" would be. The downside here is code complexity, and perhaps this > > might cause caller confusion if code is not prepared to deal with a suffix > > of one hostname having a different TLD. > > The problem with this is that I suspect that some implementations may assume > that getPublicSuffix(thingThatIsAPublicSuffix) always returns itself. I think you're speaking in terms that are too implementation-specific to reason about. The current list docs say nothing about this, and in Chrome for example, if you ask for the eTLD of an eTLD, you don't get back your input, you get back a flag that means "invalid hostname, doesn't have an eTLD". > May I make a suggestion (3)? If the thing on the PSL has an embedded dot (ie > is "co.uk" rather than just "co" or "uk"), get Chrome to do a DNS lookup to > see if there's a website there before doing a search. Not feasible for various reasons. We guarantee that omnibox actions in Chrome are determined synchronously to avoid race conditions with user input, and it's impossible to do a synchronous DNS resolution. There are also problems where some users' DNS systems will take ridiculous amounts of time to fail, and we don't want users to have search queries that sit for 30 seconds or more before loading, even if those cases are unusual.
(In reply to Peter Kasting from comment #10) > (In reply to Gervase Markham [:gerv] from comment #9) > > (I'm slightly surprised not to have been CCed on this bug before now...) > > Sorry, I CCed bugzilla@gerv.net because that's what the Bugzilla > autocompleter told me to put. I don't know what maintains that list. It's a list of accounts in Bugzilla. That's an old account of mine which is disabled. Disabled accounts shouldn't appear in that list; I've filed bug 698697 about that. > In at least the appspot.com case, it's Google folks who contacted me to tell > me there was a problem. One might respond that then there is a confusion within Google about what they want appspot.com to be, and they need to sort that out amongst themselves :-) You yourself said "I don't think it's appropriate for any true TLDs to actually be navigable in and of themselves." Putting yourself on the TLD list effectively means opting in to be a TLD, with all that this involves. Either they want that or they don't. Get them to decide and let you know :-) > I hope my comments above to Pam demonstrate why we can't just say "adding to > this list may make these non-navigable" and walk away. If that isn't > sufficient, I would also note that (a) Firefox doesn't determine > navigability based on this like Chrome does, so users may have different > expectations, and (b) since these really are navigable it seems kind of > wrong to me to end up in a scenario where they "aren't". If the UK domain registrars decided to put a website on "co.uk", would you consider it a bug in Chrome that it was non-navigable? Or, to put the question another way: you said "Before the inclusion of any "pseudo-TLDs" (my term), all TLDs named as rules could legitimately be assumed to be non-navigable." Say we had decided not to include any pseudo-TLDs at all. What would you do when/if that assumption was broken "from the other side" by some "real" TLD becoming navigable? > What will actually happen is that Chrome will do a search, but will in the > background also attempt to resolve the hostname, and if it can make an HTTP > HEAD connection, will show the user a "did you mean http://foo?" infobar, > where clicking the link will mean future attempts will navigate rather than > search. This is because we have to handle this case for intranet hostnames > today. > > I don't see a way to write the code in Chrome to ban the A record case but > still work correctly on intranets, or I would. Perhaps this process will be enough of a road bump in the user experience to stop brand owners trying it. > > May I make a suggestion (3)? If the thing on the PSL has an embedded dot (ie > > is "co.uk" rather than just "co" or "uk"), get Chrome to do a DNS lookup to > > see if there's a website there before doing a search. > > Not feasible for various reasons. OK. Here's another suggestion: for some time, we've been thinking of breaking out the pseudo-TLDs into their own section of the list, so uses which want to exclude them, or treat them differently, can. What if we did that, and then just put a separator comment between the two, which your code could detect - you could then assume that anything after the separator had the equivalent of your "=" sign in front of it. That would allow you to get what you want without making changes to the list format. Gerv
I am backtracking on my assertion (that TLDs should in principle be non-navigable) here. I didn't think it through long enough and on further thought it was flat wrong. Thanks for asking the right questions to make me realize this. (In reply to Gervase Markham [:gerv] from comment #11) > If the UK domain registrars decided to put a website on "co.uk", would you > consider it a bug in Chrome that it was non-navigable? Yes, and I would propose fixing the PSL list similarly to how I'm proposing today. > Or, to put the question another way: you said "Before the inclusion of any > "pseudo-TLDs" (my term), all TLDs named as rules could legitimately be > assumed to be non-navigable." Say we had decided not to include any > pseudo-TLDs at all. What would you do when/if that assumption was broken > "from the other side" by some "real" TLD becoming navigable? I would fix the PSL list because clearly our assumption, which tried to reflect reality, no longer actually reflects reality. Reality wins. To phrase this differently: we used to assume something that I don't think we can safely assume anymore, and I think we need to make what was previously implicit be explicitly specified in the list. We can't just demand that appspot.com or anyone else pick which of two good behaviors they want. > > I don't see a way to write the code in Chrome to ban the A record case but > > still work correctly on intranets, or I would. > > Perhaps this process will be enough of a road bump in the user experience to > stop brand owners trying it. I want to recall my statement. I am still opposed to someone registering a word as a TLD and then allowing it to be navigable, but I was mistaken to claim it was because all TLDs should not be navigable. Rather, my claim is because I don't think it's a good user experience to try and type in an English word to search for it and then instead be navigated (or harassed to navigate). The existing TLD system basically keeps hostnames and search queries looking somewhat distinct. I don't really care if co.uk becomes navigable. I care very much if "food" or "shoes" or even "Nike" becomes a navigation instead of a search, because I think that's confusing. I also think this is more of a problem with Chrome because it's more important to us that we understand user intent. All that said, if that's what happens to reality, then that's what we will change to respect, much like HTML specs should say what browsers actually do and not something else. > OK. Here's another suggestion: for some time, we've been thinking of > breaking out the pseudo-TLDs into their own section of the list, so uses > which want to exclude them, or treat them differently, can. What if we did > that, and then just put a separator comment between the two, which your code > could detect - you could then assume that anything after the separator had > the equivalent of your "=" sign in front of it. > > That would allow you to get what you want without making changes to the list > format. This is technically possible but I think inferior. I want to know why we're trying so hard to find ways of allowing Chrome to do something here without actually making it explicit in the PSL. I think we should purposefully make this explicit for two reasons: (1) I think you're worried about breaking some existing user of the PSL that we don't know about. This to me is an optimization for today versus the infinite future, and worse, one with no calculable value because we don't know who the consumers are, if they exist at all (do you have specific cases?). We should optimize for the ideal long-term case even if it's inconvenient in the short term. (2) As I said above, we should reflect reality, and that means we should make this distinction clear. It is better for both humans and code to consume a format that's as clear and direct as possible and I think marking this information on TLDs directly is better than implying it from some other sort of subtle distinction. This reduces confusion.
There are additional concerns here for SSL certificate validity (should certs for *.appspot.com be valid? What about *.com? If the two are distinct how do we know which is OK?). I have asked for someone more knowledgeable about this in Chromium-land to comment here; see http://code.google.com/p/chromium/issues/detail?id=102507#c9 .
In the meantime, my comment on that issue is: The distinction here seems to be between TLDs where one entity controls the servers for all subdomains (a la appspot.com, or perhaps future ICANN TLDs like ".<brandname>"), and TLDs where subdomains are under the control of separate entities (most TLDs). Note that this is not precisely the same set of cases as the navigability cases -- "jpn.com" might be navigable but doesn't have all subdomains under the control of one company, whereas perhaps a hypothetical ".<brandname>" TLD might not be navigable but _does_ have all subdomains under the control of one company. It is not clear to me whether this information belongs in the PSL or not. My instinct is to say "not" and claim this is a second list, much like the Mozilla list of "TLDs that have good policies around script homograph attacks", but I don't know who would maintain such a list.
The meta-issue here is that the PSL is being used for more and more uses which are outside its original scope, and then people are asking for it to be bent slightly to accommodate their use. Meanwhile, we didn't do anything to keep track of all the users and consumers, and so we can't contact them if we want to make a format change. Perhaps Yngve was right in proposing XML... (aargh!) SSL certificate validity is definitely a new use, and while we should consider it, I don't think there is a get-it-fixed-this-week rush about it - we've been managing without that additional check for some time now. However, I don't think it's entirely unreasonable to argue that cookie-sharing and SSL-validity go hand in hand. We don't want cookies shared across servers whose operators are different entities - and, given that one purpose of SSL is to tell you who you are talking to, arguably you don't want an SSL certificate shared by a number of servers whose operators are different entities either. But then it comes down to how you define "operator". This is deep, and requires careful thinking about. This growth of uses for the PSL is, I guess, a sign of success :-), but I think that we need to avoid a patch-as-we-go procedure, and instead take a moment to step back and look at all the current use cases and see which ones we want to support, and how. To that end, I've started this wiki page: https://wiki.mozilla.org/Public_Suffix_List/Use_Cases Hopefully we can fill it out fairly quickly, both with more use cases and with more edge case examples, and it will help us decide on the way forward. If we are going to be making breaking changes to the format, it seems that it might be a good time to jump to something more structured - either Yngve's XML, or JSON, or similar. We can then put that at a different URL, and people can migrate as and when they can. Gerv
Thanks Gerv. Your comments all make sense to me. I especially like the idea of constructing a parallel, more descriptive format, which is more extensible in backwards-compatible ways, because that's something we could potentially do even without being sure we've thought of every possible use case.
Peter: please do add your thoughts to the wiki page. Incidentally, it looks like Guava are running into the same problem you are with respect to navigability: http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/net/InternetDomainName.html The parallel format is important because there are now lots of apps and libraries out there consuming PSL data, sometimes dynamically downloaded: https://github.com/toddsundsted/Public-Suffix-List Gerv
The current state of https://wiki.mozilla.org/Public_Suffix_List/Use_Cases suggests to me that there are actually only two ways of splitting the list - the top two uses cases and the bottom two. pkasting: would you concur? Or have we missed something? Gerv
All three of the questions I put on that page were designed to highlight ways in which that kind of a split might be wrong/insufficient. I also think if we're planning to use a richer description format for the list, we might as well just annotate all these sorts of capabilities separately anyway. Then it doesn't matter whether certain groups of them go together.
Oops, sorry, I didn't see those questions at all! You ask: "Are there situations where one organization should be able to read/write another organization's cookies?" The historical web security answer has always been "no", or at least they have tried hard to make it so. That was the purpose of the PSL - to make it "no" more often when it had previously accidentally been "yes". If wibble.com is a standard ICANN registration, and foo.wibble.com and wibble.com want to share cookies, then the obvious thing is for wibble.com not to be on the PSL at all. Then the normal, wanted behaviour is observed. The "history sorting" case is: how, in history, do I sort the following: www.wibble.co.uk wibble.co.uk foo.wibble.co.uk george.co.uk A strict alphanumeric sort leads to: foo.wibble.co.uk paul.co.uk wibble.co.uk www.wibble.co.uk when a better sort would probably be: paul.co.uk wibble.co.uk www.wibble.co.uk foo.wibble.co.uk (putting all wibble.co.uk together, and sorting www. and plain next to each other, and first. Or even combining all the wibble.co.uk into one sub-section.) But that's only possible using the PSL. Regarding "buy-your-own-TLD", let's invent a new tld, ".brand". Into which boxes would you put it on the right of the table? Gerv
(In reply to Gervase Markham [:gerv] from comment #20) > You ask: "Are there situations where one organization should be able to > read/write another organization's cookies?" > > The historical web security answer has always been "no", or at least they > have tried hard to make it so. That was the purpose of the PSL - to make it > "no" more often when it had previously accidentally been "yes". If > wibble.com is a standard ICANN registration, and foo.wibble.com and > wibble.com want to share cookies, then the obvious thing is for wibble.com > not to be on the PSL at all. Then the normal, wanted behaviour is observed. Sure, that's the classic answer, but part of the reason for asking the question is to ferret out whether maybe we want to allow more flexibility here. Maybe someone wants some of the non-cookie "benefits" of the PSL without the cookie protections? Seems unlikely, but we should be confident of our "no" before proceeding. I was also referring to the one-way case. Maybe appspot.com wants to be able to read/write cookies on *.appspot.com without allowing reading and writing on itself in return? Again, maybe we can discount this. > The "history sorting" case is: how, in history, do I sort the following: > > www.wibble.co.uk > wibble.co.uk > foo.wibble.co.uk > george.co.uk > > A strict alphanumeric sort leads to: > > foo.wibble.co.uk > paul.co.uk > wibble.co.uk > www.wibble.co.uk > > when a better sort would probably be: > > paul.co.uk > wibble.co.uk > www.wibble.co.uk > foo.wibble.co.uk It seems like just doing an alphabetical sort in reverse-hierarchical order (i.e. last component first) basically gets us what we want here, without any need to understand where the eTLD boundaries lie. Maybe I'm missing something. Since I don't think I understand this, I'm going to ignore it below. > Regarding "buy-your-own-TLD", let's invent a new tld, ".brand". Into which > boxes would you put it on the right of the table? Let's separate the buy-your-own-TLD samples into two groups -- "more traditional" TLDs, e.g. some group buying ".awesome" and opening it for general registrations; and single-entity TLDs, as I imagine you were getting at with ".brand". The first group pretty much acts exactly like either "co.uk" or "appspot.com" does today so we need not worry further about it. The ".brand" case is more interesting. If "brand" alone is navigable, then I actually think we don't need to have ".brand" listed in any way, because I think it falls into the "not in list" column of all categories: it's navigable, we want to allow cookies on it to be readable/writeable from subdomains, and we want to allow *.brand SSL certs. (Although I guess I could imagine companies not wanting some of the last two things, although I'm not sure why.) If ".brand" isn't navigable, then it seems like we have a case just like ".com".
W.r.t. your first point, we can ask some of our security experts, but they've invented things like CORS for cross-origin resource sharing. Breaking the same-origin model for a few select sites doesn't sound to me like something they'd welcome. As for appspot.com reading and writing cookies for foo.appspot.com, I'm pretty certain this is currently a no-no, and always has been. appspot.com will only get cookies set by foo.appspot.com if foo.appspot.com specifically sets them as "appspot.com" cookies. And then, of course, they aren't its cookies any more. History sorting: maybe a better example is "I want to group all related sites in one 'folder'". So I'd want to group foo.wibble.co.uk with bar.wibble.co.uk, but not group wibble.co.uk with george.co.uk. This requires the PSL, in order to know the structure of .uk. It's basically the same answers as the cookie question, AIUI. I agree .awesome is like .com, and is not an interesting case. The thing about .brand is that if you don't list it at all, then (at least in our implementation) it behaves like it's listed alone - i.e. a single "brand" entry. IOW, historically, there has been no difference between a TLD not present, and one which is present as just itself. This is for compatibility with correct domain display on intranets (foo.localhost, bar.somecompany) and new TLDs which, like .awesome, just open for general registration. But this means that there's no way currently to say ".brand should be able to set cookies for itself". Now we might want to say "yes, and we are happy for it to stay that way, as a discouragement to sites to make 'brand' navigable". I might well be of that opinion. Gerv
The cookie reading/writing interactions are worrisomely complex. I don't think foo.appspot.com is allowed to set a domain cookie on appspot.com at all, but I'm worried that maybe appspot.com might be allowed to set a cookie on foo.appspot.com, and I'm also worried that maybe appspot.com might set an appspot.com domain cookie that then is served to foo.appspot.com as an effectively "read-only" cookie. I'm going to ask Adam Barth and one of our other cookie guys to comment on this bug because I want the right people to have thought about cookie ramifications carefully.
on the brand vs www.brand dialog... from what I understand of at least how the omnibox works (this is what I believe is the 'traffic logic' that routes things either to search or DNS), the presence of a dot in the string (even at the rightmost position, ie "brand.") ships the string to DNS resolution process (hosts file, resolver, etc.) and its absence treats the string as a search string. In this article on CircleID, the concept is discussed by Paul Vixie of ISC (Bind/DHCP): http://www.circleid.com/posts/20110620_domain_names_without_dots/ he identifies some of the behavior we might see from resolvers. There are some brands out there that would hope to have it be the case that simply typing their brand as a TLD, with or without www or a trailing dot, will get the user to their website. It is not clear that an A record for the dotless zone will be allowable, and there is mixed thought in the DNS community about how / if it will work and be allowable.
(In reply to Gervase Markham [:gerv] from comment #20) > The "history sorting" case is: how, in history, do I sort the following: > > www.wibble.co.uk > wibble.co.uk > foo.wibble.co.uk > george.co.uk > > A strict alphanumeric sort leads to: > > foo.wibble.co.uk > paul.co.uk > wibble.co.uk > www.wibble.co.uk > > when a better sort would probably be: > > paul.co.uk > wibble.co.uk > www.wibble.co.uk > foo.wibble.co.uk > > (putting all wibble.co.uk together, and sorting www. and plain next to each > other, and first. Or even combining all the wibble.co.uk into one > sub-section.) But that's only possible using the PSL. > > Regarding "buy-your-own-TLD", let's invent a new tld, ".brand". Into which > boxes would you put it on the right of the table? > The third option is what I am hearing most frequently from brands who are intending to apply.
(In reply to Jothan Frakes from comment #24) > on the brand vs www.brand dialog... from what I understand of at least how > the omnibox works (this is what I believe is the 'traffic logic' that routes > things either to search or DNS), the presence of a dot in the string (even > at the rightmost position, ie "brand.") ships the string to DNS resolution > process (hosts file, resolver, etc.) and its absence treats the string as a > search string. If you are speaking of the Chrome omnibox you're definitely misinformed. If the Firefox AwesomeBar, I don't know. > It is not clear that an A record for the dotless zone will be allowable, and > there is mixed thought in the DNS community about how / if it will work and > be allowable. If you have any voice in this discussion please convey the strongest possible sentiments on the part of Chrome that this not be allowable. It will cause major problems with corporate intranets (due to name conflicts) and with search/navigation confusion in the Chrome omnibox for all users. (e.g. Many one-word searches will start showing infobars to users which is a horrible UX.) None of our options for dealing with this are good, so it would be far better for the problem not to occur.
(In reply to Peter Kasting from comment #21) > > when a better sort would probably be: > > > > paul.co.uk > > wibble.co.uk > > www.wibble.co.uk > > foo.wibble.co.uk > > It seems like just doing an alphabetical sort in reverse-hierarchical order > (i.e. last component first) basically gets us what we want here, without any > need to understand where the eTLD boundaries lie. Maybe I'm missing > something. Since the non-TLD piece is the most meaningful one, it's plausible that you'd want to use that for your primary sort. Users may not remember or care whether a site was in .co.uk or .com (or .net or .org), so sort wibble.com and wibble.org near wibble.co.uk, rather than putting all .co.uk together. paul.co.uk wibble.co.uk foo.wibble.co.uk www.wibble.co.uk wibble.com www.wibble.com wibble.org zephyr.co.uk
Let's not get side-tracked into the correct design of history UI; the point is just that some designs require a PSL-powered awareness of domain structure :-) Jothan: that article from Paul Vixie is helpful and comforting. People at the top of ICANN are saying loudly that brand owners should not expect this to work. Which is good. So I think we can proceed on the basis that we do not have to 'make it work', and can even do things which make it work even less. If you know of brand owners who still think that http://brand/ will (as opposed to 'should') work, can you ask them on what technical basis they hold out this hope? Gerv
Hey Peter: has Adam Barth had a chance to look at this? Gerv
I think he only took a quick glance so far and sent me a comment like "sounds like we're trying to make the list serve too many masters". I thought he'd intended a longer look.
Looking now.
There's a bunch of issues in this thread. Let me try to address them separately: 1) The cookie-related threat that having appspot.com on the PSL mitigates is having foo.appspot.com set an appspot.com cookie, which would affect bar.appspot.com. 2) Having a *.appspot.com certificate is beneficial for security because that allows AppEngine to offer HTTPS for all it's customers for free. 3) Having a *.com certificate is detrimental for security because a bad actor that could obtain such a certificate could many attacks. The design tension we're running up against is that we're trying to make the PSL serve too many masters. For example, the set of domains that should be able to store cookies is different from the set of domains that should be able to have * certificates. IMHO, we should evolve the PSL into a more general registry of well-known domain names. That probably means changing the format to JSON or XML so we can store more attributes in the future. For example, we might want to fold the preloaded HSTS domain list into this same list. Currently, the set of attributes that we seem to want are: 1) NoCookies 2) NoStarCertificates 3) UseStrictTransportSecurity 4) (maybe) NonNavigable (I'm also inclined to deal with the "brand" issue later. I'm hoping that if I ignore it long enough it will go away.)
We might need to be clearer with that "no cookies" thing. For example, appspot.com should probably still be allowed to set cookies on itself. Maybe that flag should mean "subdomains should not be able to set cookies on this domain". Otherwise I think this generally lines up with the direction we've been going above. I like the addition of the HSTS list.
IMHO, NoCookies should mean no cookies. appspot.com shouldn't be able to set cookies for itself. We can arm wrestle the folks who run that service if necessary.
Really? Why? I mean, it prevents problems where foo.appspot.com reads the appspot.com domain cookies, but we can prevent that anyway, by making the write barrier between foo.appspot.com and appspot.com be a read/write barrier.
> Really? Why? Certainly we wouldn't want appspot.com to be able to set domain cookie (i.e., cookies that affect foo.appspot.com). If it wanted to do that, then it's really operating as a cohesive entity and shouldn't be on this list. So, the question boils down to why we shouldn't let appspot.com set host-only cookies. My main motivation there is that Internet Explorer doesn't support host-only cookies. If we want to pick a design that IE will be able to implement, we shouldn't pick a design point that relies on the existence of host-only cookies to work. Now, I have hope that IE will update it's cookie processing to follow RFC 6265 (including implementing host-only cookies). If/when they do, we might want to reconsider and loosen the restriction (which is much easier than tightening a restriction).
I would have instead answered that we should allow appspot.com to set domain and host cookies just as we would allow any other site to do, and that they simply should not be sent to or writeable from subdomains. Yes, domain cookies are kind of stupid in this situation, but maybe the site is using them because IE doesn't support host cookies like you mentioned. Or something. It's clear to me from some of the sites we've seen in the real world -- like irkutsk.ru that Gerv brought up privately off-bug -- that we'll probably have sites in this position that want to set cookies, and I don't think we should ban that.
I'm not sure I understand why a domain would want to set cookies for its subdomains but would want to prevent those subdomains from setting cookies for itself. Can you help me understand?
It doesn't want to set cookies for its subdomains. It wants to set cookies for itself (only). The issue is that your suggestions don't allow it to set any cookies at all. Take irkutsk.ru that I mentioned in comment 37. This is the domain for a particular territory of Russia. At irkutsk.ru itself is what appears to be a news site about the territory. And then irkutsk.ru is also treated as an eTLD for other sites which are about entities located in that territory, e.g. schools. So irkutsk.ru on its own is some legit site with content, as are *.irkutsk.ru; all of them may wish to set cookies; and none of them should know about each others' cookies. We can implement this by making a cookie firewall between irkutsk.ru and its subdomains; neither side can read or write across the firewall. If instead we disallow irkutsk.ru from setting any kind of cookies, the subdomain sites still work fine, but irkutsk.ru itself has to do whatever it wants to do without using any cookies. This isn't unique to this site; in general all the cases we're talking about are ones where there's some legit site that also acts as a TLD for other legit sites and we don't want cookie contamination among them. But that doesn't mean the "parent" site has no content and can get by without any cookies at all.
One more point of clarity just in case. When I said "domain cookies" in comment 37, I didn't mean "domain" in the sense of the normal practical effect (where subdomains see the cookie values), I just meant "domain" in the technical sense of asking the cookie subsystem to set a cookie for ".host.TLD". The actual effect of the "domain cookies" I'm suggesting would be identical to the effect of host cookies due to the "cookie firewall".
That seems fine if you can convince IE to implement it. If you can't, the only interoperable path I see is to ban cookie altogether on these domains.
If IE doesn't implement it, then there is no interoperable path, is there? They'll allow appspot.com and its subdomains to read and write each others' cookies regardless, won't they?
My point is that it's much more likely they'll implement the NoCookies policy because that's what they already do for their public suffix list. To interoperate, we just need to agree on the list of domains that are on the list. For your proposal, they'll also need to adopt the cookie firewall mechanism, which isn't consistent with their existing approach to cookies (because it requires the concept of host-only cookies, which doesn't exist in IE). Now, that's not to say they won't adopt your proposal, but before heading down that road, we should probably check with them so we don't have divergent security models.
OK. Do you know any stakeholders that we could ask to come comment here?
I bet Jacob Rossi could route us to the right folks. If you don't have his email address, ask me off-bug and I can give it to you. (He's pretty active in the W3C, so you probably have his email already.)
(I think I wrote this above, but I'm hopeful that MS will implement RFC 6265, which includes host only cookies. If they're planning to do that, then this might be an easy sell.)
I have a couple of things to say, but this discussion is rapidly expanding beyond the scope of a bug. Unfortunately, there is not a dedicated forum/mailing list for the PSL. Would you gentlemen care to step into mozilla.dev.tech.network? https://www.mozilla.org/about/forums/#dev-tech-network Gerv
This seems to have got bogged down. https://wiki.mozilla.org/Public_Suffix_List/Use_Cases seems to suggest that there's a lot of value in a simple split between 'registry' public suffixes and 'owner-requested' public suffixes, so let's start with that. I've filed bug 712640. AIUI, Chrome will be able to solve its problem, as recorded in this bug, by internally treating the two groups of suffixes differently. Gerv
Depends on: 712640
The list has been split in two. Gerv
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.