Closed Bug 1124625 Opened 10 years ago Closed 10 years ago

Add platform.sh to the public suffix list

Categories

(Core Graveyard :: Networking: Domain Lists, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: damien, Assigned: damien)

Details

Attachments

(1 file)

Attached patch platform.sh.diffSplinter Review
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36 Expected results: Platform.sh is a hybrid development/production PaaS that allows you to branch / merge the infrastructure as well as the code. Our client domains are hosted below `<region>.platform.sh`, where <cluster> is the name of the region (eu, us, ap, etc.). Would you mind adding the following patch to the public suffix list? Thanks in advance.
+ This was placed in the appropriate section
Attachment #8553046 - Attachment is patch: true
Attachment #8553046 - Attachment mime type: text/x-patch → text/plain
(In reply to Jothan Frakes from comment #1) > + This was placed in the appropriate section But not in the correct (alphabetical) order :-) Gerv
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Thanks everyone! Sorry about the order, it was not immediately obvious that entries in the PRIVATE DOMAINS section were ordered by company name. Damien
This is actually caused issues for Chrome, based on our interpretation of the PSL Algorithm ( https://publicsuffix.org/list/ ). If we're wrong, it would be great to add clarity to the algorithm and I can fix our implementation. We can boil down the platform.sh rules as follows sh *.platform.sh This is conceptually similar to the .jp structure of jp *.kobe.jp The way Chrome has implemented this, the public suffix for "foo.kobe.jp" == null (*.kobe.jp == registry), but also that kobe.jp == null This is because we begin collecting rules, starting with the rightmost label, and find "jp"/"sh" (all labels in the rule match, extra labels in the domain), but we ALSO find the *.platform.sh/*.kobe.jp rule (all labels in the rule match *OR* the label in the rule is *), thus pick the longest rule as the prevailing rule, thus treat platform.sh/kobe.jp as a public suffix. Now, for JP, this is incidentally correct, and none of the unit tests in http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1 seem to cover this case. It also aligns with the previous discussion about gaps in registerable/public suffices. However, it would seem from the platform.sh inclusion that this may not be correct - that is, that the public suffix of platform.sh should be .sh For Chrome, I'm going to exclude platform.sh from the current update of the PSL, and then we can follow-up on here whether Chrome should change it's treatment of the algorithm, or whether the <region>/<cluster>'s of platform.sh should be explicitly listed (e.g. as with Amazon). Unlike AppSpot (which serves no content on the domain), it appears platform.sh is trying to host real content (given that it's presently setting cookies for platform.sh for analytics), so I'm confused as to the correct behaviour.
Flags: needinfo?(jothan)
Flags: needinfo?(gerv)
Ryan: Damien: a) is it your intent that someone hosting at foo.platform.sh should not be able to set cookies for platform.sh? b) is it your intent that platform.sh should not be able to set cookies for foo.platform.sh? (In reply to Ryan Sleevi from comment #6) > This is because we begin collecting rules, starting with the rightmost > label, and find "jp"/"sh" (all labels in the rule match, extra labels in the > domain), but we ALSO find the *.platform.sh/*.kobe.jp rule (all labels in > the rule match *OR* the label in the rule is *), thus pick the longest rule > as the prevailing rule, thus treat platform.sh/kobe.jp as a public suffix. I'm not entirely following you there, but if you are saying that in Chrome's implementation, the rule "*.platform.sh" matches the domain of the URL "http://platform.sh", then I think you have got it wrong. "*" does not match "non-existent". Gerv
Flags: needinfo?(gerv)
Writing to nudge requester on needinfo
Flags: needinfo?(jothan)
Sorry for delay here. Gervase: > a) is it your intent that someone hosting at foo.platform.sh should not be able to set cookies for platform.sh? Our intent is that someone hosting at `foo.bar.platform.sh` should not be able to set cookies for `bar.platform.sh`, `platform.sh` or `sh`. From my reading of the specification, for the set of rules: sh *.platform.sh I would expect: public_suffix(foo.bar.platform.sh) == "bar.platform.sh" public_suffix(bar.platform.sh) == "sh" public_suffix(platform.sh) == "sh" > b) is it your intent that platform.sh should not be able to set cookies for foo.platform.sh? I do not care either way. We currently set cookies on `platform.sh`, but they are meant for this domain alone. My reading of the specification is that this case is out of scope anyway? Damien
(In reply to Damien Tournoud from comment #9) > Our intent is that someone hosting at `foo.bar.platform.sh` should not be > able to set cookies for `bar.platform.sh`, `platform.sh` or `sh`. Right, yes. > From my reading of the specification, for the set of rules: > > sh > *.platform.sh > > I would expect: > > public_suffix(foo.bar.platform.sh) == "bar.platform.sh" > public_suffix(bar.platform.sh) == "sh" No - public_suffix(bar.platform.sh) == "bar.platform.sh". > public_suffix(platform.sh) == "sh" I need Ryan's reaction to comment 7, and to the above, which I think differs from what he says Chrome does. But I do think public_suffix(bar.platform.sh) == "bar.platform.sh" falls clearly out of the documented algorithm at https://publicsuffix.org/list/. Gerv
Gervase: > No - public_suffix(bar.platform.sh) == "bar.platform.sh". Correct, my bad, that was a typo. > I need Ryan's reaction to comment 7, and to the above, which I think differs from what he says Chrome does. But I do think public_suffix(bar.platform.sh) == "bar.platform.sh" falls clearly out of the documented algorithm at https://publicsuffix.org/list/. Hm? I assume you mean public_suffix(platform.sh) == "platform.sh"? Damien
(In reply to Damien Tournoud from comment #11) > Hm? I assume you mean public_suffix(platform.sh) == "platform.sh"? No; I was commenting on Ryan's statement that "The way Chrome has implemented this, the public suffix for "foo.kobe.jp" == null." I think it should be "foo.kobe.jp". Gerv
Apologies, it was a typo Using the test-psl notation from http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1 checkPublicSuffix('jp', null) // because jp is a public suffix checkPublicSuffix('foo.kobe.jp', null) // because *.kobe.jp is a public suffix checkPublicSuffix('bar.foo.kobe.jp', 'foo.kobe.jp') // because *.kobe.jp is a public suffix The question was/is on 1. checkPublicSuffix('kobe.jp', null) // because *.kobe.jp is a public suffix and * matching empty vs 2. checkPublicSuffix('kobe.jp', 'jp') // because *.kobe.jp only applies to subdomains The interpretation of 1 comes from the fact that we have multiple ccTLDs with only wildcard entries - examples include *.pg, *.np, *.ni, *.ke, *.kh, etc - without listing the respective ccTLD as a PSL. That's what originally lead to this behaviour where *.ccTLD also implies ccTLD is a PSL (in line with Gerv's remarks that the PSL also contains the ICANN root zone database) Equally, from the handling of the JP wildcards, the *.prefecture.jp rules don't explicitly list prefecture.jp as public suffices, even though they are, per the docs. Thus it seemed to infer a shorthand consistent with the interpretation of *.domain also implying domain was a PSL. platform.sh is the first to deviate from this, AFAICT, hence why it broke Chrome.
(In reply to Ryan Sleevi from comment #13) > The interpretation of 1 comes from the fact that we have multiple ccTLDs > with only wildcard entries - examples include *.pg, *.np, *.ni, *.ke, *.kh, > etc - without listing the respective ccTLD as a PSL. That's what originally > lead to this behaviour where *.ccTLD also implies ccTLD is a PSL (in line > with Gerv's remarks that the PSL also contains the ICANN root zone database) Hmm. When I said that, I don't think I was intending what you think. Or, to put it another way, I hadn't considered e.g. *.pg in my thinking. If someone goes to http://foo.pg, then it won't match any rule in the PSL, and so it'll fall back to the default rule, which is "*" according to the algorithm definition, so "foo.pg" will be highlighted in the URL bar as PublicSuffix+1. In the PSL, we use the lack of a "pg" rule to indicate that as far as we know, the registry does not allow 2nd level registration. However, as long as people are following the algorithm, adding one should not change behaviour. > Equally, from the handling of the JP wildcards, the *.prefecture.jp rules > don't explicitly list prefecture.jp as public suffices, even though they > are, per the docs. You mean the .jp docs? I don't doubt you, but reference please? > Thus it seemed to infer a shorthand consistent with the interpretation of > *.domain also implying domain was a PSL. platform.sh is the first to deviate > from this, AFAICT, hence why it broke Chrome. Thing is, I think the algorithm clearly states the opposite (i.e. it's not ambiguous, nor is this a loophole). Say we have a PSL of: foo *.bar.foo And you are running checkPublicSuffix() on "bar.foo". Let's look at the matching algorithm on https://publicsuffix.org/list/: "A domain is said to match a rule if, when the domain and rule are both split, and one compares the labels from the rule to the labels from the domain, beginning at the right hand end, one finds that for every pair either they are identical, or that the label from the rule is "*" (star). The domain may legitimately have labels remaining at the end of this matching process." So bar.foo matches foo, but it does not match *.bar.foo. The _rule_ may not have labels remaining at the end of the matching process, only the domain. If we try and match this domain to this rule, we match foo with foo and bar with bar, but then we stop because there is no "pair" in "for every pair" any more. The domain has run out of labels. And if bar.foo matches foo, then its public suffix is foo. Gerv
Having said all that, what do we _want_? I think Ryan's option 2 makes the most sense, and it's what's documented (if you agree with my analysis above). If we decide that this is the right thing, what bad effects would that have, if any? AIUI, this would mean Chrome changing how it interprets the current .jp rules. What badness would result? Would we need to update the .jp rules to specifically add more suffixes, to retain the old Chrome behaviour? Gerv
(In reply to Gervase Markham [:gerv] from comment #14) > You mean the .jp docs? I don't doubt you, but reference please? http://jprs.jp/doc/rule/saisoku-1.html > "A domain is said to match a rule if, when the domain and rule are both > split, and one compares the labels from the rule to the labels from the > domain, beginning at the right hand end, one finds that for every pair > either they are identical, or that the label from the rule is "*" (star). > The domain may legitimately have labels remaining at the end of this > matching process." > > So bar.foo matches foo, but it does not match *.bar.foo. The _rule_ may not > have labels remaining at the end of the matching process, only the domain. > If we try and match this domain to this rule, we match foo with foo and bar > with bar, but then we stop because there is no "pair" in "for every pair" > any more. The domain has run out of labels. Eh, I don't mean to haggle on the wording, but it's the interpretation of "for every pair either they are identical, or that the label from the rule is "*" (star)". > And if bar.foo matches foo, then its public suffix is foo. I think option 2 is fine to go with, and filed http://crbug.com/459802 for it, but in our previous discussion about whether or not the PSL should explicitly list domains or whether it should default to "*", we decided it should explicitly list domains (Recall - this was because a team at Google, like teams at other organizations, assumed the PSL _did_ explicitly list domains). So for the existing wildcard domains, it does seem correct that we should be explicit that, say, pg, np, etc are all explicit ccTLDs that even if they don't allow second level registrations, are _themselves_ on the PSL. Ditto the Japanese prefectures.
(In reply to Ryan Sleevi from comment #16) > Eh, I don't mean to haggle on the wording, but it's the interpretation of > "for every pair either they are identical, or that the label from the rule > is "*" (star)". Well, I think it's clear ;-), but if you want to propose even clearer wording I'd be happy to update the site. > I think option 2 is fine to go with, and filed http://crbug.com/459802 for > it, but in our previous discussion about whether or not the PSL should > explicitly list domains or whether it should default to "*", we decided it > should explicitly list domains (Recall - this was because a team at Google, > like teams at other organizations, assumed the PSL _did_ explicitly list > domains). We have been moving in that direction as people provide patches, but there has been no concerted effort. For example, it may be that *.pg could in fact be an explicit list, but no-one has yet gone out to work out what it is. > So for the existing wildcard domains, it does seem correct that we should be > explicit that, say, pg, np, etc are all explicit ccTLDs that even if they > don't allow second level registrations, are _themselves_ on the PSL. Ditto > the Japanese prefectures. There's a slight English parsing problem in that sentence, but I think you are arguing that if the PSL currently says: *.pg we should instead say pg *.pg ? Gerv
(In reply to Gervase Markham [:gerv] from comment #17) > (In reply to Ryan Sleevi from comment #16) > > Eh, I don't mean to haggle on the wording, but it's the interpretation of > > "for every pair either they are identical, or that the label from the rule > > is "*" (star)". > > Well, I think it's clear ;-), but if you want to propose even clearer > wording I'd be happy to update the site. Thinking like a programmer, rather than someone good at effectively communicating: - A domain or rule can be split into a list of labels using the separator "." (dot). The separator is not part of any of the labels. Empty labels are not permitted, meaning that leading and trailing dots are ignored. - A domain is said to match a rule if and only if all of the following conditions are met: - When the domain and rule are split into corresponding labels, that the domain contains as many or more labels than the rule. - Beginning with the right most labels of both the domain and the rule, and continuing for all labels in the rule, one finds that for every pair, either they are identical, or that the label from the rule is "*". The other part of ambiguity trying to correct here is whether "state.*.us" is a valid rule. Put differently, is "*" in a rule a terminator for rule processing or not. As implemented in Chrome (and several implementations I've examined), it is seen as a terminal, but I suspect based on this discussion, your intent is that it is not. Perhaps it would be better to write this up IETF/W3C style of a proper spec algorithm where you can implement it however, provided that it implements the same observable behaviours. > There's a slight English parsing problem in that sentence, but I think you > are arguing that if the PSL currently says: > > *.pg > > we should instead say > > pg > *.pg > Yes. If we are to change Chrome's behaviour, it would be good for the PSL to be explicit for all of the *.x cases that both .x and *.x are suffices (at least, this is true for all the domains except for platform.sh)
I've updated the website with your new text - thank you :-) (In reply to Ryan Sleevi from comment #18) > The other part of ambiguity trying to correct here is whether "state.*.us" > is a valid rule. Put differently, is "*" in a rule a terminator for rule > processing or not. As implemented in Chrome (and several implementations > I've examined), it is seen as a terminal, but I suspect based on this > discussion, your intent is that it is not. We experimented with this in the early days, and I think we came to the conclusion that "state.*.us" was not a valid rule, and so we have no rules of that type. So my intent then was that * can only appear at the end of a rule. However, the algorithm as written permits this. We also have no rules of the type "*.*.foo", which is also permitted by the algorithm. Should we permit these things, or change the text further to disallow them? > Perhaps it would be better to write this up IETF/W3C style of a proper spec > algorithm where you can implement it however, provided that it implements > the same observable behaviours. The latter part of that sentence was always the intent. The more formal language we have now was a result of pressure from Hixie, who wanted to be able to reference something from WHATWG documents. If we want to go to a higher level of formality, I'd gladly accept a rewrite from someone who "speaks" that style. > Yes. If we are to change Chrome's behaviour, it would be good for the PSL to > be explicit for all of the *.x cases that both .x and *.x are suffices (at > least, this is true for all the domains except for platform.sh) Bug 1139842 filed. Gerv
> "state.*.us" obviously, this particular example rule cannot be included, otherwise "state.delicio.us", a private domain, becomes a public suffix without owner's knowledge. A "*" should NOT represent a label position that's open for public registration, unless we want to give the public an easy way to register a public suffix - for example, have a rule "*.pub-suff.org", and allow anyone to register a "jane.pub-suff.org" to own a public suffix. This use case seems very unlikely. And even in that case, the "*" only appears at the leftmost position; it would be inappropriate to dictate the purpose of a subdomain of a private domain. If a "*" can only represent a label position tightly controlled by an authority, it should map to only a limited list of actual domains, therefore it's not too laborious to just enumerate the actual domains. We can have "*" at the leftmost position as an economical measure; but it seems quite pointless to have "*" elsewhere.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: