Closed Bug 1139842 Opened 9 years ago Closed 5 years ago

Add "foo" rule for all occurrences of "*.foo" in PSL

Categories

(Core Graveyard :: Networking: Domain Lists, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gerv, Assigned: gerv)

Details

(Whiteboard: [necko-would-take])

Attachments

(1 file)

See discussion in bug 1124625. Wherever the PSL says:

*.foo

we should change it to:

foo
*.foo

This makes the intent clear for resolvable domains like http://wibble.foo, even though in most cases such domains are not supposed to exist.

Gerv
Attached patch Patch v.1Splinter Review
Here's a patch. But I think I've realised why we haven't done this before.

Mozilla's script for preparing the PSL for use has a problem; when checking for duplicate lines, it only checks the domain part (ignoring ! and *), so it sees "nom.br" as the same as "*.nom.br". I fixed the script (included in the patch) but it seems to my untrained eye as if the original script reflects the underlying data model inside Firefox:

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsEffectiveTLDService.cpp#31

So if we make this change, our PSL-parsing code will break :-(

Ryan: would you concur with my assessment of our code?

Gerv
Assignee: nobody → gerv
Status: NEW → ASSIGNED
Attachment #8598651 - Flags: feedback?(ryan.sleevi)
(In reply to Gervase Markham [:gerv] from comment #1)
> Ryan: would you concur with my assessment of our code?

Right, it looks like Mozilla shares the same data model limitation that we have to fix for Chrome. For Chrome, it's an issue because we allow the caller to make a distinction between ICANN domains vs. PRIVATE domains.

I believe the line you meant to reference is http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/nsEffectiveTLDService.cpp#263 , as that assumes only a single entry can be returned. That is, you store "*.foo" as "foo", with IsWild(), and thus can't distinguish from "foo" as a non-wildcard entry.
Attachment #8598651 - Flags: feedback?(ryan.sleevi)
Ryan: so what do we do? Do we try and update Firefox, Chrome and any other consumer of the code which barfs when we do this, or do we simply clarify the algorithm to specify the appropriate behaviour in this case (and update code to match) instead of encoding it directly?

Gerv
Flags: needinfo?(ryan.sleevi)
(In reply to Gervase Markham [:gerv] from comment #3)
> Ryan: so what do we do? Do we try and update Firefox, Chrome and any other
> consumer of the code which barfs when we do this, or do we simply clarify
> the algorithm to specify the appropriate behaviour in this case (and update
> code to match) instead of encoding it directly?

Well, I mean, we only encountered this issue in chrome because of platform.sh, which our code triggered on, and thus revealed the bug in Chrome code and the underlying assumptions.

My gut reaction is that changing the algorithm, no one will notice. That's because there's not really a central list for people who consume the PSL to know about updates, changes, etc. I'm not necessarily arguing there should be - that's overhead that may not be worthwhile - but just that, absent that, we've got three options.

1) Do nothing at all, and just accept the de facto implementation as the new norm, despite the text suggesting otherwise. At best, we update the text to reflect what clients implemented, not what they should implement.
2) Fix Firefox and Chrome, but don't change the PSL. Other applications may or may not realize they've misimplemented, and may or may not fix. The risk here is that applications won't realize they're broken, and keep the existing behaviour.
3) Fix it in the PSL. Firefox and Chrome will be blocked on updating the PSL until they update their behaviour. Hopefully other applications will break as well, in which case, they too will fix their implementations before updating the PSL. The risk here is that applications won't realize they're broken, and thus implement some new behaviour.

It's unclear to me what the "worst case" for 3 would be - that is, what problems would be risked. If we hadn't added the guards to Chrome, the behaviour that Chrome would have had is, well, taking the effective policy of the 'last' item encountered, which would be the wildcarded behaviour, which would be indistinguishable from the present behaviour. It _looks_ like Firefox would have done the same. Is that enough to generalize? I don't know.

My gut leans on 3, 1, 2 in order of preference; I don't think 1 is that palatable (ergo, neither is 2), but at least it gives us defined behaviours.
Flags: needinfo?(ryan.sleevi)
(In reply to Ryan Sleevi from comment #4)
> (In reply to Gervase Markham [:gerv] from comment #3)
> > Ryan: so what do we do? Do we try and update Firefox, Chrome and any other
> > consumer of the code which barfs when we do this, or do we simply clarify
> > the algorithm to specify the appropriate behaviour in this case (and update
> > code to match) instead of encoding it directly?
> 
> Well, I mean, we only encountered this issue in chrome because of
> platform.sh, which our code triggered on, and thus revealed the bug in
> Chrome code and the underlying assumptions.
> 
> My gut reaction is that changing the algorithm, no one will notice. That's
> because there's not really a central list for people who consume the PSL to
> know about updates, changes, etc. I'm not necessarily arguing there should
> be - that's overhead that may not be worthwhile - but just that, absent
> that, we've got three options.

Let's make sure we are being absolutely clear, here. For the set of rules

    xyz
    *.wibble.xyz

A) what the code currently does ("the de facto implementation") is:

    public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
    public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
    public_suffix(wibble.xyz)         == "wibble.xyz"
    public_suffix(xyz)                == "xyz"

and B) what the spec says is:

    public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
    public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
    public_suffix(wibble.xyz)         == "xyz"             <-- different
    public_suffix(xyz)                == "xyz"

So your options, in these terms, are:

1) endorsing A) and updating the website to say so
2) fixing the code to do B) but not changing the PSL 
3) fixing the code to do B) and changing the PSL to be explicit that B) is what's meant.

Is that a correct summary of the situation?

> It's unclear to me what the "worst case" for 3 would be - that is, what
> problems would be risked. If we hadn't added the guards to Chrome, the
> behaviour that Chrome would have had is, well, taking the effective policy
> of the 'last' item encountered, which would be the wildcarded behaviour,
> which would be indistinguishable from the present behaviour. It _looks_ like
> Firefox would have done the same.

Firefox's prepare_tlds.py script barfs on this situation, so it would never have happened.

Gerv
(In reply to Gervase Markham [:gerv] from comment #5)

> A) what the code currently does ("the de facto implementation") is:
> 
>     public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
>     public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
>     public_suffix(wibble.xyz)         == "wibble.xyz"
>     public_suffix(xyz)                == "xyz"
> 
> and B) what the spec says is:
> 
>     public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
>     public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
>     public_suffix(wibble.xyz)         == "xyz"             <-- different
>     public_suffix(xyz)                == "xyz"
> 
> So your options, in these terms, are:
> 
> 1) endorsing A) and updating the website to say so
> 2) fixing the code to do B) but not changing the PSL 
> 3) fixing the code to do B) and changing the PSL to be explicit that B) is
> what's meant.
> 
> Is that a correct summary of the situation?

Yes. It's clear that multiple implementations of the PSL have inferred that A) was the correct behaviour, even though B) is what was intended. 

> Firefox's prepare_tlds.py script barfs on this situation, so it would never
> have happened.

The thought experiment was meant to be "If you didn't have prepare_tlds.py, how would you be affected?" Both Chrome and Firefox have pre-parsing phases for the PSL. Other languages and implementations do not. It was to work through the thought experiment of what negative effects we'd cause for those in the latter camp - that is, what's the "worst" that we could imagine happening.
(In reply to Gervase Markham [:gerv] from comment #5)
> (In reply to Ryan Sleevi from comment #4)
> > (In reply to Gervase Markham [:gerv] from comment #3)
> > > Ryan: so what do we do? Do we try and update Firefox, Chrome and any other
> > > consumer of the code which barfs when we do this, or do we simply clarify
> > > the algorithm to specify the appropriate behaviour in this case (and update
> > > code to match) instead of encoding it directly?
> > 

...


> 
> 1) endorsing A) and updating the website to say so
> 2) fixing the code to do B) but not changing the PSL 
> 3) fixing the code to do B) and changing the PSL to be explicit that B) is
> what's meant.
> 
> Is that a correct summary of the situation?
> 
> > It's unclear to me what the "worst case" for 3 would be - that is, what
> > problems would be risked. If we hadn't added the guards to Chrome, the
> > behaviour that Chrome would have had is, well, taking the effective policy
> > of the 'last' item encountered, which would be the wildcarded behaviour,
> > which would be indistinguishable from the present behaviour. It _looks_ like
> > Firefox would have done the same.
> 


#3 sounds like the right thing to do, but that said I want to remind that we are still in the middle of a healthy stream of new TLD additions every week from ICANN's new TLD program, and want to encourage factoring that into the decision making process on these three (or other) options to solve for foo here.

Something that would potentially impact how chrome or Firefox might treat the new TLDs would be something I would advise against unless there was a rapid path to #3 on both (say <30 days).

Option 1 seems to me the lowest cost in impact in the short term, though sub-optimal longer term vs #3.

Could #1 be done for now and then #3 be performed after the new G thundering herd has passed?



> Firefox's prepare_tlds.py script barfs on this situation, so it would never
> have happened.
> 
> Gerv
(In reply to Jothan Frakes from comment #7)
> Something that would potentially impact how chrome or Firefox might treat
> the new TLDs would be something I would advise against unless there was a
> rapid path to #3 on both (say <30 days).

Isn't this true for all bugs? I mean, the stream is never going to stop - it's not like ICANN is going to decide it *stops* liking money and all the endless junkets to exotic locales in the name of inclusiveness.

It's equally not fair to suggest we don't fix a bug, when already the incorporation of said thundering herd is done on an irregular basis - every 4-12 weeks rather than every week.

> Could #1 be done for now and then #3 be performed after the new G thundering
> herd has passed?

How do we quantify this?
(In reply to Ryan Sleevi from comment #8)
> (In reply to Jothan Frakes from comment #7)
> > Something that would potentially impact how chrome or Firefox might treat
> > the new TLDs would be something I would advise against unless there was a
> > rapid path to #3 on both (say <30 days).
> 
> Isn't this true for all bugs? I mean, the stream is never going to stop -
> it's not like ICANN is going to decide it *stops* liking money and all the
> endless junkets to exotic locales in the name of inclusiveness.


I hear your energy on $$$ and junkets.   Anyways, the stream of TLDs in this round is finite.  The 2012 round took 7-8 years to open up, and there is a static list that they are processing.

Another round would come years after, leaving some breathing room where browsers could revise their *.foo = foo logic.


> 
> It's equally not fair to suggest we don't fix a bug, when already the
> incorporation of said thundering herd is done on an irregular basis - every
> 4-12 weeks rather than every week.
> 

I was not suggesting a bug not be fixed, only that the timing of when that happens should be done in a smart way that takes the introduction of the balance of the 2012 round into consideration.

There will be a perceived bug if TLDs that light up in the root don't operate as expected.  Apple Safari experienced an influx of reports that came as a result of a list being too far out of sync impacting how navigation worked.  

 
> > Could #1 be done for now and then #3 be performed after the new G thundering
> > herd has passed?
> 
> How do we quantify this?

Hmm... We could take the CSV from ICANN to see a historical per week pace of TLD contracting and root addition, as well as calculate the time between contracting and root addition.  There were approximately 1400 distinct strings, and we have around 800 in the PSL.  We could then make a "horseshoes and hand grenades" close approximation of completion from that as to when that process might reasonably conclude.
Overall, I'm not terribly keen to delay a bug-fix so that ICANN (and its customers) can make more money sooner.

On a more pragmatic level, the's no question that Option 1 is viable (at least, for Chrome and Firefox, since we're both aware of the issue can can fix). The issue is how do we communicate *that this issue even exists* to all of the downstream consumers. That's really the distinction. By updating the PSL data file, we have our best shot at reaching all consumers to ensure "correct" behaviour. If we just silently fix Chrome and Firefox, we have no practical insight into when an appropriate time to communicate that is, which is why I'd rather force the issue than what will inevitably happen (as with all open-source projects' bugs), which is "out of sight, out of mind"
Yeah, very good points, Ryan.  

I am not advocating commercial interests here as a focus, just that things work as expected.  For IDN TLDs, for community TLDs, and the generics too.

I have worked on projects where work that I felt was crucial got deferred to the void of "Phase 2" (I.e. Never) so I appreciate what you are saying about a delay.

It seems like the other consideration you raised is sorting out a manner of communication to integrators and developers beyond those known about *.sld.tld and how it functions vs how it is documented, or how they might effect a change to align with a change to correct this bug you identified.

To the best of my knowledge we don't have a means of communication or a list of PSL users available.
(In reply to Jothan Frakes from comment #11)
> To the best of my knowledge we don't have a means of communication or a list
> of PSL users available.

Correct. Short of them subscribing to this bug category (like I eventually figured out how to do) :)

The only path we have is "break things in source". A heavy-handed approach, but one which reasonably works. And, to be clear, we're not even sure it would break things. There's certainly a large batch of sites where it wouldn't break anything, and the "non-broken" behaviour would just happen to "do the right thing". Firefox and Chrome would 'break', but they could then be fixed. Or they could do what I did for Chrome with platform.sh, which is temporarily comment it out until I can get around to fixing Chrome.
Would it be correct to say there is consensus for updating the documentation?
(In reply to Ryan Sleevi from comment #6)
> (In reply to Gervase Markham [:gerv] from comment #5)
> 
> > A) what the code currently does ("the de facto implementation") is:
> > 
> >     public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
> >     public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
> >     public_suffix(wibble.xyz)         == "wibble.xyz"
> >     public_suffix(xyz)                == "xyz"
> > 
> > and B) what the spec says is:
> > 
> >     public_suffix(foo.bar.wibble.xyz) == "bar.wibble.xyz"
> >     public_suffix(bar.wibble.xyz)     == "bar.wibble.xyz"
> >     public_suffix(wibble.xyz)         == "xyz"             <-- different
> >     public_suffix(xyz)                == "xyz"
> > 
> > So your options, in these terms, are:
> > 
> > 1) endorsing A) and updating the website to say so
> > 2) fixing the code to do B) but not changing the PSL 
> > 3) fixing the code to do B) and changing the PSL to be explicit that B) is
> > what's meant.
> > 
> > Is that a correct summary of the situation?
> 
> Yes. It's clear that multiple implementations of the PSL have inferred that
> A) was the correct behaviour, even though B) is what was intended. 

Hang on. I've now confused myself. "Updating the PSL" would involve changing:

bar.wibble.xyz
xyz

to

bar.wibble.xyz
wibble.xyz
xyz

Surely that would encode interpretation A) in the PSL, not B)?

Gerv
(In reply to Gervase Markham [:gerv] from comment #14)
> Hang on. I've now confused myself. "Updating the PSL" would involve changing:
> 
> bar.wibble.xyz
> xyz
> 
> to
> 
> bar.wibble.xyz
> wibble.xyz
> xyz
> 
> Surely that would encode interpretation A) in the PSL, not B)?

I'm afraid you have confused yourself, and I'm not quite sure how you got there ;)

Let's use platform.sh as a concrete example for this discussion

Current rules:

ICANN:
sh

PRIVATE:
*.platform.sh

public_suffix(ICANN_ONLY, sh) == "sh"
public_suffix(ICANN_ONLY, foo.sh) == "sh"
public_suffix(ICANN_ONLY, platform.sh) == "sh"
public_suffix(ICANN_ONLY, bar.platform.sh) == "sh"
public_suffix(ALL, sh) == "sh"
public_suffix(ALL, foo.sh) == "sh"
public_suffix(ALL, bar.platform.sh) == "platform.sh"

public_suffix(ALL, platform.sh) == "??"

A) (aka the de facto implementation) == "platform.sh" in Chrome, "sh" in Firefox
B) (aka fixed) == "sh"

Right, so hopefully we're in agreement there

Now let's apply the same logic to a simpler form

public_suffix(bar.foo.nom.br) == "foo.nom.br"
public_suffix(foo.nom.br) == "foo.nom.br"
public_suffix(br) == "br"

public_suffix(nom.br) == "??"

A) [bugged] would yield "nom.br" in Chrome, "br" in Firefox
B) [fixed] would yield.... "nom.br" (if we update the PSL) or "br" (if we do not)

public_suffix(foo.bar.fj) == "bar.fj"
public_suffix(bar.fj) == "bar.fj"
public_suffix(fj) == "??"

A) [bugged] would yield "fj" (no rule to it falls to default *)
B) [fixed] would yield "fj" (explicit rule)

Finally, the last example

public_suffix(foo.bar.kawasaki.jp) == "bar.kawasaki.jp"
public_suffix(bar.kawasaki.jp) == "bar.kawasaki.jp"
public_suffix(jp) == "jp"
public_suffix(kawasaki.jp) == "??"

A) [bugged] would yield "kawasaki.jp" (wildcards treated as match) in Chrome, "jp" in Firefox
B) [fixed] would yield... "kawasaki.jp" (if we update the PSL) or "jp" (if we do not)


So the "update the PSL" isn't about adding wibble.xyz for bar.wibble.xyz rules, but if there's a *.wibble.xyz, AND wibble.xyz is ALSO a public suffix (which, for all of the ICANN domains, it is, but for platform.sh, IT IS NOT), then we'd encode it. So the only result that should change is platform.sh, if we encode things right.

However, changing any of the public suffix rules for the ICANN section will break our scripts, because they can't handle the case where both "*.wibble.xyz" and "wibble.xyz". Firefox 'breaks' for the ICANN domains (which really only comes up with nom.br, the JP prefectures, and sch.uk) but does the right thing for platform.sh - Chrome is the opposite.
I think if I understand the differences here, this is more likely to impact the space in ccTLDs within the ICANN section and subdomains within the PRIVATE section space.

In looking at this, I thought I'd examine the differences in ccTLD entries.

(I am part sounding this so it a] helps me because I learn things about the nuances of how PSL is applied all the time through these types of tickets vs additions and update requests; and b] helps someone less intimate with the PSL and how it works understand the issue we're discussing.)

We used platform.sh as an example from the PRIVATE section, and .SH has a flat entry in the ICANN section.

Santa Helena (.SH) has 'stub zones' (like COM.SH, NET.SH) where third level names are offered underneath by the registry.  They also offer direct second level registrations.

PSL Entry is "SH"

Let's move down to the first occurrence of a *. I could find which was for Bangladesh.  According to their FAQ they do not offer direct second level registrations.  Explicitly, they have COM.BD, EDU.BD, AC.BD, NET.BD, GOV.BD, ORG.BD, and MIL.BD where third level names are possible.

PSL Entry is "*.BD"

Would a private entry for PLATFORM.COM.BD also be problematic?

Should we get these asterisked entries (and others like SH where there are potential 3lds) explicitly listing their directly available subs for strings like this?  (That may not be a trivial project)

Pardon these questions but I'd like to understand it more, really.
(In reply to Jothan Frakes from comment #16)
> Would a private entry for PLATFORM.COM.BD also be problematic?

No, if by "problematic" you mean "would cause Firefox or Chrome to barf".

> Should we get these asterisked entries (and others like SH where there are
> potential 3lds) explicitly listing their directly available subs for strings
> like this?  (That may not be a trivial project)

Ryan is generally in favour of this, although we don't have an explicit project to do it.

Gerv
Bug 1163015 filed on fixing Firefox. Ryan: can you review it quickly and tell me if it's incorrect in any way?

Gerv
> 
> Ryan is generally in favour of this, although we don't have an explicit
> project to do it.
> 

I am too, although it is daunting the amount of work and outreach required for that.

Many of the downstream (!= browser) PSL integrators look to PSL for this, and it would be a way to better serve the community and enhance user experience.  

A discussion for a different bug, unless we go for #3 in a stringent way.
(In reply to Jothan Frakes from comment #16)
> I think if I understand the differences here, this is more likely to impact
> the space in ccTLDs within the ICANN section and subdomains within the
> PRIVATE section space.

Not really. For the ccTLDs, it is virtually a no-op.

> PSL Entry is "*.BD"
> 
> Would a private entry for PLATFORM.COM.BD also be problematic?

Nope.

> 
> Should we get these asterisked entries (and others like SH where there are
> potential 3lds) explicitly listing their directly available subs for strings
> like this?  (That may not be a trivial project)

That's not necessary.


The issue is that we have entries for
*.platform.sh
.sh

Chrome assumes that "*" applies to both sub-domains of platform.sh and to platform.sh. That's because the PSL has historically had entries of _just_ *.ccTLD, but .ccTLD was also an eTLD (because it was an actual TLD). However, the PSL didn't properly encode this as

*.ccTLD
.ccLD

Instead, it relied on the implicit rule that "*" would catch it all. However, because some (many) consumers use the PSL to determine if a domain is IANA-assigned, I _suspect_ that many implementations adopted Chrome's logic - that is, if you encountered *.ccTLD, you assumed that .ccTLD was a valid eTLD (this is true by "*" rules, but I guess I mean "Code assumed it was IANA assigned")

Platform.sh is a special case because

*.platform.sh
.sh

Does not mean that platform.sh is an eTLD. There's a separate question as to whether it should be (to which I'd argue yes, since having platform.sh be able to set cookies for and compromise all the *.platform.sh users seems counter-intuitive for me, but such is the web I guess), but that's what caused the bug.

For all of the ICANN *.foo / *.foo.bar examples, both foo and foo.bar are *also* themselves eTLDs. That's why I suggested we fix that, and then code (such as Chromium) can stop assuming that a *.foo rule also means .foo is an appropriately-assigned eTLD.
(In reply to Gervase Markham [:gerv] from comment #18)
> Bug 1163015 filed on fixing Firefox. Ryan: can you review it quickly and
> tell me if it's incorrect in any way?
> 
> Gerv

It's fine/correct.
My two cents:

A "*" rule could always be amended by an exception rule sometime in future. For example

*.platform.sh
!www.platform.sh

The public suffix of "www.platform.sh" is "platform.sh". Therefore "platform.sh" -is- a public suffix. The set of public suffix is the set of all possible outputs of the algorithm.

Now, it makes sense that add/remove exception rules should not affect the set of public suffix. Therefore it makes sense that rule "*.foo" implies rule "foo".

I think the best option is to fix the algorithm to that effect. We don't need to change PSL, or browser implementations.

If "platform.sh" really does not want to be a public suffix, we can add an exception rule

*.platform.sh
!platform.sh
(In reply to zhong.j.yu from comment #22)
> If "platform.sh" really does not want to be a public suffix, we can add an
> exception rule
> 
> *.platform.sh
> !platform.sh

This is not desirable though. A parent of a public suffix should always be a public suffix too, for simplicity's sake.
(In reply to zhong.j.yu from comment #23)
> (In reply to zhong.j.yu from comment #22)
> > If "platform.sh" really does not want to be a public suffix, we can add an
> > exception rule
> > 
> > *.platform.sh
> > !platform.sh
> 
> This is not desirable though. A parent of a public suffix should always be a
> public suffix too, for simplicity's sake.

It's also not desirable because it has all the same implementation challenges discussed on this bug. That is, it'd break for both our parsers, AFAICT :)
> it'd break for both our parsers, AFAICT :)

and mine too... it's not surprising that a programmer studies the current list and makes some assumptions to simplify the code and improve performance.
Hi guys, upon further consideration, I think there is nothing wrong with the current PSL, and we should not change it.

The definition of "public suffix" is very clear:

>A "public suffix" is one under which Internet users can (or historically could) 
>directly register names.

Therefore, "platform.sh" is clearly not a public suffix, because the public cannot directly register domains under it. Neither is "bd".

The definition of PSL is also very clear - it is an enumeration of all public suffix domains. There are wildcard and exception rules, but they are just shorthand notations, we should not give them any special semantics. Rule "*.platform.sh" does not imply rule "platform.sh", and we should not explicitly add "platform.sh" to PSL either.

---

Given the definition of "public suffix" and PSL, we should present a simple algorithm on the website for

    is_public_suffix(domain) -> boolean
    
there is no ambiguity or confusion about this algorithm.

With that settled, given an arbitrary domain, it's very clear which part or parts of it are public suffixes. For example, within "foo.bar.platform.sh", "bar.platform.sh" and "sh" are public suffix domains. How this information is utilitized is entirely up to the application.

It's possible that no part of a domain is a public suffix. For example, domain "bd" or "foo.local" contains no public suffix. Again, it is up to the application to decide how to handle it. 

---

So far so good. The confusion comes when the website defines an algorithm for

    public_suffix(domain) -> domain

My question is, is this necessary? Obviously Firefox/Chrome use this function for some specific purposes; do other applications need the exact same definition for their purposes too?

If they do, we need to clearly explain the function's intention, and perhaps coin a formal term like "-the- public suffix of a domain" to go with it.

If the function could return a non-public-suffix, it will be very confusing; in that case, we ought to rename the function/term to something more appropirate.
see also https://bugzilla.mozilla.org/show_bug.cgi?id=1163015#c5 for reasoning that a domain could only appear in PSL in one of the three forms.
(In reply to Ryan Sleevi from comment #20)
> For all of the ICANN *.foo / *.foo.bar examples, both foo and foo.bar are
> *also* themselves eTLDs. That's why I suggested we fix that, and then code
> (such as Chromium) can stop assuming that a *.foo rule also means .foo is an
> appropriately-assigned eTLD.

Let's take .il as an example. It has the following rule:

*.il

Their rules state that nobody can register directly under .il:

http://www.isoc.org.il/domains/il-domain-rules.html

I don't think it would be a good idea for the PSL maintainers to unilaterally add the following rule:

il

That is, at best, confusing, because of il-domain-rules.html above. I think the PSL maintainers should reach out to the .il registry and get approval before adding such a rule.

Also, the PSL algorithm states that the prevailing rule is "*" when no other rule matches. But in the case of "il", the algorithm must not yield "il" because that is not a public suffix, as stated in il-domain-rules.html and as implied by the presence of the rule "*.il" and the absence of the rule "il".

The "*" rule may make sense when there are no rules at all for the particular TLD. So I think we should consider changing the algorithm (the spec).

Another example is .uk. They switched from:

*.uk

to:

uk
ac.uk
<etc>

This is also consistent with the view that the presence of "*.uk" and absence of "uk" meant that nobody could register directly under .uk previously.
(In reply to erikvanderpoel from comment #28)

I agree, except

> The "*" rule may make sense when there are no rules at all for the
> particular TLD. So I think we should consider changing the algorithm (the
> spec).

In the case of "foo.local", "local" is not exactly a "public" suffix. It should be left for the application to decide; most applications probably would take the TLD as the the fallback output because it fits their purposes, but that doesn't need to be an official mandate.
(In reply to erikvanderpoel from comment #28)
> Let's take .il as an example. It has the following rule:
> 
> *.il
> 
> Their rules state that nobody can register directly under .il:
> 
> http://www.isoc.org.il/domains/il-domain-rules.html
> 
> I don't think it would be a good idea for the PSL maintainers to
> unilaterally add the following rule:
> 
> il
> 
> That is, at best, confusing, because of il-domain-rules.html above. I think
> the PSL maintainers should reach out to the .il registry and get approval
> before adding such a rule.
> 
> Also, the PSL algorithm states that the prevailing rule is "*" when no other
> rule matches. But in the case of "il", the algorithm must not yield "il"
> because that is not a public suffix, as stated in il-domain-rules.html and
> as implied by the presence of the rule "*.il" and the absence of the rule
> "il".

Hi Erik,

I appreciate your point, but I think you're demonstrably wrong in how the public suffix is used and how it is intended. That is, your view was at one time reflective of the reality of its usage, but no longer so, and I think it does a disservice to the consumers to advance it :/

The logical implication of your position is that the "*" suffices for all the new gTLDs that have been added, and that we should not add them without demonstration or request by the registry. It also means that we should not have added the IANA root zone database as a source for the PSL, since those too had no demonstration of registerability.

I think the flaw in the thinking comes from believing that an entry such as "il" is an indicator that subdomains of "il" are registerable, and I think that's the wrong way to think of it. The inclusion of "il" is an indicator that "il" is controlled by the registry - that is, explicitly, that it is NOT registerable. It is also an indicator that it is a valid TLD (as demonstrated by all of the gTLD inclusions)

If you approach an entry in the PSL with those two things in mind, then hopefully it becomes clear precisely why and how "il" is needed, because as it is today, there is zero expression that "il" is a valid, IANA-delegated ccTLD, as opposed to "something else" (e.g. .local, which isn't delegated, but is reserved)

For applications using the PSL as a surrogate both for the IANA Root Zone Database and as a knowledge of where administrative boundaries lie, minimally, then failing to include the entries for "ccTLD" when a "*.ccTLD", but properly respecting that "*.ccTLD" says nothing about "ccTLD", means that "ccTLD" will no longer be recognized as a valid ccTLD. That's the logical, natural, and hopefully obvious outcome, and it's also hopefully understood how undesirable that would be.

I think in its original intent, your remarks are entirely correct. But in just a cursory scan of https://publicsuffix.org/learn/, you can quickly see that even though this was the intent, it doesn't at all match the usage, and the PSL maintainers have realized that enough to adapt and change policies (such as the explicit inclusion of the new gTLDs, even though, as you stay, the prevailing "*" totally matches for the original purposes)
So this bug intends to change the definition of "public suffix", and/or the definition of "public suffix list"? Isn't that dangerous, particularly because there is no way to notify existing applications that depend on the old definitions?
(In reply to zhong.j.yu from comment #31)
> So this bug intends to change the definition of "public suffix", and/or the
> definition of "public suffix list"? Isn't that dangerous, particularly
> because there is no way to notify existing applications that depend on the
> old definitions?

This does not at all change the definition, and aligns these few domains with the existing practice of the past three years.
I'm obviously ignorant of the history of this subject. But from https://publicsuffix.org/

> A "public suffix" is one under which Internet users can (or historically could) 
> directly register names.

I don't understand how "il" falls within the definition.

Automatically adding "foo" for any "*.foo" does not seem to create any new information; it just add redundancy to the source data. Isn't it better to fix the algorithm instead?
(In reply to Ryan Sleevi from comment #30)
> as
> it is today, there is zero expression that "il" is a valid, IANA-delegated
> ccTLD

The "*.il" rule tells us that il is a valid TLD.

If I understand correctly, "*.platform.sh" revealed an issue in Chrome, and the concern is that addressing that issue without adding the rule "il" would change the behavior of Chrome under il and a few other TLDs. Is that correct?

I'm trying to understand whether that change in behavior could be observed "on the wire". For example, cookies are currently only set for 3LDs under il because there are no HTTP servers at the 2LDs (e.g. co.il). Is that correct?

Are cookies a good example here? Do you have other usages in mind?
(In reply to erikvanderpoel from comment #34)
> (In reply to Ryan Sleevi from comment #30)
> > as
> > it is today, there is zero expression that "il" is a valid, IANA-delegated
> > ccTLD
> 
> The "*.il" rule tells us that il is a valid TLD.
> 

No, it doesn't, which is the entire point of this bug. Assuming that "*.il" says something about .il is broken and wrong, as demonstrated by what happens when you examine *.platform.sh

The algorithm documented (but not correctly implemented by Chrome in one way, and Firefox in another), is clear that *.foo is a statement about the descendents of foo.

> If I understand correctly, "*.platform.sh" revealed an issue in Chrome, and
> the concern is that addressing that issue without adding the rule "il" would
> change the behavior of Chrome under il and a few other TLDs. Is that correct?
> 

It is hard to say if your understanding is correct, because I think you're misunderstanding a sizable piece here.

> I'm trying to understand whether that change in behavior could be observed
> "on the wire". For example, cookies are currently only set for 3LDs under il
> because there are no HTTP servers at the 2LDs (e.g. co.il). Is that correct?

You are mistaken to assume this is only about cookies, as evidenced by the many uses of the PSL.

As I said in the beginning, to fix handling (in Chrome *and* Firefox) to reflect the documented algorithm, let alone the myriad other implementations, means rejecting .il as a valid TLD - unless further steps are taken.
(In reply to Ryan Sleevi from comment #35)

I apologize for asking questions during the Memorial Day weekend. I may come back on Tuesday.
(In reply to erikvanderpoel from comment #34)
> I'm trying to understand whether that change in behavior could be observed
> "on the wire". For example, cookies are currently only set for 3LDs under il
> because there are no HTTP servers at the 2LDs (e.g. co.il). Is that correct?

I've done some tests about browser behaviors on cookie domain and public suffix,
see http://www.ietf.org/mail-archive/web/http-state/current/msg01457.html

For this discussion, let's say a cookie-public-suffix is a domain that is treated as a public suffix as far as cookie handling is concerned. 

"il" is a cookie-public-suffix. However, all TLDs are cookie-public-suffix anyway, regardless of their standing in PSL.

More interestingly, given rule "*.nom.br", is "nom.br" a cookie-public-suffix? It is, in Firefox/Chrome/Safari/IE.

On the other hand, given rules of child domains of "amazonaws.com", is "amazonaws.com" a cookie-public-suffix? It -is-not- on Firefox/Chrome; It -is- on Safair.
Hi Ryan, I'm sorry about the confrontational nature of my questions. Let me try a different approach.

As I understand it, the proposal is to add, for each "*.tld" and "*.2ld.tld", the corresponding rule without the leading "*.". Will the registries be allowed to opt out of this change? Or is there going to be a "tld" rule for every TLD?

How is the wording on the publicsuffix.org site going to change? I'm a bit concerned that this change might increase complexity.

The Public Suffix List seems to have become important enough that a change of this nature ought to be written up in some kind of proposal that includes the problem statement and/or motivation, other approaches considered, and problems that would occur if this change was not made.

Also, have the PSL maintainers considered a mailing list for announcements and maybe another for discussion?

Finally, I believe that the test cases can be improved in two ways. Instead of checking the label one below the public suffix, check the public suffix itself. Also, instead of testing the actual rules, allow arbitrary rules to be specified (including intentionally strange ones) and then test the lookups against those rules.

http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1
(In reply to erikvanderpoel from comment #38)
> As I understand it, the proposal is to add, for each "*.tld" and
> "*.2ld.tld", the corresponding rule without the leading "*.". Will the
> registries be allowed to opt out of this change? Or is there going to be a
> "tld" rule for every TLD?

Well, we established early on that there would be a "tld" rule for every TLD when we started including the ICANN new registries and incorporating the IANA root zone database.

Opt-out doesn't make sense, because there's no functional change in behaviour from the perspective of the TLD. That is, every inclusion of a TLD is equivalent to the "*" matching rule from the POV of cookies, and is formally correct from the POV of recognized vs unrecognized TLDs.

So no, no opt-out, and yes, every TLD.

> How is the wording on the publicsuffix.org site going to change? I'm a bit
> concerned that this change might increase complexity.

There's no change needed to the wording. Both the existing definition and the existing algorithm remain correct in the face of this change. The issue is that both Firefox and Chrome (and a number of other implementations) have "incorrectly" implemented the algorithm documented, in a way that naturally causes issues (due to the ways to implement)

Firefox assumes that "*.tld" is an expression about "tld" in its data storage model, but not in its processing model (instead, whether or not "tld" is a public suffix falls to the "*" rule)

Chrome assumes that "*.tld" is an expression about "tld" in its data storage model AND in its processing model, because the presence of "*.tld" in the PSL was and is equivalent to also recognizing "tld" as valid (and was equivalent to a "*" rule for cookies, but NOT equivalent for purpose of determining whether IANA delegated)

This applies to a wide variety of libraries as well (judging by those on publicsuffix.org)

> The Public Suffix List seems to have become important enough that a change
> of this nature ought to be written up in some kind of proposal that includes
> the problem statement and/or motivation, other approaches considered, and
> problems that would occur if this change was not made.

This bug has somewhat exhaustively captured that (for example, comment #4)

> Also, have the PSL maintainers considered a mailing list for announcements
> and maybe another for discussion?

This has been somewhat exhaustively discussed (for example, Comment #4, Comment #10, Comment #12)

> Finally, I believe that the test cases can be improved in two ways. Instead
> of checking the label one below the public suffix, check the public suffix
> itself. Also, instead of testing the actual rules, allow arbitrary rules to
> be specified (including intentionally strange ones) and then test the
> lookups against those rules.

I don't follow this proposal, but it sounds like exactly what Chrome does for its unit tests of the PSL.

https://chromium.googlesource.com/chromium/src/+/master/net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (and the related .gperf files at https://chromium.googlesource.com/chromium/src/+/master/net/base/registry_controlled_domains/ )
(In reply to Ryan Sleevi from comment #39)
> Well, we established early on that there would be a "tld" rule for every TLD
> when we started including the ICANN new registries and incorporating the
> IANA root zone database.
> 
> Opt-out doesn't make sense, because there's no functional change in
> behaviour from the perspective of the TLD. That is, every inclusion of a TLD
> is equivalent to the "*" matching rule from the POV of cookies

Yes, I see that new ICANN TLDs and IANA root zone TLDs have been added to the PSL, without "*.", "!" and "PRIVATE". Is it too late to change course? How about having a rule like "?tld" when the PSL maintainers know that the TLD is (or will soon be) valid, but don't know whether the rule should be "*.", "!", "PRIVATE" or "" (normal). Then applications can choose how to treat those TLDs, especially in the areas of security and privacy.

It seems to me that a domain name can be a public suffix, below a public suffix, above a public suffix, or between public suffixes.

For example, given the rules:

sh
*.platform.sh

Then the following domains are as indicated:

foo.bar.platform.sh is below a public suffix
bar.platform.sh is a public suffix
platform.sh is between public suffixes
sh is a public suffix

And given the rule:

*.il

Then:

il is above all public suffixes

So the cookie spec could clarify (if it hasn't already) that cookies can only be set for domains below a public suffix (and not above a public suffix).

For example, blogspot.com can have a cookie because com is a public suffix, but il cannot have a cookie because it is above all public suffixes (*.il).

Judging from Zhong's cookie tests, the browsers still behave very differently. This suggests that it is still early days in the Public Suffix era (though obviously not in the Cookie era). So maybe Public Suffix can still be changed, and the browsers can still try to interoperate eventually.
(In reply to erikvanderpoel from comment #40)
> Yes, I see that new ICANN TLDs and IANA root zone TLDs have been added to
> the PSL, without "*.", "!" and "PRIVATE". Is it too late to change course?

Yes

> How about having a rule like "?tld" when the PSL maintainers know that the
> TLD is (or will soon be) valid, but don't know whether the rule should be
> "*.", "!", "PRIVATE" or "" (normal). Then applications can choose how to
> treat those TLDs, especially in the areas of security and privacy.

I don't see the value in this, and it also creates further issues in as much as the absence of a TLD (which falls to a "*") is now not only conceptually different but potentially functionally different than the presence of a "?tld"

For applications, this seems to represent a much larger breaking change, and without clear benefits or need. That is neither Firefox, Chrome, Safari or IE have needed this, nor any other consumer, or else there would have been a proposal or request for such distinction. Absent actual need, I'm disinclined to advocate a change to the format.

> It seems to me that a domain name can be a public suffix, below a public
> suffix, above a public suffix, or between public suffixes.

Sure

> For example, given the rules:
> 
> sh
> *.platform.sh
> 
> Then the following domains are as indicated:
> 
> foo.bar.platform.sh is below a public suffix
> bar.platform.sh is a public suffix
> platform.sh is between public suffixes
> sh is a public suffix

That is correct in the intent, and how we got to this issue.

> And given the rule:
> 
> *.il
> 
> Then:
> 
> il is above all public suffixes

I disagree with this; il is a public suffix. This is intrinsically obvious by being a ccTLD. All existing ccTLDs are inherently public suffices, regardless of whether they allow registrations at the 2LD or 3LD, as they all represent domain boundaries below the ccTLD (e.g. il)

I'm uncertain your opposition to adding "il", other than it seems to be founded on a definition of public suffix that doesn't reflect the well-documented and long-standing usages; I'd much rather align the data with practice rather than proscribe existing practice.

> So the cookie spec could clarify (if it hasn't already) that cookies can
> only be set for domains below a public suffix (and not above a public
> suffix).

Well, no, that's not correct either. platform.sh should, in theory, be allowed to set cookies. It is between, which is nothing more than saying it is both above and below.

> For example, blogspot.com can have a cookie because com is a public suffix,
> but il cannot have a cookie because it is above all public suffixes (*.il).

I agree with the end result, but disagree with the path or logic taken to get there :/

> 
> Judging from Zhong's cookie tests, the browsers still behave very
> differently. This suggests that it is still early days in the Public Suffix
> era (though obviously not in the Cookie era). So maybe Public Suffix can
> still be changed, and the browsers can still try to interoperate eventually.

Zhong's tests have issues, but this isn't the bug for them, and we're far from the early days of the Public Suffix, and whatever changes may be worthwhile, I do not think it necessary to solve or discuss them to resolve the very real issues today.

I think it far more important to address the real issues before the theorhetical issues, and I see these concerns about such expressions as "?tld" to be far in the latter camp, while including "il" unquestionably in the former, from examining implementation and practice.

Do you have a concrete scenario or implementation where this would break? If not, it'd be much better to resolve this issue now and then separately worry about treatment of cookies (which ostensibly isn't a PSL issue) or other future use cases.
Hmmm, after sending my previous comment, I realized that the DNS root is now a public suffix because ICANN is allowing the public to register domains there.

If the youtube TLD is similar to youtube.com, will http://youtube/ be allowed to have cookies?

What about http://accountants/ ?

If YouTube asked the PSL maintainers to change the rule to "!youtube", would that change be accepted?

(I'm getting closer to accepting/understanding "il" alongside "*.il".)
currently the list maps each domain to a one-bit flag. yet we try to use the list for many different reasons. maybe we should create another list, where each domain is mapped to a set of flags

    il: ccTLD, no-public-subdomain, no-cookie, etc.
(In reply to zhong.j.yu from comment #43)
> currently the list maps each domain to a one-bit flag. yet we try to use the
> list for many different reasons. maybe we should create another list, where
> each domain is mapped to a set of flags
> 
>     il: ccTLD, no-public-subdomain, no-cookie, etc.

No, that's not accurate. The list already contains multiple bits of state, as reflected in common implementations.

1) ICANN-delegated vs private domain boundary (reflected in comments, not in format)
2) Wildcard vs non-wildcard
3) Exception vs non-exception
(In reply to erikvanderpoel from comment #42)
> Hmmm, after sending my previous comment, I realized that the DNS root is now
> a public suffix because ICANN is allowing the public to register domains
> there.

Yes, a special hell.

> 
> If the youtube TLD is similar to youtube.com, will http://youtube/ be
> allowed to have cookies?
> 
> What about http://accountants/ ?

At present, no, and this is independent of any changes we may make. This is because "*" is the terminal rule, and as such, "youtube" and "accountants" are always seen as public suffices (and thus denied cookie setting privilege), much the same way "intranet" hosts are denied this (and have been since before the PSL even came to be)

> If YouTube asked the PSL maintainers to change the rule to "!youtube", would
> that change be accepted?

Perhaps, but it wouldn't change Chrome's behaviour or policies, for example.
I do think it's important to keeping the discussion productive that we distinguish "things we might do in the future or change" and "things we can/should do now"

Zhong, Erik, not to diminish your contributions, because I think there's important discussion to be had, but I don't think it's particularly productive to have it on this bug. I think the merits of your proposals to change things are best suited for bugs reflecting those proposals, and bugs such as this may serve as illustrative towards those ends, but I don't think we should block a solution to this real and immediate issue from being resolved.
I've read all Ryan's comments multiple times, and I cannot find technical justifications for the change to PSL.

If Chrome has a bug, Chrome can simply fix it. There is no reason to alter PSL. There is no reason to assume that other implementations have the same bug, which incidentally can be fixed by the proposed PSL change. 

The change will break many consumers of PSL in various ways. It is very presumptuous to say that it is a good thing. People have been interpreting PSL just fine for their use cases; let's not assume that they don't know what they are doing. 

The change may introduce bugs silently to some implementations that compromise internet security.
(In reply to Ryan Sleevi from comment #46)
> Erik, not to diminish your contributions, because I think there's
> important discussion to be had, but I don't think it's particularly
> productive to have it on this bug.

I did not intend to delay the resolution of this bug. I am concerned about potential confusion:

(https://publicsuffix.org/)
> A "public suffix" is one under which Internet users can (or historically
> could) directly register names.

(In reply to Ryan Sleevi from comment #30)
> I think the flaw in the thinking comes from believing that an entry such as
> "il" is an indicator that subdomains of "il" are registerable, and I think
> that's the wrong way to think of it.

(In reply to Ryan Sleevi from comment #39)
> There's no change needed to the wording.

I'm not sure whether or how to change the wording. Perhaps "directly or indirectly"?

< A "public suffix" is one under which Internet users can (or historically
< could) directly or indirectly register names.

By the way, I'm assuming that "kobe.jp" will be added:

jp
kobe.jp
*.kobe.jp

Will the following tests remain unchanged?

(http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/test_psl.txt?raw=1)
> checkPublicSuffix('com', null);
> checkPublicSuffix('example.com', 'example.com');
> ...
> checkPublicSuffix('c.kobe.jp', null);

Here, the input c.kobe.jp matches the rules jp, kobe.jp and *.kobe.jp, so we take the longest rule, but there are no more labels in the input, so the output must be null. Is that right?
(In reply to zhong.j.yu from comment #47)
> I've read all Ryan's comments multiple times, and I cannot find technical
> justifications for the change to PSL.

As the very first comment says, see the discussion on bug 1124625.

> If Chrome has a bug, Chrome can simply fix it. There is no reason to alter
> PSL. There is no reason to assume that other implementations have the same
> bug, which incidentally can be fixed by the proposed PSL change. 

There's no assumption - it is true across other implementations.

> The change will break many consumers of PSL in various ways. 

Can you provide a single demonstration of this claim? I believe you're just making it up, because I have actually taken the time to review the multiple documented implementations. If there is a concrete claim of breakage, then we can evaluate it compared to the other breakage.

> It is very
> presumptuous to say that it is a good thing. People have been interpreting
> PSL just fine for their use cases; let's not assume that they don't know
> what they are doing. 
> 
> The change may introduce bugs silently to some implementations that
> compromise internet security.

This is just spreading uncertainty. I'm sure you can appreciate that I'd prefer technical arguments with solid demonstrations. I feel I've gone repeatedly out of the way to demonstrate to you how this is an issue. It's one thing if you don't agree, but it seems that you still don't understand ("If Chrome has a bug" suggests you're still unaware of the root issue)

I hope, that in reading my replies, you can see that I've tried very hard to document the risks, the considerations, and the alternatives. It's unquestionable that we've identified issues in *multiple* implementations that yields incorrect behaviour. That you disagree with that behaviour is certainly your perogative, but it's clear from history and practice that you're on a side not well supported, certainly not technically, and you're not providing any alternative solutions that might be discussed or evaluated for their technical viability. As such, it's hard to seriously weigh your objections, other than just treating them as a "No", acknowledging them, and then ignoring them in order to make real progress.
(In reply to erikvanderpoel from comment #48)
> < A "public suffix" is one under which Internet users can (or historically
> < could) directly or indirectly register names.

That would imply that "amazonaws.com" is also a public suffix.
(In reply to erikvanderpoel from comment #48)
> (https://publicsuffix.org/)
> > A "public suffix" is one under which Internet users can (or historically
> > could) directly register names.

Right, I don't know if that's meant to be an exhaustive definition, hence https://publicsuffix.org/learn/
> Some people use the PSL to determine what is a valid domain name and what isn't.

Also, https://wiki.mozilla.org/Public_Suffix_List
> As well as being used to prevent cookies from being set where they shouldn't be,
> the list can also potentially be used for other applications where the registry
> controlled and privately controlled parts of a domain name need to be known, for
> example when grouping by top-level domains.

And also the broader discussion that lead to the inclusion of the PRIVATE domain section

https://wiki.mozilla.org/Public_Suffix_List/Use_Cases

(And, for historic sake, https://bugzilla.mozilla.org/show_bug.cgi?id=712640 and https://bugzilla.mozilla.org/show_bug.cgi?id=576508 reflecting these current behaviours )


> I'm not sure whether or how to change the wording. Perhaps "directly or
> indirectly"?

It sounds like if there is a change to be made, it'd be pointing to the more holistic or nuanced discussion? 

> < A "public suffix" is one under which Internet users can (or historically
> < could) directly or indirectly register names.
> 
> By the way, I'm assuming that "kobe.jp" will be added:
> 
> jp
> kobe.jp
> *.kobe.jp

Correct

> 
> Will the following tests remain unchanged?
> 
> (http://mxr.mozilla.org/mozilla-central/source/netwerk/test/unit/data/
> test_psl.txt?raw=1)
> > checkPublicSuffix('com', null);
> > checkPublicSuffix('example.com', 'example.com');
> > ...
> > checkPublicSuffix('c.kobe.jp', null);
> 
> Here, the input c.kobe.jp matches the rules jp, kobe.jp and *.kobe.jp, so we
> take the longest rule, but there are no more labels in the input, so the
> output must be null. Is that right?

Correct, this matches the *.kobe.jp rule and thus returns 'null'.
Without this change, looking up 'kobe.jp' would return 'jp' (Firefox) or 'kobe.jp' (Chrome, several other implementations)
With this change, looking up 'kobe.jp' would return 'null' (all implementations)
(In reply to Ryan Sleevi from comment #49)
> (In reply to zhong.j.yu from comment #47)
> > I've read all Ryan's comments multiple times, and I cannot find technical
> > justifications for the change to PSL.
> 
> As the very first comment says, see the discussion on bug 1124625.

I've read that multiple times too, and I don't understand your reasoning.  

> 
> > If Chrome has a bug, Chrome can simply fix it. There is no reason to alter
> > PSL. There is no reason to assume that other implementations have the same
> > bug, which incidentally can be fixed by the proposed PSL change. 
> 
> There's no assumption - it is true across other implementations.
> 
> > The change will break many consumers of PSL in various ways. 
> 
> Can you provide a single demonstration of this claim? I believe you're just
> making it up, because I have actually taken the time to review the multiple
> documented implementations. If there is a concrete claim of breakage, then
> we can evaluate it compared to the other breakage.

It breaks Firefox and Chrome:) And it breaks my implementation. So, I'm not making it up.

I'm not sure what "documented implementations" you are referring to. Every http client should depend on PSL, and there are lots of http client libraries; I don't think you've reviewed a significant portion of them. Do they not matter?
(In reply to Ryan Sleevi from comment #51)
> (In reply to erikvanderpoel from comment #48)
> > (https://publicsuffix.org/)
> > > A "public suffix" is one under which Internet users can (or historically
> > > could) directly register names.
> 
> Right, I don't know if that's meant to be an exhaustive definition, hence
> https://publicsuffix.org/learn/

Please provide a more complete definition. I think that's our misunderstanding here, you are using the term in a way that we are not aware of.
(In reply to zhong.j.yu from comment #52)
> It breaks Firefox and Chrome:) And it breaks my implementation. So, I'm not
> making it up.

We've established that it "breaks" Firefox and Chrome because they've improperly implemented the public suffix list. Further, it "breaks" in that they've added checks before parsing, but they do parse correctly, and even if those checks were removed, the correct and desired behaviour would result. So I think we may be using "break" somewhat inconsistently.

Does it yield any incorrect result for any of the implementations? For Firefox, Chrome, and for those documented on https://publicsuffix.org/learn/, it would not yield incorrect results.
It seems like this proposed change requires an explanatory wiki page which pulls together all of the salient facts. They are now spread across two bugs and multiple points of misunderstanding (including by me, earlier). I'm not sure anyone new could easily get their head around what is going on in this bug.

I will attempt to put such a page together.

Gerv
Ok, here it is:
https://wiki.mozilla.org/Public_Suffix_List/platform.sh_Problem

There's one question for Ryan in it - my apologies if it's answered already somewhere in this thread.

Particularly looking at Ryan, but anyone: if the document is incorrect on the facts, please fix it - we are all looking for clarity! And hopefully, as we set down a clear narrative, either consensus will emerge, or it will become clear exactly where the disagreement lies.

Gerv
Gervase - my observation is that, for handling "kobe.jp" cookie domain, Firefox/Safari/Chrome/Opera behave the same - "kobe.jp" cookie domain is rejected; "s1.kobe.jp" cannot set a cookie with domain="kobe.jp", such a cookie will be not delivered to "kobe.jp", "s1.kobe.jp", "s2.kobe.jp"

"*.nom.br" and "*.sch.uk" are tested with the same result.

(actually, Firefox allows site "kobe.jp" to set a cookie with domain "kobe.jp"; but apparently the cookie domain is reset to null; the cookie can only be applied to "kobe.jp" itself, not to "s1.kobe.jp" etc.)

For "*.platform.sh", 
 - Firefox/Safari treats it the same as "*.kobe.jp"
 - Chrome/Opera treat "platform.sh" as a normal domain; presumably because the rule 
   has not been incorporated yet
(In reply to zhong.j.yu from comment #57)
> Gervase - my observation is that, for handling "kobe.jp" cookie domain,
> Firefox/Safari/Chrome/Opera behave the same - "kobe.jp" cookie domain is
> rejected; "s1.kobe.jp" cannot set a cookie with domain="kobe.jp", such a
> cookie will be not delivered to "kobe.jp", "s1.kobe.jp", "s2.kobe.jp"

s1.kobe.jp can't set a cookie for kobe.jp because s1.kobe.jp is equivalent to a TLD setting a cookie (e.g. .com can't set a cookie because it's an eTLD, not because it's the terminal TLD)

> "*.nom.br" and "*.sch.uk" are tested with the same result.
> 
> (actually, Firefox allows site "kobe.jp" to set a cookie with domain
> "kobe.jp"; but apparently the cookie domain is reset to null; the cookie can
> only be applied to "kobe.jp" itself, not to "s1.kobe.jp" etc.)

This is true for Firefox, but not true for other implementations, and begins to get closer to the heart of the issue. kobe.jp can set a cookie for kobe.jp, except kobe.jp shouldn't be able to (kobe.jp is itself registry controlled, an effective TLD).

Other implementations allow setting to kobe.jp and applying that to s1.kobe.jp and foo.s1.kobe.jp, since that's how cookies are spec'd (RFC 6265).

Firefox's behaviour is logical (preventing kobe.jp from applying to s1.kobe.jp), but that's not actually spec'd as such.

That's why we end up with such divergent behaviour.

Of course, adding kobe.jp to the PSL, reflecting its actual status, also ensures that you can't set a cookie on kobe.jp in Firefox (and all other implementations that follow 6265 and reject setting cookie domains equivalent to public suffices)

> 
> For "*.platform.sh", 
>  - Firefox/Safari treats it the same as "*.kobe.jp"
>  - Chrome/Opera treat "platform.sh" as a normal domain; presumably because
> the rule 
>    has not been incorporated yet

No, that's not correct. We have explicitly disabled *.platform.sh until the resolution of the wildcard issue. Fixing *.platform.sh will directly impact all other * rules, as discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1124625#c6 , so we're not fixing that until we can reach consensus on the expected behaviour.

Of course, for platform.sh, it's a special snowflake because it would have the "kobe.jp hole" problem, and we will end up with divergent behaviour between browsers, because "platform.sh" is a hole between two public suffices (*.platform.sh and .sh). Some browsers will allow platform.sh (which does host real content) to set a cookie, other browsers will not (because there's a public suffix lower in the domain components).

This is why I remarked, on the original issue, that platform.sh is doing something weird that's going to cause issue. It's incumbent on them to solve the larger issue of "What you're doing is going to be weird across browsers", but it highlighted an issue in the current consumers' interpretation of the language (*.foo doesn't express anything about foo, per the PSL algorithm) and the actual practice of consumers of the PSL (*.foo does express something about foo)
(In reply to Ryan Sleevi from comment #58)
> s1.kobe.jp can't set a cookie for kobe.jp because s1.kobe.jp is equivalent
> to a TLD setting a cookie (e.g. .com can't set a cookie because it's an
> eTLD, not because it's the terminal TLD)

Consider rule "!city.kobe.jp" - Firefox/Chrome/Safari does not allow "city.kobe.jp" to set a cookie to "kobe.jp".

Compare that to "www.amazonaws.com" - Firefox/Chrome allows "www.amazonaws.com" to set a cookie to "amazonaws.com". the cookie is applicable to "amazonaws.com", "www.amazonaws.com", "xxx.amazonaws.com"

Safari treats "amazonaws.com" the same as "kobe.jp" - "www.amazonaws.com" cannot set a cookie to "amazonaws.com".

Firefox/Chrome treats "amazonaws.com" differently from "kobe.jp" or "platform.sh", apparently because the wildcard plays a semantic role here. I think that should be avoided - wildcard/exception should be simply a syntactic device.

This is all too confusing, and I plan to propose an Errata to RFC6265, that a cookie domain cannot be a parent of a public suffix, even if that parent is not a public suffix itself. That will cover "kobe.jp", "amazonaws.com", "platform.sh" etc.
(In reply to Ryan Sleevi from comment #58)
> (In reply to zhong.j.yu from comment #57)
> > (actually, Firefox allows site "kobe.jp" to set a cookie with domain
> > "kobe.jp"; but apparently the cookie domain is reset to null; the cookie can
> > only be applied to "kobe.jp" itself, not to "s1.kobe.jp" etc.)

rfc6265#section-5.3 has a special clause saying that, if the cookie domain is a public suffix, but it is the same as the request domain, accept the cookie as if its domain is null. this cookie will only be applied to that exact request domain.

Firefox is the only browser that implements this clause of RFC. I plan to propose an Errata to RFC6265 to drop the clause.
The current wording of RFC6265 would allow "foo.compute.amazonaws.com" to set a cookie domain "amazonaws.com", and cookie would apply to all child domains of "amazonaws.com". That is obviously flawed.

Fortunately Firefox/Chrome/Safari does not allow that to happen. Safari took a simpler approach - "amazonaws.com" is simply not allowed, period. Firefox/Chrome took a smarter approach - it's allowed in certain cases; but that approach is complicated, and other clients may fail to implement it correctly; especially with the confusion of what "public suffix" is.

So I think cookie RFC should be fixed with a more conservative, and simpler, policy regarding cookie domain - cookie domain cannot be TLD, public suffix, or parent of public suffix.

With that fix, cookie RFC will not be affected by our discussion here of which is or is not a public suffix. And with that fix, our discussion here does not need to worry about implications to cookie handling either; in particular, expanding "*.foo" to "foo" is not necessary as far as cookie domain is concerned.
(In reply to zhong.j.yu from comment #61)
> With that fix, cookie RFC will not be affected by our discussion here of
> which is or is not a public suffix. And with that fix, our discussion here
> does not need to worry about implications to cookie handling either; in
> particular, expanding "*.foo" to "foo" is not necessary as far as cookie
> domain is concerned.

That's great (although I disagree with several bits of the past three messages, but this isn't the bug for that), but this issue isn't just about cookies.

It sounds like either way, adding "foo" for cases of "*.foo" will cause what you view "the right thing" as to happen, so you are no longer opposed? Is that a fair statement?
(In reply to Gervase Markham [:gerv] from comment #56)
> Ok, here it is:
> https://wiki.mozilla.org/Public_Suffix_List/platform.sh_Problem
> 
> There's one question for Ryan in it - my apologies if it's answered already
> somewhere in this thread.
> 
> Particularly looking at Ryan, but anyone: if the document is incorrect on
> the facts, please fix it - we are all looking for clarity! And hopefully, as
> we set down a clear narrative, either consensus will emerge, or it will
> become clear exactly where the disagreement lies.

Gerv: Before derailing this bug or the wiki page, what's the preferred way to collaborate? There's definitely some factual issues here, but I don't want to start an edit war, and I don't see a good way for collaborative discussion through the MozWiki system, only editing.
(In reply to Ryan Sleevi from comment #62)
> It sounds like either way, adding "foo" for cases of "*.foo" will cause what
> you view "the right thing" as to happen, so you are no longer opposed? Is
> that a fair statement?

Sorry, I'm still opposed to it:) though not because of cookies.

Erik's and my protest is that, "il" is not a public suffix, per our understanding of the definition; therefore "il" should not be included in PSL. You argue that "il" is a public suffix. 

If the definition should be changed, please do, but please give us a clear definition.
Ryan: feel free to edit; if I disagree with the edits, we'll take it to discussion rather than edit warring. I'm sure we can manage :-)

Gerv
(In reply to zhong.j.yu from comment #64)
> Erik's and my protest is that, "il" is not a public suffix, per our
> understanding of the definition; therefore "il" should not be included in
> PSL. You argue that "il" is a public suffix. 
> 
> If the definition should be changed, please do, but please give us a clear
> definition.

I've forwarded Gerv a thread that I started in 2013 in which I asked him (and the other PSL maintainers) regarding this. Gerv, Jothan, and Simone at that time all responded similarly in support of including *the full IANA root zone database* in the PSL.

Indeed, if you look at the history of the PSL itself, you will find it actually expressed this in it's very first incarnation - https://wiki.mozilla.org/TLD_List - the thing that birthed the PSL itself.

I asked, very explicitly, in 2013 of the PSL maintainers whether we (Chrome, but also Google) should be treating the PSL as containing the IANA Root Zone database or as a supplemental overlay for purposes of cookies. The resounding response, from all maintainers, is that the PSL is "the RZD + domain policies". 

Unquestionably during that discussion was that each of the core PSL maintainers - those committing code - agreed that the PSL exists to support browser behaviour, and is not strictly limited to cookies, nor had it been for several years.

In fact, in that thread (and I hope Gerv doesn't mind me quoting him out of context), he specifically stated

> As discussed in the CAB Forum today [note 19/12/2013], I'd say "be a superset of" is the
> right characterization.
>
> [snip]
>
> So if something is in the RZD [note: root zone database], it should be in the PSL - if not,
> that's a bug.

This, of course, also reflects the PSL maintainers involvement in the IETF DBound group ( https://www.ietf.org/mailman/listinfo/dbound ), in which the discussion is about expressing domain boundaries.

What I'm trying to communicate to you and Erik is that the PSL has, for years, not been solely about cookies, and that's a position that I'm somewhat surprised to see Gerv arguing against, since he was the clearest proponent of fully expressing all operational TLDs in the PSL in the thread in which I precisely asked him whether or not the PSL should forbid such usages (those which he now seemingly argues are non-conventional/broken, except at the time were seen very much in line with spirit and intent of the PSL)

Now, we can haggle around the definition of what's a public suffix, but I think that's thoroughly unproductive and reflects an idealistic world that ignores how it is actively being used, has historically been used, and had been previously and explicitly 'blessed' by the PSL maintainers as being within purpose. But I don't believe for a second that definition should block landing the fix, because that question of definition exists wholly independent of whether or not "il" is a domain.

I can hope you understand your opposition only encourages forking, because at the end of the day, it's more important to do the right thing for users and implementations than it is about having an idealized definition. The W3C/HTML have a phrase for this, "priority of constituencies" - http://www.w3.org/TR/html-design-principles/#priority-of-constituencies - in which theorhetical purity is the least important.

Since we agree it won't intrinsically cause problems (other than those that already misimplemented the PSL, such as both Firefox and Chrome), since we agree it leads to the right behaviour (even if implementations don't check/preprocess, as a number of implementations don't), and since it does solve real problems (such as allowing the resolution of *.platform.sh), I'd much rather "Fix it" and then argue about what the PSL is or should be in a separate bug, where we can argue about whether we should rip out all of the new gTLDs from the PSL (according to the definition of the PSL, ostensibly we should, since there's a variety of gTLDs that are not public suffices according to the definition being advocated; .google is but one of many such examples)
(In reply to zhong.j.yu from comment #64)
> Erik's and my protest is that, "il" is not a public suffix, per our
> understanding of the definition; therefore "il" should not be included in
> PSL. You argue that "il" is a public suffix. 
> 
> If the definition should be changed, please do, but please give us a clear
> definition.

Perhaps a more useful definition would be the one that Mozilla's very own code reflects (and which I updated the Wiki page Gerv mentioned), which is "effective TLD"

An effective TLD is one that is an actual, literal TLD (e.g. IANA delegated), or, within the space of those TLDs, behaves "similar to" a TLD (that is, offers registrations for third parties which are administratively disjoint).

This definition suffers on a few fronts:
- With the ICANN gTLD land rush, "brand TLDs" are a thing, in which foo.domain and bar.domain may be under the same operational aegis. This, of course, is broken for a litany of reasons wholly independent of any public suffix causes, but itself is not compatible with the public suffix algorithm of "*" assuming that foo.domain and bar.domain are disjoint.
- This doesn't provide a good definition for "things in the middle" (kobe.jp, platform.sh), since TLD as a concept implies top level,  and it's weird to have middle-level carveouts

This reflects its usage in things like FIDO Alliance's U2F facets, usage in browsers (such as Firefox and Chrome), and in popular libraries and languages such as Guava and Go.
(In reply to Gervase Markham [:gerv] from comment #56)
> Ok, here it is:
> https://wiki.mozilla.org/Public_Suffix_List/platform.sh_Problem

Hi Gerv, thank you for writing this up. I think it covers the issues very well and offers a nice set of solutions too. I agree that solution 1 is the best.

The solutions seem to refer to *.platform.sh as a rule that has not been added yet, but it has. Also, I think you meant "wont" instead of "want".

I have another suggestion for the first sentence at publicsuffix.org:

A "public suffix" is a domain under which Internet users can register names.

This proposal changes "one" to "domain", removes "historically could" and removes "directly". It may remove confusion for some readers, though the cookie examples could be improved further by adding the *.platform.sh case, showing the alternating nature of cookie validity as you traverse that part of the tree. But perhaps that example should only be added after the cookie spec is modified, or with a note that not all browsers implement this the same way.
Thanks, Ryan, that's very helpful. 

I don't have a problem with re-categorizing "kobe.jp" and "il" as public suffixes. They are "public", after all. I'm no longer opposed to including them in PSL either, even though I don't like it.

"*.platform.sh" is the only private company with a wildcard rule. That is a very bad precedence. They should have provided a list of domains instead of a wildcard.

And for gTLDs like ".google" and ".youtube", it's very hard to understand to a casual observer just in what sense they are considered "public". Right now cookies cannot be set on these domains. But I don't believe Google would hesitate to change Chrome's cookie policy when they find it convenient. 

The situation at publicsuffix.org seems quite fluid. I have no more inputs. Hope you guys can sort it out soon:)

I will not file an errata to rfc6265 at this time either.
(In reply to zhong.j.yu from comment #69)

Zhong, thank you for looking into the cookie issues. I think your contribution to this discussion has been valuable. I agree that things are still a bit fluid. I hope that the cookie spec will be tightened up in the future, perhaps with your help.
I need to read the recent discussions here and study Ryan's updates to the wiki page, but I'm not going to get that done today and I'm on holiday next week. So let's resume in ten days time :-)

Gerv
I see that *.il has been updated, but there are a few others, such as *.bd and *.kobe.jp. Are they going to be updated in a similar fashion when the information is received from the registries?
Whiteboard: [necko-would-take]
I just found this bug.  I read the *.platform.sh bug, the wiki page, and the Chrome bug on this, but not all the comments on this bug (apologies).

I'm concerned about the idea that if:

*.x

...is a rule in the registry, it implies:

x

...should always be a rule as well.  The relevant Wiki section on this talks about "if kobe.jp can set cookies foo.bar.kobe.jp can read that's bad", which I agree with, but I think if that can occur, that's a clear bug in the cookie store; the cookie store should only return cookies up the chain to the eTLD, and not just for any domain which is a suffix of the current host.

So the relevant bit of the Wiki rationale here is the idea of navigability, that one should not be allowed to navigate to a TLD.  Here the important bit to me is whether a particular input _is_ theoretically navigable, i.e. what will happen when you make DNS requests.  My knowledge of DNS is extremely poor, but my impression is that since "com" is a TLD, if you try to put a host on your network named "com", navigation to that host may not work, since DNS lookup may return the nameserver for the "com" TLD.  In the case of the Chrome omnibox, we want to make can't-possibly-succeed navigations searches, so we check for inputs that look like TLDs.

But in the "kobe.jp" case, it seems like there's no particular reason the registrar couldn't choose to allow navigation to kobe.jp.  This is, in fact, the precise case we really have with platform.sh, and from my perspective, there's nothing wrong with "sh" and "*.platform.sh" as rules and having platform.sh be navigable, and there's no particular motivating reason why we need to add "kobe.jp" to the PSL.

And from a semantic perspective, having both "*.x" and "x" as rules seems wrong, which was probably the reason for the architecture bug 1163015 complains about; if every subdomain of x is an eTLD, then x is not the effective TLD of any of those subdomains, and thus is not an eTLD at all.

So, IMO, bug 1163015 does not need to be fixed, the PSL does not need to be changed, https://bugs.chromium.org/p/chromium/issues/detail?id=459802 should be fixed to remove Chrome's assumption here, and we should ensure that doesn't open some sort of cookie-related security hole in Chrome.  Basically, this is "solution 1" on the Wiki, but without the last bullet point, or the associated Firefox bug that would need fixing before it could be implemented.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Closing in favor of the ML discussion.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: