1184059 - compatibility issues with domain labels beginning or ending with hyphens

Anne (:annevk)

Reporter

Description

•

10 years ago

I cannot access https://smaug----.github.io/ in Firefox which smells a lot like bug 1136616.

Dana Keeler (she/her) [:keeler]

Assignee

Comment 1

•

9 years ago

It's because mozilla::pkix disallows labels ending with a hyphen: https://dxr.mozilla.org/mozilla-central/rev/4feb4dd910a5a2d3061dbdd376a80975206819c6/security/pkix/lib/pkixnames.cpp#1996 Is there a compelling reason to change this?

Flags: needinfo?(annevk)

Anne (:annevk)

Reporter

Comment 2

•

9 years ago

Well, I can access this resource in other browsers. And I can access this resource in Firefox if I don't use HTTPS. So there is some compatibility concern here. Also, I think it's problematic that we have two divergent stacks with regards to hosts/domains. pkix made a decision here with regards to domains that hasn't made it through the rest of the Firefox codebase. It would be much better if we had a shared standardized primitive.

Flags: needinfo?(annevk)

Dana Keeler (she/her) [:keeler]

Assignee

Comment 3

•

9 years ago

(In reply to Anne (:annevk) from comment #2) > It would be > much better if we had a shared standardized primitive. Yes, that would be a good thing to have.

Whiteboard: [psm-backlog]

Dana Keeler (she/her) [:keeler]

Assignee

Comment 4

•

9 years ago

Note that they're also disallowed in the start of a label: https://dxr.mozilla.org/mozilla-central/rev/4feb4dd910a5a2d3061dbdd376a80975206819c6/security/pkix/lib/pkixnames.cpp#1935

Summary: (presumably) mozilla::pkix disallows several hyphens in domain label → compatibility issues with domain labels beginning or ending with hyphens

ｓｕｒａｓｈｕ

Comment 7

•

9 years ago

Would it be detrimental to allow it in the meantime while the standard is still unclear? I can't think of any downsides to it being allowed while things are still undecided.

Dana Keeler (she/her) [:keeler]

Assignee

Updated

•

8 years ago

Priority: -- → P3

Anne (:annevk)

Reporter

Comment 9

•

7 years ago

FWIW, I think this basically needs to allow all the domains https://url.spec.whatwg.org/#concept-host-parser can return. Unfortunately there isn't really a declarative production to compare to.

J.C. Jones [:jcj] (he/they)

Comment 10

•

7 years ago

RFC3696's language is "If the hyphen is used, it is not permitted to appear at either the beginning or end of a label." We contravene RFCs when suitable, I just want to be sure we should. Let me cc in Adam Roach, since he's an apps AD at IETF. Maybe he'll have an opinion? :)

Flags: needinfo?(adam)

Anne (:annevk)

Reporter

Comment 11

•

7 years ago

Note that we already have at least one exception as we allow underscores. Restricting domain labels beyond what other browsers restrict them to is just bad for web compatibility.

Flags: needinfo?(miket)

Tom

Comment 12

•

7 years ago

Another way to frame the question: The client's resolver accepted the name. So did the server's domain, which responded with an address. Firefox accepted the name and made a request including it. The server accepted that request and presented a certificate for the requested name. _At_this_point_, what purpose is served by rejecting the name?

Mike Taylor [:miketaylr]

Comment 13

•

7 years ago

Attached image edge-bing-search — Details

(In reply to Anne (:annevk) from comment #11) > Note that we already have at least one exception as we allow underscores. > Restricting domain labels beyond what other browsers restrict them to is > just bad for web compatibility. Agree. Especially with an opaque "Your connection is not secure" error message. If we feel like it's worth being different than Chrome and Safari, we should be way more clear to the user and developers. Otherwise it just feels like Firefox is busted. (Edge does something...interesting: they treat the domain as a search query)

Flags: needinfo?(miket)

Adam Roach [:abr]

Comment 14

•

7 years ago

> We contravene RFCs when suitable, I just want to be sure we > should. Let me cc in Adam Roach, since he's an apps AD at IETF. > Maybe he'll have an opinion? :) I poked around to see if any of the IETF ops folks wanted to weigh in, but didn't get anything back from them. From an *standards* perspective, what I'd really like to see us to is (a) remain standards compliant (that is, don't accept invalid hostnames); (b) fix up how we present this error to users; (c) be consistent between http and https (that is, stop accepting invalid hostnames for http); and (d) coordinate with Chrome, Safari, and Edge to do the same. I understand that there may be good product reasons to take a different path, but I'll let the people who are responsible for product decisions represent that position. I do, however, want to make sure they have the information necessary to weigh short-term product behavior against long-term protocol ecosystem damage. While I don't agree with everything he says in here, Martin Thompson (ni'd so he can weigh in) makes some good points about why the foregoing suggestion is better in the long run than accepting malformed hostnames: https://datatracker.ietf.org/doc/draft-thomson-postel-was-wrong/

Flags: needinfo?(adam) → needinfo?(martin.thomson)

Martin Thomson [:mt:]

Comment 15

•

7 years ago

BTW, it's now https://tools.ietf.org/html/draft-iab-protocol-maintenance and generating a ton of discussion. In short, there's always pressure to just fix the code. But we do no one a service in the long term if that's what we do. Yeah, Anne might be able to access Olli's page, but we've now just make the sekret kabal of browsers that much more exclusive. Interesting that curl and wget permit this as well. Adding Daniel.

Flags: needinfo?(martin.thomson)

Martin Thomson [:mt:]

Comment 16

•

7 years ago

Oops, forgot to add: https://github.com/curl/curl/issues/1441 On which we need a similar sort of conclusion.

Anne (:annevk)

Reporter

Comment 17

•

7 years ago

I think some nuance got lost here. We can parse these hosts already. However, we don't accept them over HTTPS because we have a separate host validation routine there, that doesn't match what we do elsewhere. We already plan to align our host parser with https://url.spec.whatwg.org/#concept-host-parser, which is a standard and can be implemented outside of browsers quite easily (and also deals with the IDNA question, incidentally), and has been adopted outside of browsers, e.g., by Node.js. The question here is mainly whether we should maintain a separate parser for HTTPS or whether we should just align it.

Martin Thomson [:mt:]

Comment 18

•

7 years ago

Arguably there is a bug in TR46 that allows this to be considered valid. That bug is with the CheckHyphens flag, which overloads two ideas: 1. that the Unicode domain name is not what RFC 5890 calls a tagged domain name (those with the "--" in the right place), and 2. that the Unicode domain name does not start or end with hyphens I can see the point in passing strings that are already encoded as tagged domain names through, and hence being able to disable checking of the third and fourth character for hyphens, however, there is no case where an LDH label with hyphens at the start or end is valid ... in the specs. Now, we might collectively agree that this is silly and that these restrictions can be lifted. Clearly, much of the infrastructure seems to pass them. But if we don't make the specs and reality match somehow we're back in the soup. Anne, the question you are asking isn't especially interesting, nor does it do much to advance things other than to sweep the problem under the rug. I'm more concerned that we don't actually seem to have a standard here and that various people are pulling in different directions. Let's not lose this opportunity to fix the real problem.

Martin Thomson [:mt:]

Comment 19

•

7 years ago

For the record, it appears as though GitHub no longer permits the creation of accounts that would produce an invalid domain name. https://screenshots.firefox.com/Of6NVCgMZ8PAe0kZ/github.com

Anne (:annevk)

Reporter

Comment 20

•

7 years ago

I'm confused, I pointed to a standard that we mostly follow (there's bugs, e.g., bug 1365893, surprise), except for HTTPS. It seems therefore that my question is quite relevant.

Martin Thomson [:mt:]

Comment 21

•

7 years ago

The problem here is that we claim to follow a bunch of relevant standards and some of those disagree. We should address the disagreement, not just pick the one that is most convenient.

Anne (:annevk)

Reporter

Comment 22

•

7 years ago

I chatted a bit with Martin since it seemed we're talking past each other. Outcome: a) This is necessary for the web b) This is what everyone does anyway c) We have a standard for host parsing that's web compatible: https://url.spec.whatwg.org/#concept-host-parser d) It's unclear how get this standard adopted by the relevant IETF's standards and get everyone aligned

Tom

Comment 23

•

7 years ago

[Ftr, my Android 6.0.1 phone can't resolve tom--.github.io.]

Anne (:annevk)

Reporter

Comment 24

•

7 years ago

My bad, c) in comment 22 is wrong somewhat. Bug 1136616 comment 14 goes into this to some extent and https://github.com/whatwg/url/issues/159#issuecomment-271674428 has rationale why URLs (and their host parser) and DNS diverge. So imposing additional restrictions on URL hosts when doing DNS is warranted and it's an open question to what additional restrictions are needed (or need to be removed, from the other direction).

Joshua Cline

Comment 26

•

6 years ago

Just encountered this bug myself in the wild (student web page, they use chrome), here is a test page that should be good to test against.

https://subdomain-ending-with-a-dash-.glitch.me/

On chrome it shows no warning. IMO, this seems like one of the places where the browser should just accept the malformed URL since all other parts of the system accept it. Or alternatively change the message so it contains a line about it failing domain name syntax verification since the certificate is still valid, even if it may be unintentional.

Anne (:annevk)

Reporter

Updated

•

5 years ago

Comment 28

•

5 years ago

Perhaps an approach here is that we don't validate wildcarded labels? So we remain strict for registrable domains, but any subdomains of those that use a wildcarded certificate are reachable, as long as the networking stack allows.

So you cannot use a certificate for test--.example.com, but if you have a certificate for *.example.com, test--.example.com uses that, and the browser generally allows for navigating to test--.example.com, mozilla::pkix won't complain.

Ryan, I realized we discussed this many years ago, but it keeps coming back. Any thoughts on what browsers can align on here?

Flags: needinfo?(ryan.sleevi)

Ryan Sleevi

Comment 29

•

5 years ago

Anne: The only punts the problem to a different layer, the QUIC/TLS layer, for which the specs then point to the DNS RFCs and things get weird again.

For example, TLS 1.3 has Server Name Indication support as mandatory. Punting the problem from the DNS layer ("We'll accept anything") and from the certificate layer ("We'll allow any label to match a wildcard") still has to deal with the TLS layer, on both the client and the server, in validating the protocol invariants. That is, the following language:

   "HostName" contains the fully qualified DNS hostname of the server,
   as understood by the client.  The hostname is represented as a byte
   string using ASCII encoding without a trailing dot.  This allows the
   support of internationalized domain names through the use of A-labels
   defined in [RFC5890].  DNS hostnames are case-insensitive.  The
   algorithm to compare hostnames is described in [RFC5890], Section
   2.3.2.4.

For example, Chrome's QUIC implementation does Yet Another Thing different from its TLS, certificate, and HTTP implementations, using this method to make sure that the SNI is valid. Why? Because it's Yet A Different Team, and that team doesn't have access to Chrome's URL parser in non-Chrome use cases, nor does it want to have to carry Chrome's (not-quite-)WHATWG URL parser. In that scenario, it would allow test--.example.com but wouldn't allow -test.example.com, even though the browser would allow navigation. It only allows underscores because of an unaddressed TODO while implementing.

I'm not sure I have much useful to add beyond what :abr and :mt mentioned, in Comment #14 / Comment #15, and which I understood the statement in Comment #24 to be about.

Regrettably, it seems like this is another place where Chrome had a bug / didn't correctly implement the spec, and things went downhill. The relevant code checks to make sure it starts with a letter-digit, but doesn't make sure it ends on a letter-digit. I filed a bug for that.

URL parsing continues to be a slightly-mitigated-but-overall-security-disaster. Bugs like https://crbug.com/449829 (Fixed in https://crbug.com/456391) or https://crbug.com/695474 show where the misalignment between DNS and URLs gets... messy. Similarly, reconstituting hostnames for URLs, as done by HTTP/2, also leads to weirdness (Fix). While it seems like more folks are starting to align on a WHATWT spec (such as in response to security issues), it seems they're doing it by forking Chrome's parser, bugs and all, which is discouraging.

I know Comment #24 captured some of my past concerns about the WHATWG URL spec being prescriptive about hostnames, but I do wonder if that's the only path to get out of this mess. While I wouldn't be comfortable with the lax parsing as is (which completely disregards RFC 1123/1034), perhaps the answer is to fold that text in, highlight some of the exceptions (e.g. underscores), figure out the interop issues (e.g. trailing hyphens), and effectively hard fork 1034/1123. If we're not willing to do that, it seems like we need to figure out a plan to break web compat to get alignment back on 1123/1034, and while that may not be the best thing to do during a global pandemic, it might be the best thing to do for the long term ecosystem.

Flags: needinfo?(ryan.sleevi)

Anne (:annevk)

Reporter

Updated

•

5 years ago

See Also: → https://github.com/whatwg/url/issues/397

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 30

•

5 years ago

(In reply to Anne (:annevk) from comment #28)

Perhaps an approach here is that we don't validate wildcarded labels? So we remain strict for registrable domains, but any subdomains of those that use a wildcarded certificate are reachable, as long as the networking stack allows.

Pleaes, no. It shouldn't be possible to use a wildcard certificate for any domain for which one couldn't get a normal single-domain certificate for.

If we're not willing to do that, it seems like we need to figure out a plan to break web compat to get alignment back on 1123/1034, and while that may not be the best thing to do during a global pandemic, it might be the best thing to do for the long term ecosystem.

Let's see some statistics on how often users make requests to domains that have labels with leading and/or trailing underscores. I bet the numbers would be well within the tolerance for deprecation & removal and maybe even lower than those for underscores in HTTPS domain names, which were already deprecated (and removed?).

Matt Basta [:basta]

Comment 32

•

5 years ago

I bet the numbers would be well within the tolerance for deprecation & removal and maybe even lower than those for underscores in HTTPS domain names, which were already deprecated (and removed?).

For clarification, does this imply that you would formally block domains with labels with leading/trailing underscores/hyphens on both HTTP and HTTPS? Or that you would simply formalize this error for HTTPS only? Or that you would remove the validation rule and allow domains with labels with leading/trailing underscores/hyphens?

To add my own 2c: blocking domains with labels with leading or trailing underscores/hyphens doesn't just change the behavior of domains, it has a second-order effect of pushing application developers to change the behavior of usernames/slugs in their software and data models. It's extremely common for services to allow subdomains with usernames; adding constraints more strict than [-a-z0-9]+ would have a ripple effect requiring frameworks and app developers to update their validation rules to cope with this rule (and deal with ongoing customer support issues from existing users with usernames/slugs that are invalid as a subdomain).

Django, for instance, validates slugs (SlugField, validate_slug, etc.) with a simple [-\w]+. Standardizing on the current Firefox behavior would require Django to either deprecate slugs as they exist today for use with subdomains, or provide another field type for this purpose:

https://github.com/django/django/blob/7c6b66383da5f9a67142334cd2ed2d769739e8f1/django/core/validators.py#L232

Github, as mentioned, has started disallowing usernames with trailing hyphens, but many already exist, and all of the Github Pages for these users are broken in Firefox. E.g.,: https://qix-.github.io

This is, unfortunately, a bug that's extremely easy for developers to overlook until after it's become a problem. My own service suffers from this issue, and only discovered this ticket as a result of a customer support ticket.

Ryan Sleevi

Comment 33

•

5 years ago

To be clear: The current Django behaviour is inconsistent with longstanding specs regarding DNS labels, so that's ostensibly a bug. RFC 1034 was written before I could even read, for example ;)

Now, I agree, the situation is messy because software inconsistently follows this, because different layers are more or less liberal (unfortunately), but from a spec perspective, it's not guaranteed these will work. I understand that's not perfect, but updating the hostname allocation to actually adhere to the preferred name syntax (and as modified by 1123) is a good thing. Unfortunately, bad advice often causes issues like this to be discovered later than ideal for servers.

In short, moving now is a good thing to do anyways, regardless of decisions made here.

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 36

•

3 years ago

The severity field for this bug is relatively low, S3. However, the bug has 8 duplicates.
:keeler, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)

BugBot (nomail) [:suhaib / :marco/ :calixte]

Comment 37

•

3 years ago

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(dkeeler)

Ksenia Berezina [:ksenia] on leave until 09/15

Updated

•

2 years ago

Webcompat Priority: --- → ?

See Also: → https://github.com/webcompat/web-bugs/issues/117269

Ksenia Berezina [:ksenia] on leave until 09/15

Updated

•

2 years ago

Webcompat Priority: ? → P3

Henri Sivonen (:hsivonen)

Updated

•

1 year ago

Updated

•

1 year ago

Blocks: 1912653

Chris Peterson [:cpeterson]

Comment 39

•

1 year ago

A Firefox Android user reported bug 1912653, which I think is an instance of this bug:

https://_-.pages.debian.net/gsoc2024-parsons-ballarin

Firefox Android fails to load the page and shows no error message.
Firefox desktop shows an SSL_ERROR_BAD_CERT_DOMAIN error, warning that the site's certificate is valid for *.pages.debian.net and pages.debian.net, not for _-.pages.debian.net.
Chrome redirects to https://gsoc2024-parsons-ballarin----b19696eb446388731f769f725c7642be22.pages.debian.net/ without error.

Dana Keeler (she/her) [:keeler]

Assignee

Updated

•

11 months ago

Duplicate of this bug: 1921620

Dana Keeler (she/her) [:keeler]

Assignee

Updated

•

10 months ago

Depends on: 1924585

Dana Keeler (she/her) [:keeler]

Assignee

Comment 41

•

9 months ago

Attached file Bug 1184059 - mozilla::pkix: allow reference ID labels to begin and/or end with hyphens r?jschanck — Details

Hyphens shouldn't appear at the beginning or end of labels. However, in
practice, sometimes a website will use a wildcard certificate and attempt to
match it with a reference ID with such a label. For compatibility, this change
allows reference ID labels to begin and/or end with hyphens.

Phabricator Automation

Updated

•

9 months ago

Assignee: nobody → dkeeler

Status: NEW → ASSIGNED

Dana Keeler (she/her) [:keeler]

Assignee

Updated

•

9 months ago

Webcompat Priority: P3 → ---

Component: Security: PSM → Libraries

Priority: P3 → --

Product: Core → NSS

John Schanck [:jschanck]

Comment 42

•

9 months ago

https://hg.mozilla.org/projects/nss/rev/30f7586f3919be26fe163c6fad7620fffe4d39a0

Status: ASSIGNED → RESOLVED

Closed: 9 months ago

Resolution: --- → FIXED

Dana Keeler (she/her) [:keeler]

Assignee

Updated

•

7 months ago

Duplicate of this bug: 1940446

edge-bing-search 7 years ago Mike Taylor [:miketaylr] 192.04 KB, image/png		Details
Bug 1184059 - mozilla::pkix: allow reference ID labels to begin and/or end with hyphens r?jschanck 9 months ago Dana Keeler (she/her) [:keeler] 48 bytes, text/x-phabricator-request		Details \| Review