Closed Bug 1966515 Opened 5 months ago Closed 4 months ago

Let's Encrypt: Issuance for Invalid Internationalized Domain Name

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: aaron, Assigned: aaron)

Details

(Whiteboard: [close on 2025-06-03] [ca-compliance] [uncategorized])

Preliminary Report

We are filing this preliminary report to begin a discussion within the WebPKI community about whether or not the behavior described below is in violation of the Mozilla or Baseline Requirements. We believe that the behavior is fully compliant with the requirements as they stand, but want to share our research to confirm whether the rest of the community shares our understanding.

If the root programs agree with our assessment, we ask that this bug be closed as INVALID.

Summary

Description

The certificate with serial 03:a8:3a:83:15:2b:87:e9:86:ee:11:d1:e3:70:7e:dd:bd:0b contains the Subject Alternative Name xn--2ug.walesbonner.net. The first label of this DNS name contains the punycode encoding of the single Unicode codepoint U+200E, “LEFT-TO-RIGHT MARK”.

The question is whether the issuance of this certificate (or others containing DNS names with similar labels) constitutes a violation of any root program requirements.

Relevant policies

RFC 5280, Section 7.2, says:

To accommodate internationalized domain names in the current structure, conforming implementations MUST convert internationalized domain names to the ASCII Compatible Encoding (ACE) format as specified in Section 4 of RFC 3490 before storage in the dNSName field.

Examining the ToASCII encoding algorithm described by RFC 3490, Section 4.1, we see:

  1. Perform the steps specified in [NAMEPREP] and fail if there is an error.

Here NAMEPREP is a reference to RFC 3491, which is simply a profile of STRINGPREP, also known as RFC 3454. That document dedicates all of Section 5 to a list of “Prohibited Output”, and Section 5.8 specifically lists U+200E “LEFT-TO-RIGHT MARK” as a prohibited character.

Therefore, it seems correct to conclude that xn--2ug.walesbonner.net is not a valid Internationalized Domain Name.

However, despite the fact that this name contains a code point disallowed by RFC 5280, we believe that it is nonetheless in compliance with the Baseline Requirements. To show this, we first examine three definitions from the Baseline Requirements, Section 1.6.1:

P-Label: A XN-Label that contains valid output of the Punycode algorithm (as defined in RFC 3492, Section 6.3) from the fifth and subsequent positions.

XN-Label: From RFC 5890: "The class of labels that begin with the prefix "xn--" (case independent), but otherwise conform to the rules for LDH labels."

Non-Reserved LDH Label: From RFC 5890: "The set of valid LDH labels that do not have '--' in the third and fourth positions."

By these definitions, the label xn--2ug is a valid P-Label, as it begins with the prefix “xn--”, and the remainder of the label is valid output from the Punycode algorithm with input U+200E. Note that this definition directly references the Punycode spec (RFC 3492) rather than referencing the higher-level IDNA spec (RFC 3490) that we examined above, and that the Punycode algorithm itself does not forbid any particular code points from being encoded.

Next we examine the restrictions on a dNSName in a Subscriber Certificate Subject Alternative Name, as set forth in the Baseline Requirements, Section 7.1.2.7.12:

The Fully-Qualified Domain Name or the FQDN portion of the Wildcard Domain Name contained in the entry MUST be composed entirely of P-Labels or Non-Reserved LDH Labels joined together by a U+002E FULL STOP (".") character.

Because the first label is a valid P-Label, and the second and third labels are valid Non-Reserved LDH Labels, the whole DNS name is valid for inclusion in the SANs.

And here at the bottom of this same section, we see the following call-out:

Note: As an explicit exception from RFC 5280, P-Labels are permitted to not conform to IDNA 2003. These Requirements allow for the inclusion of P-Labels that do not conform with IDNA 2003 to support newer versions of the Unicode character repertoire, among other improvements to the various IDNA standards.

So it seems clear that a DNS name which is not a valid IDN is nonetheless permitted by the Baseline Requirements, and that the issuance of a certificate containing the name xn--2ug.walesbonner.net does not constitute a violation of the requirements.

At the same time, it seems that perhaps the intent of that exception is so that DNS names can, for example, comply with IDNA 2008 (RFC 5890 and related documents) instead. That specification introduced an IANA Registry which marks U+200E as “DISALLOWED”. So perhaps the issuance of a certificate containing the name xn--2ug.walesbonner.net is in violation of the spirit of the requirement, if not its letter.

Source of incident disclosure

Certificate Problem Report from a third-party researcher, received at 04:04 UTC, 2025-05-14.

Flags: needinfo?(incident-reporting)

While these IDNs technically comply with the Baseline Requirements, specifically the requirement that they begin with xn--, their validity as DNS names is questionable:

  1. Although the Punycode strings (P-labels) are syntactically valid, the corresponding Unicode representations (U-labels) may include characters disallowed by IDNA or result in invalid domain names (RFC 5891, Section 5.4).
  2. These domains may appear resolvable due to wildcard DNS configurations (e.g., all subdomains under ugo.florist resolve to the same IP), but it’s unclear whether the specific subdomain labels actually exist.

Some examples of relevant IDNs extracted from certificates in CT logs:

This is a very interesting bug, thanks for proactively disclosing!

Section 7.1.2 of the BRs state:

Except as explicitly noted, all normative requirements imposed by RFC 5280 shall apply, in addition to the normative requirements imposed by this document

Based on the Note you referenced from BRs Section 7.1.2.7.12, does the produced dNSName value support "newer versions of the Unicode character repertoire, among other improvements to the various IDNA standards" to justify the exception?

(In reply to Dimitris Zacharopoulos from comment #2)

Based on the Note you referenced from BRs Section 7.1.2.7.12, does the produced dNSName value support "newer versions of the Unicode character repertoire, among other improvements to the various IDNA standards" to justify the exception?

First, I'll note that the quoted sentence is non-normative text provided as motivation for the exception, not a restriction placed upon the exception. That said:

On the one hand, it does not support "other improvements to the various IDNA standards", which I would interpret to essentially mean "IDNA 2008". As I mentioned, the Unicode code point in the example domain used throughout the report is also marked as DISALLOWED in IDNA 2008.

On the other hand, it does support "newer versions of the Unicode character repertoire". Here I'll quote from Corey Bonnell's message on MDSP:

This flexibility is needed because domain registrars have not universally settled on IDNA 2008 (there’s a lot of domains out there with emoji, which are generally disallowed by IDNA 2008), and user agents generally do not follow the IDNA standards.

The code point in the specific example used in this report (U+200E "LEFT-TO-RIGHT MARK") is of course not an emoji, and is not generally renderable by User Agents. But as he states, many emoji are also disallowed: for example, the IDNA 2008 registry disallows the whole range 1F300-1F6D5 CYCLONE..HINDU TEMPLE, which includes all basic smileys such as U+1F60E "SMILING FACE WITH SUNGLASSES". I believe it is not the role of Certificate Authorities to make the determination of which characters can or should be rendered by User Agents. Quoting again from Corey:

While this may appear to be overly permissive, it is important to remember that the BR profile requires that the subject CN be represented in its LDH-label form and must not use U-labels. LDH label representations are always unambiguous in terms of the DNS protocol, so there is no possibility of confusion or spoofing. Thus, this requirement for using LDH labels throughout the profile provides a clean separation of concerns: the CA need not be concerned about the vagaries of user agent behavior (which can change with no notice or change to a standard) and is instead exclusively concerned with the validation and processing of domain names as represented by LDH labels. The guardrails established by the P-label definition provide assurance that XN labels will, at the very least, contain valid Punycode. The rendering of that Punycode domain label in Unicode (or not) is exclusively a user agent concern.

Aaron - thanks for raising this question for discussion, and for the depth of analysis provided.

Taking our own research into consideration, the contents of SC-048, the in-force BRs, the comments in this bug, and early comments in the MDSP discussion, we support closing this with a status of INVALID.

We find no violations of the TLS BRs or the Chrome Root Program Policy.

If others disagree with that perspective, we welcome additional discussion.

[Note: We're not clearing the need-info to encourage other SC members to share their views.]

Summary:
The certificate is technically compliant with the Baseline Requirements and the Mozilla Root Store Policy, so this bug should be closed as "INVALID". However, Mozilla has significant concerns about the inclusion of Unicode code points that are disallowed by both IDNA 2003 and IDNA 2008. Mozilla believes such practices undermine user safety and trust in the Web PKI and recommends that the CA/Browser Forum clarify the rules to better protect against domain spoofing and misuse.


Statement of Issue:
Whether a certificate containing a DNS name with prohibited Unicode characters complies with the CA/Browser Forum Baseline Requirements. Specifically, the SAN in question contained a DNS label xn--2ug, which is the Punycode representation of U+200E (LEFT-TO-RIGHT MARK), which is "Prohibited" or "Disallowed" in IDNA 2003 and IDNA 2008, respectively.


Applicable Rules:

  • BRs Section 7.1.2: Normative requirements of RFC 5280 apply except where explicitly overridden.
  • BRs Section 7.1.2.7.12: Requires the "Fully-Qualified Domain Name ... contained in the [SAN] entry [to] be composed entirely of P-Labels or Non-Reserved LDH Labels joined together by a U+002E FULL STOP (“.”) character." It is noted, "As an explicit exception from RFC 5280, P-Labels are permitted to not conform to IDNA 2003. These Requirements allow for the inclusion of P-Labels that do not conform with IDNA 2003 to support newer versions of the Unicode character repertoire, among other improvements to the various IDNA standards."
  • RFC 5280 and RFC 3490/3491: Require IDNs to undergo Nameprep, which prohibits characters like U+200E.
  • IDNA 2008 and IANA Tables: Also prohibit U+200E as a valid character for DNS labels.

Analysis:
From a strictly technical standpoint, the certificate is compliant with the TLS Baseline Requirements. The TLS BRs explicitly define a P-Label as valid Punycode output, regardless of the Unicode input, and permit its use in SAN entries even when the input would be invalid under IDNA 2003. However, this permissiveness creates a significant risk to user safety:

  • Characters like U+200E can manipulate text rendering in unpredictable ways, potentially enabling domain spoofing or misleading UI behavior.
  • Even if such domains are syntactically resolvable, they may not be meaningful or displayable in a safe and predictable manner in browsers or other user agents.
  • It appears that the "Note" in section 7.1.2.7.12 of the BRs is meant to accommodate Unicode evolution and IDNA 2008 and not to promote the encoding of characters prohibited under IDNA specifications.

Conclusion:
The situation presented does not currently violate the TLS Baseline Requirements or other policy, but it could present a risk of spoofing and user confusion. To preserve user trust and ensure that domain names in certificates are meaningful, displayable, and safe, this issue should be addressed through updated requirements. Mozilla urges the CA/Browser Forum to prohibit the issuance of certificates containing SAN entries with P-Labels derived from Unicode code points that are disallowed by Nameprep (IDNA 2003) or IANA IDNA tables (IDNA 2008), or that are otherwise likely to create ambiguity, rendering issues, or spoofing risk in user-facing applications. The CA/Browser Forum needs to revisit and clarify the intent and scope of its exception in BR Section 7.1.2.7.12. A future ballot should consider restricting the use of P-Labels to exclude those derived from code points disallowed by modern IDNA standards and Unicode guidance, and to align certificate issuance to promote a safer and more predictable user experience.

Assignee: nobody → aaron
Whiteboard: [ca-compliance] [uncategorized]

With two root programs concurring that this bug should be closed as INVALID, we present the following closure summary (not because we have to, but because it's fun!). We do not intend to provide any further updates on this bug, and ask that it be closed as INVALID at your earliest convenience.

Report Closure Summary

Description

Let's Encrypt has issued at least one certificate containing a dnsName SAN containing a P-Label which encodes a character (in this case, U+200E "LEFT-TO-RIGHT MARK") which is disallowed by IDNA 2003 and IDNA 2008. This is a violation of RFC 5280, which requires that Internationalized Domain Names comply with IDNA 2003. This is not a violation of the Baseline Requirements or Root Program policy, as the BRs contain an explicit exception to that clause of RFC 5280.

Root Cause(s)

Let's Encrypt's issuance software contains a policy package which enforces various requirements and limitations on the names we are willing to issue for. This package strictly implements the definitions found in the Baseline Requirements, and does not go above-and-beyond to implement IDNA 2003 or IDNA 2008 restrictions on names. A prior incident regarding Reserved LDH Labels led us to implement this strict interpretation of the requirements.

Remediation

We engaged our incident response procedures, drafted the above preliminary report, and engaged the community in discussion both about whether the described behavior is a compliance incident (conclusion: it is not) and whether the described behavior should be an incident (conclusion: more discussion to be had).

We are internally discussing whether we should voluntarily restrict our issuance to exclude names which do not comply with IDNA 2003 or IDNA 2008. We have not yet come to a conclusion on this topic, largely due to the existence of domains such as i❤️.ws and 🦗.fm, which are part of the Web and have certificates today but would be forbidden by adherence to IDNA.

Commitment summary

We will participate in future public discussion regarding whether dnsNames in publicly-trusted certificates should be required to adhere to IDNA 2003 and/or IDNA 2008.

Flags: needinfo?(incident-reporting)
Flags: needinfo?(incident-reporting)

(In reply to Aaron Gable from comment #6)

We are internally discussing whether we should voluntarily restrict our issuance to exclude names which do not comply with IDNA 2003 or IDNA 2008. We have not yet come to a conclusion on this topic, largely due to the existence of domains such as i❤️.ws and 🦗.fm, which are part of the Web and have certificates today but would be forbidden by adherence to IDNA.

Commitment summary

We will participate in future public discussion regarding whether dnsNames in publicly-trusted certificates should be required to adhere to IDNA 2003 and/or IDNA 2008.

As an interim, would voluntarily enforcing tables C.1.2, C.2.2*, C.6, C.7 & C.8 from https://datatracker.ietf.org/doc/html/rfc3454 (a subset of the tables from NAMEPREP) be appropriate?

These consist of characters that would generally be not acceptable for display in browsers due to the ability to confuse or impersonate domains - control, display & space characters. It would seem to serve as an appropriate middle ground in the interim, and probably a good starting point for a BR amendment.

(* ZWJ - 200D could potentially be an exception as it is included in a variety of emojis and script)

(In reply to Sophie Jones from comment #7)

As an interim, would voluntarily enforcing tables C.1.2, C.2.2*, C.6, C.7 & C.8 from https://datatracker.ietf.org/doc/html/rfc3454 (a subset of the tables from NAMEPREP) be appropriate?

While this could be a reasonable interim measure, it does require making a series of decisions around what kinds of characters are or are not appropriate in domain names. As you not yourself, even those tables are not a clean subset, as the Zero-Width Joiner has many legitimate uses. There is a very reasonable argument that a CA should not refuse to issue for domains that a browser is happy to load and render: doing so renders visitors to those sites incapable of benefiting from encryption, through no fault of their own.

As noted above, we would prefer to have this conversation on mailing lists, so that this bug can be closed. We do not intent to provide any further updates, and ask that it be closed as INVALID.

This is a final call for comments or questions on this Incident Report.

Otherwise, it will be closed as "INVALID" on approximately 2025-06-03.

Status: NEW → ASSIGNED
Whiteboard: [ca-compliance] [uncategorized] → [close on 2025-06-03] [ca-compliance] [uncategorized]
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Flags: needinfo?(incident-reporting)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.