Open Bug 1902670 Opened 18 days ago Updated 4 days ago

Google Trust Services: SXG certificates issued without correctly checking CAA restrictions

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: gts-external, Assigned: gts-external, NeedInfo)

Details

(Whiteboard: [ca-compliance] [uncategorized])

Google Trust Services is investigating an issue with the issuance of SXG certificates. We have identified a small number of certificates that were issued without correctly verifying CAA record parameters in a way that was fully conformant with our CP.

A fix has been deployed and the affected certificates will be revoked within the 24 hour timeline mandated by the Baseline Requirements.

We will post a full incident report by Monday, June 24.

Assignee: nobody → gts-external
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [uncategorized]

Incident Report

Summary

Google Trust Services issues both TLS and SXG certificates. SXG certificates have additional CAA checking requirements on top of RFC8659, which are outlined in section 3.2.2.8 of our CP based upon section 4.2 of the SXG draft specification. While working on refactoring to improve the codebase, we noticed a corner case where one of these additional SXG requirements was not properly checked.

Specifically, issuance of an SXG certificate requires the presence of an "issue" or "issuewild" CAA property that authorizes the CA to issue the certificate. We correctly prohibited SXG issuance if no CAA records were present, but we did not properly account for the case where no applicable records with the "issue" and/or "issuewild" property were present but other CAA records were present, which permits issuance. The most common scenarios where this occurred was when the subscriber’s only CAA record contained an “iodef” property or when the subscriber was issuing for a non-wildcard FQDN, but only had an “issuewild” CAA property. TLS CAA checks were unaffected and other CAA checks, including the “cansignhttpexchange=yes” requirement on relevant “issue” / “issuewild” properties, were not affected. All affected SXG certificates pass the CAA requirements for TLS server certificates per the TLS BRs.

An error occurred during the first intended rollout of the patch. The patch did not immediately reach production and 3 additional certificates were misissued with the bug described above.

All affected active certificates were revoked within 24 hours of their discovery.

Impact

There were 58 certificates issued due to an SXG CAA bug from July 19th, 2022 to June 14th, 2024. 9 were active at the time of incident discovery, 3 additional certificates were issued during incident handling, for a total of 12; all 12 were revoked within 24 hours. Issuance of SXG certificates was stopped for 2 hours and 28 minutes while the bug was being remediated. All certificates were short-lived certificates with lifetimes of 45 days. During the time the bug was present, we issued ~8.5 million total SXG certificates.

Timeline

All times are UTC.

2013-01-01

  • 00:00 RFC 6844 defining CAA is published

2018-05-10

  • 00:00 BR 1.5.8 is adopted, which states CAs MAY treat the non-empty CA record set without a prohibitive “issue” or “issuewild” property tag as permission to issue

2018-05-15

  • 16:20 CAA library is added to GTS’ Domain Validation Service. Previous issuance used a different implementation

2018-11-30

  • 00:00 Errata attempting to clarify the handling of non-empty CAA record sets without “issue” or “issuewild” property tag is set to “Held for Document Update”

2019-02-06

2019-03-11

  • 17:00 Feature request filed for SXG CAA feature containing summarized requirements

2019-04-10

  • 20:28 SXG CAA policy check with the bug that is the cause of this incident is added to the codebase

2019-11-01

  • 00:00 RFC 8659 is published clarifying the processing of a CAA RRset that is non-empty but does not contain any “issue” or “issuewild” property tags (see section 7.4)

2020-08-21

  • 00:00 GTS starts its publishing process for CP 1.12 which includes the requirement to follow the SXG CAA algorithm

2021-06-17

  • 01:54 GTS issues its first SXG certificate

2022-07-19

  • 18:00 First certificate affected by the bug was issued

2024-06-11

  • 19:32 Discovered potential presence of CAA bug in code affecting SXG certificates
  • 20:08 Notification sent to stop an ongoing rollout and rollout again once a pending code fix is submitted
  • 20:35 CAA bug affecting SXG certificates is confirmed. A patch with updated unit tests submitted

2024-06-12

  • 11:25 First unsuccessful rollout attempt to fix the issue is started
  • 14:20 First developer attains logs access
  • 23:40 Second developer attains logs access

2024-06-13

  • 17:56 Development of tool to identify authorizations that could have resulted in misissuance is completed
  • 21:11 Discovery of 9 affected active certificates
  • 23:46 GTS verifies our CAA checks pass caatestsuite.com

2024-06-14:

  • 02:04 Integration tests to verify code patch are complete
  • 02:36 Notifications sent to subscribers of the 9 affected certificates
  • 03:30 Final certificate affected by the bug is issued
  • 14:12 Incident officially declared
  • 14:27 Preliminary report is published to Bugzilla
  • 16:49 The 9 initially discovered active certificates are revoked
  • 18:05 Three (3) additional newly misissued certificates are discovered
  • 18:27 GTS realizes the newly misissued certificate are due to first rollout on 2024-06-12 not containing the fix
  • 19:04 GTS begins rollout to halt SXG issuance
  • 19:41 Rollout to halt SXG issuance completed globally
  • 21:10 Second rollout with fix included completed globally
  • 21:24 Development of tool for historical analysis tying affected authorizations to certificates is completed
  • 21:43 Confirmed all CAA checks attached to offending authorizations have expired
  • 21:56 Completed historical impact analysis
  • 22:09 SXG issuance is re-enabled globally
  • 22:43 The 3 remaining certificates are revoked

Root Cause Analysis

Background

GTS’ code has evolved over time. At first, it only supported RFC6844 and its errata, then GTS added support for the SXG draft, minor modifications were made for RFC8659, and finally added support for RFC8657. This incident was discovered as a new engineer was refactoring the CAA code in preparation to support SC-067, multiple perspective issuance corroboration (MPIC).

GTS’ CAA code runs as an RPC on our Domain Validation Service (DVS). The API started out very simply. The request was represented as { RequestContext ctx, string dns_identifier } and the response was { bool gts_allowed, Timestamp query_time }. The DVS handles all required logging and internally represents the result as either PERMITTED, NOT_PERMITTED, or NO_CAA_RECORD_FOUND.

In 2019, we added support for the SXG draft with an eye towards incorporating RFC8657, the CAA ACME extensions. The CAA RPC API was extended to support a set of open-ended restrictions which would satisfy both use cases and be flexible towards future changes. The CAA request restrictions were structured like the following.

message Restriction {
  oneof type {
    // Require at least one CAA record.
    bool require_caa_record = 1;
    // Key value equality.
    // If key is missing then we default to 'only_if_present' for deciding if
    // the check succeeds.
    EqualityRestriction equal_to = 2;
    ...
  }
}

SXG’s algorithm is represented as two separate restrictions on top of RFC8659, restriction: { require_caa_record: true } and restriction: { equal_to: { key: “cansignhttpexchanges” value: “yes” } }, which are checked independently. The EqualityRestriction ran against every relevant “issue” and “issuewild” property, while the “require_caa_record” policy check was implemented at the very end of the algorithm and essentially transformed results that had “NO_CAA_RECORD_FOUND” to “NOT_PERMITTED” if the restriction was in place.

However, the “require_caa_record” restriction did not properly require that a relevant “issue”/”issuewild” record was present in the recordset. This oversight allowed issuance which violated the draft spec and our CP. Any CAA record that doesn’t prohibit issuance under RFC8659 would be represented as “PERMITTED” and pass the check. We do not believe the primary issue was in the code itself, but instead in the translation between the requirements and the intermediate language of the policy restrictions. The restriction as described by its name and documentation was insufficient for the SXG use case when it was combined with our TLS implementation at creation. The code performed exactly as it was documented, but the documentation did not match the requirements.

This hypothesis is consistent with our tests. Our test coverage was high, but if the requirement is missed there will not be an associated test. GTS currently has 100% unit test code coverage over the directory containing the implementation of CAA. We also have end-to-end tests through a separate health monitoring service covering both FQDNs with and without wildcards, as well as negative tests (tests that should fail issuance) covering when no CAA records existed. There are also two separate CAA checks covering RFC8657’s ACME account and validation method requirements all over SXG specifically. Third, and finally, we manually run a test suite covering Andrew Ayer’s caatestsuite.com when making functional changes to our libraries, but that test suite does not contain SXG checks. At no point did we attempt to test the missing requirement.

Missed SXG CAA Requirement

There are a few factors that contributed to missing the SXG CAA requirement.

  1. Misunderstanding of non-empty CAA record sets without “issue”/”issuewild” properties. The TLS CAA algorithm and the SXG CAA addition were written by two different authors with a time gap between. Both were originally written against RFC6844 as RFC8659 was not yet published. RFC6844 was known to be ambiguous (2018 CA/B Forum discussion, RFC6844 Errata]) with respect to the proper handling of the non-empty CAA record set which did not contain relevant “issue”/”issuewild” properties, which is this exact issue. It is possible that the implementer and reviewer misinterpreted the implemented functionality for this edge case at the time of the SXG implementation. There was a significant amount of discussion about the topic in 2018 and GTS’ original CAA implementation for TLS certificates did allow issuance for this case, so it would be an inconsistent interpretation across the team and a bug regardless.
  2. The hard requirements were lost in the initial review. The original bug that assigned the subtask was written as:
    We need to confirm the presence of CAA when checking for HTTPExchange certificates and we
    need to check for the presence of an additional "cansignhttpexchanges" parameter.
    https://wicg.github.io/webpackage/draft-yasskin-http-origin-signed-responses.html#cross-origin-cert-req”
    ...
    
    The summarization of the problem is exactly what was implemented. It’s likely that the author and the reviewer overly relied on the bug description and the code would appear correct.

In this instance, we did not create a well-defined mapping between specification and code as we normally do to ensure compliance.

Requirements Handling for GTS Software

While handling the issue described in Bug 1751984 and in particular, considering the addition of the TLS 1.2 requirement to the “tls-alpn-01” challenge (RFC8737) immediately before its publication, we recognized we had a gap when handling new RFCs similar to this issue. It was too easy for MUSTs to be added to the requirements and for the team to assume that we already complied with the text and under scrutinize new ballots. We instituted a new practice within the team where triaging new RFCs requires copying the RFC into a separate doc and highlighting and commenting every MUST with a link to the relevant code and explanation. This doc receives its own review on top of the normal code changes to make sure the feature is end-to-end complete from a compliance standpoint.

Although we instituted the practice immediately and reviewed a few other RFCs that we had recently incorporated, we did not go back to the SXG specification and perform the same formal review upon it because it is still a draft. The SXG requirements only appear formally in our CP/CPS. The value or concern was perceived to be lower than other areas.

The CAA code had been touched many times over the past five years, but the error wasn’t detected. As described above, the code was consistent with its naming and documentation so there wasn’t anything that was easy to spot as wrong. Given the generic nature of the policy restrictions, there were not any links to the exact specification we were attempting to satisfy within the CAA code itself. The developer who discovered the bug asked a question over team chat about the specific restriction as they were looking at the code and the specification was provided to them in response. They then noticed the inconsistency. The documentation should have been right next to the code to enable less familiar developers to spot the discrepancy between intent and functionality. This is especially important for less obvious or less well-known requirements that originate from a draft specification.

Rollout of fix

After fixing the bug, a rollout to production was performed, but it mistakenly did not contain the patch. The changelist number of our configuration is independent of our code changelist. During the emergency rollout, the configuration changelist number was copied in the field for the code. That changelist number did not include the code patch.

To add more context, once the first rollout completed, we notified subscribers instructing them to fix their DNS records and request new certificates. One of them requested new certificates without updating their DNS. At this point, we realized the rollout did not include the fix, resulting in misissuing three additional certificates. As part of our expedited release process we did not perform a step to confirm the patch was deployed in production. Upon detecting the issue, GTS blocked issuance of SXG certificates as we worked to roll out the patch to production, this time confirming the results with a test. Before resuming issuance, we verified that all affected authorizations were deactivated or expired.

Root Cause Summary

  • Poor process used for translating the requirements from the draft to the intermediate language of the policy restrictions. There was no detailed review of the requirements like GTS began instituting after Bug 1751984. If this explicit mapping had been performed, it is likely the disconnect between the requirement and the restriction would have been raised.
  • Translation to generic intermediate policy language did not include a link to the hard requirements, making it difficult for developers to question the code.
  • Emergency rollout process did not include tests for the intended fix resulting in three additional certificates misissued.

Lessons Learned

What went well

  • All affected active certificates were revoked within 24 hours of discovery.
  • The problem was discovered internally.
  • Our audit logs allowed us to fully review past authorizations and narrow the scope of affected certificates.

What didn't go well

  • We needed to write custom code to interface with our raw DNS audit logs, which lengthened the investigation. The code we wrote should allow us to investigate more quickly in the future.
  • Our initial rollout did not include the fix. We did not catch the issue until performing a larger analysis of the entire dataset.

Where we got lucky

  • Only 7 unique active domains were impacted making notifications and revocation trivial.
  • The bug was limited to an additional CAA requirement for SXG certificates which is in addition to normal TLS CAA requirements.

Action Items

Action Item Kind Due Date
Add missing unit tests for SXG CAA Prevent 2024-06-11
Add missing E2E tests for SXG CAA Prevent 2024-06-28
Rename “require_caa_record” to “require_relevant_issue_or_issuewild_record” and rewrite documentation Prevent 2024-06-28
Add all CAA requirement documentation to the relevant code Prevent 2024-06-28
Update the unplanned rollouts procedure to include a verification step to ensure a fix is working Mitigate 2024-07-08
Perform a CAA requirement analysis against all relevant RFCs/drafts Detect 2024-07-12

Appendix

Details of affected certificates

Below are the precertificates of all impacted certificates.

12 active certificates during the time of incident remediation
https://crt.sh/?sha256=14295a7667091cedc41e49ef45d94ab1a85d10a74919a5ac66ae3a5f902f5752
https://crt.sh/?sha256=24eeb8e4899836a30b4326fb8d5b2542630fa70dcef646f1a7b3a20939916c3b
https://crt.sh/?sha256=7cfaab00f9859ccd9b6811aa9215a300012dec9438c711e49d960a474945201d
https://crt.sh/?sha256=59de765ed3047cc03bc1782e4865ed98ef643adfc549fbfa859daae3ee79c7c9
https://crt.sh/?sha256=6960256cf2524d02214089e01dc9853ecdf53d20d1b20b387e9303dbd9d4edf3
https://crt.sh/?sha256=66736f7bf78093043907f146a9b791c1b65e6608d8a2272d7e6605669e6cb53e
https://crt.sh/?sha256=dbd8c6556b6c4f27074fe60aa2e45afe7db864d3807e70342e393ddf795ceaff
https://crt.sh/?sha256=fe06e00bbfe611edadc2b909713f1aafb9f5d691d2660e2d79375014f7043586
https://crt.sh/?sha256=ebeb2313d5fd6e3801220bfd4b2b1247912ae07f433287658018896cfc82145c
https://crt.sh/?sha256=6fcbcda1c5b8bfb9e81d372e9ae571c04f579ccfe070ab7e64cdb64d882657ab
https://crt.sh/?sha256=59b03c8bf75efd453e305719c476eb97a6e98065c71c536aef1b6ed94f36e387
https://crt.sh/?sha256=3528efe998a86fca9b3a5bdc4091e992cf9689d76ab5a1d9a7cd1ec3dc3e6917

All 58 historical certificates
https://crt.sh/?sha256=14295a7667091cedc41e49ef45d94ab1a85d10a74919a5ac66ae3a5f902f5752
https://crt.sh/?sha256=24eeb8e4899836a30b4326fb8d5b2542630fa70dcef646f1a7b3a20939916c3b
https://crt.sh/?sha256=7cfaab00f9859ccd9b6811aa9215a300012dec9438c711e49d960a474945201d
https://crt.sh/?sha256=9019819e5563b31584ce5f617e8f49bfbcfffca3398993df45862a989e0a48b7
https://crt.sh/?sha256=fd0a83df337295614f7870e4b2de1278b0840221620fc5cc68e5b3adf98f6b1c
https://crt.sh/?sha256=9834b1864d39f65202365155e02be632a126c986bbfa1c02504c2f5ed9cc9243
https://crt.sh/?sha256=de2adacedc1ea0ebd6b4dc29e04e99444597570bacc9929191e0802ec66293a7
https://crt.sh/?sha256=dc13c1e22e5bc39aa6bad498cbe78c21b4ac59453c3da3395057eda5d3ec5fc8
https://crt.sh/?sha256=59b03c8bf75efd453e305719c476eb97a6e98065c71c536aef1b6ed94f36e387
https://crt.sh/?sha256=4ffab0852638ac14bbadbd8456afedc65541e0b3f8e22d1fc2a70720b211a829
https://crt.sh/?sha256=6960256cf2524d02214089e01dc9853ecdf53d20d1b20b387e9303dbd9d4edf3
https://crt.sh/?sha256=f7ea4de0b387aaa90cc63fc447cecb9edd819a41cc09887125240be014f872b9
https://crt.sh/?sha256=ebeb2313d5fd6e3801220bfd4b2b1247912ae07f433287658018896cfc82145c
https://crt.sh/?sha256=07aee83ff5a1d39f15740092368b81be4b7a6877584bf8889dde65f35ab00e46
https://crt.sh/?sha256=6963fa749184e0868401f6c52bf568962b8ab9d9dccfd5b830beac6e276450f0
https://crt.sh/?sha256=ec5c08de760d73be975d969eb6989dfb69bed83539e55c919f9b82abd7b3ce94
https://crt.sh/?sha256=3528efe998a86fca9b3a5bdc4091e992cf9689d76ab5a1d9a7cd1ec3dc3e6917
https://crt.sh/?sha256=5e2828bf7d8e4bcaedbacf6b7df7bd488f27f28e9151de4e9c40b8a5da8158ae
https://crt.sh/?sha256=12932a6df3bada8affe7264a7d5569aa74ec3c4cc3ccd5b4c62659b32365277e
https://crt.sh/?sha256=59de765ed3047cc03bc1782e4865ed98ef643adfc549fbfa859daae3ee79c7c9
https://crt.sh/?sha256=9af64d161eb5a9934203d941d7a9a6cc187d220e0f9c09770c16293975f81fa0
https://crt.sh/?sha256=854b0efba2323c42d5df745ed12a274fc3b87c0151a001d17029cfbff10dd532
https://crt.sh/?sha256=55f91ea4f5bd2d383c9221253b522eab09fbcaa7e265be502f37653edafb4043
https://crt.sh/?sha256=6daed283b11e25c1d90222a45a695e76f74825420317d6e5ccc3102ee171c9cc
https://crt.sh/?sha256=6fcbcda1c5b8bfb9e81d372e9ae571c04f579ccfe070ab7e64cdb64d882657ab
https://crt.sh/?sha256=2d4b1938be00416da3df8ea84a1e9dc5979dc079ccdbbb2f77895a38d0dcf1b9
https://crt.sh/?sha256=7220ca3f1e9505de08a7ae7b5d3858f17d198875fd33b778c8e937ca38b2b10b
https://crt.sh/?sha256=6a6dc72bf33b7dde2dfeca5141b6213cd6726ddd3d5c1674febdfe4b92f1ac08
https://crt.sh/?sha256=e5b6b5cccf33fbc321e6d00cf57cefcf98f8e6faf222c78dc073d2d519bd6a5d
https://crt.sh/?sha256=fa50913f1d145d09dae3ac618af5ac89e0cfb46ec7dca6f7ea20ff853e2118de
https://crt.sh/?sha256=be2b23ab6edb90e0b3b693010a35073d92408b8181ecf0e2f1b6c9e9be3055a5
https://crt.sh/?sha256=b941078acc630c0940e153da158d31c2e5e175e0e5dd1f26fe1f936ae5de65ac
https://crt.sh/?sha256=66736f7bf78093043907f146a9b791c1b65e6608d8a2272d7e6605669e6cb53e
https://crt.sh/?sha256=5a77e59acc38c353b300b7ba8566e689f38e91f7b95970fb0c89961761c5ebae
https://crt.sh/?sha256=1e9fd855af8f3a2dfb53c422658e9a49be71439a86633533d32fad6b7a5c8957
https://crt.sh/?sha256=a4697bed8689aa228bf88e293ff45f9819c010d5883ece7acd2aa3313ba7d3db
https://crt.sh/?sha256=64959f6303eb6dd42d1b0babad61a141ed849ee6c146a90df6259597dc6a523c
https://crt.sh/?sha256=1651a960c6a6ca3e42d23ee9e4efe3b45cdef14ea778c12ac0140a0a2b1b52df
https://crt.sh/?sha256=7db9587b891877c57d269d26ea3422ef0247d21dd13d3e55d4ca3bb11c40203d
https://crt.sh/?sha256=9d871c537520cffec7c360f768560f41b4fe8197ca78d1871d068042c2236b44
https://crt.sh/?sha256=9cbd83ee255fbbf643c2a23f6d17173db966bc6cb5d9b4c0f07185422772d6b3
https://crt.sh/?sha256=4e44f90ca0e4f4a156ab67d463f2e70c60af295b16d4c74f8fff4b3c3d1237cb
https://crt.sh/?sha256=5b9623deef661a4f97b88f80923a4384e63d9cb26b7a00e275b42654af95ae68
https://crt.sh/?sha256=893a560af43587e802ff3a3b3b83d504d6adb21dfd33ecbfbbf962438376b4f8
https://crt.sh/?sha256=be1f05c180a1df7790df8aed9e53fd4d5380a6c7edb452e1ca0a8b830780bf32
https://crt.sh/?sha256=3a91d2dcceb569b92bbad04ce64ec6e0eb22341e730007cdaff918694a0d461c
https://crt.sh/?sha256=2d867e6736acbebe81a4d90dc84e2f2af815dd7a6491a39996507d472c6e8120
https://crt.sh/?sha256=6342f7556807804ebb28e0cad81a8bbed0ca8fb426f4d7587ede188bee7984cc
https://crt.sh/?sha256=3dc7232ba3174b7e2f1b99bf772e3e75f1b19618681b1073ce88dc69be8f483a
https://crt.sh/?sha256=b55ef65d3992b11c2f1e447761b2656e4b0774175d8904d880aef7d678e19dfd
https://crt.sh/?sha256=55670046b8c7e9260ba0b777a9b57bdf1cc70481cb42d51c158e63fac48381f2
https://crt.sh/?sha256=8e175a06ece3feb737c7eb97c2f14f9caa431ac676b477013cdd71994a30b9c7
https://crt.sh/?sha256=9b35b2026cbf150e527a3fce334cb06a1b0dadf793ae248d337027ea73f1b524
https://crt.sh/?sha256=50ca01491f361677d7b280030d407d92c18f4fe59b0226d4102fd1f52775c144
https://crt.sh/?sha256=1b930864997e7f6c5076678b47dc4a89f75ce6ba4ffc8fe81575b36120de0026
https://crt.sh/?sha256=e97552b42c495e6926c5b4660f02a32fac80a76ffa1b742a37a0aff8f5b8d1d4
https://crt.sh/?sha256=dbd8c6556b6c4f27074fe60aa2e45afe7db864d3807e70342e393ddf795ceaff
https://crt.sh/?sha256=fe06e00bbfe611edadc2b909713f1aafb9f5d691d2660e2d79375014f7043586

Thanks for this incident report.

I'm not sure I understand what the bug was. The summary says "other CAA checks, including the “cansignhttpexchange=yes” requirement on relevant “issue” / “issuewild” properties, were not affected". But it sounds like in some cases this requirement wasn't enforced? Could you provide some examples of CAA record sets that were and were not interpreted correctly?

Third, and finally, we manually run a test suite covering Andrew Ayer’s caatestsuite.com when making functional changes to our libraries, but that test suite does not contain SXG checks.

The lack of SXG test cases in caatestsuite.com is definitely something that should be fixed! Would GTS be willing to contribute some test cases?

Although we instituted the practice immediately and reviewed a few other RFCs that we had recently incorporated, we did not go back to the SXG specification and perform the same formal review upon it because it is still a draft. The SXG requirements only appear formally in our CP/CPS. The value or concern was perceived to be lower than other areas.

It seems like this incorrect perception was part of the problem. Are you making any changes to your practices to ensure that requirements in relevant draft standards are treated as seriously as requirements in published RFCs?

Flags: needinfo?(gts-external)

(In reply to comment #2)

I'm not sure I understand what the bug was. The summary says "other CAA checks, including the “cansignhttpexchange=yes” requirement on relevant “issue” / “issuewild” properties, were not affected". But it sounds like in some cases this requirement wasn't enforced? Could you provide some examples of CAA record sets that were and were not interpreted correctly?

Absolutely, we realize the problem is subtle and has a lot of nuance to convey.

Below are a set of CAA record sets with the expected outcomes for a theoretical request to issue for both non-wildcard and wildcard versions of “example.com”. The table shows the expected outcome of TLS issuance following RFC8659, the expected outcome following the SXG requirements as specified in the draft, as well as the SXG outcomes with the faulty code. WC stands for wildcard domain while italicized answers are ones in which GTS’ implementation of SXG CAA at the time of incident discovery differs from the requirements of SXG.

CAA Record Set TLS Non-WC SXG Non-WC GTS SXG Non-WC TLS WC SXG WC GTS SXG WC
example.com CAA 0 issue "pki.goog" yes no no yes no no
example.com CAA 0 issue "other-ca.com" no no no no no no
example.com CAA 0 issue "pki.goog; cansignhttpexchanges" no no no no no no
example.com CAA 0 issue "pki.goog; cansignhttpexchanges=yes" yes yes yes yes yes yes
example.com CAA 0 issue "pki.goog; cansignhttpexchanges=no" yes no no yes no no
example.com CAA 0 issuewild "pki.goog" yes no yes yes no no
example.com CAA 0 issuewild "other-ca.com" yes no yes no no no
example.com CAA 0 issuewild "pki.goog; cansignhttpexchanges" yes no yes no no no
example.com CAA 0 issuewild "pki.goog; cansignhttpexchanges=yes" yes no yes yes yes yes
example.com CAA 0 issuewild "pki.goog; cansignhttpexchanges=no" yes no yes yes no no
example.com CAA 128 unknowntag "" no no no no no no
example.com CAA 0 unknowntag "" yes no yes yes no yes
example.com CAA 0 iodef "mailto:security@example.com" yes no yes yes no yes
<No answer section for example.com or com> yes no no yes no no

Note: There is a typo in our initial report which was quoted above, the parameter tag is cansignhttpexchanges as opposed to cansignhttpexchange.

Third, and finally, we manually run a test suite covering Andrew Ayer’s caatestsuite.com when making functional changes to our libraries, but that test suite does not contain SXG checks.

The lack of SXG test cases in caatestsuite.com is definitely something that should be fixed! Would GTS be willing to contribute some test cases?

GTS would be happy to propose test cases to caatestsuite.com. We can work out details on GitHub. We have added an appropriate AI.

Although we instituted the practice immediately and reviewed a few other RFCs that we had recently incorporated, we did not go back to the SXG specification and perform the same formal review upon it because it is still a draft. The SXG requirements only appear formally in our CP/CPS. The value or concern was perceived to be lower than other areas.

It seems like this incorrect perception was part of the problem. Are you making any changes to your practices to ensure that requirements in relevant draft standards are treated as seriously as requirements in published RFCs?

GTs does believe that all requirements in a CP/CPS ought to receive the same treatment. All new specifications have been given the same treatment when adding code since 2022 and will continue to going forward.

When we instituted the practice we didn’t perform a full review of all specifications that had been implemented in the past. Our main concern was prioritizing the most recent RFCs that may have changed during our implementation. Due to the SXG spec having been published 2+ years prior to that incident, and our CP/CPS not changing those requirements, it did not get prioritized as part of that incident remediation.

Here is an updated view of our AIs. All crossed out AIs have been completed. In our initial report, we omitted performing a full SXG requirement analysis. We have added that as an additional action item.

Action Item Kind Due Date
Add missing unit tests for SXG CAA Prevent 2024-06-11
Add missing E2E tests for SXG CAA Prevent 2024-06-28
Rename “require_caa_record” to “require_relevant_issue_or_issuewild_record” and rewrite documentation Prevent 2024-06-28
Add all CAA requirement documentation to the relevant code Prevent 2024-06-28
Update the unplanned rollouts procedure to include a verification step to ensure a fix is working Mitigate 2024-07-08
Perform a CAA requirement analysis against all relevant RFCs/drafts Detect 2024-07-12
Perform an SXG requirement analysis against the draft specification Detect 2024-07-19
Propose SXG tests for caatestsuite.com Prevent 2024-07-19
You need to log in before you can comment on or make changes to this bug.