Closed Bug 1793467 Opened 2 years ago Closed 2 years ago

Google Trust Services: invalid CRL reason code

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: agwa-bugs, Assigned: cadecairns)

Details

(Whiteboard: [ca-compliance] [crl-failure])

Attachments

(1 file)

2.83 KB, application/x-pkcs7-crl
Details
Attached file 7UCuXZuLUIg.crl

GTS has issued the following certificate:

https://crt.sh/?sha256=9BD98AB55879AA3264068BAA523DB0CF5E3C1E764E8F7D7DAFA9A886041F55B2

containing this CRL distribution point:

URI:http://crls.pki.goog/gts1p5/7UCuXZuLUIg.crl

The CRL's entry for this certificate contains a CRL reason code of 7. 7 is not a valid value of the CRLReason enumeration as defined in RFC 5280 section 5.3.1. Therefore, this CRL violates RFC 5280.

NB: this is an RFC 5280 violation, not a violation of Mozilla's new CRL reason code policy which only applies to revocations that occur after October 1, 2022.

I have attached a copy of the CRL.

Assignee: bwilson → cadecairns
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]

Thank you for your report. Google Trust Services is investigating and will respond in the coming days.

1. How your CA first became aware of the problem

Andrew Ayer reported the problem by opening this bug.

2. A timeline of the actions your CA took in response.

YYYY-MM-DD (UTC) Description
2020-11-20 17:21 A bug was introduced in a library used to generate and deal with common X.509 extensions. The bug caused an incorrect conversion between the protocol buffer representation of revocation reasons used internally and a small subset of CRLReason codes.
2021-01-18 11:51 The code containing the bug was deployed into production as part of a change to consolidate some functionality into one architectural component.
2022-07-22 15:34 We received a request from our legal team to revoke three certificates. The reason code privilegeWithdrawn was the most appropriate option for these 3 cases. The subscriber had already been notified and confirmed they were prepared for the revocation.
2022-07-22 20:09 All three certificates were revoked.
2022-10-03 23:55 We received this incident report and began investigating.
2022-10-04 01:12 The developer examining the applicable code identified the bug that caused the off-by-two error and submitted a fix.
2022-10-04 15:29 We began deploying the fix to production using our normal process, which follows a staged approach that takes several days to complete.
2022-10-06 09:52 The deployment concluded.

3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident.

Google Trust Services has not stopped issuing certificates as this incident did not produce misissued certificates. We have completed a deployment of a bug fix to prevent this behavior, and the affected CRL has been republished with the correct reason code.

4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.)

Three certificates were affected by this issue. They were all revoked on July 29, 2022. Only one certificate was still within its validity period when this bug was filed.

5. In a case involving certificates, the complete certificate data for the problematic certificates.

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

From 2020 into early 2021, there was a project to consolidate and simplify revocation functionality in Go, which included migrating functionality previously implemented in C++. During the migration, a bug was introduced when converting between the protocol buffer representation of a revocation reason used by our internal systems and its corresponding CRLReason code as defined in RFC5280 section 5.3.1.

Internally, our revocation reason codes are represented as a protobuf enum and are therefore mapped to an integer constant. Unfortunately, protobuf enums have a default which must be mapped to zero and in order to differentiate between the empty case and an explicit value, the default is generally reserved and is not a semantically meaningful value. In our enum, 0 is UNKNOWN and 1 is RFC 5280’s UNSPECIFIED, breaking the direct mapping. Code was written that assumed that each reason was off-by-one, but the proto constants did not skip 7 as in RFC 5280. Therefore, the mapping was correctly set for unspecified through certificateHold but did not correctly map removeFromCRL, privilegeWithdrawn, and aACompromise. The root cause of this incident was lacking a technical control to ensure the validity of these values.

On July 29, 2022, we received a request from our legal team to revoke three certificates, for which the reason code privilegeWithdrawn was most appropriate. Due to the fact the corresponding protocol buffer representation had a numeric value of 8, it was incorrectly mapped to CRLReason with a value of 7, resulting in this issue.

7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

To avoid a recurrence of this issue in the future, we are adding checks for the numeric reason codes as well as the text version to ensure they are always aligned before signing them. Let’s Encrypt recently published a blog post in which they describe their collection of checks and their intention to upstream them to ZLint. Rather than reinvent the wheel, we are offering our support in this endeavor. We appreciate Let's Encrypt taking the lead on the expanded rules to benefit the entire web PKI ecosystem. We will meet with them soon and will provide a timeline for remediation in our CA by Friday, October 14.

Unit tests for the affected reason codes have already been added. There is also now a plan to centralize the metadata on the enumeration values using custom options, thereby providing one location for required metadata.

Internally, our revocation reason codes are represented as a protobuf enum and are therefore mapped to an integer constant. Unfortunately, protobuf enums have a default which must be mapped to zero and in order to differentiate between the empty case and an explicit value, the default is generally reserved and is not a semantically meaningful value. In our enum, 0 is UNKNOWN and 1 is RFC 5280’s UNSPECIFIED, breaking the direct mapping. Code was written that assumed that each reason was off-by-one, but the proto constants did not skip 7 as in RFC 5280. Therefore, the mapping was correctly set for unspecified through certificateHold but did not correctly map removeFromCRL, privilegeWithdrawn, and aACompromise. The root cause of this incident was lacking a technical control to ensure the validity of these values.

Based on this explanation, I would think that the enum would look something like this:

enum RevocationReason {
        UNKNOWN = 0;
        UNSPECIFIED = 1;
        KEYCOMPROMISE = 2;
        CACOMPROMISE = 3;
        AFFILIATIONCHANGED = 4;
        SUPERSEDED = 5;
        CESSATIONOFOPERATION = 6;
        CERTIFICATEHOLD = 7;
        REMOVEFROMCRL = 8;
        PRIVILEGEWITHDRAWN = 9;
        AACOMPROMISE = 10;
}       

So why do you say that the protocol buffer representation of privilegeWithdrawn had a value of 8?

On July 29, 2022, we received a request from our legal team to revoke three certificates, for which the reason code privilegeWithdrawn was most appropriate. Due to the fact the corresponding protocol buffer representation had a numeric value of 8, it was incorrectly mapped to CRLReason with a value of 7, resulting in this issue.

Flags: needinfo?(cadecairns)

Because it "assumed that each reason was off-by-one". So KEYCOMPROMISE (2) got mapped to the correct value of keyCompromise (1), but PRIVILEGEWITHDRAWN (9) got mapped to the incorrect value of removeFromCRL (8).

Gah, apologies, I realized my comment was incorrect but accidentally posted it instead of deleting it. Ignore me!

Hi Andrew,

We should have clarified this in our incident report. removeFromCRL does not have a corresponding value in our enum since we have not built support for it. Thus, this issue is an off-by-two.

Flags: needinfo?(cadecairns)

Google Trust Services is monitoring this bug for comments or questions. We will post an update with our timeline for remediation tomorrow as promised.

We have spoken to Let’s Encrypt and committed Google Trust Services to help push through CRL linting support for ZLint. We will work with the maintainers to develop a plan under https://github.com/zmap/zlint/issues/458. The architecture, APIs, and dependencies of the new lints will need to be agreed upon and it is likely that the existing crypto/x509 CRL parsing code will not meet our needs, which will increase the timeline. Given the cross-team collaboration necessary, we cannot give a firm time commitment other than that we will start the process this quarter.

We believe the other remediations we implemented as described in comment#2 should be sufficient in the interim. If there are no comments or questions, we request consideration to close this bug.

Flags: needinfo?(bwilson)

I will look at closing this on or about Wed. 26-Oct-2022.

Google Trust Services is monitoring this bug for comments or questions.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [crl-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: