Closed Bug 1678410 Opened 4 years ago Closed 4 years ago

IdenTrust: Invalid OCSP Response Held in Cache

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: roots, Assigned: roots)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36

Steps to reproduce:

Summary:
On 2020-10-16 we deployed a new OCSP signing certificate and new OCSP responses for certificates signed by our DST Root CA X3 certificate. After uploading the new responses to our OCSP responder, the cache was not purged for the old response. We learned from ISRG that the cache at our CDN is held for up to 12 hours which extended beyond the expiration of the previous OCSP signing certificate. This caused errors in validating OCSP responses for some relying parties of the Let’s Encrypt subordinate CAs that have been cross signed by DST Root CA X3. There was also a period of 30 minutes where external monitors reported outage due to excessive traffic overload.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
    IdenTrust:
    On 2020-10-2016 at 13:03 (GMT-06:00), our monitoring system issued a connectivity alert affecting the IdenTrust Commercial Roots OCSP Responder.

  2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
    IdenTrust:
    2020-09-16 19:00:00 UTC: A delegated OCSP signing certificate was issued from DST Root CA X3 with a lifetime of 30 days.
    Not Before: Sep 16 19:00:00 2020 GMT
    Not After : Oct 16 19:00:00 2020 GMT
    2020-10-14 05:58:29 UTC: An OCSP response was signed by the delegated OCSP signer with the following parameters:
    This Update: Oct 14 05:58:29 2020 GMT
    Next Update: Oct 21 05:57:29 2020 GMT
    2020-10-14 19:00 UTC: A new delegated OCSP signing certificate was issued from DST Root CA X3 with a lifetime of 30 days.
    Not Before: Oct 14 19:00:00 2020 GMT
    Not After: Nov 13 19:00:00 2020 GMT
    2020-10-16 15:59:42 UTC: A new OCSP response was signed by the new delegated OCSP signer with the following parameters.
    This Update: Oct 16 15:59:42 2020 GMT
    Next Update: Oct 23 15:58:42 2020 GMT
    2020-10-19 19:00:00 UTC: The delegated OCSP signing certificate expired. At this time any CDN pops that had cached this response would be serving OCSP responses that could not validate, causing client errors. ISRG maintains a logging only monitor of OCSP requests and their logs did not catch this because their CDN pop had refreshed with the updated response before the certificate expiration, but we can presume some customers experienced this based on cache configuration and error reports.
    2020-10-16 19:30 - 22:05 UTC: An ISRG monitoring tool logged an outage looking up OCSP responses. The errors indicated: "error for /certs/lets-encrypt-x3-cross-signed.pem: getting issuer: Get http://apps.identrust.com/roots/dstrootcax3.p7c: dial tcp 192.147.157.177:80: i/o timeout". During this time there was excessive traffic to the OCSP responder that caused an overload to servers serving requests.
    2020-10-16 19:00 - 2020-10-17 04:00 UTC: Any CDN pops holding the cache for up to 12 hours expired and renewed with a valid OCSP response, but invalid OCSP responses were possibly delivered to relying parties up until the cache expiration.

  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    IdenTrust:
    Issuance was not stopped for this incident.

  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    IdenTrust:
    OCSP Signer Exp 2020-10-16
    OCSP Signer Exp 2020-11-13
    Let's Encrypt Authority X3
    Let's Encrypt Authority X4

  5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
    IdenTrust:
    OCSP Signer Exp 2020-10-16 https://crt.sh/?id=3635578609
    OCSP Signer Exp 2020-11-13 https://crt.sh/?id=3635375168
    Let's Encrypt Authority X3 https://crt.sh/?id=15706126
    Let's Encrypt Authority X4 https://crt.sh/?id=1571029

  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    IdenTrust:
    The Let's Encrypt cross signed certificates have an embedded OCSP responder url of http://isrg.trustid.ocsp.identrust.com. This url is specifically to deliver OCSP status for Let's Encrypt cross signed certificates. We were not aware that ISRG cached for 12 hours which resulted in more traffic than expected.
    There was a lack of information between us and ISRG of how this cache is operated. It was not understood that a cache purge is necessary to clear cache before 12 hours when we updated the OCSP signing certificate. This led us updating the signing certificate and OCSP response but not purging the cache before the old response signing certificate expired.

  7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
    IdenTrust:
    In order to avoid recurrence of this issue, we have implemented the following:

    1. After the certificate is renewed, our PKI team will publish the new certificate. This is now scheduled to happen at least 48 hours prior to the previous certificate expiring.
    2. We will be publishing the renewed certificate and hosting at the secondary server in order to reduce the traffic going into the main server.
    3. If we are unable to publish the renewed certificate within 24 hours of the prior certificate expiring, we will contact ISRG to have them manually clear the cache.
    4. We continue reviewing all monitoring to determine if further changes are needed.
Assignee: bwilson → roots
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance]
Type: defect → task
Summary: Invalid OCSP Response Held in Cache → IdenTrust: Invalid OCSP Response Held in Cache

Shouldn't 7. iii. above say, "If we are unable to publish the renewed certificate prior to within 24 hours of the prior certificate expiring, we will contact ISRG to have them manually clear the cache."?
Are there any other refinements to your remediation plan / steps going forward to avoid a recurrence?
If not, then have all remediations been implemented? And if so, then can this matter be closed?

Flags: needinfo?(roots)

(In reply to Ben Wilson from comment #1)

Shouldn't 7. iii. above say, "If we are unable to publish the renewed certificate prior to within 24 hours of the prior certificate expiring, we will contact ISRG to have them manually clear the cache."?
IdenTrust: Yes, we accept the suggested language update.
Are there any other refinements to your remediation plan / steps going forward to avoid a recurrence?
IdenTrust: No
If not, then have all remediation's been implemented?
IdenTrust: Yes
And if so, then can this matter be closed?
IdenTrust: No additional changes, our remediation changes have been implemented. This issue can be closed out.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Flags: needinfo?(roots)
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.