Closed Bug 1576789 Opened 5 years ago Closed 4 years ago

Let’s Encrypt: 2019.08.20 Incident: Incorrect OCSP responses under certain conditions

Categories

(CA Program :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jaas, Assigned: jaas)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

On 2019.08.20 at 08:48 UTC we received a report from community member and Apache httpd developer, Stefan Eissing, that under certain conditions our OCSP caching layer would return a valid OCSP response but not the one that was requested. This resulted in our OCSP service acting in violation of RFC 6960.

Upon further investigation we believe that the only condition that would trigger the incorrect behavior was making the OCSP request via POST with the “Expect: 100-continue” header described in RFC 7231 section 5.1.1 set. So far we have no reason to believe that the problem affected any significant portion of OCSP requests.

We quickly determined that the problem was with our CDN, Akamai, since our OCSP responder origin servers were not seeing any of the requests in question. We reported the problem to Akamai and they have fixed the issue.

After initially confirming the report we reached out to multiple other CAs that we believed would also be affected. Other affected CAs should also benefit from the fix that Akamai made.

Assignee: wthayer → jaas
Status: NEW → ASSIGNED
Whiteboard: [ca-compliance]

Here is a more complete timeline in case it's helpful:

2019-08-20 08:48 UTC - Initial report from Stefan Eissing
2019-08-20 17:15 UTC - Ticket filed with Akamai
2019-08-21 17:25 UTC - A temporary workaround fix is applied to production (strip problematic header)
2019-08-21 17:28 UTC - Private disclosures made to root programs
2019-08-26 23:46 UTC - Akamai confirms global permanent fix, public disclosure made

Summary: 2019.08.20 Let’s Encrypt Incident: Incorrect OCSP responses under certain conditions → ISRG/Let’s Encrypt: 2019.08.20 Incident: Incorrect OCSP responses under certain conditions
Summary: ISRG/Let’s Encrypt: 2019.08.20 Incident: Incorrect OCSP responses under certain conditions → Let’s Encrypt: 2019.08.20 Incident: Incorrect OCSP responses under certain conditions

Josh: thank you for this information. To close this out, please provide a full incident report as described here. While I realize that many of these answers may be redundant or irrelevant, consistent reporting helps to ensure that we don't miss some valuable information, such as a root cause or preventative measures.

Flags: needinfo?(jaas)
  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

On 2019.08.20 at 08:48 UTC we received a report from community member and Apache httpd developer, Stefan Eissing, that under certain conditions our OCSP caching layer would return a valid OCSP response but not the one that was requested. This resulted in our OCSP service acting in violation of RFC 6960.

The report was received via a direct message to one of our engineers on our Community Forums:

https://community.letsencrypt.org/

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2019-08-20 08:48 UTC - Initial report from Stefan Eissing
2019-08-20 17:15 UTC - Ticket filed with Akamai
2019-08-21 17:25 UTC - A temporary workaround fix is applied to production (strip problematic header)
2019-08-21 17:28 UTC - Private disclosures made to root programs
2019-08-26 23:46 UTC - Akamai confirms global permanent fix, public disclosure made

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

We did not stop issuance at any time as a result of this incident. Given that the problem was limited to very rare forms of OCSP requests, and did not affect the integrity of the certificate validation and issuance process itself, it did not make sense to take the highly disruptive step of stopping issuance.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

No certificates were problematic as part of this incident. A very small number of OCSP responses were incorrect. See answer to question #1.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

No certificates were problematic as part of this incident.

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

This incident was the result of a bug at our CDN provider. We do not know when the bug was introduced in their systems. The bug was not detected earlier because it was triggered by an unusual form of OCSP request.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

We reported the problem to our CDN and they have fixed the issue. A short-term protective measure was taken in cooperation with our CDN (strip the problematic header), followed up by a more complete long-term fix from the CDN (handle the header correctly).

After initially confirming the report we reached out to multiple other CAs that we believed would also be affected. Other affected CAs should also benefit from the fix that CDN made.

Flags: needinfo?(jaas)
Flags: needinfo?(wthayer)

It appears that all questions have beeen answered and remediation is complete.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(wthayer)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] → [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.