- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
On 2019-08-27 14:03 MDT a bug on Bugzilla was opened reporting that a valid OCSP response was not being returned for the certificate with the serial number 099AAEEF271B78187D8209501EF733DE and instead a single byte file with the value of ‘0’ was received. This certificate was valid and should have been returning a valid signed OCSP response.
- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2019-08-27 14:03 MDT a bug on Bugzilla was opened reporting that a valid OCSP response was not being returned
2019-08-27 14:10 MDT Investigation begins
During the investigation, two issues were discovered that caused an invalid OCSP response to be returned.
2019-08-27 15:55 MDT
Original issue was discovered, the file was corrected and CDN cache was flushed.
2019-08-27 18:01 MDT
The pre-certificate mentioned in the response had it’s OCSP enabled and a valid response pushed live.
2019-08-28 17:30 MDT
Patch to fix new issuance tested and implemented.
- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
CA is still issuing as a fix has being made to ensure the all pre-certificates are enabled for OCSP when they are initially saved during the CT pre-cert process to ensure any failure later in the process will not prevent the OCSP generation for that certificate. A script has also been created to go back and enable OCSP for any previous pre-certs that are in the same invalid state. This is still running and is expected to be finished on the 29th.
- A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
This is impacting OCSP responses only not issues with certificates. However this would have impacted OCSP responses for all pre-certs which is just over 1 million.
- The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
This is impacting OCSP responses only not issues with certificates.
- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The first issue was caused a single byte value of ‘0’ to be returned instead of a 5 byte ASN.1 OCSP UnAuthorized response when a valid OCSP response for the requested certificate was not found. DigiCert’s OCSP infrastructure consist of a front facing CDN backed by several origin servers spread throughout the world. When a request comes through the CDN cache and is received by an origin server, the origin server will look to see if it has a pre-generated response for the requested certificate. If it does find a response then that response is returned but if it does not then there is a default response file saved on the files system that is supposed to contain the OCSP UnAuthorized response. Following origin server upgrades, that default file on some of servers was overwritten with the incorrect response causing it to be returned a cached on the CDN. Once the error was discovered, the file was corrected and CDN cache was flushed.
As the first issue was being investigated, it was also determined that the origin servers did not actually have a pre-generated response for that certificate when they should have. When a certificate is in the process of being issued that requires CT, the pre-certificate will be created, signed, and then saved into the database in a pending final issuance state before it is sent to the CT logs. Once the CT process is complete the final certificate is created and saved in the database in a final issuance state and at the same time enabled for OCSP. The problem occurred due to OCSP not being properly enabled when the CT process failed for whatever reason (network issues, not enough working CT logs, etc). During a revocation event, OCSP does get re-enabled correctly and OCSP responses would start to be generated and pushed out to the origin servers for delivery. The pre-certificate mentioned in the response had it’s OCSP enabled and a valid response pushed live.
When a pre-cert ultimately fails to be successfully submitted to CT, a final certificate is not signed or returned to the customer. Most of our OCSP testing has been around when a certificate was fully issued or when a fully issued or pre-cert is revoked and the test case for OCSP on an unused failed pre-cert was not added.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
For the UnAuthorized file on the origin servers, we are adding a check to our external monitoring tool to check for a correct response for an unknown certificate. This check will also be performed before a new origin server is brought into the rotation.
We are adding an automated test to our integration test pipeline that will cause a certificate request to fail on CT submission and we will ensure that an OCSP response is still generated for the certificate.
A post mortem will be conducted prior to 12th Sept to see what we can learn from this and what can be done to stop similar incidents in the future.