Note: All the times mentioned in this document are in CET time zone (GMT + 1).
Summary of the incident:
The following servers were unavailable as follows:
- The OCSP server related to “TunTrust Root CA” and “TunTrust Services CA” was intermittently unavailable for the period between 3:18pm on the 8th of September to 1:30pm on September 10th, 2020)
- The CRLs of “TunTrust Root CA” and “TunTrust Services CA” were not available for 20h (between 3:18pm on the 8th of September to 12pm on the 9th of September 2020).
1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
08 September 2020
03:19pm: The monitoring system alerted the system administrators that the OCSP server and the CRL repository were in status DOWN. The administrators responded to the alert by investigating the cause of the issue.
03:34pm: The root cause was identified: The system administrator made an error while executing a script related to automation of the patch management process. There was a configuration step that was missing in the set of taken actions that resulted in the inadvertent deletion of the OCSP and CRL VMs.
2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
|Date & Time
|08 September 2020 3:19pm
||The monitoring system alerts the system administrators that the OCSP server and the CRL repository are in status DOWN.
|08 September 2020 3:34pm
||Automated patch management console powered off for offline troubleshooting. Certificate issuance stopped.
|09 September 2020 12pm
||CRL repository restored and external-facing CRL services are fully operational. Our revocation request monitoring shows that no revocation request was received during the downtime of the CRL and OCSP services.
|09 September 2020 4:10pm
||Post initial problem to Bugzilla.
|10 September 2020 1:30 am
||OCSP server restored and external-facing OCSP services are fully operational.
|11 September 2020 08:00am
||Certificate issuance resumed.
3. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.
The certificate issuance was stopped September 8th at 3:34pm and was resumed on the 11th of September 2020 at 8am. In addition to that, our monitoring and audit logging processes confirmed that no certificate was issued during the entire downtime of the CRL and OCSP services.
4. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.
There was no certificate mis-issuance. We also verified that no certificate revocation requests were processed during the CRL and OCSP downtime and that no revocation requests were received.
5. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.
This case does not affect issuance of certificates but the OCSP responder and CRL related to “TunTrust Services CA” and “TunTrust Root CA”.
6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The system administrator made an error while executing the script related to automation of the patch management process. There was a configuration step that was missing in the set of taken actions that resulted in the inadvertent deletion of the OCSP and CRL VMs.
7. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.
Sept 10th 2020: Reverting to the legacy patch management process included in the scope of our prior and current ETSI and WebTrust audits. Deferring the automation of the patch management process considering our priorities and available resources.