User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Steps to reproduce:
- How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
During an internal review on 2020-04-17 we identified that some of the batch revocation data we rolled out on 2020-04-08 erroneously overwrote previously published revocation data for our old subordinate CAs Google Trust Services (GTS) Y3 and Y4.
- A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
2019-09-05 – A scheduled periodic and manual ceremony is conducted to produce CRL and OCSP responses for GIAG4, GIAG4ECC, GTS CA 1O1, GTS CA 1D2, GTS Y1, GTS Y2, GTS Y3 and GTS Y4: Revocation information is produced in batches to reduce the need to access offline key material. In this case “Batch 1” was valid from 2019-09-30 to 2020-04-15, and “Batch 2” was valid from 2020-04-01 to 2020-10-15.
2019-09-30 – On schedule, the previously deployed batch of revocation information is replaced with “Batch 1.”
2020-01-27 – Post deployment, an internal review of production configuration finds a signature algorithm mismatch in the chains of GTS Y3 and GTS Y4 (EC 256 vs 384).
2020-01-30 – Mozilla bug 1612389 is filed.
2020-01-31 – As part of the response to this incident (bug 16162389) GTS Y3 and GTS Y4 are re-issued with the correct signature algorithm and new revocation data for them is produced, valid from 2020-01-31 to 2020-10-15.
2020-02-03 – New GTS Y3 and GTS Y4 and their corresponding revocation data, as well as revocation data for old GTS Y3 and GTS Y4, are published.
2020-04-08 08:15 UTC – As “Batch 1” is approaching expiration (on 2020-04-15), on schedule, the revocation data Batch 2 is retrieved up from the safe.
2020-04-08 14:55 UTC – “Batch 2” is installed, overwriting revocation data produced on 2020-01-31 for old GTS Y3 and old GTS Y4, effectively un-revoking them.
2020-04-17 16:30 UTC – An internal review identifies that “Batch 2” contained the old versions of the revocation data for the old GTS Y3 and GTS Y4, and that the valid data was erroneously overwritten by outdated information.
2020-04-17 17:02 UTC – Partial rollback of “Batch 2” to restore intended revocation data begins.
2020-04-17 18:31 UTC – The rollout of the correct CRLs finishes and the correct CRLs are now being served.
2020-04-18 01:25 UTC – Rollout of the correct OCSP responses finishes and the correct OCSP responses for old GTS Y3 and GTS Y4 are now being served.
- Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem.
The outdated revocation data was rolled back and replaced with correct CRL and OCSP responses.
- A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
Two certificates were impacted:
- The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
Revocation data for:
- Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
The offline nature of root keys means that generation of CRLs and OCSP responses for issuing certificates is a mostly manual process.
Bringing root keys online to produce these objects is a sensitive security task. Each time keys are brought online there is risk of unauthorized use. As such, processes are designed to limit the need to access these keys. This is accomplished via the batch production of OCSP and CRLs for the associated certificates. This greatly reduces the need to bring these root keys online.
The manual nature of this process exposes it to the risk of human error. This is what happened in this case.
We have review procedures in place to help catch these manual errors, the ongoing Covid-19 pandemic made accessing the associated root keys and revocation data more complicated. The extraordinary access restrictions associated with this introduced additional special safety procedures to enter the facility and to leave it in a timely manner with limited human contact.
These safety procedures also limited our ability to include a wider real-time peer review as we normally would. While not an excuse, highly abnormal operating conditions were a contributing factor to the human error that led to this incident.
- List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
Correct revocation information is now being served for the old GTS Y3 and GTS Y4.
We have added new checklist items to our procedures when publishing pre-generated revocation data and are evaluating additional options to further improve our processes. In particular we are implementing presubmit checks:
- to prevent submitting CRL files that have the same or lower CRLNumber
- to prevent accidental removal of revoked entries in CRLs
- to prevent accidental change from Revoked to Good for OCSP responses
- to prevent overwriting revocation data that has newer thisUpdate
- improve Sub CA creation and revocation procedures to include explicit action for pre-produced revocation data.