We failed to process210 customer revocation requests within 24 hours, which is a violation of the Baseline Requirements (BRs) for Publicly Trusted SSL certificates, Section 184.108.40.206 which states, The CA SHALL revoke a Certificate within 24 hours if the Subscriber requests in writing that the CA revoke the Certificate.
1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
During the course of business, the RA team noticed a delayed revocation request and escalated this to the PKI Engineering team on Sep 23rd at 8:47 AM (MST). On 09/23/22 at 10:55am MST, the PKI Engineering team confirmed that 210 customer requests for revocation had not been processed as expected.
2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
MM/DD/YYYY HH:MM (Times are all MST)
- 09/23/22 8:47 - RA team noticed a delayed revocation request and escalated
- 09/23/22 10:55 - PKI Engineering confirmed issue and identified root cause
- 09/23/22 13:45 - PKI Engineering revoked all 210 certificates Note: Only 123 of these certificates were still active
- 09/23/22 13:45 - PKI Engineering added monitoring to mitigate further issues with revocation requests
3. Whether your CA has stopped or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
The bug was limited to certificate revocation requests and subscriber certificate issuance was not impacted.
4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
Summary: 210 Customer requests for revocation were not processed, of which 123 were active.
Date of First: 11/13/2020
Date of Last: 09/22/2022
5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
<refer to attached file revocation_certsh.txt>
6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. See Google's guidance on root cause analysis for ideas of what to include.
We introduced a bug with the integration of a new event queue system (RabbitMQ) in April 2020. In rare cases, the event queue can become unresponsive causing revocation requests to go unprocessed. The requests are still persisted in a database but are never processed. It was this database storage of the requests that allowed GoDaddy engineers to review and process the revocation requests with existing requested reasons.
The delay in the detection of the bug was largely due to the rarity of the situation required for the revocation requests to go unprocessed. Since the introduction of the new queue system, we have processed over 2 million revocation requests successfully.
7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied by a timeline of when your CA expects to accomplish these things.
- On 9/23/22, PKI Engineering revoked all 210 certificates with the originally requested revocation reason.
- On 9/23/22, Implemented automated alerts to notify PKI Engineering to take action if any revocations are not processed as scheduled.
- Additional System Updates (Pending 1/31/23): Implement an automated failsafe to process delayed revocation requests.