Closed Bug 1734953 Opened 2 months ago Closed 27 days ago

GoDaddy: Certificate Problem Report responses greater than 24 hours

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brittany, Assigned: brittany)

Details

(Whiteboard: [ca-compliance])

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36

Incident Report

1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

Problem Summary:
We failed to respond to two (2) Certificate Problem Reports (CPRs) within 24 hours of receipt, which is a violation of the Baseline Requirements (BRs) for Publicly Trusted SSL certificates, Section 4.9.5 which states that “Within 24 hours after receiving a Certificate Problem Report, the CA SHALL investigate the facts and circumstances related to a Certificate Problem Report and provide a preliminary report on its findings to both the Subscriber and the entity who filed the Certificate Problem Report.”

Discovery Details:
On 09/27/2021 at 12:00 MST, during a routine check of the CPR inbox, practices@starfieldtech.com, an RA Associate (RA) identified 2 CPRs that had been received more than 24 hours prior (09/26/2021 at 09:22 and 09/26/2021 at 09:51). Both CPRs were reporting a possible phishing attack related to the same certificate.

2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

DD/MM/YYYY HH:MM (Times are all MST)

  • 09/26/2021 09:15 As part of routine procedures, RA checks the Certificate Problem Reporting (CPR) inbox, practices@starfieldtech.com and verifies the queue was cleared.
  • 09/26/2021 09:22 A CPR was received in the practices@starfieldtech.com inbox reporting a possible phishing attack on eisbt.com.
  • 09/26/2021 09:51 A CPR (different from the above received at 09/26/2021 09:22) was received in the practices@starfieldtech.com inbox reporting a possible phishing attack on eisbt.com.
  • 09/27/2021 12:00 RA checks the CPR inbox, notices the time, and starts investigating the reports.
  • 09/27/2021 12:27 RA escalates to RA Management regarding the time between receipt of CPRs and review.
  • 09/27/2021 12:32 RA completes the CPR review process for both CPRs, which includes revocation of the reported certificate for eisbt.com (https://crt.sh/?id=5286958942)
  • 09/27/2021 13:35 RA Management communicates incident to the Compliance team.
  • 09/27/2021 15:34 A stakeholder meeting is held to continue the investigation which included: confirming understanding of the issue, gathering additional information, assessing impact, and next steps.
  • 09/28/2021 09:55 Continued communication with stakeholders to determine root cause and continue with next steps.
  • 09/29/2021 11:00 RA Management implements a change in internal process which is documented and communicated to the team. This includes training on the updated process and coverage.
  • 09/29/2021 15:07 A stakeholder meeting is held to brainstorm systematic changes to address the root of the issue and to help prevent reoccurrence.
  • 09/30/2021 12:53 RA Management adds additional changes to the manual process which includes redundancy between RA locations resulting in additional coverage. Updates are communicated to the team.
  • 10/01/2021 09:30 RA Management and Compliance meet with internal IT teams to discuss the possibility of a long-term solution, involving integration with other systems already in use within the company.
  • 10/04/2021 12:00 A stakeholder meeting is held to brief additional members of management on the issue and to enlist additional support for a systematic fix.
  • 10/05/2021 10:30 Bi-weekly compliance touch point meeting is held which includes a review of the CPRs missed.
  • 10/06/2021 11:30 Stakeholder meeting is held to review mitigations to date including manual updates to the process, training, as well as proposed systematic alerting and reminders. Next steps include finalizing CPR impact assessment for 2021 and implementing systematic alerting.
  • 10/06/2021 15:42 RA Management queries for all CPRs submitted during 2021 (between 01/01/2021 through 10/05/2021) and verifies that all CPRs were addressed within the 24-hour requirement except for the two that are the subject of this incident report.
  • 10/07/2021 09:00 RA Management implements systematic alerts and reminders, including call outs to group Slack channels and escalations to RA Management.

3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

As of 10/07/2021, we have implemented automated alerting and reinforced accountability within the remaining manual review process through system updates, documentation, and training to prevent this in the future. Subscriber certificate issuance was not directly impacted.

4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

Impact was limited to two (2) certificate problem reports. Subscriber certificate issuance was not directly impacted.

5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

Impact was limited to two (2) certificate problem reports. The subject of both certificate problem reports was the following certificate, which has since been revoked: https://crt.sh/?id=5286958942

6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now. See Google's guidance on root cause analysis for ideas of what to include.

Certificate Problem Reports (CPRs) are received via an email to the practices@starfieldtech.com shared email inbox. Prior to 10/07/2021, the monitoring of the inbox was limited to procedural checking throughout the day by RAs (multiple) during weekdays and RAs (limited) on the weekends. During the weekend in question, the RA personnel who were present differed from the regularly scheduled RAs who typically performed the weekend monitoring of the CPR inbox.

RA Management completed an impact assessment for all CPRs submitted during 2021 (between 01/01/2021 through 10/05/2021) and confirmed that no other problem reports were out of compliance with the 24 hour response requirement. It was also noted that of all problem reports received during 2021, only about 6.5% of problem reports were received during weekend hours.

The low frequency of weekend CPRs coupled with a change in the regularly scheduled RA personnel meant that the problem went undetected until the next RA manually checked the inbox and discovered that there were incidents reported more than 24 hours prior.

7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

Mitigation Strategy:

  • M1. System Update (Completed): Implement automated alerts to notify the RA team and management to provide additional visibility into CPR inbox messages.
  • M2. Process Updates (Completed): Update monitoring schedule to ensure more coverage and redundancy. Additionally, within team documentation, formalize a RACI chart to further clarify roles and responsibilities.
  • M3. People Training (Completed): Train/coach RA to reinforce importance of Certificate Problem reports, and all associated timeframes.

Implementation Actions Completed:

  • (M1): As of 10/07/2021, RA management implemented systematic alerts and reminders, including call outs to group Slack channels and escalations to RA Management. Completed.
  • (M2) On 09/29/2021, the monitoring schedule was updated to include additional coverage. Subsequently, on 10/06/2021, the monitoring schedule was finalized and in addition to the coverage added on 09/29, redundancy in the process was added to include checks from both US and International RA functions. Additionally, on 10/07/2021, a RACI was published on the team’s Confluence site adding additional clarity to the manual process. Completed.
  • (M3) On 09/29/2021, the RA Policy Manager trained/re-trained the RA supervisors and those team members responsible for monitoring CPRs, which included a specific briefing of this specific incident, training on the updated schedule, and reinforcement of policy. Additionally, changes made after the original training on 09/29 were communicated real time and via email. Completed.
Assignee: bwilson → brittany
Status: UNCONFIRMED → NEW
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance]
Status: NEW → ASSIGNED

All mitigation strategy items have been completed as mentioned in the incident report above. We are continuing to track this bug for any questions from the community.

If there isn't any objection, we would like to request closing this bug on 11/1/2021. We will continue to monitor for questions/comments from the community.

I'll close this on or about Friday 30-Oct-2021 unless there are any objections.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 27 days ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.