Closed Bug 1888667 Opened 7 months ago Closed 4 months ago

VikingCloud: Delayed preliminary report of CPR to affected Subscribers

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andreaholland, Assigned: andreaholland)

Details

(Whiteboard: [ca-compliance] [policy-failure])

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0

Steps to reproduce:

Incident Report

Delayed preliminary report to affected Subscribers regarding a certificate problem report.

Summary

Two affected Subscribers were not provided with a preliminary report within 24 hours of receiving a Certificate Problem Report. (TLS BR Section 4.9.5)

Impact

There was a delay in two Subscribers receiving the preliminary report related to a Certificate Problem Report being reported.

Timeline (All times are UTC.)

2024-03-04 13:04 – CPR notification of potential misissuance due to incorrect Subject RDN attribute encoding order.

2024-03-04 21:18 – Began investigation on the CPR.

2024-03-04 21:19 – Email sent to report filer acknowledging receipt and the investigation into CPR.

2024-03-05 21:42 – Preliminary report on misissuance Bugzilla created.

2024-03-07 19:19 – Notified 2 affected subscribers.

Root Cause Analysis

Why was there a problem?

Because the Subscribers were not notified within 24-hours after receiving a Certificate Problem Report.

Why were the Subscribers not notified within 24-hours?

Because the 24-hour requirement for the Subscriber notification was missed on the internal documentation for CPR response.

Why was 24-hour requirement for the Subscriber notification missed in the internal documentation for the CPR process?

Because there was no clear delineation of the 24-hour response time to the Subscribers.

Why was there no clear delineation of the 24-hour response time to the Subscribers?

Because there was not a checklist with timing setup to verify that all the CPR required actions occurred.

Why was there no checklist timing setup?

Because we did not include timing specifically for Subscriber notifications.

Lessons Learned

What went well

  • Response to the initial report filer was within 24 hours.
  • Investigation into the CPR was within 24 hours.

What didn't go well

The process for responding to the CPR was not clear enough specifying timing on notifying the affected Subscribers.

Where we got lucky

N/A

Action Items

| Action Item | Kind | Due Date |

| Clearly note in CPR process exact timing for notifying Subscribers | Prevent | 2024-04-05 |

| Update checklist to include all time requirements for valid CPRs to verify actions have occurred. | Prevent | 2024-04-05 |

Assignee: nobody → andreaholland
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [policy-failure]

Update

Action Items

Action Item Kind Due Date
Clearly note in CPR process exact timing for notifying Subscribers. Prevent Completed
Update checklist to include all time requirements for valid CPRs to verify actions have occurred. Prevent Completed

We are monitoring this bug for any further questions or comments.

The process for responding to the CPR was not clear enough specifying timing on notifying the affected Subscribers.

What was your process? What did you change?

Thank you for this report.

Similar to Comment 3 and related to the internal documentation identified in your Root Cause Analysis, it seems that there may have also been some human or organizational causes contributing to the incident if the question “why” was repeated one or more additional times (e.g., why was timing not included for Subscriber notifications). Some more detailed analysis may highlight these causes.

Also, can you describe how these contributing causes avoided detection until this incident? Were timely preliminary reports successfully provided to Subscribers in the past without the checklist you're proposing as an Action Item?

Our internal process for a certificate problem report is that the request is directed to senior level support personnel who evaluates the email request. If it is a direct subscriber revocation request, the information in the request is validated, the certificate(s) is revoked, and the direct subscriber requestor is notified of the revocation within 24 hours. If it is a third-party reporter for a potential misissuance or security event, it is escalated to the compliance team. Within 24 hours of receipt of the CPR, a preliminary report is sent to the third-party reporter. At this point a preliminary report will also be sent to the potentially impacted subscriber(s) to inform them of a reported CPR. The internal process documentation included notifying the impacted subscriber(s); but there was no delineation of the requirement to notify them within 24 hours of receipt of a CPR. The internal process documentation now includes a specific entry on the checklist breaking down each preliminary report recipient and the required 24-hour timeline. We also added management preapproved generic templates for the responses to assist us in this process.

Since implementing the separate CPR email address, all the CPRs have been direct subscriber revocation requests so the reporter and the subscriber are the same. Therefore, there hasn’t been a situation where the subscriber would not have received the preliminary report within 24 hours.

We are monitoring this bug for any further questions or comments.

Our internal process for a certificate problem report is that the request is directed to senior level support personnel who evaluates the email request.

How do you make this determination? Just any email that comes to the CPR endpoint?

all the CPRs have been direct subscriber revocation requests so the reporter and the subscriber are the same.

How do you determine if these are the same entity?

but there was no delineation of the requirement to notify them within 24 hours of receipt of a CPR.

How and Why was this missed? How often do you review your procedures? How many people are involved in reviewing procedures?

For the CPR process, all requests coming into VikingCloud from a reporter using the CPR email address are directed to senior level support personnel. This process is reviewed annually by the compliance team, and we have completed the action item to improve this process by adding all time requirements in our CPR checklist. During this process if the CPR report appears to be the subscriber making the request, an alternative means is used to confirm the request.

We are monitoring this bug for any further questions or comments.

If there are no further questions or comments, I request this bug be closed.

I still don’t think I understand how this process works.

How do you determine if these are the same entity?

To which I think you responded:

During this process if the CPR report appears to be the subscriber making the request, an alternative means is used to confirm the request.

What does this mean? What is the “alternative means”?

Flags: needinfo?(bwilson)

Our senior level support verifies a Subscriber is the Reporter in a CPR by utilizing alternative means such as checking that the Reporter is registered in our system as an active Subscriber within the Organization, and requesting a revocation for a certificate issued to that same Organization.

We are monitoring this bug for any further questions or comments.

I will close this bug next Wed. 2024-06-05 unless there are additional questions.

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.