Closed Bug 1955799 Opened 7 months ago Closed 6 months ago

CFCA: Failed to follow Report lifecycle rule to respond within 7 days

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: songxinlei, Assigned: songxinlei)

Details

(Whiteboard: [ca-compliance] [disclosure-failure])

Steps to reproduce:

During handling incidents 1888881, 1949131, 1888882 and 1886135, we(CFCA) failed multiple times for responding within 7 days, as the Report lifecycle management of Incident Reporting Guidelines required.

Although 1 of the incidents resolved, with another 1 incident reaching a resolved status next week, we inspected all the 4 incidents for the detailed time gap between responses, and found out that there's a serious incident handling problem in us.

We've listed several approaches in 1888882 for improvements, and made a statement that we will follow the respond within 7 days requirement.
But as some community members requested, and for better self-improvements, we're adding a seperate bug here.

Actual results:

During handling community incidents, we failed to make a response within 7 days in several discussions.

Expected results:

We MUST respond and answer questions within 7 days.

Actions had been made to prevent/mitigate incidents like this happening again, as described Comment 43 in bug #1888882.

Here's a full incident report.

Full Incident Report

Summary

  • CA Owner CCADB unique ID: A000272
  • Incident description: During handling incidents 1888881, 1949131, 1888882 and 1886135, we(CFCA) failed multiple times for responding within 7 days, as the Report lifecycle management of Incident Reporting Guidelines required.
  • Timeline summary:
    • Non-compliance start date: 2024-04-01
    • Non-compliance identified date: 2025-03-22
    • Non-compliance end date: 2025-03-22
  • Relevant policies: Incident Reporting Guidelines, section Report lifecycle management: CA Owners should respond promptly to comments and questions, and MUST respond within 7 days, even if only to acknowledge the request and provide a timeline for a full response.
  • Source of incident disclosure: Self Reported

Impact

  • Total number of certificates: N/A
  • Total number of "remaining valid" certificates: N/A
  • Affected certificate types: N/A
  • Incident heuristic: N/A
  • Was issuance stopped in response to this incident, and why or why not?: N/A
  • Analysis: N/A
  • Additional considerations: N/A

Timeline

For Bug 1886135:

  • 2024-04-01: failed to respond within 7 days since 2024-03-23, and failed to submit reports as promised
  • 2024-04-30: failed to respond within 7 days since 2024-04-05
  • 2024-05-08: failed to respond within 7 days since 2024-04-30
  • 2024-05-17: failed to respond within 7 days since 2024-05-08
  • 2024-06-06: failed to respond within 7 days since 2024-05-23
  • 2024-06-21: failed to respond within 7 days since 2024-06-13
  • 2024-06-30: failed to respond within 7 days since 2024-06-21
  • 2024-07-12: failed to respond within 7 days since 2024-06-30
  • 2024-08-21: failed to respond within 7 days since 2024-08-06
  • 2024-09-12: failed to respond within 7 days since 2024-08-21

For Bug 1888881:

  • 2024-05-05: failed to respond within 7 days since 2024-04-07
  • 2024-05-17: failed to respond within 7 days since 2024-05-05
  • 2024-09-12: failed to respond within 7 days since 2024-05-17
  • 2025-02-13: failed to respond within 7 days since 2024-09-12
  • 2025-03-12: failed to respond within 7 days since 2025-02-19

For Bug 1888882:

  • 2024-04-09: failed to respond within 7 days since 2024-04-01
  • 2024-05-08: failed to respond within 7 days since 2024-04-30
  • 2024-05-17: failed to respond within 7 days since 2024-05-08
  • 2024-06-06: failed to respond within 7 days since 2024-05-27
  • 2024-06-21: failed to respond within 7 days since 2024-06-13
  • 2024-06-30: failed to respond within 7 days since 2024-06-21
  • 2024-07-12: failed to respond as promised at 2024-07-08, to respond at 2024-07-11
  • 2024-07-30: failed to respond within 7 days since 2024-07-19
  • 2024-08-21: failed to respond within 7 days since 2024-08-07
  • 2024-09-12: failed to respond within 7 days since 2024-08-21
  • 2025-03-12: failed to respond within 7 days since 2025-02-19
  • 2025-03-18: failed to submit report as promised within that week of 2025-03-12

Related Incidents

Bug Date Description
1886135 2024-03-19 There're some failure to respond within 7 days
1888881 2024-04-01 There're some failure to respond within 7 days
1888882 2024-04-01 There're some failure to respond within 7 days

Root Cause Analysis

Contributing Factor # 1: missing certificate incident handling plan

  • Description: Certificate incideng handling plan can provide standardized incident procedures, enhance team response capabilities, make sure our team meet compliances and community rules through proactive risk awareness.
  • Timeline: N/A
  • Detection: N/A
  • Interaction with other factors: Without a handling plan, the supervision can't work properly.
  • Root Cause Analysis methodology used: N/A

Contributing Factor # 2: Lack of supervision

  • Description: Supervision mechanism is important that the handling plan's executing. Supervision plays a monitoring and meaurement role.
  • Timeline: N/A
  • Detection: N/A
  • Interaction with other factors: Ensures the certificate incident handling plan executed as expected.
  • Root Cause Analysis methodology used: N/A

Contributing Factor # 3: Shortage of human resource

  • Description: Most of the failure of timely response occurs during handling other incidents, mainly the delayed revocation, and it's the first time for our team to handling this large number of revocation, without appropriate human resource investment, the quality of response would be affected, which leads to this incident.
  • Timeline: N/A
  • Detection: N/A
  • Interaction with other factors: N/A
  • Root Cause Analysis methodology used: N/A

Lessons Learned

  • What went well: After reviewing the incident handling details, we allocated additional personnel to the team.
  • What didn’t go well: Human resources were not fully utilized, and our team was experiencing instability due to potential staff turnover, and there was insufficient monitoring of both the execution process and outcomes.
  • Where we got lucky: N/A
  • Additional: N/A

Action Items

Action Item Kind Corresponding Root Cause(s) Evaluation Criteria Due Date Status
Establish a certificate incident handling plan Prevent Root Cause # 1 Internal Policy Development & Issuance 2025-03-24 Complete
Establish a supervision team Prevent Root Cause # 2 Internal Policy Development & Issuance 2025-03-24 Complete
Invest more human resources into our team Prevent Root Cause # 3 Team member change 2025-03-24 Complete
Assign tasks to check bugzilla at least twice a week Prevent Root Cause # 1/2/3 Weekly Report 2025-03-24 Complete

Appendix

N/A

I think the manual fixes are great, but I'd also recommend more automated controls. In particular, when you hit day 6 since the last post (or the day an update is due if Ben gives you a "Next Update" allowance), alarms should start ringing about the need to post. Could you pull questions into JIRA from Bugzilla and track it that way? Speaking from experience, adding more people doesn't usually fix the issue and can even make it worse by causing confusion over responsibilities.

Hi Jeremy,

You're absolutely right about automation.

I'm testing with n8n community, integrating airtable and some IM clients. But it's quite a different story in China, because the Social Instant Messaging apps are relatively more "closed", i.e more difficult to be integrated.

Although there're some several great apps such as Slack/Telegram, the internet connection is not stable.
But anyway, we're still looking into it, the main process is done to daily check the last non-CFCA reply/comment time, so we can be notified every day until it's RESOLVED.

I'm not sure whether it's required to put this into the Action Items, what's your suggestions, Jeremy?

Hi Jeremy,

We've implemented a n8n+airtable+slack automation workflow, that can monitor the bugs related with CFCA, and make a daily check, and send to our channel with bugs that not Resolved.

It's a simple workflow, right now, there still are some other logic to be implemented and improved, but it's at least working.

Thank you again for the suggestions.

Hi, we're still monitoring this bug, if there're no other comments, we'll submit a Closure Summary and request closure.

Thanks Michael for the updates! I like the slack automation flow for bugs. Helps make sure you don’t miss anything. I don’t have any more questions and really appreciate the insights on what you e done to automate the reminders.

(In reply to Jeremy from comment #6)

Thanks Michael for the updates! I like the slack automation flow for bugs. Helps make sure you don’t miss anything. I don’t have any more questions and really appreciate the insights on what you e done to automate the reminders.

Thank you Jeremy, inspiring us to look into the automations.

We're updating the Closure Summary here.


Report Closure Summary

  • Incident description: During handling incidents 1888881, 1949131, 1888882 and 1886135, we(CFCA) failed multiple times for responding within 7 days, as the Report lifecycle management of Incident Reporting Guidelines required.
  • Incident Root Cause(s): Missing certificate incident handling plan; Lack of supervision; Shortage of human resource.
  • Remediation description: Establish a certificate incident handling plan, invest more human resources into our team, assign tasks to check bugzilla at least twice a week, and develop an automated workflow that can notify us through daily used app.
  • Commitment summary: It's usually don't really fix the problem by adding more people to the team, but we can rely on the automation workflow. Although we've implemented a simple one, some detail logic still need to be improved, we will keep improving the workflow so that we can monitor the bugs better.

All Action Items disclosed in this report have been completed as described, and we request its closure.

I will close this on or about Friday, 11-Apr-2025, unless there are additional questions or issues to address.

Flags: needinfo?(bwilson)

We're still monitoring this until it's RESOLVED, no other updates.
Thank you, Ben.

Assignee: nobody → songxinlei
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [disclosure-failure]
Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.