Closed Bug 1945389 Opened 1 year ago Closed 1 year ago

HARICA: delayed revocation for bug 1943596

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dzacharo, Assigned: dzacharo)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

Steps to reproduce:

Preliminary Incident Report

Summary

Based on bug https://bugzilla.mozilla.org/show_bug.cgi?id=1943596, HARICA was expected to revoke 43 non-expired, non-revoked, mis-issued S/MIME SV certificates by 2025-01-27 18:20 EET, a deadline that was communicated to affected subscribers when the mis-issuance was detected.

For less than 50 certificate revocations, the current HARICA revocation process is manual, and involves assigning a task to a Validation Specialist on-duty, on the designated day (several hours before the end of the 5-day deadline) to ensure that the remaining non-revoked certificates are force-revoked. This task involves the assistance of multiple Validation Specialists, depending on the volume, but is coordinated by one Validation Specialist. The designated Validation Specialist forgot to handle this task on the designated day.

The issue was discovered on 2025-01-31, during the drafting of the full incident report for Bug 1943596, where it was noticed that the revocation date for the affected certificates was missing from the timeline.

35 non-revoked mis-issued certificates were revoked on 2025-01-31. 15 certificates were already revoked by their Subscribers.

A full incident report will be published no later than 2025-02-07.

Whiteboard: [ca-compliance] [leaf-revocation-delay]
Type: enhancement → task
Assignee: nobody → jimmy
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Incident Report

Summary

Based on bug https://bugzilla.mozilla.org/show_bug.cgi?id=1943596, HARICA was expected to revoke 43 non-expired, non-revoked, mis-issued S/MIME SV certificates by 2025-01-27 18:20 EET, a deadline that was communicated to affected subscribers when the mis-issuance was detected.

For less than 50 certificate revocations, the current HARICA revocation process is manual, and involves assigning a task to a Validation Specialist on-duty, on the designated day (several hours before the end of the 5-day deadline) to ensure that the remaining non-revoked certificates are force-revoked. This task involves the assistance of multiple Validation Specialists, depending on the volume, but is coordinated by one Validation Specialist. The designated Validation Specialist forgot to handle this task on the designated day.

The issue was discovered on 2025-01-31, during the drafting of the full incident report for Bug 1943596, where it was noticed that the revocation date for the affected certificates was missing from the timeline.

35 non-revoked mis-issued certificates were revoked on 2025-01-31. 8 certificates were already revoked by their Subscribers.

Impact

35 SV S/MIME certificates affected by Bug 1943596 were not revoked in the expected 5-day revocation period.

Timeline

All timestamps in EET (GMT+2)

  • 2025-01-24 10:03 HARICA identified 43 SV S/MIME certificates impacted by bug 1943596 and notified the subscribers of the scheduled revocation on 2025-01-27 at 18:20 EET.
  • 2025-01-24 10:30 An internal task was created and assigned to the Validation Specialist on duty on 2025-01-27 to verify the remaining active certificates and proceed with their revocation.
  • 2025-01-27 The Validation Specialist overlooked the assigned task, resulting in a missed revocation deadline.
  • 2025-01-31 11:30 While drafting the full incident report for Bug 1943596, it was discovered that 35 certificates had not been revoked.
  • 2025-01-31 12:12 The remaining certificates were revoked.

Root Cause Analysis

Here we present the results of the “5 whys” root cause methodology that was followed:

  1. Why was the revocation delayed?

Because the scheduled revocation wasn’t executed as planned.

  1. Why was the scheduled revocation not executed as planned?

Because the Validation Specialist on duty assigned to this task, did not complete the revocation.

  1. Why did the Validation Specialist not complete the revocation task?

The task was overlooked among other prioritized tasks.

  1. Why did the Validation Specialist overlook among other prioritized tasks?

The revocation task had the same severity flag as other tasks so it was not prioritized over other tasks. Simply put, the validation specialist on duty neglected to execute this scheduled task. The revocation process for fewer than 50 certificates relies solely on human oversight, and no technical controls were implemented to ensure that the scheduled revocation is executed.

Lessons Learned

What went well

  • HARICA identified the issue internally

What didn't go well

  • The Validation Specialist responsible for the revocation overlooked their task prioritization and did not complete the task.
  • There was no technical control to prevent the delayed revocation.
  • There was no follow up from another Validation Specialist on the task because it was assumed it was completed by the Validation Specialist assigned to this task.

Where we got lucky

  • The non-revocation was detected when drafting the incident report for bug 1943596

Action Items

Action Item Kind Due Date
Modify the CRM settings to ensure that assigned tickets are automatically unlocked and released after the due date. Mitigate Already implemented
Implement an automated certificate revocation scheduling process through the RA, which will deprecate the current 50-certificate threshold used for manual revocation. Validation Specialists will set the revocation date, time, and the appropriate revocation reason for the certificate(s). A web service will run daily to check if the subscriber has already revoked them. If any certificates remain active as the deadline approaches, the web service will automatically revoke them using the predefined reason. Prevent 2025-03-31

Appendix

Details of affected certificates

Serial, SHA-256 fingerprint
49917FA4976174A927282ABFE98F77EB,6F8B41D09996C3B84A3FD1ABF36751A2E544C581D96A25F20BDB78D547443B09
4425F43E0630691141CB822D71F0AED9,9A1923BAD348BCB019F0A1B85E5E25E7C23259FFBDCDB06E1F9BF4241A64A370
618D4B081D73CE1B65AF0F833E1A47A6,D41597B98BB84CEEC01979F5920EE11BB9F518452153B5468AC51668F9C6A5CF
55B1F5BA7E7BA00EE2CFA70C5FB9C87,FE9F09632823B2925E21B71CFBC1988EA6F3AAEB207A11830FC70CF977C73D9F
6A5D7C80FE06E0CF465145E894284BAC,772B341E88A470E5F5F6E369BC6C59A240108CB643A8B8709E39815B7BA87A11
2596AA6EE7B9282E0E87051C740AEEEB,407667F44286714D2167B384DBA235D035C56015E645C81C22F3ED3038D1816E
727B50FD6CFD039EBDDE0170DBA8B9FE,B93CC2875C2C937880F790A0056D9DC49D76CA3DEF3EF9F0A86CF4FAB2445536
6C4A3E70893F86F93913EEEAC8C48CE5,87EB53E5DFA25CD0C80D13D2A3BF9B9D7320A4C04E100C1371E5AE54BCD18047
78058E42C1B922DADAA43A5E8CD24FA8,A48C17E47C50140A409E29F8BDA9712DF7B9ADCC729CE15D3FBD87416B7AFF1C
4376DC37CE6AA031E1B5E9D4415B0484,82EFCED2E0AF4A0B66556CAFA5B1A467EDE930E0A1688793EEBCF46BEC98A5FD
0AD281D0A1B4A2E03F48892E932A7A6,D391358E6DC0E3EA089323215B7AD5A25298D8BE4364F75F543035FA953990FA
0EAE3D2FF9BF399DF1806D8660EA4947,435B2F2FF3D5188749B4414EC08C29A098214DD75FC666C821C89879EAAC4B26
3D79D7E55AA178A506ABEF2BD4E40E28,9DD7F72103AFCC4099A256CBFFA8C91D5C4A01628A9892C217429EF760206F89
6BBF2D017FE1B92095032D81CF48FBCF,E4630CE8D04B6653CA10BF7231894A3E71F7C0896684B882EFA0AC5FDE63225F
535C97622936ACF4A4A9C32489606882,B811082DE00223B7AE3DCFE9A8996B4F08D89860CF27887B3A73DB558B02E353
13F5D1F98D20D8917803AD313BE2AEBB,9D8A9F3FA90710ED4BF5F63B50DC9E8DB675AC8944C39AB1B994A91803E0C541
54928B5D1E163EB39351987B31CF00D9,39CB0AE0E9B1F566FE9B1C586D53C3653939E10134A6046841D61111F4864849
52A694732F979C42938E98C0D26D5361,19916B0048FC6C3C0A3BEF2BDC078CBB51AF2C55344F739D6E5D676AC773FC9C
4F36D2A01580D955FDD95C6D0E7FFB21,ABB54A26CE491DFF90784DD8B5BB2B14E40860B55C8C5C0FCB92EF97456CDE18
08FD07F27A2ECF2BDB7649B1FE3185BF,55793C73CF907274A8ECD5566566DD239086E074EE1DE52837B3FF17A789B5C5
220385BBA7F9300EF2EFCEC3BC639D9F,6E54254E3A029C378AD2754B9B0E0E41C6E81D18F66DB22FFA317ED54C3AA30C
7310758EF762272EE474342744D0EC7B,73A35DD319A36A2B99E4D23D077D857D320AEAB120112F9E74EC051AFF6747AD
644B3A88DBB3A3EB1BF2D8226B6AC4F2,E087846233D615CCCA0256E45236CAEA8B0C23E21391148B67BC66AB32056339
13666C611DB8D1058E45B34D57A238FE,FCEADF30BB5DFB1000C9989A575EEA72E8D9DD9BF56A795A317A628C84CCF25A
751AB6B25656762B4A19E77B4426C59F,6AD7DFDC90424CEEA9725412200E8A3176447A2D6EF02B0FC81E7F005ADB7EF3
66465BF21B87680A28A9CFCBAC4D5075,C36221A1A16D116E4BFFFDF5980E35696AEECE45AB6561D3F16273C0FDBFBE63
0C9269AD8B5D6EE591DF4302FFF7D430,3502EE88747F81BD70C1C15AFFC279DE09F02A5C6582B0E763DE09AC4470D798
418B4AB2D6EEA5A95180F2FAF51EFCB9,1EFE9EA7993AE13C3A3A743446A25C4AE9A1AACB00F059B97B40C2123224A81A
4B60E9929EBC95FFCD7804236BBB2B5C,A725832E81728651BE7E344194B01424D8E1AC84082D48747B3756BBA73387FC
3AEE0829E8AFD69AB107598EDB22891F,3020510F10369B3C09595789425E877D6AD4D2E35369ED0601A708791AD30FFC
233B9A91A47754A673C5AE7778465C22,84333324D4136396B568EA8F75E726DEB27855EC33413DD3ADF45FBC861075A6
1735B41434CA5625826159C80CC6244F,A94C805B558EDBC77835352135671EE784357C7222549FAE5E94BCF9EC2497C4
154371260896DF0FDD33F7B640A77888,B57104C755ACB8368818EF4FBB87C8B537059DACE30A3E8CC5B1B15F9A6B88C6
78841F189F0583098B8983CFAD2E57DD,14A884E772B8AB2E534DE737F089FEF16501057EC98DE05A82062D0B96835253
5E816D4C1AD4D9600D1A9DC8D1FCD51E,83D7D8F0CED74BCBADB743A3A596FDE70435C87E3A9DECF6B37A262699E1EB63

Based on Incident Reporting Template v. 2.0

Regarding the remaining action item, the team has completed a first draft of the design for “scheduled bulk certificate revocation” via the RA portal.

Certificate serial numbers uniquely identify a specific end-entity certificate (and precertificate). The tool will request the following input from RA authorized personnel:

  • certificate serial number
  • scheduled revocation date and time (in the appropriate timezone)
  • revocation reason (restricted to values only allowed when the CA initiates the revocation)
  • internal ticket number linked to an incident

Due to the criticality of the operation and the possible significant impact (accidental or incorrect revocation of large numbers of certificates), additional controls are being considered:

  • limit the number of serial numbers per batch
  • review and second approval before the scheduled bulk revocation is “armed”. When the request is approved, the tool will send a notification email to the affected Subscribers.
  • notifications to the larger compliance team will be sent during submission, updates, approvals and final execution of such tasks.

The action item is also updated to indicate that the web service will run hourly instead of daily.

Action Item Kind Due Date
Modify the CRM settings to ensure that assigned tickets are automatically unlocked and released after the due date. Mitigate Already implemented
Implement an automated certificate revocation scheduling process through the RA, which will deprecate the current 50-certificate threshold used for manual revocation. Validation Specialists will set the revocation date, time, and the appropriate revocation reason for the certificate(s). A web service will run hourly to check if the subscriber has already revoked them. If any certificates remain active as the deadline approaches, the web service will automatically revoke them using the predefined reason. Prevent 2025-03-31

There are only four "whys" in the "five whys" analysis used in place of a meaningful root-cause analysis in the incident report, so I'd like to propose a fifth:

"why are revocations of less than fifty certificates handled manually?"

As an aside, I'd like to bring to the attention of HARICA this article from the British Medical Journal, enumerating just some of the limitations and hazards of the "five whys" methodology: https://qualitysafety.bmj.com/content/26/8/671. In particular, I think these quotes are of particular relevance: "the potential for users to rely on off-the-cuff deduction, rather than situated observation when developing answers" as well as "systems thinking requires both depth and breadth of analysis".

(In reply to mpalmer from comment #3)

There are only four "whys" in the "five whys" analysis used in place of a meaningful root-cause analysis in the incident report, so I'd like to propose a fifth:

"why are revocations of less than fifty certificates handled manually?"

As an aside, I'd like to bring to the attention of HARICA this article from the British Medical Journal, enumerating just some of the limitations and hazards of the "five whys" methodology: https://qualitysafety.bmj.com/content/26/8/671. In particular, I think these quotes are of particular relevance: "the potential for users to rely on off-the-cuff deduction, rather than situated observation when developing answers" as well as "systems thinking requires both depth and breadth of analysis".

Thank you for the additional feedback and the reference to the limitations and hazards of the current methodology. Any risk cause analysis methodology has inherent risks if the entity does not want to perform in-depth analysis and does not want to discover systemic issues. In HARICA’s case, having used this methodology for several incidents, we always try to ask follow-up questions until we find a result that reveals a systemic problem that, once addressed, can prevent the issue from reoccurring.

To the best of our knowledge, the “five whys” methodology does not need to always ask “five” questions exactly. It can be more, or it can be less depending on the systemic issues detected. Here is a link describing this particular aspect.

HARICA went through the exercise and stopped at the fourth question because we detected a systemic issue that once fixed, should prevent this incident from happening again.

We did consider a fifth “why” very similar to what you proposed: "why are revocations handled manually?"

and the answer did not reveal any additional issue or root cause. The answer was as simple as “it was considered reasonable to allow manual revocations for a limited number of cases based on the existing human resources available for this task”.

Ben considers (still in draft) a threshold of 100 revocations to be considered a “Mass Revocation Event”. We believe our threshold of 50 continues to be reasonable for manual revocation by humans, but the process needs to be enhanced by implementing some automation as described in the last action item.

We are still on track to implement the additional controls described in Comment 2

If there are no additional questions or concerns with this plan, we kindly ask the Next Update to be set for 2025-03-21 so we can provide an update on the progress. If there is meaningful progress before that date, we will post an update sooner.

Whiteboard: [ca-compliance] [leaf-revocation-delay] → [ca-compliance] [leaf-revocation-delay] Nexgt update 2025-03-21
Whiteboard: [ca-compliance] [leaf-revocation-delay] Nexgt update 2025-03-21 → [ca-compliance] [leaf-revocation-delay] Next update 2025-03-21

Here is the update on our action items.

We are on track to deliver the remaining action items as scheduled. We would like to set the next update to 2025-03-28.

The last action item was completed successfully and pushed to production on 2025-03-28.

If there are no action items remaining, and you believe that this case can be closed, then please submit a Closure Summary:
https://www.ccadb.org/cas/incident-report#how-are-reports-closed
https://www.ccadb.org/cas/incident-report#closure-report
https://www.ccadb.org/cas/incident-report#incident-closure-summary
Thanks,
Ben

Thank you Ben. Due to time restrictions, we will try to submit a Closure Summary next week.

Flags: needinfo?(jimmy)

Incident Report Closure Summary

Incident Description

On January 31, 2025, HARICA discovered that S/MIME (SV) certificates had not been revoked by the required deadline of 2025-01-27 18:20 EET, as part of corrective actions for a prior mis-issuance (Bug 1943596). These certificates were intended for revocation as they were improperly issued.

Incident Root Cause(s)

The revocation task was assigned to a single Validation Specialist and was not completed on the scheduled date due to human error. HARICA’s internal processes for manually revoking fewer than 50 certificates did not include automated reminders, redundancy, or oversight mechanisms to prevent such omissions.

Remediation Description

HARICA revoked the outstanding certificates upon discovery. The affected subscribers were informed. Process improvements were implemented, including setting reminders for scheduled actions, and introducing monitoring functions to track pending revocations.

Commitment Summary

HARICA is committed to minimizing human error through automation and increased procedural oversight. We are evaluating full automation of revocation processes and will continue refining internal controls to ensure timely revocation of certificates.

All action items have been completed as described, and we respectfully request closure of this incident.

Flags: needinfo?(dzacharo)

I'll close this on or about Wed. 23-Apr-2025, unless there are issues or questions to discuss.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [leaf-revocation-delay] Next update 2025-03-21 → [ca-compliance] [leaf-revocation-delay]
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.