Sectigo: Failure to reply to Certificate Problem Reports within 24 hours
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: martijn.katerbarg, Assigned: martijn.katerbarg)
Details
(Whiteboard: [close on 2025-12-10] [ca-compliance] [policy-failure])
Preliminary Incident Report
Summary
- Incident description:
On October 13th we became aware of a problem with our Certificate Problem Report email queue, which resulted in us failing to reply to several Certificate Problem Reports within the required 24 hours.
-
Relevant policies: Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates Version 2.1.7, Section 4.9.5
-
Source of incident disclosure: Third Party Reported
Updated•4 months ago
|
| Assignee | ||
Comment 1•3 months ago
|
||
Full Incident Report
Summary
- CA Owner CCADB unique ID: A000016
- Incident description:
On October 13th we became aware of a problem with our Certificate Problem Report email queue, which resulted in us failing to reply to several Certificate Problem Reports within the required 24 hours.
- Timeline summary:
- Non-compliance start date: 2025-09-19 – 09:51 UTC
- Non-compliance identified date: 2025-10-13 – 08:01 UTC
- Non-compliance end date: 2025-10-13 – 20:57 UTC
- Relevant policies:
Baseline Requirements for the Issuance and Management of Publicly-Trusted TLS Server Certificates Version 2.1.7, Section 4.9.5
- Source of incident disclosure: Third Party Reported
Impact
- Total number of certificates: 150 Certificates were referenced by a total of 68 Certificate Problem Reports which fall under the scope of this incident.
- Total number of "remaining valid" certificates: 0. All Certificate Problem reports have been triaged.
- Affected certificate types: TLS (DV, OV, EV), S/MIME and Code Signing
- Incident heuristic:
The incident affected all certificate problem reports (CPR) filed between 2025-09-18 – 09:51 UTC and 2025-10-13 – 08:01 UTC.
To further expand on the scope, the CPRs can be divided into:
-
53 CPRs filed by a single customer, of which:
o 3 were for their own TLS Certificates, resulting in 34 revocations.
o 52 were for their own S/MIME Certificates, resulting in 104 revocations.
o Note: 2 of the CPRs referenced both TLS and S/MIME certificates. -
3 CPRs were deemed invalid.
-
12 further CPRs were reported, mostly by third parties, of which:
o 1 was for a single TLS certificate, resulting in 1 revocation.
o 1 was for an S/MIME certificate, resulting in 1 revocation.
o 10 were for Code Signing certificates, resulting in 10 revocations. -
Was issuance stopped in response to this incident, and why or why not?: No. This incident affected certificate problem report handling, not certificate issuance.
-
Analysis: N/A.
-
Additional considerations: N/A.
Timeline
All times are in UTC.
- 2025-09-17
o 20:22 We make changes to our (Salesforce based) CRM, specifically the “case origin” field. As part of this, and unbeknown at the time, this breaks the logic taking care of directing CPR emails into the correct queue. - 2025-09-18
o 08:01 We receive the first CPR since the change, resulting in it not showing up in the appropriate filter. - 2025-10-12
o 20:39 I personally receive an email on my work address from an acquainted third party, requesting assistance on having code signing CPRs be processed. The reporter states that emails sent to the CPR address have not been actioned as of late. - 2025-10-13
o 08:01 I read the aforementioned email and start investigating with the appropriate teams.
o 08:35 We confirm our queue of CPR emails (or SSL Abuse, as we call these), is empty.
o 09:59 Based on emails we are able to trace, we build a custom filter to find all pending CPR emails. At this time visibility into pending cases is restored. Our abuse team starts processing the backlog of emails.
o 15:51 We open a ticket with our IT helpdesk to address the breakdown in our CRM.
o 20:57 All emails have been processed. - 2025-10-15
o 18:27 Our SF team confirms the issue is due to a recent change to the Case Origin logic. The team continues to discuss the best option going forward. - 2025-10-17
o 22:18 Our SF team reverts the recent change, restoring functionality of the Case Origin field and logic utilizing the field.
Related Incidents
| Bug | Date | Description |
|---|---|---|
| 1905509 | 2024-06-29 | Both incidents affect the timely response to a CPR, however the root cause differs. |
| 1959733 | 2025-04-10 | Both incidents affect the timely response to a CPR. In bug 1959733 CPR emails did not reach the CA due to email filtering at the ISP, which did not occur in our case. |
| 1888881 | 2024-04-01 | Both incidents affect the timely response to a CPR. In bug 1888881 CPR emails were not processed due to personnel deficiencies, which did not occur in our case. |
| 1963629 | 2025-04-30 | Both incidents affect the timely response to a CPR. In bug 1963629 a second CPR address email address, which did not occur in our case. |
| 1967929 | 2025-05-22 | Both incidents affect the timely response to a CPR. In bug 1967929 CPR emails were not processed timely due to personnel deficiencies, which did not occur in our case. |
| 1970727 | 2025-05-22 | Both incidents affect the timely response to a CPR. In bug 1970727 the affected CPR email was incorrectly marked as spam. |
| 1985466 | 2025-08-27 | Both incidents affect the timely response to a CPR. In bug 1985466 CPR emails were not processed timely due to personnel deficiencies, which did not occur in our case. |
Root Cause Analysis
Contributing Factor 1: Logic change to Case Origin field
- Description:
We confirmed that the root cause is a change to the Case Origin field, which disrupted the logic in the TriggerUtility Apex class responsible for stamping the Case Reason. This logic relies on a custom label (Set_Reason_Case_Checker) to match specific origin values. With the Case Origin values changed, the logic could no longer be applied, resulting in blank case reasons. As the list view filter for showing pending CPRs relied on these fields to be filled properly, cases no longer showed up
- Timeline: The logic change took place in September 2025.
- Detection: This was detected as part of our incident investigation.
- Interaction with other factors: N/A
Contributing Factor 2: Certificate Problem Report logic not tested as part of changes in CRM processing logic
- Description:
Due to a gap in understanding the requirements, the CPR logic flow was not specifically tested when changes were made to the Case Origin field. The lack of testing led to the issue going undetected until externally reported.
- Timeline: The logic change took place in September 2025.
- Detection: This was detected as part of our incident investigation.
- Interaction with other factors: The gap in understanding of requirements around CPRs, directly caused the flow of CPRs to not have been tested when changes were made to the Case Origin field.
Contributing Factor 3: Gap in communications and understanding of requirements between SF / CRM team and SSL Abuse team
- Description:
It’s become clear during our investigation that the SF/CRM team was not sufficiently aware of the flow of CPR cases, and the requirements which apply to us for responding to these in a timely fashion.
-
Timeline: This was detected in October 2025.
-
Detection: Self-detected as part of our investigation.
-
Interaction with other factors: The gap in understanding of requirements around CPRs, directly caused the flow of CPRs to not have been tested when changes were made to the Case Origin field.
Lessons Learned
-
What went well:
- Once identified, we were quickly able to setup a workaround to capture backlog and future cases.
-
What didn’t go well:
- With the Case Origin logic change, we did not confirm if there was any impact to the CPR processing logic.
- We detected gaps between responsible departments in understanding requirements in place.
-
Where we got lucky:
- One of the third-party reporters reached out through an out-of-band method.
Action Items
| Action Item | Kind | Corresponding Root Cause(s) | Evaluation Criteria | Due Date | Status |
|---|---|---|---|---|---|
| Restore Case Origin logic | Prevent | Contributing Factor 1 | By restoring the Case Origin logic, existing flows will return to normal operations. | 2025-10-21 | Completed |
| Add SLA with automated monitoring and alerting | Prevent | Contributing Factor 1 | An automated SLA will be added that triggers alerts when CPRs have been received 12 hours ago and not yet been processed. The monitoring for this will use a less restrictive filter, which increases the coverage, and would have caught the cases that were affected by this bug. | 2025-11-30 | Ongoing |
| Add testing CPR flows as part of any SF/CRM changes | Prevent | Contributing Factor 2 | Making sure CPR logic still works after any changes affecting cases, “email to case” logic or filters, should make sure no further reoccurrence of this root cause will occur. | 2025-10-30 | Ongoing |
| Iterate and clarify the requirements placed on the SSLAbuse team to the SF/CRM team | Prevent | Contributing Factor 3 | Already as part of this bug, the importance and requirements of CPRs have been make clear to the SF/CRM team, which is already leading to contributions from the team on making further improvements, as outlined within these Action Items. | 2025-09-21 | Completed |
| Assignee | ||
Comment 2•3 months ago
|
||
Our third action item was completed as scheduled.
Based on the final remaining action item, we would like to request a next-update for 2025-11-30
Updated•3 months ago
|
Updated•3 months ago
|
| Assignee | ||
Comment 3•2 months ago
|
||
Yesterday our automated monitoring and alerting changes for CPRs went into production, completing our final action item.
We intend to post a Report Closure Summary by this time next week.
| Assignee | ||
Comment 4•2 months ago
|
||
Report Closure Summary
- Incident description:
A logic change to our Certificate Problem Report email queue resulted in us failing to reply to several Certificate Problem Reports (CPR) within the required 24 hours.
- Incident Root Cause(s):
A logic change to Case Origin field within our CRM, combined with insufficient testing on the CPR email queue, led to this incident.
- Remediation description:
We have restored the Case Origin logic, added testing CPR flows as part of any CRM changes, and added an automated SLA with alerting and escalation methods to the CPR queue.
- Commitment summary:
Sectigo is committed to keeping these changes in place and to maintaining awareness within our organization of the expectations regarding CPRs.
All Action Items disclosed in this report have been completed as described, and we request its closure.
Comment 5•2 months ago
|
||
This is a final call for comments or questions on this Incident Report.
Otherwise, it will be closed on approximately 2025-12-10.
Updated•2 months ago
|
Description
•