Open Bug 1933353 Opened 3 months ago Updated 13 days ago

IdenTrust: Incorrect response for OCSP validation

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: roots, Assigned: roots)

Details

(Whiteboard: [ca-compliance] [ocsp-failure] Next update 2025-03-03)

Attachments

(1 file)

203.88 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0

Steps to reproduce:

Preliminary Incident Report

Summary

On November 23, 2024, during our maintenance window, we discovered an issue with a limited number of TLS certificates having an unauthorized OCSP response. The issue was remediated quickly after discovery. A complete incident report will be provided by Friday, December 6th, 2024.

Assignee: nobody → roots
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [ocsp-failure]
Type: defect → task

Complete Incident Report

Summary

On November 23, 2024, during our maintenance window, we discovered an issue with a limited number of TLS certificates having an “Unauthorized” OCSP response which is a violation of our CPS Section 9.6.1 CA Representations and Warranties:
… Maintaining an online 24x7 publicly accessible Repository with current information regarding the status (valid or revoked) of all unexpired Certificates…

Impact

6,450 certificates possibly returning ‘unauthorized’ OCSP response

Timeline – All times are UTC

2024-11-23 16:06:00 Initiated scheduled change control
2024-11-23 16:48:05 - Issue was introduced into the REST API
2024-11-23 18:14:49 – OCSP traffic was diverted from cloud responders to internal responders which responded with “unauthorized”
2024-11-23 19:09:06 – External monitoring system reported validation error “unauthorized” from OCSP responders for newly issued certificates
2024-11-23 22:12:00 - Change control team was alerted and started investigation
2024-11-23 22:39:40 – Traffic was diverted to cloud responders and validation services returned to normal
2024-11-24 00:09:00 – Issue identified with the REST API and deployment rolled back

Root Cause Analysis

IdenTrust employs both internal and cloud OCSP responders, with requests being load balanced between them. The internal and cloud OCSP responders operate on distinct architectural models.

During our change window on Saturday, November 23, 2024, new API code was deployed to production, directing all OCSP traffic to the cloud responders. This update caused an issue with reconciling database time zone differences, which was discovered when OCSP traffic was redirected back to the internal responders. The issue only affected the internal responders, while the cloud responders remained unaffected.

To prevent this issue from recurring, we have implemented the following measures:

  1. Added a new test scenario to our suite of pre-production tests.
  2. Fixed the REST API bug.
  3. Prioritize external alerts from internal alerts
  4. Enhanced logging on internal responders to include serial numbers in all OCSP response logs, including those for 'unauthorized' responses.

Lessons Learned

What went well

The team was alerted to the issue by external monitoring,

What didn't go well

  1. We identified the potentially affected certificates but couldn't determine if OCSP requests were made for them during the issue.
  2. The alert went unnoticed because it was part of the change control process.

Where we got lucky

Our cloud responders were not affected and we have the ability to easily redirect traffic to them

Action Items

Action Item Kind Due Date
Add new test scenario Preventive Done
Fix the bug in the REST API Corrective 2024-12-31
Prioritize external alerts from internal alerts Prevent/Correct 2024-12-31
Add additional logging Corrective/Preventive 2025-02-28

Appendix

Details of affected certificates

See attached file

We have resolved the REST API issue on 12/19/2024 and are working on prioritizing external and internal alerts.
We will provide another status update by January 31, 2025.

Whiteboard: [ca-compliance] [ocsp-failure] → [ca-compliance] [ocsp-failure] Next update 2025-01-31

We’ve improved the alerts system. Existing alerts will continue to notify the on-call operator as usual. Additionally, they will now send a notification to a designated channel. This ensures all relevant stakeholders are promptly informed of any issues, allowing them to monitor and verify that each alarm is addressed and resolved efficiently.

Please provide an update on all open action items.

Flags: needinfo?(roots)
Whiteboard: [ca-compliance] [ocsp-failure] Next update 2025-01-31 → [ca-compliance] [ocsp-failure]

(In reply to Ben Wilson from comment #5)

Please provide an update on all open action items.
The only remaining task for this issue is to add logging from OCSP responders. This has been scheduled to be completed by Saturday, March 1, 2025.

Flags: needinfo?(roots)
Whiteboard: [ca-compliance] [ocsp-failure] → [ca-compliance] [ocsp-failure] Next update 2025-03-03
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: