IdenTrust: Incorrect response for OCSP validation
Categories
(CA Program :: CA Certificate Compliance, task)
Tracking
(Not tracked)
People
(Reporter: roots, Assigned: roots)
Details
(Whiteboard: [ca-compliance] [ocsp-failure] Next update 2025-03-03)
Attachments
(1 file)
203.88 KB,
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0
Steps to reproduce:
Preliminary Incident Report
Summary
On November 23, 2024, during our maintenance window, we discovered an issue with a limited number of TLS certificates having an unauthorized OCSP response. The issue was remediated quickly after discovery. A complete incident report will be provided by Friday, December 6th, 2024.
Updated•3 months ago
|
Updated•3 months ago
|
Complete Incident Report
Summary
On November 23, 2024, during our maintenance window, we discovered an issue with a limited number of TLS certificates having an “Unauthorized” OCSP response which is a violation of our CPS Section 9.6.1 CA Representations and Warranties:
… Maintaining an online 24x7 publicly accessible Repository with current information regarding the status (valid or revoked) of all unexpired Certificates…
Impact
6,450 certificates possibly returning ‘unauthorized’ OCSP response
Timeline – All times are UTC
2024-11-23 16:06:00 Initiated scheduled change control
2024-11-23 16:48:05 - Issue was introduced into the REST API
2024-11-23 18:14:49 – OCSP traffic was diverted from cloud responders to internal responders which responded with “unauthorized”
2024-11-23 19:09:06 – External monitoring system reported validation error “unauthorized” from OCSP responders for newly issued certificates
2024-11-23 22:12:00 - Change control team was alerted and started investigation
2024-11-23 22:39:40 – Traffic was diverted to cloud responders and validation services returned to normal
2024-11-24 00:09:00 – Issue identified with the REST API and deployment rolled back
Root Cause Analysis
IdenTrust employs both internal and cloud OCSP responders, with requests being load balanced between them. The internal and cloud OCSP responders operate on distinct architectural models.
During our change window on Saturday, November 23, 2024, new API code was deployed to production, directing all OCSP traffic to the cloud responders. This update caused an issue with reconciling database time zone differences, which was discovered when OCSP traffic was redirected back to the internal responders. The issue only affected the internal responders, while the cloud responders remained unaffected.
To prevent this issue from recurring, we have implemented the following measures:
- Added a new test scenario to our suite of pre-production tests.
- Fixed the REST API bug.
- Prioritize external alerts from internal alerts
- Enhanced logging on internal responders to include serial numbers in all OCSP response logs, including those for 'unauthorized' responses.
Lessons Learned
What went well
The team was alerted to the issue by external monitoring,
What didn't go well
- We identified the potentially affected certificates but couldn't determine if OCSP requests were made for them during the issue.
- The alert went unnoticed because it was part of the change control process.
Where we got lucky
Our cloud responders were not affected and we have the ability to easily redirect traffic to them
Action Items
Action Item | Kind | Due Date |
---|---|---|
Add new test scenario | Preventive | Done |
Fix the bug in the REST API | Corrective | 2024-12-31 |
Prioritize external alerts from internal alerts | Prevent/Correct | 2024-12-31 |
Add additional logging | Corrective/Preventive | 2025-02-28 |
Appendix
Details of affected certificates
See attached file
We have resolved the REST API issue on 12/19/2024 and are working on prioritizing external and internal alerts.
We will provide another status update by January 31, 2025.
Updated•1 month ago
|
We’ve improved the alerts system. Existing alerts will continue to notify the on-call operator as usual. Additionally, they will now send a notification to a designated channel. This ensures all relevant stakeholders are promptly informed of any issues, allowing them to monitor and verify that each alarm is addressed and resolved efficiently.
Comment 5•17 days ago
|
||
Please provide an update on all open action items.
(In reply to Ben Wilson from comment #5)
Please provide an update on all open action items.
The only remaining task for this issue is to add logging from OCSP responders. This has been scheduled to be completed by Saturday, March 1, 2025.
Updated•13 days ago
|
Description
•