Open Bug 1917459 Opened 15 days ago Updated 14 days ago

emSign PKI Services : OCSP Responder Time Inconsistency

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: naveen.ml, Assigned: naveen.ml)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Attachments

(2 files)

Incident Report

Summary

An external researcher notified emSign CA about a inconsistency with respect to the OCSP responses for a recently revoked certificate and that the OCSP and CRL timestamp in the responses for the certificate in question had a 12 hour time difference. Upon investigation it was determined that the certificates revoked during time window of 12:00 to 23:59 IST hours were having a time conversion issue from IST to UTC for the OCSP responder as a result of an incorrect configuration while the CRL timestamps were accurate. During the investigation, it was further identified that while the OCSP responders were behaving correctly in terms of frequency of updates, because of the time conversion issue, it appeared that the OCSP responders did not update the responses at appropriate intervals. In summary, the process of deployment to production lacked sufficient review of configuration settings which led to an incorrect time conversion logic being moved to production specifically for the OCSP responder.

Impact

The impact of this incident was minimal, as no customers reported any issues with validating revocation statuses of certificates in question. Both the CRL and OCSP responder displayed correct revocation statuses for certificates. Certificates revoked during this period were consistently marked as revoked by both the CRL and OCSP responder, despite the timestamp displaying a 12-hour difference in the OCSP response compared to the CRL. The exposure was also limited to the specific certificates revoked during this timeframe, with no other services or certificates being affected.

Timeline

All times are IST.

2024-09-05 18:10: An external researcher alerted emSign CA to a discrepancy in the OCSP response for a recently revoked certificate.
2024-09-05 19:00: emSign CA initiated an investigation into the revocation timings between OCSP and CRL.
2024-09-06 15:45: The investigation team identified that certificates revoked between 12:00 and 23:59 IST were impacted due to an incorrect time conversion from IST to UTC in the configuration as part of the OCSP responder. It was also confirmed that the CRL timestamps were accurate
2024-09-06 19:30: We formally raised an incident (Incident No. EMINCPKI0020) after confirming the discrepancy, and an internal investigation began to determine the root cause of the OCSP and CRL inconsistency. The relevant teams were notified to evaluate the issue’s severity and impact.
2024-09-06 21:30: It was confirmed that the issue resulted from an incorrect configuration for time conversion being moved to production as part of a recent release.
2024-09-06 23:30: A fix was developed and successfully tested in the staging environment.
2024-09-07 00:15: The corrected configuration was deployed to the production environment.
2024-09-07 00:30: Full synchronization between OCSP and CRL was achieved, with enhanced monitoring and additional maker/checker controls implemented as part of configuration management to prevent future discrepancies.

Root Cause Analysis

The issue occurred due to a flaw in the configuration responsible for converting certificate timestamps from IST to UTC. This bug caused incorrect revocation time which is published to the OCSP responder for certificates revoked between 12:00 and 23:59 IST. Additionally, the lack of a thorough configuration review process allowed the incorrect configuration to be moved to production, resulting in the highlighted discrepancy.

Lessons Learned

This incident highlighted the importance of thorough configuration management and review and deployment validation processes, especially for time-sensitive operations like certificate issuance and revocation. A lack of attention to detail in the time zone conversion from IST to UTC resulted in a significant discrepancy between timestamps reflected in OCSP and CRL responses where the OCSP responder displayed an incorrect time in the response . Further, inadequate testing for such configuration changes in real-world scenarios, including time zone variations resulting in the issue to go unnoticed.

The incident further emphasized the need for quicker communication channels with relevant stakeholders, ensuring timely detection and resolution of potential issues. Additionally, a better-defined incident escalation process could have expedited the internal response and reduced the time to resolve the problem.

Improving our development and configuration management practices as part of deployment will prevent similar issues in the future and enhance overall system security and reliability.

What went well

• Prompt notification from an external researcher allowed the issue to be identified early.
• The investigation team quickly identified the root cause of the problem and implemented a fix within a short time frame.
• Strong collaboration between the development, security, and operations teams during the investigation and remediation process.
• There was no customer impact as the CRL and OCSP responses marked the certificate as revoked

What didn't go well

The configuration review process did not detect the error in the time conversion logic, leading to the deployment of flawed logic to production. Additionally, the testing environment did not sufficiently replicate real-world scenarios which allowed the bug to remain undetected.

Where we got lucky

The issue was isolated to certificates revoked during a specific time window, limiting the scope of the impact. No reports of exploitation or security incidents were associated with this discrepancy for any revoked certificates.

Action Items

Action Item Kind Due Date
Corrections to the configuration of IST to GMT conversions Mitigated 2024-09-06
Implement a stricter Configuration review process for time-sensitive configurations Prevent 2024-09-07
Conduct a thorough audit of the configuration management OCSP/CRL related changed and deployments Prevent 2024-09-12
Enhance testing to simulate scenarios relating to revocation and behaviour of OCSP and CRL responses Prevent 2024-09-19

Based on Incident Reporting Template v. 2.0

For what it's worth, I was the reporter of this issue, and for transparency, I'm sharing the OCSP replies and CRL, as well as the report I sent to emsign, here.

This file contains 4 different OCSP responses I saw at the time of my report, plus the CRL file, both in raw format, and openssl decoded.

Assignee: nobody → naveen.ml
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance] [ocsp-failure]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: