Closed Bug 1903823 Opened 3 months ago Closed 2 months ago

WISeKey: OCSP responding "Unauthorized" for a TLS certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pfuentes, Assigned: pfuentes)

Details

(Whiteboard: [ca-compliance] [ocsp-failure])

Steps to reproduce:

We are opening this incident as a placeholder and we will publish a full incident report early next week.

Our internal monitoring system raised today an alarm due to a TLS certificate found listed in the OCSP Watch, due to "error from server: unauthorized".
SSLMate was pointing to a different certificate, not under our Root, so we thought it was a false positive and reported the issue, but SSLMate confirmed that indeed there was a problem with a certificate, but the link reported was not correct.

The certificate (https://crt.sh/?id=13445735643) is not revoked (and it must not be) and we are checking the root cause of this situation.

We will write a full incident report, using the appropriate format, ASAP in the next days, once we are more advanced in our investigation.

No other certificates seem to be affected. Certificate issuance and revocation services are working as usual.

Assignee: nobody → pfuentes
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [ocsp-failure]

Hello,
just a heads-up about this.

We are finalising our investigation about the root cause of the problem (it can be summarised as due to an error that prevented the final certificate to be fully issued), but given that the issue affects only a pre-certificate, we are pondering the option to request this bug to be closed as INVALID.

The rational for this request, if we finally do it, would be based on:

  • The BR set as optional the need to respond to reserved serial numbers (Section 4.9.10, "The OCSP responder MAY provide definitive responses about “reserved” certificate serial numbers, as if there was a corresponding Certificate that matches the Precertificate [RFC6962].")
  • Similar bugs such as https://bugzilla.mozilla.org/show_bug.cgi?id=1580393#c8

We may need a bit more of time to publish here again. If in the meantime anyone wants to chime-in about the issue (unauthorised response for a pre-certificate where the certificate doesn't exist), they will be welcomed!

Hi Pedro,

I believe bug 1580393 was quiet a few years ago, and before some changes were made to the Mozilla Policy.

Specifically, Section 5.4 states that

a CA MUST provide CRL and OCSP services and responses in accordance with this policy for all certificates presumed to exist based on the presence of a precertificate, even if the certificate does not actually exist.

I believe the BR language about the reserved status would fulfill this requirement, but an unauthorized response would not.

I agree with Martijn's analysis of the MRSP. Additionally, the OCSP carve-out for Reserved serials in the BRs is ineffective (and I filed this bug to remove it a year ago) because Section 7.1.2.9 also says

The existence of a signed Precertificate can be treated as evidence of a corresponding Certificate also existing.

Since all Precertificates are evidence of the corresponding Certificate existing, then all serials used in precertificates have evidence that they are actually Assigned, not just Reserved, and therefore OCSP responders MUST provide responses for them.

Thanks, Martijn and Aaron, for your comments.

Incident Report

Summary

A certificate issuance error occurred due to a race condition in EJBCA when a client application simultaneously requested two certificates for the same user. This resulted in one certificate request not being fully processed and not correctly persisted in our CA and OCSP database.

Impact

The impacted certificate was not initially persisted correctly, although it was stored as a precertificate in the EJBCA IncompleteIssuanceJournalData table and later manually published. Additionally, the certificate was not automatically published into the OCSP database, so the OCSP service was responding "unauthorized" for this certificate.

Timeline

All times are UTC.

2024-06-19

  • 17:16 Certificate issuance attempted but affected by a concurrency issue.

2024-06-20

  • 02:02 A notification was sent from our monitoring tool indicating changes in OISTE's OCSP Watch feed.
  • 04:39 A report was sent to SSLMate indicating that a URL in the entry that appeared in our feed was incorrect and we (incorrectly) assumed this was a false positive.
  • 11:46 SSLMate responded acknowledging that the URL was indeed incorrect but the problem still existed for an OISTE certificate.
  • 12:18 Investigation started.

2024-06-21

  • 02:09 Precertificate manually published to the CA and OCSP database.

2024-06-22

  • 02:02 A notification was sent from our monitoring tool indicating recovery in OISTE's OCSP Watch feed.

2024-06-24

  • 17:37 Automated monitoring implemented for the IncompleteIssuanceJournalData EJBCA table to get an alert if any precertificate gets registered there.

Root Cause Analysis

The root cause was a race condition in EJBCA triggered by simultaneous certificate requests for the same EJBCA user from the same client application. EJBCA is apparently not equipped to handle such concurrent operations.

Lessons Learned

What went well

  • Our monitoring systems detected the issue and manual intervention allowed for the certificate to be published.

What didn't go well

  • Our client application could not prevent a race condition with concurrent requests for the same EJBCA user, resulting in a certificate failed to be fully issued and published to the OCSP database.

Where we got lucky

  1. Revocation
    ...
    For end entity certificates, if the CA provides revocation information via an Online Certificate Status Protocol (OCSP) service:
  • it MUST update that service at least every four days;

For the previous reason and according to our understanding this would not warrant an Incident Report and should be closed as INVALID, but we are providing the Incident Report here anyway to allow other CAs to learn from our experience.

Action Items

Action Item Kind Due Date
The precertificate was manually published to the CA and OCSP database. Correct 2024-06-21
Implement monitoring for the IncompleteIssuanceJournalData EJBCA table to get an alert if any precertificate gets registered there. Prevent 2024-06-24
Improve the EJBCA client application to avoid issues related to concurrency Prevent 2024-07-25
Study and possibly enable the EJBCA Pre-Certificate Maintenance Service (https://docs.keyfactor.com/ejbca/latest/pre-certificate-maintenance-service) that allows to automatically publish to the OCSP database precertificates that failed successful issuance. Prevent 2024-07-25

Appendix

Details of affected certificates

BR 4.10.2:

The CA SHALL maintain an online 24x7 Repository that application software can use to automatically check the current status of all unexpired Certificates issued by the CA.

MRSP 6 paragraph 1:

CA operators MUST maintain an online 24x7 repository mechanism whereby application software can automatically check online the current status of all unexpired certificates issued by the CA.

MRSP 6 paragraph 3:

responses MUST have a defined value in the nextUpdate field

Unauthorized OCSP responses are error responses that contain neither a nextUpdate field nor the status of the certificate. Therefore they cannot satisfy the requirement of operating a 24x7 service to check the current status of a certificate.

The four day window applies when updating from one definitive response to another (e.g. good to revoked).

Therefore, this was a compliance violation.

Similar incidents: Bug 1753123, Bug 1758372

The race condition is interesting and that it impacts EJBCA implies that this could be a common configuration mistake. Do you have more details on that? I'll note that it seems that you are awaiting a vendor fix and are still impacted allowing for mis-issuances to continue. I see no mention of issuance ever stopping during investigation until the band-aid for detection was applied either.

I'm still not quite sure what you mean by the initial certificate not being fully processed. A pre-certificate is mentioned, so presumably at least one well-formed and signed certificate was created for linting and CT purposes. Given it hit the OCSP cache, I presume it was signed with an intermediary and thus was considered a valid certificate albeit with the poison extension applied?

(In reply to Pedro Fuentes from comment #4)

Where we got lucky

  1. Revocation
    ...
    For end entity certificates, if the CA provides revocation information via an Online Certificate Status Protocol (OCSP) service:
  • it MUST update that service at least every four days;

For the previous reason and according to our understanding this would not warrant an Incident Report and should be closed as INVALID, but we are providing the Incident Report here anyway to allow other CAs to learn from our experience.

This is an interestingly unique interpretation of MRSP policy. How are you reconciling the acknowledgement of Comment 2 and Comment 3 specifically on MRSP interpretation with this view?

Flags: needinfo?(pfuentes)

Hello @Andrew, @Wayne,
we aren't challenging the need to consider the incident as such. even if I'm still unsure about the interpretation of the 4-day window, because IMHO it may be interpreted also as applicable for the initial load of information in the OCSP responder DB (to avoid misinterpretation of this comment, it must be understood that our systems update the OCSP info in real time, except in this particular case, due to the incident being disclosed).

Anyhow, we opened this Bugzilla proactively from our side because we thought it was the right thing to do, so we don't have any issue with having this incident going through its natural process. Any comment that is received in the meantime is always enriching and an opportunity to learn.

Next week we will publish a progress report. As we said, issuance systems are working as expected and we have already implemented a first set of countermeasures that control the risk of recurrence of this problem.

Flags: needinfo?(pfuentes)

(In reply to Pedro Fuentes from comment #1)

The rational for this request, if we finally do it, would be based on:

  • The BR set as optional the need to respond to reserved serial numbers (Section 4.9.10, "The OCSP responder MAY provide definitive responses about “reserved” certificate serial numbers, as if there was a corresponding Certificate that matches the Precertificate [RFC6962].")
  • Similar bugs such as https://bugzilla.mozilla.org/show_bug.cgi?id=1580393#c8

That comment in the bug says, "Given the outcome of the discussion on the mozilla.dev.security.policy list," and if you follow the link to the discussion, you will see that in September 2019, Mozilla added the following policy, and previous incidents were closed only because they happened before the finalization of the policy: "A CA must provide OCSP services and responses in accordance with Mozilla policy for all certificates presumed to exist based on the presence of a Precertificate, even if the certificate does not actually exist." Ballot SC23: Precertificates was passed so that the BRs didn't conflict with Mozilla policy.

You can see this policy still exists in version 2.9 of the Mozilla Root Store Policy as Martijn said in comment #2.

A heads-up on the progress...

The action item "Improve the EJBCA client application to avoid issues related to concurrency" that was planned for 25-July has been already implemented.

The pending item "Study and possibly enable the EJBCA Pre-Certificate Maintenance Service..." wouldn't be needed, but most likely we'd still complete it as a backup countermeasure.

We are closely monitoring all systems.

No update this week. Pending item may or may not be executed depending of the outcome of some tests.

Regarding the link with the GoDaddy incident... While our case is different because it was originated by a "bug" that needed correction and not by architecture design, I think it would be beneficial to clarify the admitted lead times to update info in the OCSP responder, both for initial update after certificate issuance and for updates after revocation status changes. In particular, the applicability of the four-day period stated in the Mozilla Policy, that I think it should be clarified.

Flags: needinfo?(bwilson)

I'm wondering whether a 15-minute latency for publication of OCSP responses for pre-certificates would be something that should be adopted either in the Mozilla Root Store Policy or by the CA/B Forum. I filed an issue in GitHub for this: https://github.com/mozilla/pkipolicy/issues/280.

No update this week. We will inform next week if we finally decided to execute the pending optional action item.

Regarding the proposed change in the MRSP, I think this should be discussed with a broader perspective (not only precerts) at the CABF because, the way I see it, there's an inconsistency with the 24-hour CRL update period when there's a revocation status change.

In the CA/B Forum - here is a draft proposal - https://github.com/cabforum/servercert/pull/535.

Flags: needinfo?(bwilson)

We have completed all the planned tasks, including the last optional action (enable the EJBCA Pre-Certificate Maintenance Service).

We don't foresee other actions related to this issue, and we request that its closure is planned for next week, to give time in case anyone wants to make any comment.

I intend to close this on or about next Wed. 2024-07-31, unless there are additional questions or comments.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.