Closed Bug 1649939 Opened 5 years ago Closed 5 years ago

WISeKey: Incorrect OCSP Delegated Responder Certificate

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ryan.sleevi, Assigned: pfuentes)

Details

(Whiteboard: [ca-compliance] [ocsp-failure] [covid-19])

The following was originally reported to m.d.s.p. at https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg13493.html

WISeKey has issued one or more OCSP Delegated Responders, as defined within RFC 6960, Section 2.6 and Section 4.2.2.2, without including the id-pkix-ocsp-nocheck response, as required by the Baseline Requirements, Version 1, Section 13.2.5 through Version 1.7.0, Section 4.9.9

Example certificate: https://crt.sh/?id=1490728458

Please provide an incident report, including the timeline for revocation.

Hello,
we acknowledge the reception of this bug, but we need to properly assess the problem and impact.
In particular, the affected SubCAs are operated by the same entity that operates the Root, and therefore we don't perceive the severity of the potential security problem raised in the m.d.s.p.
Also, we consider that the need to introduce EKUs in subCA, mandated recently by Mozilla and other Root program, implies the need to consider all the types of leaf certificates issued by the CA, and therefore this would require including the OCSP Signing EKU, without implying that the CA is acting as a delegated responder.
We will follow up and complete the informations ASAP.
Regards,
Pedro

Pedro and Ryan,

We are a consumer or WiseKey certificates that are used to secure healthcare systems.

You must not revoke these subCA certificates for a minimum period of 90 days unless you can demonstrate that this newly discovered vulnerability is actively being exploited.

The volume of the certificates requiring replacement and the nature of their use will not allow us to remediate this issue in less than 90 days.

Thank you for your understanding
Mark Arnott

Dear Mark,
please hold on because we just agreed with Ryan a transition plan of 90 days. This plan would imply that you would need to install a new certificate chain in the servers, but won't require an immediate revocation.
I'll keep you posted separately of this discussion.
Thanks,
Pedro

(In reply to mark.arnott from comment #2)

Pedro and Ryan,

We are a consumer or WiseKey certificates that are used to secure healthcare systems.

You must not revoke these subCA certificates for a minimum period of 90 days unless you can demonstrate that this newly discovered vulnerability is actively being exploited.

The volume of the certificates requiring replacement and the nature of their use will not allow us to remediate this issue in less than 90 days.

Thank you for your understanding
Mark Arnott

Welcome to Bugzilla. As a consumer of WISeKey certificates, you were required to agree that you understood these terms and conditions, including the possibility of revocation. These are obligations Allscripts was required to acknowledge in their Subscriber agreement, and part of ongoing committments that WISeKey made, either contractually or as a condition to being trusted. So absolute statements like must not are, unfortunately, not helpful or appropriate, although your participation is welcome, since one of the expectations of WISeKey fails to fulfill those obligations is a customer-by-customer breakdown of the factors leading to that failure, and a clear plan and timeline to ensure that in the future, WIseKey will replace such certificates within the 7 days required.

Pedro: I did not agree to the plan. I simply stated I better understood your plan. As written, I don’t think it’s sufficient to address the challenges, and that’s up for WISeKey to demonstrate, along with the above. Either we’re using the mailing list to discuss abstractly, which is fine and something to encourage, or using it to discuss incident response, which would be deeply troubling for WISeKey. I want to make sure you understand the need to have a comprehensive plan that demonstrates the needs being met, and a comprehensive plan to mitigate this going forward. Anything more than 7 days to destruction has to require something more comprehensive to demonstrate agility going forward.

Sorry, Ryan, but I'm confused now. The reading of your message in m.d.s.p. made me think you were in agreement as we were setting a deadline to destroy the keys.

If what is required is to expose a detailed plan, our proposal is:

  1. Today, 4th of July, we sent already a message to our main customers about the need to replace their certificates in 7 days. This could be alleviated if we reach an agreement on the alternative plan.
  2. We create today new CAs that will substitute the ones that include the EKU
  3. We initiate the process to replace the subscriber certificates, depending on their pace sending the new requests
  4. We reissue today the offending CAs, eliminating the OCSP Signing EKU
  5. We send to customers not being able to replace their certificates the new Issuing CA, so they reconfigure the chain in the their systems
  6. Within the 7-day period, we revoke the previous certificate of the offending CAs
  7. Within the period from now to the 4th of October, we revoke the reissued certificates and we destroy the keys, with the appropriate audit proofs

My main problem is that every day counts when we talk about a 7-day deadline, so my intent is to advance in parallel. I'd really appreciate if you can express your opinion to this plan.

Thanks.

Flags: needinfo?(ryan.sleevi)

I can understand that in you're rush, this is causing some things to be misinterpreted. I agree that your proposal was something meaningfully different than the options put forward, but that doesn't mean it achieves the goal of finding the right or appropriate balance. The discussion with Peter Bowen, for example, highlighted some of the challenges and limitations with the strategy you outline here.

The plan you outline in Comment #5 only seems to focus on the compliance risk, but does nothing to establish that the security risk was, is, or will be mitigated. Peter Bowen's provided many useful considerations to keep in mind, with respect to audits, and that's a part of it. I don't know how much I can stress: the revocation does nothing for the security risk, precisely because of the security risk.

As a trusted CA, your incident responses need to comprehensively show how they've mitigated the risks and, if a revocation delay happens, how a separate incident report that looks at how future revocation delays are being comprehensively mitigated. This is something that has been repeatedly clarified, and I'm not sure what more I can say about that.

As discussed on the thread, your revocation and elimination of the OCSP Signing EKU don't eliminate the security risk: they only focus on the compliance obligation. I appreciate that your attempt at replacement in 90 days is to try and address the security risk. I can spell out the set of mitigations that might be appropriate, for WISeKey, if you'll commit to doing them, or you can take this opportunity to think about how to demonstrate the key hasn't been misused, can't be misused, and won't be misused during the next 90 days. Regardless, however, the burden is on WISeKey to demonstrate that they understand the security risk, and to provide the appropriate transparency and external verification that the risk is mitigated. Your 90 day proposal is useful, because it tries to anchor that the risk will not be indefinite, but for such an issue like this, you still have to demonstrate why the risk is appropriately mitigated that 90 days is not unreasonable.

Flags: needinfo?(ryan.sleevi)

Ryan,
my rational to evaluate the risk level is based in the fact that we are operator of the Root Keys.

This control on the keys and the right to operate the OCSP responders of the hierarchy implies that there's no change in the risk level by the fact of the existence of this CA as Peter Bowen pointed in his comments, in particular when saying " If the CA has control over the keys in the certificates in question, then I do not see that there is a risk that is greater than already exists"

I'm containing even more the risk by setting the deadline for the keys.

Best

Flags: needinfo?(ryan.sleevi)

Pedro: And I've repeatedly stated that there's no basis for making that determination. You, as a CA, are responsible for demonstrating that risk assessment is justified, not merely asserting it.

As it seems unlikely that WISeKey will be able to demonstrate suitable controls for the 90 day period, let's talk about what a "reasonable" set of expectations are to balance the security risk:

  • If WISeKey has not maintained audit logs being able to link every signature created with these keys to a specific object being signed, so as to demonstrate no OCSP messages have been signed, this plan would be an unacceptable risk.
  • If WISeKey cannot produce a period-of-time audit using the WebTrust criteria demonstrating the non-performance of OCSP signing using these keys, this plan would be an unacceptable risk.
  • If WISeKey does not commit to producing a Detailed Control Report covering this period, with specific controls with respect to the use of these keys, this plan would be an unacceptable risk. Note that this may be dependent on Webtrust finalizing such a report, and thus may regardless represent an unacceptable risk. You should discuss with your auditors.

It's not sufficient to simply say the keys were under WISeKey's control, no harm, no foul. You must demonstrate that it's not possible for these keys to have been misused. If you cannot, the expectation is that you will treat them as if they have been misused, because you have no way of showing they haven't.

If it helps any, and to be honest, I'm increasingly worried that WISeKey is not taking this seriously enough, think about what procedures you were to use if you had given one of these sub-CAs to a hostile third-party, and all you have is the audit report. What would be an appropriate level of evidence that would justify not immediately destroying the key? By design, all CAs are treated as 'hostile'; they are untrusted until they can prove otherwise. If you're going to proceed, you have to prove otherwise.

This is not the same as operating a responder certificate yourself, precisely because you have this capability but without the controls to go with it. You have to demonstrate that.

Flags: needinfo?(ryan.sleevi)

I'd like to do a short update on this bug.

WISeKey has been focussing all its efforts into containing the effects of this incident, so we preferred to stop posting here about intermediate or proposed plans.

By Tomorrow afternoon (CET) we'll have completed the first phase of the action plan, which includes these items:

  • Creation of new CAs (with new keys) to substitute the three affected, which are disabled for new issuances (done by 4th of July)
  • Revocation of two of the affected CAs by Thursday the 8th of July
  • Implementation of additional and specific controls to provide assurance that, in the period between now and the destruction of each key, the private keys are only used for signing CRLs (if applicable). In particular these additional controls oriented to correlate the signing operations of the HSM for the specific keys with the activity of the CA software, in order to generate specific audit material related to this incident.

Then we will initiate the second phase of the plan, that includes:

  • Revocation of the remaining CA in a period not longer than 90-days, trying to making it as close as possible to 30-day.
  • Destruction of the private keys of all affected CAs(*)

We will publish ASAP, most likely tomorrow afternoon, the incident report, where we will include the risk analysis that motivates the (maximum) 90-day delay for one of the CAs. We understand that this will imply an additional incident report for the delayed revocation, as two of the CAs will be revoked some hours after the 7-period and one of the CAs will be revoked later.

As this is just an update, I'd appreciate that further comments come after we publish the incident report, which will be our next message here.

(*) We are coordinating with our auditors the best approach to provide assurance on the destructions of the keys, in terms of doing a single report or two reports, depending if we defer the destruction to happen during a single activity or we do it separately, which right now doesn't seem to make a significant difference in terms of the outcome of our risk assessment.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

We received an email triggered by the creation of this bug, we also saw the associated discussion in m.d.s.p.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2/July – Received the notification for the bug, we initiate an investigation of the issue, identifying it as a critical incident that can endanger the continuity of our services
2..4/July – Discussions in m.d.s.p. and this bug in order to identify the severity and draft possible activity lines. Contact with main customers to notify the issue
4/July – Creation of three new CAs to substitute the affected ones and start the transfer of services
4..8/July – Implementation of additional controls to monitor the usage of the involved keys and generate specific audit evidence
9/July - Revocation of 2 of the 3 affected CAs (Personal CAs 1 and 2)

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

The affected issuances were performed during 2019, after the rule to include the EKU in SubCAs was enforced by the BR. No other SubCAs had been created later with this EKU. We have ensured that the certificate templates used to generate SubCAs don’t have the EKU.

  1. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.

The certificates that can be considered as delegated responders of our Roots are:
https://crt.sh/?id=1490728539 : WISeKey CertifyID Personal GB CA 1
https://crt.sh/?id=1490728458 : WISeKey CertifyID Personal GB CA 2
https://crt.sh/?id=1435074103 : WISeKey CertifyID SSL GB CA 1
These three CAs are owned and operated by WISeKey.

  1. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

See #4

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

This error was introduced by the common misinterpretation that the EKU in SubCAs need to be inserted to effectively constraint a CA to issue certain types of certificates. We included the OCSP Signing for such purposes, as the CA would sign certificates with such EKU. WISeKey doesn’t use subCAs for OCSP delegation and none are configured to act as such. We saw certain discussions in m.d.s.p. and Bugzilla, about this topic, but the statements made (i.e. https://bugzilla.mozilla.org/show_bug.cgi?id=991209#c10) and the matters discussed made us think that this was only a security issue for externally operated CAs.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

The CA servers using these keys don't have any functionality enabled to produce OCSP responses, and in our opinion the compromise of these keys for the purposes of creating OCSP responses is closely related to the associated risks of a CA key compromise, having a comparable potentially severe impacts (e.g. massive mis-issuances), and we also consider that both compromises would be achieved by the materialisation of similar threats (taking control of the HSM, taking control of the CA, etc.). We consider that we have already security measures to contain such threats, in compliance with independent audit criteria (e.g. WebTrust Principles and Criteria for Certification Authorities – SSL Baseline with Network Security).

Nevertheless given the high potential impact in the case of exploit of the vulnerability, WISeKey recognises the risks derived of this vulnerability, and has established the required measures to remediate these potential risks, by eliminating the vulnerability in the fastest possible way given the lateral consequences of the remediation actions. We also understand the need to establish any additional controls that could be convenient to provide assurance that the vulnerability is not exploited during the execution of the remediation plan.

As expressed in comment #9, WISeKey is executing an action plan on two phases:
First Phase:

  • Creation of new CAs (with new keys) to substitute the three affected, which are disabled for new issuances (Done by 4th of July)
  • Revocation of two of the affected CAs by Thursday the 9th of July (Done). In particular we are revoking the CAs for personal certificates.
  • Implementation of additional and specific controls to provide assurance that, in the period between now and the destruction of each key, the private keys are only used for signing CRLs (if applicable). In particular these additional controls are oriented to correlate the signing operations of the HSM for the specific keys with the activity of the CA software, in order to generate specific audit material related to this incident.
    Second phase:
  • Revocation of the remaining CA in a period not longer than 90-days, trying to making it as close as possible to 30-day.
  • Destruction of the private keys of all affected CAs(*)

We consider necessary the (maximum) 90-day delay in order to ensure that we aren’t provoking irremediable harm with our actions. In particular we have an important customer with some hundreds of TLS certificates used in health-related applications. Stopping the services that rely on these certificates due to revocation will produce that thousands of doctors in the US couldn’t be able to access to drug prescription services, and we have certainty that the effect of this interruption, in the current pandemic situation that is beating the US population, would cost a number of human lives. As WISeKey we are bound to be compliant with the industry requirements, but we have the conviction that a delay in the revocation is the adequate measure in order to contain that potential cost, that is impossible to assume from a human standpoint.

Therefore, we estimate that the adequate and balanced mitigation control is to revoke the affected CA for TLS services during a period not longer than 90 days, leaving time to our customers to replace the certificates in critical services. We are pushing to our customers to reduce the deadline to the maximum possible, but right now we need to communicate the longest possible delay of 90 days.

We understand the the plan implies the breach of the revocation deadlines, so we’d be opening the specific bug for the specific discussion.

——————————————
(*) We are coordinating with our auditors the best approach to provide assurance on the destructions of the keys, in terms of doing a single report or two reports, depending if we defer the destruction to happen during a single activity or we do it separately. The alternatives are specific reports or have this noted in the section “Other matters” in the next annual audit report

Hello,
just a short update on this.

As discussed in Bug #1651730 (Delayed revocation), this was the main point to post comments on the action plan.

Some days ago I posted in the Bug #1651730 the new expectation to complete the revocation and key destruction for the remaining CA in a much shorter period. We still keep this plan to reduce significantly the deadline initially set in comment #10 for 90 days (so first half of October), but I'd just have to say that probably this will happen some days later than mentioned in bug #1651730, so instead of happening around the 15th of August it will happen around the 25th.

The cause for this delay is due to a case of COVID19 in one of our employees, so we have to follow certain rules imposed by the Swiss Health Authority. This will not affect to any critical service and continuity of operations will not be compromised, but we'd still have certain restrictions in the next two weeks that will slow down certain tasks.

Regards,
Pedro

Whiteboard: [ca-compliance] → [ca-compliance] [covid-19] Next Update - 25 October, 2020

Hi Pedro,
I am willing to close this bug and consolidate further discussion of this issue under WISeKey's bug for delayed revocation, Bug #1651730. The comments in this bug contain many valuable disclosures and observations which are preserved for cross-reference in that bug. However, before I close this bug, I would like to understand what steps WISeKey takes to ensure it is following and participating in discussions that highlight these types of issues. For instance, I assume that WISeKey follows and occasionally participates in discussions about: the CA incidents of other CAs in Bugzilla; changes and discussions of CA/Browser Forum Guidelines, and in particular, certificate profiles; and m.d.s.p posts, which discuss developments and interpretations of the applicable standards for CAs. (Section 2.1 of the Mozilla Root Store Policy states, "CAs MUST follow and be aware of discussions in the mozilla.dev.security.policy forum, where Mozilla's root program is coordinated. They are encouraged, but not required, to contribute to those discussions.") I think that understanding what additional steps you take to ensure this expectation is met, in order to detect and prevent future issues, is useful to ensure that no future interpretation issues arise.
Thanks,
Ben

Dear Ben,
I'd like to confirm that WISeKey follows actively all the discussions in m.d.s.p, most of the discussions in Bugzilla (in particular we are subscribed to the CA Compliance messages) and the exchanges in the CA/B Forum. Key people in the organization follows these discussions and whenever an issue is considered relevant for us, messages are forwarded internally for awareness and discussion.
In terms of active participation, this is done when we consider that we add value to the discussions. In general we'd wish there was a different dynamic in these discussions, which sometimes doesn't encourage the necessary debate that would be so important to enrich the community.
Thanks,
Pedro

Besides de above, I'd like to do a final update, confirming that we keep for now the deadline around the 25th to revoke the third CA and destroy the keys. We'd do a final report in the bug concerning the revocation delay. Almost all affected leaf certificates are already replaced and we are doing the final checks and planning the revocation ceremonies.

I am closing this bug. Remaining issues can be addressed in the other bug for delay revocation, Bug #1651730.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance] [covid-19] Next Update - 25 October, 2020 → [ca-compliance] [ocsp-failure] [covid-19]
You need to log in before you can comment on or make changes to this bug.