Closed Bug 1651730 Opened 1 year ago Closed 1 year ago

WISeKey: Failure to revoke ICA Certificates within 7 days (OCSP EKU)

Categories

(NSS :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pfuentes, Assigned: pfuentes)

Details

(Whiteboard: [ca-compliance] [delayed-revocation-ca] Next Update 2020-12-01)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15

Steps to reproduce:

As per the action plan disclosed in https://bugzilla.mozilla.org/show_bug.cgi?id=1649939#c10 WISeKey will not be able to meet the expected deadlines to revoke non-compliant CAs in 7 days.
We will elaborate more on the particular topic of the delay in the next days.

Assignee: bwilson → pfuentes
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-ca]

An update is needed.

Flags: needinfo?(pfuentes)

(In reply to Ben Wilson from comment #1)

An update is needed.

Hello,
As the status overlaps with the incident reporting of the original bug (#1649939), I'm publishing here the update, and then we'd formalise interim and final incident reports in both bugs, as required and if OK for you.

In Comment #10 of the bug 1649939, we disclosed the already executed actions, that included the revocation of two of the three offending CAs the past 9th of July. The remaining action (revocation of the third CA), was expected to happen before a maximum period of 90-days later (so during the first half of October), due to the reasons explained in comment #10.

We are trying to demonstrate a "best effort" approach, so our current expectation is to trim that period to about 30 days. Right now we plan to revoke this remaining CA not later than the 15th of August, but we don't have yet an exact date.

In terms of key destruction, our current plan is to do this in a single event for all the affected CAs after revoking the third CA. Due the internal nature of these CAs (owned and operated by us, as operators of the Roots), we plan to do a usual key destruction process, leaving audit evidence (paperwork and video recording), so our auditors can do the usual validations during the next annual audit report. The auditors will also be able to check the private key usage audit logs, in order to provide more assurance to the community to sustain the fact that these CAs never deal with OCSP responses. All this would be reflected in the next annual audit report ("Other matters" section) linked to this incident and the two related bugs.

I hope this fulfils your expectations in terms of an interim report.

Best regards,
Pedro

Flags: needinfo?(pfuentes)

(In reply to Pedro Fuentes from comment #2)

Hello,
As the status overlaps with the incident reporting of the original bug (#1649939), I'm publishing here the update, and then we'd formalise interim and final incident reports in both bugs, as required and if OK for you.

Please keep these separate. They are meant to address to independent element of the incident.

Bug 1649939 should look to understand how the incident happened and the factors contributing to it, as well as what is being done to address those factors.

Please use this incident to focus on the factors that WISeKey evaluated in deciding to delay remediation, and how that's being addressed going forward. In particular, you should be looking to understand how to ensure that any intermediate can be revoked within the BR-permitted timeframe, without hardship, and the steps you're taking to ensure that.

We are trying to demonstrate a "best effort" approach, so our current expectation is to trim that period to about 30 days. Right now we plan to revoke this remaining CA not later than the 15th of August, but we don't have yet an exact date.

This is really encouraging. Thanks for sharing this.

In terms of key destruction, our current plan is to do this in a single event for all the affected CAs after revoking the third CA. Due the internal nature of these CAs (owned and operated by us, as operators of the Roots), we plan to do a usual key destruction process, leaving audit evidence (paperwork and video recording), so our auditors can do the usual validations during the next annual audit report. The auditors will also be able to check the private key usage audit logs, in order to provide more assurance to the community to sustain the fact that these CAs never deal with OCSP responses. All this would be reflected in the next annual audit report ("Other matters" section) linked to this incident and the two related bugs.

This is an update probably relevant for the other bug. In general, it would be better to have such evidence sooner. The problem for Relying Parties is that any issues with the evidence won't be detected until the next audit period (and it's unclear, from this issue, when that is). For a Relying Party, that's indistinguishable from doing nothing at all. This is because a CA might do nothing, or might have insufficient evidence, but that won't be detected until well after the fact.

We can continue the discussion on Bug 1649939 for this matter, the broader problem of "How do we independently demonstrate that we're behaving in a trustworthy manner"

Flags: needinfo?(pfuentes)

(In reply to Ryan Sleevi from comment #3)

Please keep these separate. They are meant to address to independent element of the incident.

Thanks for the guidance, we'll do so and try to keep both discussions separate, although there are overlapping aspects of it.

We can continue the discussion on Bug 1649939 for this matter, the broader problem of "How do we independently demonstrate that we're behaving in a trustworthy manner"

If it's OK for you, I'll keep the details on this in the other bug. Just maybe clarify about the rational to do in a single action the key destruction of all CAs... when discussing with our auditor about this, they communicated that they won't be available to witness the activity in the day we wanted to revoke the first two CAs, and that this had to happen up to two weeks later. Once it was clear that the third CA could be revoked much earlier than expected initially, it was seen that it wouldn't really make a difference doing one ceremony and some days later another one, instead of simply making a single action and unify the audit evidence that intends to provide the required assurance to relying parties.

Flags: needinfo?(pfuentes)
Whiteboard: [ca-compliance] [delayed-revocation-ca] → [ca-compliance] [delayed-revocation-ca] Next Update - 15-August 2020

Hello,
I'd like to provide the status on this issue.

As explained in the related Bug #1649939 we had planned initially a revocation delay of three months, but we have been managed to reduce this time significantly.

As per today, 25th of August, we have revoked the third and last CA affected by the OCSP EKU problem. We have performed the key destruction ceremony for all effected CAs, generating the required audit proof.

During this procedure, we also export a digitally signed copy of the specific logs that we have been generating to monitor the keys of the three CAs, to provide clear audit evidence that the keys haven't been used for anything else that CRL generation.

I'll write ASAP a final incident report, focussing on the requirements set in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

Best,
Pedro

I'd like to publish the full incident report here, remaining available for any further clarification.

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.
    The problem to meet the revocation was detected after analysing the impact that would have in our customers operations the revocation of a CA involved in SSL issuance that was affected by the problem derived of the inclusion of the OCSP Signing EKU in the CA Certificate, as discussed in the related Bug #1649939. We checked the feasibility of a revocation within the normative deadline and we found an extremely high impact in one of our particular customers that works in the Health industry.

  2. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.
    2/July – Received the notification for the mis-issuance bug, we initiate an investigation of the issue, identifying it as a critical incident that can endanger the continuity of our services
    2..4/July – Discussions in m.d.s.p. and this bug in order to identify the severity and draft possible activity lines. Contact with main customers to notify the issue
    4/July – Creation of three new CAs to substitute the affected ones and start the transfer of services
    4..8/July – Implementation of additional controls to monitor the usage of the involved keys and generate specific audit evidence
    9/July - Revocation of 2 of the 3 affected CAs (Personal CAs 1 and 2). These CAs were revoked shortly after the 5-day deadline was already past
    25/August - Revocation of the third affected CA (SSL CA 1). We initially communicated a 90-day deadline for the revocation, but we managed to accelerate the process. All CA keys were destroyed according to our procedures and leaving the adequate audit track.

  3. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.

    The problem that caused the mis-issuance was properly addressed, as per the original bug indicated above, by ensuring that the Ceremony scripts would use CA Certificate Profiles not including the OCSP signing EKU.

  4. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
See #5

  5. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.

    https://crt.sh/?id=1490728539 : WISeKey CertifyID Personal GB CA 1
    https://crt.sh/?id=1490728458 : WISeKey CertifyID Personal GB CA 2
    https://crt.sh/?id=1435074103 : WISeKey CertifyID SSL GB CA 1

  6. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
    We don’t consider this a mistake, but a decision that we had to take.
    (Note that this section is intended to cover the items “The decision and rational for delaying revocation” and “Any decision to not comply with the timeline” in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation)
    We analysed with the customer the situation and detected that the server was used for a multitude of servers important for the operations of a health-related application. Those services would suffer of severe disruptions if the CA used for SSL issuance was revoked prematurely.
    We checked the Mozilla Policy and verified that an extension of the revocation deadlines is acceptable after an appropriate risk analysis, and given the fact that the revocation could affect a health-related service operating in the US during a severe pandemic situation, and we were made aware by the customer that a number of people would certainly die if not getting their prescriptions for medical treatment, we came to the conclusion that the harm derived of the CA revocation was too high and that this case merited exceptional treatment foreseen by the Mozilla policy.
    In consequence, before the 5-day deadline we realised that, according to the above, we wouldn’t be revoking the SSL within the normative deadline.

  7. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.
    In General: During this incident we managed to replace the SSL certificates of the rest of the customers during the normative deadline, even if already knowing that we’d have to extend the period, as a way to test our procedures to roll out new CAs and replace certificates, and like so ensuring that customers assume the need for abrupt revocations when necessary.
    In particular for the main reason to delay the revocation of the CA for SSL certificates: We had some months ago a late revocation issue with the same customer, and we have been helping them to deploy an automated certificate management tool, that will enable them to roll out new certificates in their servers in hours instead of weeks. This project is not completed due to the complexity of the customer, but it’s going at a good pace and automation is expected to be available already during 2020.

We also ensured with our auditors that they are aware of the issue, so they have reviewed our action plan during the incident and the problem will be noted in our next audit report, as requested in https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

I have set the next update for this bug to 1-Oct-2020. Meanwhile, I'll review Comment #6 and see if it is satisfactory.

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [delayed-revocation-ca] Next Update - 15-August 2020 → [ca-compliance] [delayed-revocation-ca] Next Update - 1-Oct-2020

Hello,
next update was set for yesterday, but on our side we don't have other activities or issues to report, as the plan was executed as announced in comment #2 and reported in comment #6 .
We remain available for any further clarification, else maybe this could be closed.
Txs,
Pedro

(In reply to Pedro Fuentes from comment #6)

We checked the Mozilla Policy and verified that an extension of the revocation deadlines is acceptable after an appropriate risk analysis

The Mozilla policy doesn't say that delaying revocation is acceptable; Mozilla imposes additional requirements of transparency. That wiki article explicitly says that "Mozilla does not grant exceptions to the BR revocation requirements."

Before I close this, is there an auditor's attestation, report or letter regarding CA key destruction that can be uploaded as an attachment here?

Flags: needinfo?(bwilson)
Whiteboard: [ca-compliance] [delayed-revocation-ca] Next Update - 1-Oct-2020 → [ca-compliance] [delayed-revocation-ca] Next Update 2020-12-01

(In reply to Ben Wilson from comment #10)

Before I close this, is there an auditor's attestation, report or letter regarding CA key destruction that can be uploaded as an attachment here?

Hello Ben,
As explained in comment #2 and later executed, we executed the key destruction ceremony generating the appropriate audit track, that the auditors would review and include this in the annual audit report, as part of the disclosure of incidents.
Best regards,
Pedro

Thanks. I'll close this matter next week (Dec. 7-11) with the understanding that documentation of the key destruction is forthcoming in the annual audit report.

Flags: needinfo?(bwilson)
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.