Closed Bug 1610767 Opened 2 years ago Closed 1 year ago

WISeKey: Failure to meet revocation deadline

Categories

(NSS :: CA Certificate Compliance, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pfuentes, Assigned: pfuentes)

Details

(Whiteboard: [ca-compliance] [delayed-revocation-leaf])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15

Steps to reproduce:

This bug is related to bug 1609501 (https://bugzilla.mozilla.org/show_bug.cgi?id=1609501)

As disclosed in the referenced link above, WISeKey incurred in a misissuance due to a typo in the organization field. This problem occurred in 8 certificates of which 7 required replacement/revocation and 1 was already expired.

Of the offending certificates, 6 were revoked on the 5th day and one was left pending due to inability of the customer to replace it on time.

WISeKey has considered appropriate to extend the revocation period due to these facts:

  • The certificate is used by the legitimate subscriber
  • The customer needs assistance of an external contractor to replace the certificate and they had a weekend and a bank holiday during the 5-day period that made difficult to meet our request to replace the certificate on time
  • The certificate is used for an internal VPN service that is not accessed by the public
  • The VPN service is important for the customer operations and a disruption is harmful to them
  • The customer is a pharmaceutical company that processes medical prescriptions and the disruptions in their services can lead to patients not getting access to adequate medical treatments and in a worst-case scenario endanger their lives

Under these circumstances, WISeKey considers that the risks derived of revoking the certificate before it's safely replaced exceed largely the risks derived of the BR uncompliance.

We hope our position in this exceptional case is acceptable by the community.

We fully understand the implications of this decision and we will communicate it to our auditors, so this is reflected in our next audit report.

As informed by the customer by Tuesday the 22nd of January, they expect to revoke this certificate by Wednesday the 23rd, and we hope they can meet this expectation. We will do timely updates here until the problem is solved.

Thanks for your understanding and best regards,
Pedro Fuentes

Assignee: wthayer → pfuentes
Status: UNCONFIRMED → ASSIGNED
Type: defect → task
Ever confirmed: true
Whiteboard: [ca-compliance] [delayed-revocation-leaf]
Whiteboard: [ca-compliance] [delayed-revocation-leaf] → [ca-compliance] [delayed-revocation-leaf] Next Update - 23-January 2020

Thanks, Pedro, for filing this. It looks like most, but not all of the information requested is present.

In particular, wanting to confirm two of the actions the CA must/will take, as well as one set of unanswered questions here:

Please confirm:

  • The issue will need to be listed as a finding in your CA’s next BR audit statement.
  • Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable.

Please provide details on:

  • That you will perform an analysis to determine the factors that prevented timely revocation of the certificates, and include a set of remediation actions in the final incident report that aim to prevent future revocation delays.
Flags: needinfo?(pfuentes)

Note that the format here, to help make sure the information is present, is to make sure you're formatting as an Incident Report, with the focus of the incident report being the missed revocation. That is, the incident report isn't about the typo in the O field (That's Bug 1609501), but about the missed revocation.

Thanks!

Hello,
sorry if the structure of my post wasn't formal enough. Let me rewrite it according to the Policy.

  • The decision and rationale for delaying revocation will be disclosed to Mozilla in the form of a preliminary incident report immediately; preferably before the BR mandated revocation deadline.

WISeKey has considered appropriate to extend the revocation period due to these facts:

  • The certificate is used by the legitimate subscriber
  • The customer needs assistance of an external contractor to replace the certificate and they had a weekend and a bank holiday during the 5-day period that made difficult to meet our request to replace the certificate on time
  • The certificate is used for an internal VPN service that is not accessed by the public
  • The VPN service is important for the customer operations and a disruption is harmful to them
  • The customer is a pharmaceutical company that processes medical prescriptions and the disruptions in their services can lead to patients not getting access to adequate medical treatments and endanger their lives
    Under these circumstances, WISeKey considers that the risks derived of revoking the certificate before it's safely replaced exceed largely the risks derived of the policy uncompliance.
  • Any decision to not comply with the timeline specified in the Baseline Requirements must also be accompanied by a clear timeline describing if and when the problematic certificates will be revoked or expire naturally, and supported by the rationale to delay revocation.

Our current position is that the certificate will be revoked as soon as the customer replaces it. We have frequent exchanges with the customer and they are making clear that they will replace the certificate but they have technical difficulties to complete the process.
As explained above, the rationale to delay the revocation of this certificate, used by its legitimate subscriber for an important infrastructure, is based to a possible disruption in a service that, in a worst-case scenario, can endanger human lives due to a delay to access to a medical prescription.
If the customer fails to do the replacement in a reasonable time (right now we expect this to be done during this week), we'd evaluate again the risks.

  • The issue will need to be listed as a finding in your CA’s next BR audit statement.

We have communicated the issue to our auditors and they will do their duties in the next audit (to happen in June).

  • Your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable.

In our exchanges with the auditors, the consider our approach adequate. We are hereby disclosing the information to the Mozilla Root Store (which is a reference for others) and we expect to get an statement if our perception of the risk balance is not acceptable.

  • That you will perform an analysis to determine the factors that prevented timely revocation of the certificates, and include a set of remediation actions in the final incident report that aim to prevent future revocation delays.

We don't know if such a case of exceptional nature can lead to remediation actions to prevent it to happen again. If at some point in the future we have to balance again the risks of revoking a certificate that is used by its legitimate subscriber for lawful purposes, and that supports a critical infrastructure, it could happen that we find justified to give an extension.

Flags: needinfo?(pfuentes)

Hello,
I'd like to inform you that the offending certificate has been revoked after being safely replaced by the customer.
Please let us know if further actions are required.
Thanks,
Pedro

I think it's questionable whether other root stores would consider this sufficient, but that's for other root stores to communicate.

I'm concerned about the lack of steps to mitigate or prevent the risk of this in the future. WISeKey has contracts in place with their customers to ensure they have the right to revocation, and WISeKey's own policies state it will revoke. The suggestion that WISeKey will not be doing anything to prevent this from happening in the future, and may allow it to happen again, is quite troubling.

There seem to be a number of opportunities here for improvement:

  • Reminding all WISeKey customers of the timelines for revocations, that they are non-negotiable, and working to ensure customers have solutions that are viable for them, such as moving to privately-trusted CAs for these mission critical systems.
  • WISeKey investing in automation, for example, ACME or other solutions
  • WISeKey reducing the certificate lifetimes for the certificates it issues, to ensure that replacement becomes routine and well-supported

These are just three examples of positive preventative steps that could be taken. I do hope it's re-evaluated about what steps are taken to broadly reduce the risk of delayed revocations going forward. I wanted to see if you wanted to update this issue with that.

Note that the incident report in Comment #3 still does not follow the template, which I requested in Comment #2.

Flags: needinfo?(pfuentes)

(In reply to Ryan Sleevi from comment #5)

I think it's questionable whether other root stores would consider this sufficient, but that's for other root stores to communicate.

I'm concerned about the lack of steps to mitigate or prevent the risk of this in the future. WISeKey has contracts in place with their customers to ensure they have the right to revocation, and WISeKey's own policies state it will revoke. The suggestion that WISeKey will not be doing anything to prevent this from happening in the future, and may allow it to happen again, is quite troubling.

Ryan, I guess we have a communication issue... The point is not about not willing to do anything to keep this happening again in the future, but the fact that it’s impossible to prevent an exceptional situation to happen ever again. In our contracts we formalize the fact that the CPS is a binding document, and therefore there are stipulations related to the revocation that are contractually set. But in every contract exists the concept of « force majeure » that can prevail in exceptional situations. What we said is that we don’t see how we can prevent an exceptional situation to occur.

Just be clear, and avoid more issues with my broken English, I understand « prevent » as « to keep or impede something to occur », not as « to reduce the likelihood of something to occur ».

There seem to be a number of opportunities here for improvement:

  • Reminding all WISeKey customers of the timelines for revocations, that they are non-negotiable, and working to ensure customers have solutions that are viable for them, such as moving to privately-trusted CAs for these mission critical systems.

Besides the above-said about « force majeure », the Policy itself admits the fact that in exceptional cases the revocation delays could be required to be reconsidered based on a risk analysis. If the community considers that there’s no exception to be admitted by a CA, then the exceptionalities must be eliminated of the Policy, in that case is where the CAs can be empowered to reduce the applicability of « force majeure » and impose immutable revocation delays.

  • WISeKey investing in automation, for example, ACME or other solutions

While these solutions, on which we ARE investing (e.g. we do support ACME for special projects already) improve the certificate management, this will not eliminate these exceptionalities (e.g. in this particular case the customer didn’t support such protocols) related to risks derived of revocations.

  • WISeKey reducing the certificate lifetimes for the certificates it issues, to ensure that replacement becomes routine and well-supported

Again, this is positive and the way to go, but can’t prevent an exceptional case to occur, which is the point here.

These are just three examples of positive preventative steps that could be taken. I do hope it's re-evaluated about what steps are taken to broadly reduce the risk of delayed revocations going forward. I wanted to see if you wanted to update this issue with that.

As I said, your recommendations are of evident benefit, but I don’t think that are remediation measures that prevent situations like the present to happen in the future. What is clear is that this comes after a misissuance and it’s there where we have to put our efforts by applying recommendations like the ones your propose, which, as I said are already all part of our roadmap.

Note that the incident report in Comment #3 still does not follow the template, which I requested in Comment #2.
My bad, I was taking that template for misissuances, not for revocation incidents, anyhow... I really hope very much not having to master the way to communicate incidents... no because of not making proper reporting, but for not having to do it.

Flags: needinfo?(pfuentes)

I think the objective is to understand the factors that made this delay necessary, and to take every reasonable step to prevent this in the future.

I'm not sure force majeure is the relevant concept here: the customer of WISeKey made design decisions that led to them to have difficulties if WISeKey enforced the Subscriber Agreement and it's CP/CPS. This wasn't an act of God, a declared war, or a natural disaster: this was a failure to plan appropriately.

I totally understand and appreciate that WISeKey can't 'force' their customers to plan appropriately. However, WISeKey made the decision to accept the risk and responsibility for that failure to plan, and thus has to accept the accountability that comes with it. What I want to make sure is that, in the future, WISeKey is less likely to accept that risk and responsibility, by understanding that if and as WISeKey does so, they lose the trust in the community by prioritizing their business interests and their customers.

We've had CAs misissue SHA-1 certificates, or weak keys, or even MITM CA certificates, under the argument very similar to here - force majeure, which is really "the customer would be inconvenienced". We want and need that to stop, and we get there by incident reports regarding revocation so that the CA makes it clear that they're taking steps that to prevent the need of this in the future, as well as making commitments to the community that they understand they won't and can't do this regularly every time a customer of theirs is inconvenienced.

Obviously, there are trade-offs. When CAs are stuck between that rock and a hard place, and make the decision to favor their customer over the community, we want to make sure that they take steps to help all of their customers going forward. We can't 'force' a CA to revoke - that technical capability exists only for the CA, and we don't want every CA thinking they'll be immediately distrusted the moment they make a decision not to revoke some certificate. The decision on whether to trust or distrust a CA is not based solely on their decision to revoke or not revoke, but how they respond to the overall incident, the same as how we treat other incidents.

That's why the response concerned me, because it doesn't really help understand what WISeKey is doing to help prevent these situations. I'm similarly a little concerned that customer business decisions/negligence/poor planning are being conflated with force majeure. We need the ecosystem to be better :)

Ryan, I really appreciate your insights as something positive and something that helps to improve.

As CAs we have to balance the impact of our decisions. It’s not just not willing to annoy or customers, but sometimes we have to take decisions based on the effect to the end users that access to our customer’s servers. Should I prevent a person to access to its medical prescription because I personally decided not giving some slack to my customer to safely replace a certificate? Certainly is not the same than making a user to have an error when visiting a commercial website... in such case my hand wouldn’t tremble when clicking on the revoke button... As it’s said, a big power brings a big responsibility...

That's understandable, but that's also why it's important and useful to understand what steps are being taken to make sure that certificates won't be used in such situations.

In effect, the customer is offloading the risk of their decision to WISeKey, by trying to make WISeKey responsible (if things go wrong) for their poor planning.
That has the effect of offloading the risk of WISeKey's decisions onto the browsers and community: If/when WISeKey decides not to revoke, do the browsers remove trust in WISeKey?

That's not reasonable for customers to offload the risk of their decisions onto the whole Internet community, which is why the incident reports try to address how those risks are being minimized and reduced. For example, WISeKey reminding its customers of the contracts, and helping them find alternative solutions (such as private PKIs) helps reduce that risk, both to WISeKey and to the community.

If nothing is done, and this is treated as a "Sometimes bad things happen", and it happens again, I think it'd raise much more serious questions. That's why I wanted to make sure we take the opportunity to understand the challenges the customer faced, and from there, how to do everything possible to make sure no customer is ever faced with those challenges, so that this doesn't happen again. Now, as you say, this isn't perfect - but we want to make sure the next time, it was totally clear that every effort was made to help guide the customer away from those risky decisions, they made them anyways, and that WISeKey revoking that certificate in no way makes them responsible for the consequences. That's why we have policies and guidelines in the first place - to help set those expectations.

Pedro,
Could you please provide a final incident report?
Thanks,
Ben

Flags: needinfo?(pfuentes)
QA Contact: wthayer → bwilson

Hello Ben,
please allow us to come back to this in a few days with the final report.
Thanks,
Pedro

Hello,
please find below the final incident report.
Thanks,
Pedro

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

The problem to meet the revocation was detected after communicating to the customer the need to revoke his certificate due to a mis-issuance. At some point the customer told us that there was a particular server that wasn’t in their main premises and that required a contractor to intervene. Due to the fact that the incident report was communicated on a weekend, that left the customer with too few working days to arrange the intervention to replace the certificate, so they made us aware that it was possible that they wouldn’t be able to do it on time.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

2020-01-15 10:57 PST: The original bug (1609501) about the mis-issuance is published.
2020-01-21 01:47 PST: We communicate (https://bugzilla.mozilla.org/show_bug.cgi?id=1609501#c5) that the customer had issues to replace one certificate (https://crt.sh/?id=1864166725) and that we’d had to give them an extension
2020-01-22 02:34 PST: We publish the present bug (1610767), notifying the problem of meeting the revocation deadlines
2020-01-23 04:32 PST: The certificate is revoked (2020-01-23 13:32:25 UTC) once safely replaced

  1. Whether your CA has stopped, or has not yet stopped, issuing certificates with the problem. A statement that you have will be considered a pledge to the community; a statement that you have not requires an explanation.
    The problem that caused the mis-issuance was properly addressed, as per the original bug indicated above. This is not a problem affecting our revocation capabilities that could affect other certificates.

  2. A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
    See #5

  3. The complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
    https://crt.sh/?id=1864166725

  4. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We don’t consider this a mistake, but a decision that we had to take.
We analysed with the customer the situation and detected that the server was used for a VPN very important for the operations of a health-related application (the customer is a relevant company in the U.S. Pharma sector, dealing with medical prescriptions).
We checked the Mozilla Policy and verified that an extension of the revocation deadlines is acceptable after an appropriate risk analysis, and given the criticality of the application and the fact that the revocation could affect a health-related service, and therefore hypothetically endanger human lives, and also considering that the certificate was used by its legitimate subscriber and that the mis-issuance was “just” a typo in the Organization name, we came to the conclusion that the risk to revoke was too high and that this case merited exceptional treatment foreseen by the Mozilla policy.
In consequence, before the 5-day deadline we realised that, according to the above, it could be advisable to give some extra time to the customer, and we notified so in bug #1609501.

  1. List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.

As explained in the discussion above, we don’t consider this as something endemic to our particular CA, but an exceptional case that required an exceptional treatment.
We were pointed by Ryan Sleevi in the previous comments that a possible solution would be to ensure that Trusted Certificates aren’t used in mission-critical services, but we don’t consider this as something that falls under our control. What is in our control is to ensure that the customers understand the circumstances for revocation set by the CABF/BR and that we reflect in our CPS, ensuring that the subscriber agreement is properly binding in terms of respecting these stipulations.

Flags: needinfo?(pfuentes)

The relevant certificate was revoked in January 2020 (after 8 days). Final incident report has been provided, so I am closing this bug.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Whiteboard: [ca-compliance] [delayed-revocation-leaf] Next Update - 23-January 2020 → [ca-compliance] [delayed-revocation-leaf]
You need to log in before you can comment on or make changes to this bug.