Closed Bug 1656487 Opened 4 years ago Closed 4 years ago

Izenpe: Failure to revoke within 5 days

Categories

(CA Program :: CA Certificate Compliance, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: o-garcia, Assigned: o-garcia)

Details

(Whiteboard: [ca-compliance] [leaf-revocation-delay])

This bug comes from https://bugzilla.mozilla.org/show_bug.cgi?id=1653284

  1. How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, a discussion in mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the time and date.

As indicated in https://bugzilla.mozilla.org/show_bug.cgi?id=1653284, due to a misissuance we had to revoke some certificates. The incidence was reported on 16th July at 15:18 UTC, therefore the 5 days defined by Baseline requirements to revoke a subscriber certificate ended on 21th July at 15:18 UTC. They were revoked on July 22th at 7:23 UTC, and we became aware of the delay in revocation on July 22th. Therefore, those certificates were revoked 16 hours and 5 minutes later.

  1. A timeline of the actions your CA took in response. A timeline is a date-and-time-stamped sequence of all relevant events. This may include events before the incident was reported, such as when a particular requirement became applicable, or a document changed, or a bug was introduced, or an audit was done.

July 16th at 15:18 UTC -> incidence was reported
July 21th at 15:18 UTC -> finished the time to revoke within 5 days
July 22th at 7:23 UTC -> all affected certificates were revoked
August 4th -> As an improvement measure we're going to define the new type "SSL certificate misissuance" into the clasification of our database, to be sure that it's treated correctly.

  1. Whether your CA has stopped, or has not yet stopped, certificate issuance or the process giving rise to the problem or incident. A statement that you have stopped will be considered a pledge to the community; a statement that you have not stopped requires an explanation.

We do not have any pending revocation requests due to any misissuance

  1. In a case involving certificates, a summary of the problematic certificates. For each problem: the number of certificates, and the date the first and last certificates with that problem were issued. In other incidents that do not involve enumerating the affected certificates (e.g. OCSP failures, audit findings, delayed responses, etc.), please provide other similar statistics, aggregates, and a summary for each type of problem identified. This will help us measure the severity of each problem.

12 certificates involved. In this case the problem was not with the issuance, but with their revocation.

  1. In a case involving certificates, the complete certificate data for the problematic certificates. The recommended way to provide this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem. In other cases not involving a review of affected certificates, please provide other similar, relevant specifics, if any.

https://crt.sh/?id=976518444
https://crt.sh/?id=955588634
https://crt.sh/?id=1154593917
https://crt.sh/?id=1130629695
https://crt.sh/?id=923040929
https://crt.sh/?id=967383323
https://crt.sh/?id=1154423592
https://crt.sh/?id=962032211
https://crt.sh/?id=1137176095
https://crt.sh/?id=1137206205
https://crt.sh/?id=1109849921
https://crt.sh/?id=1231234643

  1. Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.

We have two different teams:
• Validation agents -> compliance review
• Issuance/revocation agents -> have access to the PKI console to issue or revoke certificates
And these are the main steps our internal procedure for security incidences:
1.- Event detection: as soon as it's received, it's registered centraliced in an internal database. In case it affects to a critical service (defined by our BIA) this incidence has an specific treatment. In this case it came from bugzilla. The CSO follows all the process.
2.- Event identification: we identify and clasify the event into one of the following:
• Failure or Security Incident (Breakdown): An event in which the security of an asset has been compromised.
• Threat (problem): Detection of an agent that may cause damage to one of the Izenpe assets.
• Weakness (problem): Detection of a possibility of failure in some of the assets of Izenpe.
• Malfunction (failure): Error in any of the Izenpe computer assets, which may lead to a weakness or a security incident.
• Improvement (request): Detection of an improvement in the application of security measures.
• Privacy: all those security violations that cause the destruction, loss or accidental or illicit alteration of personal data transmitted, stored or otherwise processed, or the unauthorized communication of or access to such data.
In case it's necessary to activate the Contingency Plan we have specfic steps.
In this case it was clasified as threat.
3.- Communication: depending on the type of event it's notified to the corresponding entities (i.e.: CABForum, Spanish Ministry, third parties, etc.). In this case it was notified to bugzilla and to the customer.
4.- Event treatment: once the event has been identified and clasified, the CSO is responsible for the event treatment, since the beginning till the closure. In case there's a risk of CA key compromise there's an specific procedure. In this case Izenpe, accordingly with the customer, keeping in mind the impact of the revocation over this entity, decided to wait until the last available day to revoke. The CSO communicated the need to revoke all those certificates to the issuance/revocation team, by an automatic generated email. In that email it was specified that it's due to a security incidence, and it includes (among other information) event type, criticity and risk level. It was classified as a threat (we don't have evidence of compromise), so although in the email it was specified that all certificates had to be revoked by the July 21th, they were revoked the next day at first working hour.
5.- Close: security events, once closed, can be used for awareness-raising practices, as examples of what might have happened, how to respond to them and how to avoid them in the future. In particular, detected safety events may be used within the internal security training plan.

  1. List of steps your CA is taking to resolve the situation and ensure that such situation or incident will not be repeated in the future, accompanied with a binding timeline of when your CA expects to accomplish each of these remediation steps.

August 5th -> define the new type "SSL certificate misissuance" into the clasification of our incidence database

Assignee: bwilson → o-garcia
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [ca-compliance][delayed revocation leaf]
Whiteboard: [ca-compliance][delayed revocation leaf] → [ca-compliance][delayed-revocation-leaf]

Thanks Oscar.

I think you've done a good job of describing the status quo. What I think is missing here is describing the remediation. you've implemented and how that will prevent the reoccurrence of this in the future.

For example, it's unclear what the classification will impact, and whether there are technical controls or if this is still a manual process. It doesn't really describe the steps you take to, for example, analyze your responsiveness and how effective whatever controls you have implemented are. I'm hoping you can share more details here?

Flags: needinfo?(o-garcia)

Hi Ryan, first of all, sorry for the delay. In this case the problem was that the revocation team didn't receive the information correctly, so we've focused the remediation in trying to improve that communication.

Once the incidence is detected, identified and communicated to the corresponding entities, it's registered into our database. Until last August 5th it was registered as a security threat. In the "Event Treatment" phase an email is automatically sent to the revocation team, including (among other information) event type, criticity, risk level, and the need to revoke those certificates. But it didn't include any maximum date or time to be done.

What we have done to reduce the risk to happen again is to include a new event type into our internal database called "Certificate misisuance". The email generated to the revocation team in this case will include the deadline to revoke all affected certificates.

In any case all security incidences are treated as soon as possible, but in this case we decided to wait to revoke until the last available moment, to reduce the impact in our customers.

Flags: needinfo?(o-garcia)

Thanks for the update.

I'm a little concerned it took nearly a month to update. Could you share what you're doing about ensuring timely communications?

Flags: needinfo?(o-garcia)

We don't have a written procedure to such detail that defines how much time we have to answer to a requeriment in Bugzilla, but obviously we must meet the Mozilla requirements, and we should also provide updates at least every week. We have put in production the courtermeasures we indicated in our first comment in the planned date, but we didn't update this bug. It was due to summer holidays, covid, etc. Until now we haven't had any delay in answering to questions, or to update the posts. And we'll try to do our best to continue like that.
Thanks

Flags: needinfo?(o-garcia)
Flags: needinfo?(bwilson)

I believe that this issue can be closed. I'll close it on or about 6-November-2020 unless further discussion is needed.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(bwilson)
Resolution: --- → FIXED
Product: NSS → CA Program
Whiteboard: [ca-compliance][delayed-revocation-leaf] → [ca-compliance] [leaf-revocation-delay]
You need to log in before you can comment on or make changes to this bug.